JP2021071957A

JP2021071957A - Information processing apparatus, control method, and program

Info

Publication number: JP2021071957A
Application number: JP2019198690A
Authority: JP
Inventors: 下郡山　敬己; Itsuki Shimokooriyama; 敬己下郡山
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2021-05-06
Anticipated expiration: 2039-10-31
Also published as: JP7464814B2

Abstract

To provide a technology to select an appropriate learning model during a search for data.SOLUTION: The present invention is an information processing apparatus that manages a plurality of learning models, and comprises: receiving means that presents groups in which pieces of data searched for according to a search condition are analyzed and classified, and receives designation of a group; and selection means that selects a learning model according to first characteristic data based on pieces of data classified into the designated group and second characteristics data associated with the respective learning models.SELECTED DRAWING: Figure 1

Description

本発明は、データの特徴に応じた学習モデル選定の技術に関する。 The present invention relates to a technique for selecting a learning model according to the characteristics of data.

従来からユーザに対して適切な検索結果を提示するため、検索条件と文書群に含まれる各文書に含まれるターム（形態素解析、Ｎ−Ｇｒａｍなど一定の基準で切り出した文字列）の関連性を統計値として算出する技術がある。これらの技術を類似検索などと呼ぶ（以下、本発明の説明において、当該技術を統一的に類似検索と呼ぶこととし、本願発明における後述の順位学習による検索とは区別することにする）。 Conventionally, in order to present appropriate search results to the user, the relationship between the search conditions and the terms (morphological analysis, character strings cut out by a certain standard such as N-Gram) included in each document included in the document group is determined. There is a technique to calculate as a statistical value. These techniques are referred to as a similarity search or the like (hereinafter, in the description of the present invention, the techniques will be collectively referred to as a similarity search, and will be distinguished from the search by rank learning described later in the present invention).

また、学習データと検索対象となる文書群が類似する場合の特徴量を機械学習によりモデル化し、新たな検索条件が指定された場合に、当該学習モデルに基づきランキング調整をすることで、類似検索の精度を向上させる順位学習の技術がある。 In addition, the feature amount when the learning data and the document group to be searched are similar is modeled by machine learning, and when a new search condition is specified, the ranking is adjusted based on the learning model to perform a similar search. There is a ranking learning technique that improves the accuracy of.

順位学習には大量の学習データが必要であるが、学習データの収集は困難である。類似検索をシステムとして運用開始した後にユーザの検索ログから学習データを収集することも考えられるが、検索結果の評価にはユーザの負荷がかかることもあり、十分な量のログ収集が可能とは言い切れない。また運用開始前には、開発者がテスト用に作成した学習データなどに限定される。 Although a large amount of learning data is required for rank learning, it is difficult to collect the learning data. It is conceivable to collect learning data from the user's search log after starting the operation of similar search as a system, but it may be a burden on the user to evaluate the search results, so it is possible to collect a sufficient amount of logs. could not say it all. Also, before the start of operation, it is limited to learning data created by the developer for testing.

特許文献１は、予め用意された回答（いわばＦＡＱの文書群）に対して、ユーザからの問い合わせに対して最も類似した質問（学習データの質問文）を見つけ、対応する回答を返す技術に対して、質問文が少ない場合でもトピック推定精度を高める技術を提供している。 Patent Document 1 is a technique for finding the most similar question (question text of learning data) to an inquiry from a user with respect to an answer prepared in advance (so to speak, a group of FAQ documents) and returning the corresponding answer. We provide technology to improve topic estimation accuracy even when there are few question sentences.

具体的には、学習データの質問文に現れる単語に対して、対応する回答内の単語に置換することによって、学習データの質問文を拡張する、すなわち学習データの件数を増やしている。また拡充した質問文のうち不自然な質問文を除外するため、確率言語モデルを用いて質問文の存在確率を計算し、存在確率がある閾値を超える場合のみ学習データとして用いるとしている。 Specifically, by substituting the words appearing in the interrogative sentence of the learning data with the words in the corresponding answers, the interrogative sentence of the learning data is expanded, that is, the number of learning data is increased. In addition, in order to exclude unnatural question sentences from the expanded question sentences, the existence probability of the question sentence is calculated using a stochastic language model, and it is used as training data only when the existence probability exceeds a certain threshold.

特開２０１７−３７５８８号公報Japanese Unexamined Patent Publication No. 2017-37588

しかしながら、特許文献１の技術においては、確率言語モデルを用いて拡充された質問文が適切であるか否かを判定しているが、置換された単語はあくまで予め用意された回答に含まれるものであり、専門用語やある組織特有の用語が使用されている可能性がある。その場合、確率言語モデルでは事例が不足していて、質問文が適切に拡充されない場合も発生する。 However, in the technique of Patent Document 1, it is determined whether or not the expanded question sentence is appropriate by using a stochastic language model, but the replaced word is included in the answer prepared in advance. And there is a possibility that technical terms or terms specific to an organization are used. In that case, the stochastic language model lacks cases, and the question text may not be expanded appropriately.

さらに特許文献１の技術においては、学習データとして用いる質問文を拡充させることで学習効果を高めること目的である。しかしながら学習データの件数が増加すると学習に要する計算時間が膨大になり、実用的ではなくなってしまうことある。 Further, the technique of Patent Document 1 aims to enhance the learning effect by expanding the question sentences used as learning data. However, if the number of training data increases, the calculation time required for learning becomes enormous, which may become impractical.

本発明の目的は、データの検索時に適切な学習モデルを選択する技術を提供することである。 An object of the present invention is to provide a technique for selecting an appropriate learning model when retrieving data.

本発明は、学習モデルを複数管理する情報処理装置であって、検索条件に従って検索されたデータを解析して分類したグループを提示し、グループの指定を受け付ける受付手段と、前記指定されたグループに分類されたデータに基づく第１の特徴データと、前記学習モデルそれぞれに関連付けられた第２の特徴データとに従って、学習モデルを選定する選定手段とを備えることを特徴とする。 The present invention is an information processing device that manages a plurality of learning models, presents a group that analyzes and classifies the data searched according to the search conditions, and accepts the designation of the group, and the designated group. It is characterized by including a first feature data based on the classified data and a selection means for selecting a learning model according to the second feature data associated with each of the learning models.

本発明により、データの検索時に適切な学習モデルを選択する技術を提供することが可能となる。 INDUSTRIAL APPLICABILITY According to the present invention, it is possible to provide a technique for selecting an appropriate learning model when searching data.

本発明の実施形態に係る機能構成の一例を示す図である。It is a figure which shows an example of the functional structure which concerns on embodiment of this invention. 本発明の実施形態に係る情報処理装置１００に適用可能なハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware configuration applicable to the information processing apparatus 100 which concerns on embodiment of this invention. 本発明の実施形態に係わる検索条件の入力と検索結果から正解を指定するユーザインタフェースの一例である。This is an example of a user interface for inputting search conditions and designating a correct answer from the search results according to the embodiment of the present invention. 本発明の実施形態に係わる類似検索の検索結果の一例である。This is an example of a search result of a similar search according to an embodiment of the present invention. 本発明の実施形態に係る検索結果をクラスタリングした結果の一例である。This is an example of the result of clustering the search results according to the embodiment of the present invention. 本発明の実施形態に係わるクラスタリングを用いてドリルダウンした結果の一例である。This is an example of the result of drilling down using the clustering according to the embodiment of the present invention. 本発明の実施形態に係る学習データ記憶部に登録された学習データの構造の一例である。This is an example of the structure of the learning data registered in the learning data storage unit according to the embodiment of the present invention. 本発明の実施形態に係る学習モデル生成の処理を説明するフローチャートの一例である。This is an example of a flowchart for explaining the process of generating a learning model according to the embodiment of the present invention. 本発明の実施形態に係る学習データのグループ類似度とグループ化を説明する図の一例である。It is an example of the figure explaining the group similarity and grouping of the learning data which concerns on embodiment of this invention. 本発明の実施形態に係る学習モデルを記憶する際のデータ構造と検索時に選択されたクラスタの類似度計算を説明する図の一例である。It is an example of a figure explaining the data structure when storing the learning model according to the embodiment of the present invention and the similarity calculation of the cluster selected at the time of retrieval. 本発明の実施形態に係る検索処理を説明するフローチャートの一例である。This is an example of a flowchart for explaining the search process according to the embodiment of the present invention.

本発明においては機械学習により従来型の文書の検索結果を、機械学習を利用して検索順位を改めて指定し直す。これを順位学習などと呼ぶ。特に本発明では説明の便宜上、事前に学習モデルを決定する処理を“学習モデルの生成”、実際にユーザなどの検索条件に基づく検索結果を、生成された学習モデルを用いて順位を指定し直す処理を“再ランク付け”と呼ぶことにする。 In the present invention, the search result of the conventional document is redesignated by machine learning, and the search order is redesignated by using machine learning. This is called ranking learning. In particular, in the present invention, for convenience of explanation, the process of determining the learning model in advance is "generation of learning model", and the search result based on the search conditions such as the user is redesignated by using the generated learning model. We will call the process "reranking".

本発明の特徴は、次の３点にある。まず分類情報がない文書群に対して、後述の学習モデルを生成するために学習データをどのように記憶させるかということである。２点目に前記の学習データを元に、学習データとなる文書群をどのように一部に限定して学習モデルを生成するかである。３点目に検索時に動的にクラスタリングされた文書群の再ランク付けに際し、複数ある学習モデルの中から適切な学習モデルをいかに選択するかである。 The features of the present invention are the following three points. The first is how to store learning data in order to generate a learning model, which will be described later, for a group of documents that do not have classification information. The second point is how to generate a learning model by limiting the document group to be the learning data to a part based on the learning data. The third point is how to select an appropriate learning model from a plurality of learning models when re-ranking a group of documents dynamically clustered at the time of searching.

以下、本発明の実施の形態を、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施形態に係る機能構成の一例を示す図である。検索条件受付部１０１は、検索ユーザまたは他のプログラムから検索条件（文字列）を受け付けて、類似検索部１０２に送る。類似検索部１０２は、文書記憶部１２１を検索して検索条件に記載された条件にヒットした検索結果、すなわち文書一覧を取得する。この検索処理は単語の出現頻度などに基づき検索条件と文書の類似度を計算しその上位の文書を検索結果の文書一覧にする、など様々な周知の技術があり説明を省略する。 FIG. 1 is a diagram showing an example of a functional configuration according to an embodiment of the present invention. The search condition receiving unit 101 receives a search condition (character string) from a search user or another program and sends it to the similar search unit 102. The similar search unit 102 searches the document storage unit 121 and acquires a search result that hits the condition described in the search condition, that is, a document list. This search process has various well-known techniques such as calculating the similarity between the search condition and the document based on the frequency of occurrence of words and making the higher-ranking document into a document list of the search result, and the description is omitted.

前記検索結果である文書一覧はクラスタリング部１０３に渡され、自然言語処理にて各文書の類似度に基づきクラスタに分割する。クラスタリングについても周知の技術であり説明を省略する。またクラスタリングでは、１つの文書を必ず１つのクラスタに分類する方式と、１つの文書が複数のクラスタに含まれることを許容する方式があるが、本願発明ではそのいずれを用いても良い。 The document list that is the search result is passed to the clustering unit 103, and is divided into clusters based on the similarity of each document by natural language processing. Clustering is also a well-known technique, and the description thereof will be omitted. Further, in clustering, there are a method of always classifying one document into one cluster and a method of allowing one document to be included in a plurality of clusters, and any of them may be used in the present invention.

表示部１０４は、前記クラスタを表示して例えばユーザにクラスタのうちの１つを選択させる。ユーザが選択したクラスタを受け付けて当該クラスタに分類された文書群（前記検索結果の文書一覧の一部）を表示する。その際に、学習モデル選択部１０５はユーザが選択したクラスタに応じて、学習モデル記憶部１２３から適切な学習モデルを選択し、当該学習モデルに従って前記文書群を再ランク付けして表示部１０４に表示する。 The display unit 104 displays the cluster and allows, for example, the user to select one of the clusters. The cluster selected by the user is accepted and the document group classified into the cluster (a part of the document list of the search result) is displayed. At that time, the learning model selection unit 105 selects an appropriate learning model from the learning model storage unit 123 according to the cluster selected by the user, re-ranks the document group according to the learning model, and displays the document group on the display unit 104. indicate.

また前記表示部１０４でユーザが選択したクラスタに含まれる前記前記文書群から例えばユーザに１つの文書を選択させ、当該文書を前記検索条件に対する正解として学習データ登録部１０７に渡し、当該学習データ登録部１０７は学習データを構成して学習データ記憶部１２２に登録する。 Further, for example, the user is made to select one document from the document group included in the cluster selected by the user on the display unit 104, the document is passed to the learning data registration unit 107 as a correct answer to the search condition, and the learning data registration is performed. The unit 107 configures the learning data and registers it in the learning data storage unit 122.

学習モデル生成部１０８は、学習データ記憶部１２２に記憶された学習データを用いて再ランク付けのための学習モデルを生成する。全ての学習データを用いて１つの学習モデルを生成するのではなく、生成モデル決定部１０９は当該学習データが登録された際のクラスタに含まれる文書群に関する情報を用いて学習データをグループ化し、そのグループに基づいて学習モデル生成部１０８が当該グループ毎に学習モデルを生成する。 The learning model generation unit 108 generates a learning model for re-ranking using the learning data stored in the learning data storage unit 122. Instead of generating one learning model using all the training data, the generation model determination unit 109 groups the training data using the information about the document group included in the cluster when the training data is registered. The learning model generation unit 108 generates a learning model for each group based on the group.

図２は、本発明の実施形態に係る情報処理装置１００に適用可能なハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of a hardware configuration applicable to the information processing apparatus 100 according to the embodiment of the present invention.

図２に示すように、情報処理装置１００は、システムバス２０４を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０２、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０３、入力コントローラ２０５、ビデオコントローラ２０６、メモリコントローラ２０７、通信Ｉ／Ｆコントローラ２０８等が接続された構成を採る。 As shown in FIG. 2, the information processing apparatus 100 includes a CPU (Central Processing Unit) 201, a RAM (Random Access Memory) 202, a ROM (Read Only Memory) 203, an input controller 205, and a video controller 206 via the system bus 204. , Memory controller 207, communication I / F controller 208, etc. are connected.

ＣＰＵ２０１は、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 comprehensively controls each device and controller connected to the system bus 204.

また、ＲＯＭ２０３あるいは外部メモリ２１１には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、各サーバあるいは各ＰＣが実行する機能を実現するために必要な後述する各種プログラム等が記憶されている。また、本発明を実施するために必要な情報が記憶されている。なお外部メモリはデータベースであってもよい。 Further, the ROM 203 or the external memory 211 will be described later, which is necessary for realizing the functions executed by the BIOS (Basic Input / Output System) and the OS (Operating System), which are the control programs of the CPU 201, and each server or each PC. Various programs etc. are stored. In addition, information necessary for carrying out the present invention is stored. The external memory may be a database.

ＲＡＭ２０２は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０３あるいは外部メモリ２１１からＲＡＭ２０２にロードし、ロードしたプログラムを実行することで各種動作を実現する。 The RAM 202 functions as a main memory, a work area, and the like of the CPU 201. The CPU 201 realizes various operations by loading a program or the like necessary for executing the process from the ROM 203 or the external memory 211 into the RAM 202 and executing the loaded program.

また、入力コントローラ２０５は、キーボード（ＫＢ）２０９や不図示のマウス等のポインティングデバイス等からの入力を制御する。 Further, the input controller 205 controls input from a pointing device such as a keyboard (KB) 209 or a mouse (not shown).

ビデオコントローラ２０６は、ディスプレイ２１０等の表示器への表示を制御する。尚、表示器は液晶ディスプレイ等の表示器でもよい。これらは、必要に応じて管理者が使用する。 The video controller 206 controls the display on a display such as the display 210. The display may be a display such as a liquid crystal display. These are used by the administrator as needed.

メモリコントローラ２０７は、ブートプログラム、各種のアプリケーション、フォントデータ、ユーザファイル、編集ファイル、各種データ等を記憶する外部記憶装置（ハードディスク（ＨＤ））や、フレキシブルディスク（ＦＤ）、あるいは、ＰＣＭＣＩＡ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒＭｅｍｏｒｙＣａｒｄＩｎｔｅｒｎａｔｉｏｎａｌＡｓｓｏｃｉａｔｉｏｎ）カードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等の外部メモリ２１１へのアクセスを制御する。 The memory controller 207 is an external storage device (hard disk (HD)) for storing boot programs, various applications, font data, user files, edit files, various data, etc., a flexible disk (FD), or a PCMCIA (Personal Computer). It controls access to an external memory 211 such as a compact flash® memory connected to a Memory Card International Association card slot via an adapter.

通信Ｉ／Ｆコントローラ２０８は、ネットワークを介して外部機器と接続・通信し、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）を用いた通信等が可能である。 The communication I / F controller 208 connects and communicates with an external device via the network, and executes communication control processing on the network. For example, communication using TCP / IP (Transmission Control Protocol / Internet Protocol) is possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０２内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１０上に表示することが可能である。また、ＣＰＵ２０１は、ディスプレイ２１０上のマウスカーソル（図示しない）等によるユーザ指示を可能とする。 The CPU 201 can display the outline font on the display 210 by executing the outline font expansion (rasterization) process on the display information area in the RAM 202, for example. Further, the CPU 201 enables a user instruction by a mouse cursor (not shown) or the like on the display 210.

本発明を実現するための後述する各種プログラムは、外部メモリ２１１に記録されており、必要に応じてＲＡＭ２０２にロードされることによりＣＰＵ２０１によって実行されるものである。 Various programs described later for realizing the present invention are recorded in the external memory 211, and are executed by the CPU 201 by being loaded into the RAM 202 as needed.

次に図３から図６を用いて検索の概要を説明する。図３は、本発明の実施形態に係わる検索条件の入力と検索結果から正解を指定するユーザインタフェースの一例である。 Next, the outline of the search will be described with reference to FIGS. 3 to 6. FIG. 3 is an example of a user interface for inputting search conditions and designating a correct answer from the search results according to the embodiment of the present invention.

条件検索入力画面３０１は、検索ユーザが質問文欄３１１に「大谷選手の打撃成績」を入力し、検索ボタン３１２を押下することで検索を開始する画面である。 The condition search input screen 301 is a screen in which the search user inputs "Otani's batting record" in the question text field 311 and presses the search button 312 to start the search.

検索ユーザが最終的に閲覧したい文書データの一覧を文書閲覧画面３０２で説明する。表示領域３２１に、文書のＩＤが”００２８”であること、タイトル、本文などが表示される。 A list of document data that the search user finally wants to browse is described on the document browsing screen 302. In the display area 321, the document ID is "0028", the title, the text, and the like are displayed.

また、検索ユーザが表示領域３２１に表示された文書を閲覧して、まさに自身が見たかった情報であると判断すれば、正解ボタン３２２を押下することで、学習データ記憶部１２２に登録させることができる。 Further, if the search user browses the document displayed in the display area 321 and determines that the information is exactly the information he / she wants to see, he / she presses the correct answer button 322 to register the document in the learning data storage unit 122. Can be done.

しかし、文書ＩＤ”００２８”にたどり着くためには、まず図４の検索結果一覧４００が提示される。一般的に前記検索結果一覧４００が検索ユーザに提示されることが多いが、一部の商用システムには自動分類して、検索ユーザに提示する文書群を限定することがある。これは登録されている文書の数にもよるが、検索条件に対して数百件がヒットした場合に、検索ユーザが求める情報にたどり着くのは大変であり、何らかの方法で、例えば検索ユーザに再度何かの条件を指定させることで数十件に限定することで求める情報にたどり着きやすくするためである。 However, in order to reach the document ID "0028", the search result list 400 of FIG. 4 is first presented. Generally, the search result list 400 is often presented to the search user, but some commercial systems may be automatically classified to limit the document group to be presented to the search user. This depends on the number of registered documents, but when hundreds of hits are made to the search conditions, it is difficult to reach the information requested by the search user, and in some way, for example, the search user is asked again. This is to make it easier to reach the desired information by limiting the number to dozens by specifying some condition.

一つの方法として、検索条件にヒットした文書群をクラスタリングして、内容が類似する文書群をクラスタと呼ばれるグループとして提示する。検索ユーザはまず自身が得たい情報のクラスタを選択することで前述の通り確認する文書を一部に限定することになる。 As one method, a group of documents that hit the search conditions is clustered, and a group of documents having similar contents is presented as a group called a cluster. The search user first selects the cluster of information he / she wants to obtain, thereby limiting the documents to be confirmed to a part as described above.

図４の検索結果一覧には、例えば大谷一朗選手に関する情報であっても、高校時代、日本のプロ野球で活躍していた時代、さらにメジャーリーグでの成績などのクラスタに分かれると思われる。図５では、これらの分類がクラスタ一覧の例５０１として検索ユーザに提示される。 In the search result list in Fig. 4, for example, even information about Ichiro Otani may be divided into clusters such as high school days, times when he was active in professional baseball in Japan, and results in major leagues. In FIG. 5, these classifications are presented to the search user as Example 501 of the cluster list.

検索ユーザがクラスタ５０３の”メジャーリーグ”を選択すると、そのクラスタに分類された文書一覧（図６の６００）が表示される。図４の検索結果一覧４００では大谷一朗選手について様々な情報が含まれていたが、図６の文書一覧６００ではメジャーリーグでの活躍に関する記事だけが含まれるという例を示している。このように文書一覧を制限することで、検索ユーザが求める情報（図３の文書ＩＤ”００２８”）に容易にたどり着くことを可能とする。 When the search user selects "Major League" of the cluster 503, a list of documents classified into the cluster (600 in FIG. 6) is displayed. The search result list 400 in FIG. 4 contains various information about Ichiro Otani, but the document list 600 in FIG. 6 shows an example in which only articles related to activities in the major leagues are included. By limiting the document list in this way, it is possible to easily reach the information requested by the search user (document ID "0028" in FIG. 3).

また、クラスタリングは検索ユーザが指定する質問文（図３の３１１）にヒットする文書群が異なれば当然異なるグループに分類される。図５の５０２のように”アメリカでの野球”という質問文で検索すれば、大谷一朗選手に関する情報が集まるとは限らず、図４の例とは異なる文書がヒットする。さらにそれらの文書をクラスタリングすれば異なるクラスタが生成される。５０２の例では、メジャーリーグというクラスタ（５０４）が生成されているが、これは見出しは同じでも５０３とは全く異なる文書が多く含まれることになる。例えば、アメリカの野球の中でのメジャーリーグの位置づけ、大谷一朗選手以外の名選手の記録、などが含まれているなど考えられる。このように同じタイトルが付いていても”メジャーリーグ”というタイトルにはあまり意味はなく、それよりも同じクラスタに含まれる文書ＩＤにはどのようなものがあるか、といったリストがこれらのクラスタの言語的特徴を表すものとして重要になるのである。 Further, clustering is naturally classified into a different group if the document group that hits the question sentence (311 in FIG. 3) specified by the search user is different. If you search for the question "baseball in the United States" as in 502 in FIG. 5, information about Ichiro Otani is not always collected, and a document different from the example in FIG. 4 is hit. Further clustering those documents will generate different clusters. In the example of 502, a cluster (504) called Major League Baseball is generated, which means that many documents with the same heading but completely different from 503 are included. For example, it may include the position of Major League Baseball in American baseball, records of famous players other than Ichiro Otani, and so on. The title "Major League" doesn't make much sense even if it has the same title like this, and the list of what kind of document IDs are included in the same cluster is the language of these clusters. It becomes important as a representation of the characteristic.

図７は、本発明の実施形態に係る学習データの構造の一例である。順位学習において学習モデルを生成するための学習データは、例えば実際にユーザが検索した際に、ユーザの検索意図に一致した文書を指定することで得られるものである。従って、少なくともユーザの検索条件と、ユーザが選択した文書を特定するための情報がペアで登録される必要がある。学習データ７０１においては、それらは質問文７０３、正解文書ＩＤ７０４として格納される。 FIG. 7 is an example of the structure of the learning data according to the embodiment of the present invention. The learning data for generating a learning model in rank learning is obtained by, for example, designating a document that matches the user's search intention when the user actually searches. Therefore, at least the user's search condition and the information for identifying the document selected by the user need to be registered as a pair. In the learning data 701, they are stored as the question sentence 703 and the correct answer document ID 704.

本発明における学習データ記憶部１２２に格納された学習データ７０１（ａ〜ｆ）は、前記質問文７０３、正解文書ＩＤ７０４以外に、同一クラスタ文書ＩＤリスト７０５が格納されることを特徴とする。 The learning data 701 (a to f) stored in the learning data storage unit 122 in the present invention is characterized in that the same cluster document ID list 705 is stored in addition to the question sentence 703 and the correct answer document ID 704.

例えば、「大谷選手の打撃成績」という検索条件を入力した検索ユーザが、クラスタ５０３の「メジャーリーグ」を選択したことを図６で示している。当該クラスタは、文書ＩＤとして”０００５”、”０００６”・・・、”００２８”を含んでいるため、学習データ７０１ａの同一クラスタ文書ＩＤリスト７０５にはこれらの文書ＩＤのリストがそのまま記載されている。 For example, FIG. 6 shows that the search user who entered the search condition “Otani's batting record” selected the “major league” of the cluster 503. Since the cluster includes "0005", "0006" ..., "0028" as the document IDs, the list of these document IDs is described as it is in the same cluster document ID list 705 of the learning data 701a. There is.

次に図８〜図９を用いて、学習モデル生成の処理について説明する。図８は、本発明の実施形態に係る学習モデル生成の処理を説明するフローチャートの一例である。図８のフローチャートの各ステップは、情報処理装置１００上のＣＰＵ２０１で実行される。
＜実施形態１＞
ステップＳ８０１では、学習データ記憶部１２２に記憶された全ての学習データ群を読み込む。読み込んだデータを図９の学習データの文書ＩＤベクトル（例１）９０１に示す。このテーブルは図７と本質的に同じものであるが、文書ＩＤリストを一覧として表現した図７に対して、（例えば文書ＩＤが１〜５００まであるとすれば）それぞれの文書ＩＤを１列にならべたベクトルとして表現している。ベクトルの各要素は、同一クラスタ文書ＩＤリスト７０５にその文書ＩＤが含まれる場合は”１”、含まれない場合は”０”としたものである。 Next, the process of generating the learning model will be described with reference to FIGS. 8 to 9. FIG. 8 is an example of a flowchart for explaining the process of generating a learning model according to the embodiment of the present invention. Each step of the flowchart of FIG. 8 is executed by the CPU 201 on the information processing apparatus 100.
<Embodiment 1>
In step S801, all the learning data groups stored in the learning data storage unit 122 are read. The read data is shown in the document ID vector (Example 1) 901 of the learning data of FIG. This table is essentially the same as FIG. 7, but for FIG. 7, which represents a list of document IDs, each document ID is listed in one column (for example, if the document IDs are 1 to 500). It is expressed as a vector arranged in line. Each element of the vector is set to "1" when the document ID is included in the same cluster document ID list 705, and "0" when the document ID is not included.

ステップＳ８０２では、学習データ（９０１）の中で、正解文書ＩＤ（９１１）が同一のものを図９の学習データの文書ＩＤベクトル（例２）９０２のようにまとめる。例えば、正解文書ＩＤが”００２８”であるものは”Ｌ０００１”、”Ｌ０００３”の２つある。これらを９０２の１行目のようにまとめる。具体的には、学習データＩＤリスト９１６に、これら２つの学習データＩＤを列挙し、また９０１の文書ＩＤベクトルは、各ＩＤにあり（１）、なし（０）だけを示していたものを、合計で幾つあったかを表すようにする。例えば、文書ＩＤが”０００５”に相当する値は、９１４では”１”だが、９１７では”２”となっている。このように単純に合計するのはあくまで例であって様々な計算方法があることはいうまでもない。 In step S802, among the learning data (901), those having the same correct answer document ID (911) are put together as in the document ID vector (Example 2) 902 of the learning data of FIG. For example, there are two documents whose correct document ID is "0028", "L0001" and "L0003". These are summarized as in the first line of 902. Specifically, these two learning data IDs are listed in the learning data ID list 916, and the document ID vector of 901 shows only (1) and none (0) in each ID. Try to show how many there were in total. For example, the value corresponding to the document ID "0005" is "1" in 914, but is "2" in 917. It goes without saying that such simple summing is just an example and there are various calculation methods.

ステップＳ８０３では、９０２の学習データをグループ化する。目的は、１つの学習モデルを生成する際に使用する学習データを決定することである。すなわち、文書内に何らかの分類情報が入っている場合には、正解文書ＩＤで示す文書内の分類情報が同一のものを集めて学習データをグループ化するなどが可能であるが、本願発明ではそのような分類情報を持たない、あるいは使用できない場合を想定しているため、図６のようにドリルダウンした際に、同じような文書群が含まれている学習データは、１つの学習モデルを生成するために使用するものと仮定している。 In step S803, the training data of 902 is grouped. The purpose is to determine the training data to use when generating one training model. That is, when some classification information is included in the document, it is possible to collect the same classification information in the document indicated by the correct answer document ID and group the learning data. However, in the present invention, this is possible. Since it is assumed that such classification information is not available or cannot be used, when drilling down as shown in FIG. 6, training data containing similar document groups generates one learning model. It is assumed that it will be used to do so.

なお、ここで学習モデルをグループ化する方法としては、周知の技術としてベクトルのクラスタリングがある。図９の例２では、文書数に相当する５００次元のベクトルを相互に比較し、クラスタリングする技術である。また重複クラスタリングとして、同一のベクトルが複数のクラスタに含まれることを許容する技術もある。いずれにしてもこれらのベクトル群をクラスタに分ける技術であれば、どのような方式であっても良いことはいうまでもない。 As a method of grouping learning models here, there is vector clustering as a well-known technique. In Example 2 of FIG. 9, it is a technique of comparing and clustering 500-dimensional vectors corresponding to the number of documents. There is also a technique for allowing the same vector to be included in a plurality of clusters as duplicate clustering. In any case, it goes without saying that any method may be used as long as it is a technique for dividing these vector groups into clusters.

これにより、前記の通り予め文書群に分類情報がない場合であっても、学習データを適切なグループに分けて複数の学習データを生成することが可能になる、という効果を得ることができる。グループ化した学習モデル群を図９のグループ９０３（ａ、ｂ）として例示する。 As a result, even if there is no classification information in the document group in advance as described above, it is possible to obtain the effect that the learning data can be divided into appropriate groups and a plurality of learning data can be generated. The grouped learning model group is illustrated as the group 903 (a, b) of FIG.

Ｓ８０４からＳ８０６の繰り返し処理は、Ｓ８０３でグループ化した学習データ群１つ１つに対する処理である。 The iterative processing of S804 to S806 is a processing for each learning data group grouped in S803.

Ｓ８０５は、グループ化した学習データ群（例えば図９の９０３ａ、９０３ｂ）の１つずつに着目し、当該学習データで学習モデルの生成を行う。順位学習の場合ＳＶＭ（サポートベクターマシン）などにより実現することが可能である。生成された学習モデルは、学習モデル記憶部１２３に格納する。 S805 pays attention to each of the grouped learning data groups (for example, 903a and 903b in FIG. 9), and generates a learning model with the learning data. In the case of rank learning, it can be realized by SVM (support vector machine) or the like. The generated learning model is stored in the learning model storage unit 123.

以上で図８のフローチャートによる本願発明における学習モデルの生成についての説明を完了する。 This completes the description of the generation of the learning model in the present invention according to the flowchart of FIG.

図１０は、本発明の実施形態に係る学習モデルを記憶する際のデータ構造と検索時に選択されたクラスタの類似度計算を説明する図の一例である。まず学習モデル記憶部１２３に格納された学習モデルについて説明する。本図では２つの学習モデルが格納されているものとする。 FIG. 10 is an example of a diagram illustrating a data structure when storing a learning model according to an embodiment of the present invention and a similarity calculation of clusters selected at the time of retrieval. First, the learning model stored in the learning model storage unit 123 will be described. In this figure, it is assumed that two learning models are stored.

ＳＶＭ（サポートベクターマシン）などで生成された学習モデルの本体は、学習モデル記憶部１２３に格納されている。しかしながら、従来技術では、これらの学習モデルをどのような条件の下で利用するかという情報は含まれておらず、学習モデルを利用するアプリケーション（あるいはユーザ）が、複数ある学習モデルから使用すべきものを選択することになる。しかし、本願発明の前提として、文書群に分類情報に相当する情報が固定的に用意されておらず、また図５のクラスタも動的に生成されるため、どの学習モデルを利用すべきかは検索時に決定するしかない。 The main body of the learning model generated by SVM (support vector machine) or the like is stored in the learning model storage unit 123. However, in the prior art, information on under what conditions these learning models are used is not included, and an application (or user) using the learning model should use from a plurality of learning models. Will be selected. However, as a premise of the present invention, since the information corresponding to the classification information is not fixedly prepared in the document group and the cluster of FIG. 5 is dynamically generated, it is possible to search which learning model should be used. There is no choice but to decide at times.

本願発明の特徴は、各々の学習モデルに関連づけて、その学習モデルがいかなる状況で使用されるかを示す学習モデルの言語的特徴１００３を含むことにある。 A feature of the present invention is that it includes a linguistic feature 1003 of a learning model that indicates in what circumstances the learning model is used in relation to each learning model.

言語的特徴１００３の設定データ１００５の一例として、文書ＩＤベクトル（総和）で表した場合を示す。文書ＩＤベクトル（総和）１００６は、図９のグループ９０３ａに含まれる学習データの文書ＩＤベクトルを単純に総和したものである。つまり、この学習モデルを生成した際に用いた学習データでは、正解文書と同じクラスタに、どのような文書がどの程度出現したか、という傾向が記載されていることになる。類似検索や文書のクラスタリングでは、これは１種の言語的特徴を示すものであり、本願発明での当該学習モデルの言語的特徴である。 As an example of the setting data 1005 of the linguistic feature 1003, a case where it is represented by a document ID vector (sum) is shown. The document ID vector (sum) 1006 is simply the sum of the document ID vectors of the training data included in the group 903a of FIG. That is, in the learning data used when this learning model was generated, the tendency of what kind of document and how much appeared in the same cluster as the correct answer document is described. In similar search and document clustering, this indicates one type of linguistic feature, which is the linguistic feature of the learning model in the present invention.

図１１は、本発明の実施形態に係る検索処理を説明するフローチャートの一例である。図１１のフローチャートの各ステップは、情報処理装置１００上のＣＰＵ２０１で実行される。 FIG. 11 is an example of a flowchart for explaining the search process according to the embodiment of the present invention. Each step of the flowchart of FIG. 11 is executed by the CPU 201 on the information processing apparatus 100.

ステップＳ１１０１では、検索ユーザあるいはアプリケーションから検索条件を受け付け、ステップＳ１１０２では、文書記憶部１２１から当該検索条件にヒットする文書群を取得する。 In step S1101, the search condition is received from the search user or the application, and in step S1102, the document group that hits the search condition is acquired from the document storage unit 121.

ステップＳ１１０３では、ステップＳ１１０２で取得した文書群をクラスタリングする。この例が図５の５０１で、例えば３つのクラスタとなっている。 In step S1103, the document group acquired in step S1102 is clustered. An example of this is 501 in FIG. 5, for example, three clusters.

ステップＳ１１０４では、前記クラスタの一覧をユーザに提示し、ステップＳ１１０５では、提示されたクラスタの中からユーザが１つを選択する。すなわちドリルダウンする。検索ユーザは図５の５０１の中から"メジャーリーグ”というクラスタを選択したとする。 In step S1104, the list of the clusters is presented to the user, and in step S1105, the user selects one from the presented clusters. That is, drill down. It is assumed that the search user selects a cluster called "Major League" from 501 in FIG.

ステップＳ１１０６では、前記ユーザが選択したクラスタ（例では”メジャーリーグ”）に含まれる文書ＩＤリストをベクトルとして生成する。図１０の１００７（検索時に自動生成されたクラスタに含まれる文書ＩＤベクトル）が生成されたものである。次に１００１ａ〜１００１ｂの文書ＩＤベクトル（総和）１００６の中で、１００７のベクトルと類似度が一番高いものを特定する。 In step S1106, a document ID list included in the cluster selected by the user (“Major League” in the example) is generated as a vector. 1007 (document ID vector included in the cluster automatically generated at the time of search) of FIG. 10 is generated. Next, among the document ID vectors (sum) 1006 of 1001a to 1001b, the one having the highest degree of similarity to the vector of 1007 is specified.

しかしながら、一番類似度が高いものでも、学習モデルとして採用するのが不適切な場合がある。そこで、ステップＳ１１０７では、不図示の記憶部に記憶された閾値と比較し、ベクトルの類似度が閾値を超える学習モデルがない場合（ＮＯの場合）は学習モデルを用いずに、ステップＳ１１０２の類似検索結果のランキングをそのままユーザに提示するようにしても良い。適切な学習モデルがある場合には、ステップＳ１１０８において、ステップＳ１１０２の類似検索の結果を再ランク付けし、ステップＳ１１０９で検索結果としてユーザに提示する。 However, even the one with the highest degree of similarity may be inappropriate to be adopted as a learning model. Therefore, in step S1107, when compared with the threshold value stored in the storage unit (not shown) and there is no learning model in which the vector similarity exceeds the threshold value (NO), the learning model is not used and the similarity in step S1102 is performed. The ranking of the search results may be presented to the user as it is. If there is an appropriate learning model, in step S1108, the results of the similar search in step S1102 are reranked and presented to the user as search results in step S1109.

最後にステップＳ１１１０では、ユーザが提示された検索結果の中から１つの文書を、ユーザ自身の検索に対して適切な文書であった、と指定した場合にはそれを正解選択として受付け、ステップＳ１１１１にて新たな学習データとして学習データ記憶部１２２に登録する。この学習データは次回の学習モデル生成時に使われることになる。 Finally, in step S1110, if one of the search results presented by the user is specified as an appropriate document for the user's own search, it is accepted as a correct answer selection, and step S1111 Is registered in the learning data storage unit 122 as new learning data. This training data will be used when the next training model is generated.

以上で、図１１のフローチャートを用いて、クラスタリング及びドリルダウンから、適切な学習モデルを選択して、クラスタ内に出現した文書群を再ランク付けしてユーザに提示する処理についての説明を完了する。
＜実施形態２＞
他の実施形態について説明する。図８のフローチャートにおいては、ステップＳ８０２で同一の正解文書ＩＤをもつ学習データを１つにまとめたが、この処理を実施しなくても良い。 This completes the description of the process of selecting an appropriate learning model from clustering and drilldown using the flowchart of FIG. 11, re-ranking the document group appearing in the cluster, and presenting it to the user. ..
<Embodiment 2>
Other embodiments will be described. In the flowchart of FIG. 8, the learning data having the same correct answer document ID is put together in step S802, but this process does not have to be performed.

その場合には、同じ正解文書ＩＤを持つ学習データが異なる学習データのグループに含まれるようになる。正解となる文書が同一のものであっても、そもそものユーザの検索意図が異なれば、検索条件の言語的特徴も異なり、同一クラスタに含まれる他の文書ＩＤも全く異なる可能性もある。このような学習データを無理に１つにまとめて同一の学習データを生成するために用いる必要はなく、異なる学習モデルを生成するために用いることで、よりユーザの意図を反映した学習モデルが生成可能になるという効果を得ることができる。
＜実施形態３＞
図１０の例では学習モデルの言語的特徴１００３の設定データ１００５を文書ＩＤベクトル（総和）１００６としているが、クラスタの言語的特徴を表すものであれば、いかなるものでもよいのはいうまでもない。 In that case, the learning data having the same correct document ID will be included in different training data groups. Even if the correct documents are the same, if the user's search intention is different, the linguistic characteristics of the search conditions are different, and other document IDs included in the same cluster may be completely different. It is not necessary to forcibly combine such learning data into one and use it to generate the same learning data, but by using it to generate different learning models, a learning model that more reflects the user's intention is generated. You can get the effect of being possible.
<Embodiment 3>
In the example of FIG. 10, the setting data 1005 of the linguistic feature 1003 of the learning model is set as the document ID vector (sum) 1006, but it goes without saying that any data can be used as long as it represents the linguistic feature of the cluster. ..

例えば、対応する学習モデルを生成するために使用した学習モデルの”質問文”と、正解となる文書内のテキストから、特徴語（重要語など）を自然言語処理により取り出して、１００５に格納しても良い。 For example, characteristic words (important words, etc.) are extracted by natural language processing from the "question sentence" of the learning model used to generate the corresponding learning model and the text in the document that is the correct answer, and stored in 1005. You may.

この場合、検索時にも検索ユーザが選択したクラスタに含まれる文書から特徴語を抽出して、１００５と比較しても良い。 In this case, the feature words may be extracted from the documents included in the cluster selected by the search user during the search and compared with 1005.

また、単語そのものではなくても良い。周知の技術のモデルがある。”ＴｏｍａｓＭｉｋｏｌｏｖ，ＫａｉＣｈｅｎ，ａｎｄＪｅｆｆｒｅｙＤｅａｎ，Ｅｆｆｉｃｉｅｎｔｅｓｔｉｍａｔｉｏｎｏｆｗｏｒｄｒｅｐｒｅｓｅｎｔａｔｉｏｎｉｎｖｅｃｔｏｒｓｐａｃｅ，ＣｏＲＲ，Ｖｏｌ．ａｂｓ／１３０１．３７８１，，２０１３”
この技術では、大量の文書内に出現する単語を例えば２００次元の素性ベクトルとして表すように学習する。さらに文書の内容は、それら素性ベクトルの和として考えることができる。従って、学習モデルの生成に関与した質問文やクラスタに含まれた文書群の特徴を素性ベクトルとして表し、また検索時には、ユーザが選択したクラスタに含まれる文書群から素性ベクトルを生成して、類似度を比較することも可能である。 Also, it does not have to be the word itself. There are well-known technical models. "Tomas Mikolov, Kai Chen, and Jeff Dean, Effective stimation of world vector space, CoRR, Vol.abs / 1301.3781, 2013"
In this technique, words appearing in a large number of documents are learned to be represented as, for example, a 200-dimensional feature vector. Furthermore, the content of the document can be thought of as the sum of those feature vectors. Therefore, the features of the question sentences involved in the generation of the learning model and the document group included in the cluster are expressed as feature vectors, and at the time of search, the feature vector is generated from the document group included in the cluster selected by the user and similar. It is also possible to compare degrees.

これにより、単なる文書ＩＤで構成される数値のベクトルの類似度や、特徴語の類似度だけではなく、より意味的に類似した学習モデルを選択することが可能になるという効果が得られる。 This has the effect of making it possible to select learning models that are more semantically similar, not just the similarity of numerical vectors composed of document IDs and the similarity of feature words.

クラスタに含まれる文書一覧を最適に再ランク付けするための学習モデルを選択するための方法であれば、いかなる素性を利用しても良いことはいうまでもない。 It goes without saying that any feature can be used as long as it is a method for selecting a learning model for optimally re-ranking the list of documents contained in the cluster.

また、本実施例では文書を対象としたが、データとして検索、分類、評価が可能な画像等、様々な種類のデータにも適用可能である。 Further, although a document is targeted in this embodiment, it can be applied to various types of data such as images that can be searched, classified, and evaluated as data.

なお、上述した各種データの構成及びその内容はこれに限定されるものではなく、用途や目的に応じて、様々な構成や内容で構成されることは言うまでもない。 It should be noted that the structure and contents of the various data described above are not limited to this, and it goes without saying that the structure and contents are various depending on the intended use and purpose.

以上、いくつかの実施形態について示したが、本発明は、例えば、システム、装置、方法、コンピュータプログラムもしくは記録媒体等としての実施態様をとることが可能であり、具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although some embodiments have been described above, the present invention can take an embodiment as, for example, a system, an apparatus, a method, a computer program, a recording medium, or the like, and specifically, a plurality of devices. It may be applied to a system composed of, or may be applied to a device composed of one device.

また、本発明におけるコンピュータプログラムは、図８、図１１に示すフローチャートの処理方法をコンピュータが実行可能なコンピュータプログラムであり、本発明の記憶媒体は図８、図１１の処理方法をコンピュータが実行可能なコンピュータプログラムが記憶されている。なお、本発明におけるコンピュータプログラムは図８、図１１の各装置の処理方法ごとのコンピュータプログラムであってもよい。 Further, the computer program in the present invention is a computer program in which a computer can execute the processing methods of the flowcharts shown in FIGS. 8 and 11, and the storage medium of the present invention can execute the processing methods in FIGS. 8 and 11. Computer programs are stored. The computer program in the present invention may be a computer program for each processing method of the devices shown in FIGS. 8 and 11.

以上のように、前述した実施形態の機能を実現するコンピュータプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたコンピュータプログラムを読出し実行することによっても、本発明の目的が達成されることは言うまでもない。 As described above, a computer in which a recording medium on which a computer program that realizes the functions of the above-described embodiment is recorded is supplied to the system or device, and the computer (or CPU or MPU) of the system or device is stored in the recording medium. Needless to say, the object of the present invention can be achieved by reading and executing the program.

この場合、記録媒体から読み出されたコンピュータプログラム自体が本発明の新規な機能を実現することになり、そのコンピュータプログラムを記憶した記録媒体は本発明を構成することになる。 In this case, the computer program itself read from the recording medium realizes the novel function of the present invention, and the recording medium storing the computer program constitutes the present invention.

コンピュータプログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク、ソリッドステートドライブ等を用いることができる。 Recording media for supplying computer programs include, for example, flexible disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, DVD-ROMs, magnetic tapes, non-volatile memory cards, ROMs, EEPROMs, etc. Silicon disks, solid state drives, etc. can be used.

また、コンピュータが読み出したコンピュータプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのコンピュータプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the computer program read by the computer, not only the functions of the above-described embodiment are realized, but also the OS (operating system) or the like running on the computer is activated based on the instructions of the computer program. Needless to say, there are cases where a part or all of the actual processing is performed and the processing realizes the functions of the above-described embodiment.

さらに、記録媒体から読み出されたコンピュータプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのコンピュータプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, the computer program read from the recording medium is written to the memory provided in the function expansion board inserted in the computer or the function expansion unit connected to the computer, and then its function is based on the instruction of the computer program code. Needless to say, there are cases where a CPU or the like provided in the expansion board or the function expansion unit performs a part or all of the actual processing, and the functions of the above-described embodiment are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。また、本発明は、システムあるいは装置にコンピュータプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのコンピュータプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, the present invention may be applied to a system composed of a plurality of devices or a device composed of one device. It goes without saying that the present invention can also be applied when it is achieved by supplying a computer program to a system or device. In this case, by reading the recording medium in which the computer program for achieving the present invention is stored into the system or the device, the system or the device can enjoy the effect of the present invention.

さらに、本発明を達成するためのコンピュータプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, by downloading and reading a computer program for achieving the present invention from a server, database, or the like on the network by a communication program, the system or device can enjoy the effect of the present invention.

なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 It should be noted that all the configurations in which each of the above-described embodiments and modifications thereof are combined are also included in the present invention.

１００情報処理装置
１０１検索条件受付部
１０２類似検索部
１０３クラスタリング部
１０４表示部
１０５学習モデル選択部
１０６再ランク付け部
１０７学習データ登録部
１０８学習モデル生成部
１０９生成モデル決定部
１２１文書記憶部
１２２学習データ記憶部
１２３学習モデル記憶部 100 Information processing device 101 Search condition reception unit 102 Similar search unit 103 Clustering unit 104 Display unit 105 Learning model selection unit 106 Reranking unit 107 Learning data registration unit 108 Learning model generation unit 109 Generation model determination unit 121 Document storage unit 122 Learning Data storage unit 123 Learning model storage unit

Claims

An information processing device that manages multiple learning models.
A reception means that analyzes and classifies the data searched according to the search conditions, presents the group, and accepts the designation of the group.
It is characterized by comprising a first feature data based on the data classified into the designated group and a selection means for selecting a learning model according to the second feature data associated with each of the learning models. Information processing device.

The information processing device according to claim 1, wherein the learning model is a learning model for ranking learning that ranks the searched data.

The data is a document, and the first feature data and the second feature data are the linguistic features of the data classified into the designated group and the language of the data used for training the learning model, respectively. The information processing apparatus according to claim 1 or 2, wherein the information processing apparatus represents a specific feature.

The first feature data and the second feature data are data showing the relationship between the group and the data classified into the group, respectively, and the relationship between the learning model and the data used for learning the learning model. The information processing apparatus according to any one of claims 1 to 3, wherein the data indicates the above.

The information processing apparatus according to any one of claims 1 to 4, further comprising a display control means for displaying and controlling the searched data by the selected learning model.

It is a control method of an information processing device that manages multiple learning models.
A reception step in which the reception means analyzes the data searched according to the search conditions, presents the classified group, and accepts the designation of the group.
The selection means includes a selection step of selecting a learning model according to the first feature data based on the data classified into the designated group and the second feature data associated with each of the learning models. A control method for an information processing device characterized by.

A program that can be executed by an information processing device that manages multiple learning models.
The information processing device
A reception means that analyzes and classifies the data searched according to the search conditions, presents the group, and accepts the designation of the group.
A program for functioning as a selection means for selecting a learning model according to the first feature data based on the data classified into the designated group and the second feature data associated with each of the learning models.