JP2021182329A

JP2021182329A - Learning model selection method

Info

Publication number: JP2021182329A
Application number: JP2020088266A
Authority: JP
Inventors: 祐介船矢; Yusuke Funaya; 実佳高田; Mika TAKATA; 俊彦樫山; Toshihiko Kashiyama
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2021-11-25
Also published as: WO2021235177A1

Abstract

To select a highly accurate learning model for solving problems for new data out of existing models.SOLUTION: A model selection device separates input data into test data and learning data. Subsequently, the learning data obtained from the input data is input to create a model by transferring learning from one or more existing models. Subsequently, the test data obtained from the input data is input to the model by the transferring learning to calculate an objective variable of the data obtained by inference and a mean square error with the true value of the item corresponding to the objective variable of the input data. Then, a transition model with a small mean square error is selected as the optimum model. When selecting an existing model, a statistic of a data item (variance, mean) is compared with the same data item statistic of the input data, and the smaller one is selected.SELECTED DRAWING: Figure 3

Description

本発明は、学習モデル選択方法に係り、特に、転移学習において少ないデータから最適なモデルを選択するのに好適な学習モデル選択方法に関する。 The present invention relates to a learning model selection method, and more particularly to a learning model selection method suitable for selecting an optimum model from a small amount of data in transfer learning.

情報処理技術とそれを支えるベーステクノロジーの進展に伴って、様々な分野において、機械学習の期待が高まっている。機械学習とは、大量のデータまたはあるルールにより学習モデル（本明細書中では、単に「モデル」ともいう）を構築して、その学習モデルに従って最適な問題解決の解を導く手法である。 With the development of information processing technology and the base technology that supports it, expectations for machine learning are increasing in various fields. Machine learning is a method of constructing a learning model (also referred to simply as a "model" in the present specification) with a large amount of data or a certain rule, and deriving an optimum problem-solving solution according to the learning model.

一般に機械学習の精度は、データの量が十分多いほど高い精度を実現可能である。逆にデータが少ない場合は十分な精度のモデルを作成することは困難である。そのため、目的が同じ予測システムを構築しようとした場合でも、データが多く得られる案件では十分な精度だが、データが少ない案件では精度が不十分ということがありうる。 In general, the accuracy of machine learning can be improved as the amount of data is sufficiently large. On the contrary, when the data is small, it is difficult to create a model with sufficient accuracy. Therefore, even if an attempt is made to build a forecasting system with the same purpose, it is possible that the accuracy is sufficient for a project in which a large amount of data can be obtained, but the accuracy is insufficient in a project with a small amount of data.

機械学習の分野で少ないデータで高い精度を実現する手法としては、他のデータで作成したモデル(既存モデル)を丸ごと再利用したり、既存モデルと新しいデータを使った転移学習により既存モデルの一部を再利用する方法が知られている。既存モデルと新しいデータを使った転移学習とは、例えば既存モデルに新しいデータ入力し、その推論結果を新しいデータと結合して、結合後のデータを用いて新しいモデルを作成する方法である。 In the field of machine learning, as a method to achieve high accuracy with a small amount of data, one of the existing models can be reused entirely from a model created with other data (existing model), or transfer learning using an existing model and new data. A method of reusing a part is known. Transfer learning using an existing model and new data is, for example, a method of inputting new data into an existing model, combining the inference result with the new data, and creating a new model using the combined data.

この転移学習の手法においては、いかにして新しいデータに適合するモデルを選択するかが重要になる。そのような機械学習のために、最適なモデルを選択する技術としては、例えば、特許文献１に開示がある。特許文献１に記載された予測モデル選択システムでは、単一のデータから初期値を変えて複数の予測モデルを作成する。そして、予測モデルを構成する要素を数値化し、数値が選択基準を満たさない予測モデルを除外する。これにより、訓練データの数量が十分でない場合でも、適切な予測モデルを選択することができるとしている。 In this transfer learning method, it is important how to select a model that fits the new data. For example, Patent Document 1 discloses a technique for selecting an optimum model for such machine learning. In the prediction model selection system described in Patent Document 1, a plurality of prediction models are created by changing initial values from a single data. Then, the elements constituting the prediction model are quantified, and the prediction model whose numerical value does not meet the selection criteria is excluded. As a result, even if the amount of training data is not sufficient, an appropriate prediction model can be selected.

国際公開２０１７／１６８４５８号公報International Publication No. 2017/168458

一般に、最適なモデルを選択して、新たな学習システムを構築するにあたっては、再利用または転移学習に用いる既存モデルの特性が、新モデルの精度を左右することになる。例えば、地域ごとの住宅価格を予測する学習システムを構築する場合、サンプル数が多い都市部では精度の高いモデルを構築可能だが、サンプル数の少ない地方では十分な精度が得られない。この場合、都市部のモデルを転移学習に利用することが考えられるが、複数の都市で構築された多数のモデルの中から、どの都市のモデルを選択すべきかを決める必要がある。 In general, when selecting the optimum model and constructing a new learning system, the characteristics of the existing model used for reuse or transfer learning determine the accuracy of the new model. For example, when building a learning system that predicts house prices for each region, it is possible to build a highly accurate model in urban areas with a large number of samples, but sufficient accuracy cannot be obtained in rural areas with a small number of samples. In this case, it is conceivable to use the urban model for transfer learning, but it is necessary to decide which city model should be selected from a large number of models constructed in multiple cities.

特許文献１によると、単一のデータから初期値を変えて複数の予測モデルを作成し、予測モデルを構成する要素を数値化し、数値が選択基準を満たさない予測モデルを除外する技術が開示されている。しかしながら、異なるデータから生成されたモデルの中から、新規データに類似したモデルを選択するケースは想定されていない。 According to Patent Document 1, a technique is disclosed in which a plurality of predictive models are created by changing initial values from a single data, the elements constituting the predictive model are quantified, and the predictive models whose numerical values do not meet the selection criteria are excluded. ing. However, it is not assumed that a model similar to the new data is selected from the models generated from different data.

本発明の目的は、既存モデルの中から新規データに対して、問題解決にあたって精度の高い学習モデルを選択する学習モデル選択方法を提供することにある。 An object of the present invention is to provide a learning model selection method for selecting a learning model with high accuracy in solving a problem for new data from existing models.

本発明の学習モデル選択方法は、好ましくは、情報処理装置が保存している複数の既存のモデルから、新規の入力データから高い精度のモデルを作成する転移学習の手法に適したモデルを選択する学習モデル選択方法であって、情報処理装置が、入力データをテストデータと学習データに分離するステップと、情報処理装置が、入力データから得られる学習データを既存モデルに入力して、一つ以上の既存モデルからそれぞれ転移学習によるモデルを作成するステップと、情報処理装置が、入力データから得られるテストデータを転移学習によるモデルに入力し、推論して得られたデータの目的変数と、入力データの目的変数にあたる項目の真値との誤差を算出するステップと、情報処理装置が、誤差が小さい所定個数の転移モデルを最適なモデルとして選択するステップとを有するようにしたものである。 The learning model selection method of the present invention preferably selects a model suitable for a transfer learning method for creating a highly accurate model from new input data from a plurality of existing models stored in the information processing apparatus. One or more training model selection methods, in which the information processing device separates the input data into test data and training data, and the information processing device inputs the training data obtained from the input data into the existing model. The step of creating a model by transfer learning from each of the existing models of the above, the objective variable of the data obtained by inputting the test data obtained from the input data to the model by transfer learning, and the input data The data processing apparatus has a step of calculating an error from the true value of the item corresponding to the objective variable of the above, and a step of selecting a predetermined number of transfer models having a small error as the optimum model.

本発明によれば、既存モデルの中から新規データに対して、問題解決にあたって精度の高い学習モデルを選択する学習モデル選択方法を提供することができる。 INDUSTRIAL APPLICABILITY According to the present invention, it is possible to provide a learning model selection method for selecting a learning model with high accuracy in solving a problem for new data from existing models.

モデル選択装置の機能構成とデータフローを示す図である。It is a figure which shows the functional structure and data flow of a model selection apparatus. モデル選択装置のハードウェア・ソフトウェア構成を示す図である。It is a figure which shows the hardware software configuration of a model selection apparatus. 実施形態１のモデル選択方法の考え方を具体例を用いて説明する図である。It is a figure explaining the concept of the model selection method of Embodiment 1 by using a specific example. 住宅分析システムの住宅データテーブルの一例を示す図である。It is a figure which shows an example of the house data table of a house analysis system. 統計量項目テーブルの一例を示す図である。It is a figure which shows an example of a statistic item table. モデル統計量情報テーブルの一例を示す図である。It is a figure which shows an example of a model statistic information table. 統計量比較テーブルの一例を示す図である。It is a figure which shows an example of a statistic comparison table. 転移モデル結果テーブルの一例を示す図である。It is a figure which shows an example of a transfer model result table. 実施形態１のモデル選択装置が入力データの取り込みからモデルを選択するまでの一連の処理を示すフローチャートである。FIG. 5 is a flowchart showing a series of processes from the acquisition of input data to the selection of a model by the model selection device of the first embodiment. 実施形態１のデータ分析処理を示すフローチャートである。It is a flowchart which shows the data analysis process of Embodiment 1. FIG. 実施形態１の候補モデル検索処理を示すフローチャートである。It is a flowchart which shows the candidate model search process of Embodiment 1. FIG. 重視項目入力画面の一例を示す図である。It is a figure which shows an example of the important item input screen. 実施形態１の候補モデル選択処理を示すフローチャートである。It is a flowchart which shows the candidate model selection process of Embodiment 1. FIG. 転移学習処理を示すフローチャートである。It is a flowchart which shows the transfer learning process. 転移モデル評価処理を示すフローチャートである。It is a flowchart which shows the transition model evaluation process. 最適モデル評価処理を示すフローチャートである。It is a flowchart which shows the optimum model evaluation process. 実施形態２のモデル選択方法の考え方を具体例で説明する図である。It is a figure explaining the concept of the model selection method of Embodiment 2 by a concrete example. モデル結果テーブルの一例を示す図である。It is a figure which shows an example of a model result table. 実施形態２のモデル選択装置が入力データの取り込みからモデルを選択するまでの一連の処理を示すフローチャートである。FIG. 5 is a flowchart showing a series of processes from the acquisition of input data to the selection of a model by the model selection device of the second embodiment. 実施形態２のデータ分析処理を示すフローチャートである。It is a flowchart which shows the data analysis process of Embodiment 2. 実施形態２の候補モデル検索処理を示すフローチャートである。It is a flowchart which shows the candidate model search process of Embodiment 2. モデルＦＩ管理テーブルの一例を示す図である。It is a figure which shows an example of the model FI management table. 新モデルＦＩ結果テーブルの一例を示す図である。It is a figure which shows an example of a new model FI result table. モデルＦＩ比較テーブルの一例を示す図である。It is a figure which shows an example of a model FI comparison table. 実施形態３のモデル選択装置が入力データの取り込みからモデルを選択するまでの一連の処理を示すフローチャートである。6 is a flowchart showing a series of processes from the acquisition of input data to the selection of a model by the model selection device of the third embodiment. 実施形態３の候補モデル検索処理を示すフローチャートである。It is a flowchart which shows the candidate model search process of Embodiment 3. 実施形態３の候補モデル選択処理を示すフローチャートである。It is a flowchart which shows the candidate model selection process of Embodiment 3. 転移候補選択画面の一例を示す図である。It is a figure which shows an example of the transfer candidate selection screen.

以下、本発明に係る各実施形態を、図１ないし図２８を用いて説明する。 Hereinafter, each embodiment of the present invention will be described with reference to FIGS. 1 to 28.

〔実施形態１〕
以下、本発明に係る実施形態１を、図１ないし図１６を用いて説明する。
実施形態１の学習モデル選択方法は、新規モデルに適応させるための最適なモデルを、入力データの値との差異に従って、既存のモデルから選択するものである。 [Embodiment 1]
Hereinafter, the first embodiment according to the present invention will be described with reference to FIGS. 1 to 16.
In the learning model selection method of the first embodiment, the optimum model for adapting to the new model is selected from the existing models according to the difference from the value of the input data.

先ず、図１および図２を用いてモデル選択装置の構成について説明する。
モデル選択装置１００は、図１に示されるように、機能構成として、候補モデル分析部１１０、転移学習部１２０、モデル評価部１３０、最適モデル選択部１４０、モデル管理部１５０、記憶部１６０からなる。 First, the configuration of the model selection device will be described with reference to FIGS. 1 and 2.
As shown in FIG. 1, the model selection device 100 includes a candidate model analysis unit 110, a transfer learning unit 120, a model evaluation unit 130, an optimum model selection unit 140, a model management unit 150, and a storage unit 160 as functional configurations. ..

候補モデル分析部１１０は、モデル管理部１５０を介してモデル管理ＤＢ（DataBase）２００から取り出した入力データ１０を分析するためのモデルの候補を取り出す機能部である。転移学習部１２０は、モデルの候補に対して、入力データ１０に対する転移学習を行う機能部である。モデル評価部１３０は、転移学習によって構築されたモデルの評価を行う機能部である。最適モデル選択部１４０は、モデル評価部１３０の評価に従って、入力データ１０に対しての学習モデルとして最適なモデルを選択する機能部である。モデル管理部１５０は、各部の指示に従って、モデル管理ＤＢ２００から必要なモデルを取り出したり、モデルに対して情報の入出力や加工を行う機能部である。記憶部１６０は、データを記憶する機能部である。 The candidate model analysis unit 110 is a functional unit that extracts model candidates for analyzing the input data 10 extracted from the model management DB (DataBase) 200 via the model management unit 150. The transfer learning unit 120 is a functional unit that performs transfer learning for the input data 10 for model candidates. The model evaluation unit 130 is a functional unit that evaluates the model constructed by transfer learning. The optimum model selection unit 140 is a functional unit that selects the optimum model as a learning model for the input data 10 according to the evaluation of the model evaluation unit 130. The model management unit 150 is a functional unit that takes out a necessary model from the model management DB 200 and inputs / outputs / processes information to the model according to the instructions of each unit. The storage unit 160 is a functional unit that stores data.

記憶部１６０には、モデル管理ＤＢ２００、入力データＤＢ２１０、計算結果出ＤＢ２２０が記憶される。
モデル管理ＤＢ２００は、モデル選択の対象となるモデルを格納するＤＢである。モデル選択の対象となるモデルは、既に別の学習過程で作成されているものとする。入力データＤＢ２１０は、学習システムを構築するための対象となる入力データを格納するＤＢである。計算結果ＤＢ２２０は、モデル選択装置１００の行う処理の計算結果を格納するＤＢである。
なお、モデル選択装置１００の処理で使用されるデータのデータ構造については後に詳説する。 The storage unit 160 stores the model management DB 200, the input data DB 210, and the calculation result output DB 220.
The model management DB 200 is a DB that stores a model to be selected as a model. It is assumed that the model to be selected is already created in another learning process. The input data DB 210 is a DB that stores input data that is a target for constructing a learning system. The calculation result DB 220 is a DB that stores the calculation result of the processing performed by the model selection device 100.
The data structure of the data used in the processing of the model selection device 100 will be described in detail later.

次に、図２を用いてモデル選択装置のハードウェア・ソフトウェア構成について説明する。
モデル選択装置１００のハードウェア構成としては、例えば、図２に示されるパーソナルコンピュータのような一般的な情報処理装置で実現される。 Next, the hardware / software configuration of the model selection device will be described with reference to FIG.
The hardware configuration of the model selection device 100 is realized by, for example, a general information processing device such as the personal computer shown in FIG.

モデル選択装置１００は、ＣＰＵ（Central Processing Unit）５０２、主記憶装置５０４、ネットワークＩ／Ｆ（InterFace）５０６、表示Ｉ／Ｆ５０８、入出力Ｉ／Ｆ５１０、補助記憶Ｉ／Ｆ５１２が、バスにより結合された形態になっている。 In the model selection device 100, a CPU (Central Processing Unit) 502, a main storage device 504, a network I / F (InterFace) 506, a display I / F 508, an input / output I / F 510, and an auxiliary storage I / F 512 are connected by a bus. It is in the form of.

ＣＰＵ５０２は、モデル選択装置１００の各部を制御し、主記憶装置５０４に必要なプログラムをロードして実行する。
主記憶装置５０４は、通常、ＲＡＭなどの揮発メモリで構成され、ＣＰＵ５０２が実行するプログラム、参照するデータが記憶される。 The CPU 502 controls each part of the model selection device 100, loads and executes a program required for the main storage device 504.
The main storage device 504 is usually composed of a volatile memory such as a RAM, and stores a program executed by the CPU 502 and data to be referred to.

ネットワークＩ／Ｆ５０６は、ネットワークと接続するためのインタフェースである。 The network I / F506 is an interface for connecting to the network.

表示Ｉ／Ｆ５０８は、ＬＣＤ（Liquid Crystal Display）などの表示装置５２０を接続するためのインタフェースである。 The display I / F 508 is an interface for connecting a display device 520 such as an LCD (Liquid Crystal Display).

入出力Ｉ／Ｆ５１０は、入出力装置を接続するためのインタフェースである。図２の例では、キーボード５３０とポインティングデバイスのマウス５３２が接続されている。 The input / output I / F 510 is an interface for connecting an input / output device. In the example of FIG. 2, a keyboard 530 and a pointing device mouse 532 are connected.

補助記憶Ｉ／Ｆ５１２は、ＨＤＤ（Hard Disk Drive）５５０やＳＳＤ（Solid State Drive）などの補助記憶装置を接続するためのインタフェースである。
ＨＤＤ５５０は、大容量の記憶容量を有しており、本実施形態を実行するためのプログラムが格納されている。 The auxiliary storage I / F 512 is an interface for connecting an auxiliary storage device such as an HDD (Hard Disk Drive) 550 or an SSD (Solid State Drive).
The HDD 550 has a large storage capacity, and stores a program for executing the present embodiment.

モデル選択装置１００には、候補モデル分析プログラム５６０、転移学習プログラム５６１、モデル構築評価プログラム５６２、最適モデル選択プログラム５６３がインストールされている。候補モデル分析プログラム５６０、転移学習プログラム５６１、モデル構築評価プログラム５６２、最適モデル選択プログラム５６３は、それぞれ候補モデル分析部１１０、転移学習部１２０、モデル評価部１３０、最適モデル選択部１４０の各機能を実行するプログラムである。
また、ＨＤＤ５５０には、モデル管理ＤＢ２００、入力データＤＢ２１０、計算結果出ＤＢ２２０が格納されている。 A candidate model analysis program 560, a transfer learning program 561, a model construction evaluation program 562, and an optimum model selection program 563 are installed in the model selection device 100. The candidate model analysis program 560, the transfer learning program 561, the model construction evaluation program 562, and the optimum model selection program 563 have the functions of the candidate model analysis unit 110, the transfer learning unit 120, the model evaluation unit 130, and the optimum model selection unit 140, respectively. It is a program to be executed.
Further, the HDD 550 stores a model management DB 200, an input data DB 210, and a calculation result output DB 220.

次に、図３を用いて本実施形態のモデル選択方法の考え方を具体的に説明する。
本実施形態では、図３に示されるように、函館の住宅データを入力データとして分析を行う住宅分析システムを例に採り、既存のモデルから転移学習に最適なモデルを選択する例について説明する。本実施形態では、目的変数である住宅価格を予測するために、住宅データの住宅価格の要因（築年数、最寄駅徒歩、面積）を説明変数としたモデルを構築して推論するシステムを例にとる。 Next, the concept of the model selection method of the present embodiment will be specifically described with reference to FIG.
In this embodiment, as shown in FIG. 3, an example of a housing analysis system that analyzes housing data in Hakodate as input data is taken as an example, and an example of selecting the optimum model for transfer learning from existing models will be described. In this embodiment, in order to predict the house price, which is the objective variable, an example is a system in which a model is constructed and inferred using the factors of the house price in the house data (age, walking distance from the nearest station, area) as explanatory variables. To take.

ここで、説明変数とは、機械学習の用語で、結果に影響を与える変数であり、目的変数とは、システムにより求めたい値を示す変数である。 Here, the explanatory variable is a machine learning term, and is a variable that affects the result, and the objective variable is a variable that indicates a value desired to be obtained by the system.

先ず、東京、札幌、旭川などの他の都市の住宅データにより、既存のモデルが構築されているものとする。次に、入力データとなる函館住宅データの学習データを用いて函館住宅価格予測モデルを構築することを考える。函館は学習データが少ないため転移学習により精度を高めたいとする。この時、複数の既存モデルの中でどのモデルを選択するのがよいのかというのが課題である。 First, it is assumed that an existing model is constructed from housing data of other cities such as Tokyo, Sapporo, and Asahikawa. Next, consider constructing a Hakodate housing price prediction model using the learning data of Hakodate housing data as input data. Since Hakodate has little learning data, we want to improve the accuracy by transfer learning. At this time, the issue is which model should be selected from among a plurality of existing models.

本実施形態は、函館住宅データから指定された項目に関する統計量（例えば、分散、平均など）を計算して、既存のモデルの統計量の近いものを候補モデルとして選択する。次に、入力データとなる函館住宅データの学習データを既存のモデル（例えば、モデル（東京）、モデル（札幌）、…）に入力して、転移学習を行い、新たな転移学習によるモデル（以下、「転移モデル」という）（例えば、転移モデル（東京）、転移モデル（札幌）、…）を得る。その後、入力データとなる函館住宅データのテストデータを複数の転移モデルに入力して、推論して得たデータの目的変数（住宅価格）と、入力データの目的変数（住宅価格）を算出し、その誤差を評価することにより、最適なモデルを選択しようとするものである。 In this embodiment, statistics (for example, variance, average, etc.) related to a specified item are calculated from Hakodate housing data, and a model having a close statistic of an existing model is selected as a candidate model. Next, the learning data of the Hakodate housing data, which is the input data, is input to an existing model (for example, model (Tokyo), model (Sapporo), ...), transfer learning is performed, and a new transfer learning model (hereinafter referred to as). , "Transition model") (for example, transfer model (Tokyo), transfer model (Sapporo), ...). After that, the test data of Hakodate housing data, which is the input data, is input to multiple transfer models, and the objective variable (house price) of the inferred data and the objective variable (house price) of the input data are calculated. By evaluating the error, we try to select the optimum model.

次に、図４ないし図８を用いて実施形態１のモデル選択装置で使用されるデータ構造について説明する。
住宅データテーブル３００は、住宅分析システムの基本的な住宅データを格納するテーブルであり、例えば、図４に示されるように、住宅ＩＤ３００ａ、築年数３００ｂ、最寄駅徒歩３００ｃ、用途地域３００ｄ、面積３００ｅ、間取り３００ｆ、形態３００ｇ、住宅価格３００ｈの各フィールドよりなる。 Next, the data structure used in the model selection apparatus of the first embodiment will be described with reference to FIGS. 4 to 8.
The house data table 300 is a table that stores the basic house data of the house analysis system. For example, as shown in FIG. 4, the house ID 300a, the age of 300b, the nearest station walk 300c, the restricted area 300d, and the area. It consists of fields of 300e, floor plan 300f, form 300g, and house price 300h.

住宅ＩＤ３００ａには、住宅を一意に識別するＩＤが格納される。築年数３００ｂは、その住宅を立てたときからの年数が格納される。最寄駅徒歩３００ｃは、その住宅から最寄の駅まで歩いてかかる時間が格納される。なお、この時間は一般には、８０ｍ／分の基準で距離に換算される。用途地域３００ｄには、その住宅が存在する用途地域（都市計画法で定められた地域区分）、例えば、「住宅専用」、「商用地域」を表す文字列、コードなどが格納される。面積３００ｅには、その住宅の建築面積が格納される。間取り３００ｆには、その住宅の部屋構成、例えば、２ＬＤＫ、１ＤＫなどを表す文字列、コードが格納される。形態３００ｇには、その住宅の建築形態として、例えば、アパート、マンション（区分所有）、戸建てを表す文字列、コードなどが格納される。住宅価格３００ｈには、その住居の評価価格が格納される。 The house ID 300a stores an ID that uniquely identifies the house. The age of 300b stores the number of years since the house was erected. The nearest station walk 300c stores the time it takes to walk from the house to the nearest station. This time is generally converted into a distance on the basis of 80 m / min. In the use area 300d, a use area (area classification defined by the Town Planning and Zoning Act) in which the house exists, for example, a character string representing "house-only" or "commercial area", a code, or the like is stored. The building area of the house is stored in the area 300e. In the floor plan 300f, a character string and a code representing the room configuration of the house, for example, 2LDK, 1DK, etc. are stored. In the form 300g, for example, an apartment, a condominium (divided ownership), a character string representing a detached house, a code, and the like are stored as the building form of the house. The evaluation price of the house is stored in the house price 300h.

本実施形態では、入力データ１０や学習に用いられるデータは、この住宅データテーブル３００で表されるデータを例に採って説明する。機械学習の適用にあたり、住宅価格が目的変数となり、その他の項目が説明変数となる。 In the present embodiment, the input data 10 and the data used for learning will be described by taking the data represented by the housing data table 300 as an example. In applying machine learning, the house price is the objective variable, and the other items are the explanatory variables.

統計量項目テーブル４２０は、住宅データの各々の項目と各統計量を対応付けるテーブルである。項目のフィールドには、図４で示した各々の項目の名称、例えば、築年数、最寄駅徒歩、面積、…が格納される。そして、ヒストグラム、平均値、分散の対応ごとに、各々のフィールドに、統計量を表す文字列、コードが格納される。図５では、ｉ番目のレコードと統計量を表すｊ番目のフィールドごとに、統計量ｉ−ｊというシンボルが格納されている。 The statistic item table 420 is a table for associating each item of housing data with each statistic. The item field stores the name of each item shown in FIG. 4, for example, age, walking distance from the nearest station, area, and so on. Then, a character string and a code representing the statistic are stored in each field for each correspondence of the histogram, the average value, and the variance. In FIG. 5, a symbol called statistic i-j is stored in each of the i-th record and the j-th field representing the statistic.

モデル統計量情報テーブル４３０は、各モデルごとに、統計量の情報を対応付けるテーブルであり、図６に示されるように、モデルＩＤ４３０ａ、モデルファイルパス４３０ｂ、統計量１−１、統計量１−２、統計量１−３、…の各フィールドを有する。 The model statistic information table 430 is a table to which the statistic information is associated with each model, and as shown in FIG. 6, the model ID 430a, the model file path 430b, the statistic 1-1, and the statistic 1-2. , Statistics 1-3, ....

モデルＩＤ４３０ａには、モデルを一意に識別するＩＤが格納される。モデルファイルパス４３０ｂには、そのモデルのファイルシステムにおけるパスが格納される。統計量１−１、統計量１−２、統計量１−３、…には、図５の統計量項目テーブル４２０により対応付けられた項目に対する値が格納される。例えば、統計量１−１には、築年数に対する分散が格納される。 The model ID 430a stores an ID that uniquely identifies the model. The model file path 430b stores the path in the file system of the model. In the statistic 1-1, the statistic 1-2, the statistic 1-3, ..., The values for the items associated with the statistic item table 420 in FIG. 5 are stored. For example, statistic 1-1 stores the variance for age.

統計量比較テーブル４００は、入力データの統計量と、各モデルを作成したときの学習データの統計量（本明細書中では、単に「モデルの統計量」ともいう）の差の結果を格納するテーブルであり、図７に示されるように、モデルＩＤ４００ａ、分散σ４００ｂ、入力データの分散との差４００ｃからなる。ここでは、統計量の一例として、データ項目の分布の分散とし、モデルの指定された項目の分散と入力データの指定された項目の分散との差をこのテーブルに格納する。 The statistic comparison table 400 stores the result of the difference between the statistic of the input data and the statistic of the training data when each model is created (also referred to simply as "model statistic" in the present specification). It is a table, and as shown in FIG. 7, it is composed of a model ID 400a, a variance σ400b, and a difference 400c from the variance of the input data. Here, as an example of the statistic, the variance of the distribution of the data items is used, and the difference between the variance of the specified items of the model and the variance of the specified items of the input data is stored in this table.

モデルＩＤ４００ａには、モデルを一意に識別するＩＤが格納される。分散σ４００ｂには、そのモデルにおけるあるデータ項目の分布の分散が格納される。入力データの分散との差４００ｃには、各モデルのあるデータ項目の分布の分散と入力データのあるデータ項目の分布の分散との差が格納される。 An ID that uniquely identifies the model is stored in the model ID 400a. The variance σ400b stores the variance of the distribution of certain data items in the model. The difference 400c from the variance of the input data stores the difference between the variance of the distribution of a certain data item of each model and the variance of the distribution of the data item having the input data.

転移モデル結果テーブル４１０には、転移モデルから得られた目的変数となる項目の値と、入力データのその項目の真値の差を格納するテーブルであり、図８に示されるように、転移モデルＩＤ４１０ａ、モデルファイルパス４１０ｂ、誤差４１０ｃが格納される。 The transfer model result table 410 is a table that stores the difference between the value of the item that is the objective variable obtained from the transfer model and the true value of the item of the input data, and is a transfer model as shown in FIG. The ID 410a, the model file path 410b, and the error 410c are stored.

転移モデルＩＤ４１０ａには、転移モデルを一意に識別するＩＤが格納される。モデルファイルパス４１０ｂには、その転移モデルのファイルシステムにおけるパスが格納される。誤差４１０ｃには、その転移モデルの推論から得られる目的変数となる項目の値と、入力データのその項目の真値との平均二乗誤差が格納される（詳細は後述）。 The transfer model ID 410a stores an ID that uniquely identifies the transfer model. The model file path 410b stores the path in the file system of the transfer model. The error 410c stores the root mean square error between the value of the item that is the objective variable obtained from the inference of the transition model and the true value of the item in the input data (details will be described later).

次に、図９ないし図１６を用いてモデル選択装置の行う処理について説明する。 Next, the processing performed by the model selection device will be described with reference to FIGS. 9 to 16.

先ず、図９を用いてモデル選択装置が入力データの取り込みからモデルを選択するまでの一連の処理について説明する。
先ず、モデル選択装置１００は、データ分析処理を行う（Ｓ０１）。データ分析処理の詳細は、後に、図１０を用いて説明する。 First, a series of processes from the acquisition of input data to the selection of a model by the model selection device will be described with reference to FIG.
First, the model selection device 100 performs data analysis processing (S01). The details of the data analysis process will be described later with reference to FIG.

次に、モデル選択装置１００は、候補モデル検索処理を行う（Ｓ０２）。候補モデル検索処理の詳細は、後に、図１１を用いて説明する。 Next, the model selection device 100 performs a candidate model search process (S02). The details of the candidate model search process will be described later with reference to FIG.

次に、モデル選択装置１００は、候補モデル選択処理を行う（Ｓ０３）。候補モデル選択処理の詳細は、後に、図１３を用いて説明する。 Next, the model selection device 100 performs a candidate model selection process (S03). The details of the candidate model selection process will be described later with reference to FIG.

次に、モデル選択装置１００は、転移学習処理を行う（Ｓ０４）。転移学習処理の詳細は、後に、図１４を用いて説明する。 Next, the model selection device 100 performs a transfer learning process (S04). The details of the transfer learning process will be described later with reference to FIG.

次に、モデル選択装置１００は、転移モデル評価処理を行う（Ｓ０５）。転移モデル評価処理の詳細は、後に、図１５を用いて説明する。 Next, the model selection device 100 performs a transfer model evaluation process (S05). The details of the transfer model evaluation process will be described later with reference to FIG.

次に、モデル選択装置１００は、最適モデル選択処理を行う（Ｓ０６）。最適モデル選択処理の詳細は、後に、図１６を用いて説明する。 Next, the model selection device 100 performs the optimum model selection process (S06). The details of the optimum model selection process will be described later with reference to FIG.

次に、図１０を用いてデータ分析処理について説明する。
先ず、モデル選択装置１００の候補モデル分析部１１０は、モデル管理部１５０を介して、入力データＤＢ２１０から入力データ１０を取得する（Ｓ１００）。
次に、モデル管理部１５０を介して、モデル管理ＤＢ２００の統計量項目対応テーブル４２０の項目と統計量の対応情報を取得する（Ｓ１０１）。 Next, the data analysis process will be described with reference to FIG.
First, the candidate model analysis unit 110 of the model selection device 100 acquires the input data 10 from the input data DB 210 via the model management unit 150 (S100).
Next, the items in the statistic item correspondence table 420 of the model management DB 200 and the correspondence information of the statistic are acquired via the model management unit 150 (S101).

次に、統計量項目対応テーブル４２０の対応情報に従って、入力データ１０の項目の各統計量を計算する（Ｓ１０２）。
次に、その計算結果を、計算結果ＤＢ２２０に格納する（Ｓ１０３）。
次に、取得した入力データ１０を、テストデータと学習データに分離する（Ｓ１０４）。データを、テスト用のデータと学習用データに分離するのは、機械学習の分野で一般的に行われる手法である。分離の仕方は、ランダムに選択してもよいが、テストデータは、学習用データよりも一般的に少量である。テストデータ、学習データを使用する処理については、各々後に説明する。 Next, each statistic of the item of the input data 10 is calculated according to the correspondence information of the statistic item correspondence table 420 (S102).
Next, the calculation result is stored in the calculation result DB 220 (S103).
Next, the acquired input data 10 is separated into test data and training data (S104). Separating data into test data and training data is a common practice in the field of machine learning. The method of separation may be randomly selected, but the test data is generally smaller than the training data. The processing using the test data and the training data will be described later.

次に、図１１および図１２を用いて候補モデル検索処理について説明する。
先ず、重視項目入力画面５００から、ユーザが重視する分析項目と、モデル分析のために用いる統計量の入力を受け付ける（Ｓ２００）。 Next, the candidate model search process will be described with reference to FIGS. 11 and 12.
First, from the priority item input screen 500, the analysis items that the user emphasizes and the statistics used for the model analysis are input (S200).

重視項目入力画面５００は、図１２に示されるように、重視項目リスト５１０と、統計リスト５２０からなり、ユーザは、マウスなどのポイティングデバイスにより、その項目を選択できるようになっている。この例では、重視項目として、「築年」、統計量として、分散が選択されたものとする。 As shown in FIG. 12, the important item input screen 500 includes an important item list 510 and a statistical list 520, and the user can select the item by a pointing device such as a mouse. In this example, it is assumed that "year of construction" is selected as the important item and variance is selected as the statistic.

次に、選択された重視項目の選択された統計量に関して、計算結果ＤＢに格納している入力データの重視項目におけるＳ１０３で計算したその統計量の計算結果を取得する（Ｓ２０１）。 Next, with respect to the selected statistic of the selected priority item, the calculation result of the statistic calculated in S103 in the priority item of the input data stored in the calculation result DB is acquired (S201).

次に、モデルＮの統計量の内で、図６のモデル統計量情報テーブル４３０を参照し、選択された重視項目の選択された統計量を取得する（Ｓ２０２）。ここで、Ｎの初期値は、１であり、フローチャートの中では、モデル１、モデル２、…のように処理するように記述している。 Next, among the statistics of the model N, the model statistic information table 430 of FIG. 6 is referred to, and the selected statistic of the selected important item is acquired (S202). Here, the initial value of N is 1, and in the flowchart, it is described that the processing is performed as model 1, model 2, ....

次に、入力データ１０に関し、その重視項目におけるＳ１０３で計算したその統計量の計算結果Ｓ１０３と、モデルＮの選択された重視項目の選択された統計量との差を計算し、図７に示した統計量比較テーブル４００に格納する（Ｓ２０３）。 Next, with respect to the input data 10, the difference between the calculation result S103 of the statistic calculated in S103 of the important item and the selected statistic of the selected important item of the model N is calculated and shown in FIG. It is stored in the statistic comparison table 400 (S203).

そして、モデル管理ＤＢで対象とするモデルの全てについて計算したか判定し（Ｓ２０４）、全てのモデルについて計算したときには（Ｓ２０４：ＹＥＳ）、処理を終了し、全てのモデルについて計算していないときには（Ｓ２０４：ＮＯ）、Ｎをインクリメントし（Ｎ＋＋）、次のモデルＮを取り出し（Ｓ２１０）、Ｓ２０２に戻る。 Then, it is determined in the model management DB whether all the target models have been calculated (S204), when all the models have been calculated (S204: YES), the processing is terminated, and all the models have not been calculated (S204). S204: NO), N is incremented (N ++), the next model N is taken out (S210), and the process returns to S202.

次に、図１３を用いて候補モデル選択処理について説明する。
先ず、統計量比較テーブル４００に格納された入力データの分散との差が小さい順にソートする（Ｓ３００）。 Next, the candidate model selection process will be described with reference to FIG.
First, sorting is performed in ascending order of difference from the variance of the input data stored in the statistic comparison table 400 (S300).

次に、上位Ｍ（Ｍは、予め定められた個数）個のモデル、すなわち、入力データの分散との差が小さいＭ個のモデルを、候補モデルとして選択する（Ｓ３０１）。 Next, upper M (where M is a predetermined number) models, that is, M models having a small difference from the variance of the input data are selected as candidate models (S301).

次に、図１４を用いて転移学習処理について説明する。
先ず、入力データから、Ｓ１０４で分離した学習データを取得する（Ｓ４０１）。
次に、モデル選択装置１００の転移学習部１２０は、図１３の候補モデル選択処理により選択された所定の上位Ｍ個のモデルを、モデル管理部１５０を介して、モデル管理部１５０から取得する（Ｓ４０１）。 Next, the transfer learning process will be described with reference to FIG.
First, the learning data separated in S104 is acquired from the input data (S401).
Next, the transfer learning unit 120 of the model selection device 100 acquires the predetermined upper M models selected by the candidate model selection process of FIG. 13 from the model management unit 150 via the model management unit 150 (. S401).

次に、取得したモデルと学習データを用いて転移学習を行い、作成した転移モデルに、転移モデルＩＤを付与し、転移モデルＩＤをモデル管理ＤＢ２００に格納し、転移モデルＩＤとファイルパスを、図７の転移モデル結果テーブル４１０に格納する（Ｓ４０２）。 Next, transfer learning is performed using the acquired model and training data, a transfer model ID is assigned to the created transfer model, the transfer model ID is stored in the model management DB 200, and the transfer model ID and file path are shown in the figure. It is stored in the transfer model result table 410 of 7 (S402).

次に、選択された上位Ｍ個の全てに対して処理したか判定し（Ｓ４０３）、全てに対して処理したときには（Ｓ４０３：ＹＥＳ）、処理を終了し、全てに対して処理していないときには（Ｓ４０３：ＮＯ）、次のモデルを取り出し（Ｓ４１０）、Ｓ４０２に戻る。 Next, it is determined whether all of the selected upper M elements have been processed (S403), when all have been processed (S403: YES), the processing has been completed, and all have not been processed. (S403: NO), the next model is taken out (S410), and the process returns to S402.

次に、図１５を用いて転移モデル評価処理について説明する。
先ず、モデル選択装置１００のモデル評価部１３０は、入力データＤＢから、Ｓ１０５で分離したテストデータを取得する（Ｓ５００）。 Next, the transfer model evaluation process will be described with reference to FIG.
First, the model evaluation unit 130 of the model selection device 100 acquires the test data separated by S105 from the input data DB (S500).

次に、転移モデル統計量テーブル４１０に格納されている転移モデルＩＤの転移モデルＭを取得する（Ｓ５０１）。ここで、Ｍの初期値は、１であり、フローチャートの中では、転移モデル１、転移モデル２、…のように処理するように記述している。 Next, the transfer model M of the transfer model ID stored in the transfer model statistic table 410 is acquired (S501). Here, the initial value of M is 1, and in the flowchart, it is described that the transfer model 1, the transfer model 2, ...

次に、テストデータを転移モデルＭによって推論し、転移モデルの推論によって得られたデータの目的変数の値と、入力データの目的変数にあたる項目の真値の平均二乗誤差を計算し、その誤差を転移モデル結果テーブル４１０に格納する（Ｓ５０２）。平均二乗誤差は、測定値と真値との差の二乗の相加平均をとることにより求められるもので、測定値のバラツキ具合を数量的に表すものであり、平均二乗誤差が小さいほど、その測定精度はよいと考えられる。すなわち、以下の（式１）で求められる量である。ここで、誤差が小さいほど、入力データに適したモデルであることを意味する。 Next, the test data is inferred by the transfer model M, the mean square error of the value of the objective variable of the data obtained by the inference of the transfer model and the true value of the item corresponding to the objective variable of the input data is calculated, and the error is calculated. It is stored in the transfer model result table 410 (S502). The average squared error is obtained by taking the arithmetic mean of the square of the difference between the measured value and the true value, and quantitatively expresses the degree of variation in the measured value. The smaller the average squared error, the more. The measurement accuracy is considered to be good. That is, it is an amount obtained by the following (Equation 1). Here, the smaller the error, the more suitable the model is for the input data.

次に、全ての転移モデルについて計算したかを判定し（Ｓ５０３）、全て計算したときには（Ｓ５０３：ＹＥＳ）、処理を終了し、全て計算していないときには（Ｓ５０３：ＮＯ）、Ｍをインクリメントし（Ｍ＋＋）（Ｓ５１０）、Ｓ５０２に戻る。 Next, it is determined whether all the transition models have been calculated (S503), when all have been calculated (S503: YES), the processing is terminated, and when all have not been calculated (S503: NO), M is incremented (S503: NO). Return to M ++) (S510) and S502.

次に、図１６を用いて最適モデル評価処理について説明する。
次に、モデル選択装置１００の最適モデル選択部１４０は、転移モデル結果テーブル４１０の誤差が一番小さな転移モデルを最適モデルとして選択する（Ｓ６００）。
ここで、予め定めておいたＫ個の最適モデルを選択してもよい。 Next, the optimum model evaluation process will be described with reference to FIG.
Next, the optimum model selection unit 140 of the model selection device 100 selects the transfer model having the smallest error in the transfer model result table 410 as the optimum model (S600).
Here, K predetermined optimum models may be selected.

以上のように、本実施形態によれば、入力データの求めたい項目（住宅価格）の真値と、転移学習によるモデルから推論によって得られる目的変数（住宅価格）の値の誤差を評価することにより、入力データに最適なモデルを得ることができる。 As described above, according to the present embodiment, the error between the true value of the desired item (house price) of the input data and the value of the objective variable (house price) obtained by inference from the model by transfer learning is evaluated. Therefore, the optimum model for the input data can be obtained.

〔実施形態２〕
以下、図５ないし図１８を用いて、本発明に係る実施形態２を説明する。 [Embodiment 2]
Hereinafter, the second embodiment according to the present invention will be described with reference to FIGS. 5 to 18.

実施形態１では、入力データの求めたい項目の真値と、転移学習によるモデルから推論によって得られる目的変数の値の誤差を評価することにより、入力データから推論するための最適なモデルを得た。 In the first embodiment, the optimum model for inferring from the input data was obtained by evaluating the error between the true value of the item to be obtained in the input data and the value of the objective variable obtained by inference from the model by transfer learning. ..

本実施形態は、入力データの求めたい項目の真値と、その入力データを既存モデルに入力して推論することによって得られる目的変数の値の誤差を評価することにより、入力データから推論するための最適なモデルを得ようとするものである。 In this embodiment, in order to infer from the input data by evaluating the error between the true value of the desired item of the input data and the value of the objective variable obtained by inputting the input data into the existing model and inferring. It is an attempt to obtain the optimum model of.

本実施形態では、実施形態１と比較して異なる所を中心に説明する。 In this embodiment, the points different from those in the first embodiment will be mainly described.

以下、図１７を用いて本実施形態のモデル選択方法の考え方を具体例を用いて説明する。
本実施形態も、実施形態１と同様、図１７に示されるように、函館の住宅データを入力データとして分析を行う住宅分析システムを例に採り、既存のモデルから転移学習に用いるのに最適な既存モデルを選択する例について説明する。 Hereinafter, the concept of the model selection method of the present embodiment will be described with reference to FIG. 17 by using a specific example.
Similar to the first embodiment, this embodiment is also most suitable for transfer learning from an existing model by taking a housing analysis system that analyzes the housing data of Hakodate as input data as an example, as shown in FIG. An example of selecting an existing model will be described.

先ず、東京、札幌、旭川などの他の都市の住宅データにより、既存のモデルが構築されているものとする。
このとき、転移学習に用いるモデルとして、この中でどのモデルを選択するのがよいのかというのが課題である。 First, it is assumed that an existing model is constructed from housing data of other cities such as Tokyo, Sapporo, and Asahikawa.
At this time, the issue is which model should be selected as the model used for transfer learning.

本実施形態は、函館住宅データから求めたい項目（住宅価格）の真値と、その入力データを既存モデルに入力して推論することによって得られる目的変数（住宅価格）との誤差を評価することにより、最適なモデルを選択しようとするものである。 In this embodiment, the error between the true value of the item (house price) to be obtained from the Hakodate house data and the objective variable (house price) obtained by inputting the input data into the existing model and inferring it is evaluated. This is an attempt to select the optimum model.

先ず、図１８を用いて実施形態２のモデル選択装置で使用されるデータ構造について説明する。
モデル結果テーブル４４０は、既存のモデルの統計量と入力データの分布の統計量との差を格納するテーブルであり、図１８に示されるように、転移モデルＩＤ４４０ａ、モデルファイルパス４４０ｂ、誤差４４０ｃが格納される。 First, the data structure used in the model selection device of the second embodiment will be described with reference to FIG.
The model result table 440 is a table that stores the difference between the statistics of the existing model and the statistics of the distribution of the input data, and as shown in FIG. 18, the transfer model ID 440a, the model file path 440b, and the error 440c Stored.

転移モデルＩＤ４４０ａには、モデルを一意に識別するＩＤが格納される。モデルファイルパス４４０ｂには、そのモデルのファイルシステムにおけるパスが格納される。誤差４４０ｃには、そのモデルから得られる目的変数の値と、入力データとの目的変数の項目にあたる真値との平均二乗誤差が格納される（詳細は後述）。 The transfer model ID 440a stores an ID that uniquely identifies the model. The model file path 440b stores the path in the file system of the model. The error 440c stores the average squared error between the value of the objective variable obtained from the model and the true value corresponding to the item of the objective variable with the input data (details will be described later).

次に、図１９ないし図２１を用いてモデル選択装置の行う処理について説明する。 Next, the processing performed by the model selection device will be described with reference to FIGS. 19 to 21.

先ず、図１９を用いてモデル選択装置が入力データの取り込みからモデルを選択するまでの一連の処理について説明する。
先ず、モデル選択装置１００は、データ分析処理を行う（Ｓ１１）このデータ分析処理は、後に、図２０を用いて説明する。
次に、モデル選択装置１００は、候補モデル検索処理を行う（Ｓ１２）。候補モデル検索処理の詳細は、後に、図２１を用いて説明する。
次に、モデル選択装置１００は、最適モデル選択処理を行う（Ｓ０６）。この最適モデル選択処理は、実施形態１の図１６の処理と同様である。 First, a series of processes from the acquisition of input data to the selection of a model by the model selection device will be described with reference to FIG.
First, the model selection device 100 performs a data analysis process (S11). This data analysis process will be described later with reference to FIG. 20.
Next, the model selection device 100 performs a candidate model search process (S12). The details of the candidate model search process will be described later with reference to FIG. 21.
Next, the model selection device 100 performs the optimum model selection process (S06). This optimum model selection process is the same as the process of FIG. 16 of the first embodiment.

次に、図２０を用いて実施形態２のデータ分析処理について説明する。
先ず、モデル選択装置１００の候補モデル分析部１１０は、モデル管理部１５０を介して、入力データＤＢ２１０から入力データ１０を取得する（Ｓ１００）
次に、取得した入力データ１０を、テストデータと学習データに分離する（Ｓ１０４）。
この各々の処理は、実施形態１の図１０に示したデータ分析処理の該当するステップと同様である。 Next, the data analysis process of the second embodiment will be described with reference to FIG.
First, the candidate model analysis unit 110 of the model selection device 100 acquires the input data 10 from the input data DB 210 via the model management unit 150 (S100).
Next, the acquired input data 10 is separated into test data and training data (S104).
Each of these processes is the same as the corresponding step of the data analysis process shown in FIG. 10 of the first embodiment.

次に、図２１を用いて実施形態２の候補モデル検索処理について説明する。
先ず、入力データＤＢから、図２０のＳ１０４で分離した入力データ１０のテストデータを取得する（Ｓ１０００）、
次に、モデル管理部１５０を介して、モデル管理ＤＢ２００からモデルＮを取得する（Ｓ１００１）。ここで、Ｎの初期値は、１であり、フローチャートの中では、モデル１、モデル２、…のように処理するように記述している。 Next, the candidate model search process of the second embodiment will be described with reference to FIG.
First, the test data of the input data 10 separated in S104 of FIG. 20 is acquired from the input data DB (S1000).
Next, the model N is acquired from the model management DB 200 via the model management unit 150 (S1001). Here, the initial value of N is 1, and in the flowchart, it is described that the processing is performed as model 1, model 2, ....

次に、テストデータをモデルＮによって推論して得られたデータの目的変数の値と、入力データの目的変数にあたる項目の真値の平均二乗誤差を計算し、その誤差をモデル結果テーブル４４０に格納する（Ｓ１００２）。 Next, the mean square error of the value of the objective variable of the data obtained by inferring the test data by the model N and the true value of the item corresponding to the objective variable of the input data is calculated, and the error is stored in the model result table 440. (S1002).

そして、モデル管理ＤＢで対象とするモデルの全てについて計算したか判定し（Ｓ１００３）、全てのモデルについて計算したときには（Ｓ１００３：ＹＥＳ）、処理を終了し、全てのモデルについて計算していないときには（Ｓ１００３：ＮＯ）、Ｎをインクリメントし（Ｎ＋＋）（Ｓ１０１０）、Ｓ１００１に戻る。 Then, it is determined in the model management DB whether all the target models have been calculated (S1003), when all the models have been calculated (S1003: YES), the processing is finished, and when all the models have not been calculated (S1003). S1003: NO), N is incremented (N ++) (S1010), and the process returns to S1001.

最適モデル選択処理では、図１８のモデル結果テーブルの誤差の値が最小のモデルを最適モデルとして選択する。 In the optimum model selection process, the model having the smallest error value in the model result table of FIG. 18 is selected as the optimum model.

以上のように、本実施形態によれば、入力データの求めたい項目（住宅価格）の真値と、その入力データのテストデータを入力して、既存モデルから推論によって得られる目的変数（住宅価格）の値の誤差を評価することにより、入力データに最適なモデルを得ることができる。 As described above, according to the present embodiment, the true value of the item (house price) to be obtained for the input data and the test data of the input data are input, and the objective variable (house price) obtained by inference from the existing model is input. By evaluating the error of the value of), the optimum model for the input data can be obtained.

〔実施形態３〕
以下、図２２ないし図２８を用いて、本発明に係る実施形態３を説明する。 [Embodiment 3]
Hereinafter, the third embodiment according to the present invention will be described with reference to FIGS. 22 to 28.

本実施形態は、入力データから得られる新モデルのＦＩ（Feature Importance）と、既にあるモデルのＦＩを比較することにより、最適なモデルを得ようとするものである。ＦＩとは、ツリー系の学習モデルにおける各説明変数の目的変数の寄与度を表す指標である。 This embodiment attempts to obtain an optimum model by comparing the FI (Feature Importance) of a new model obtained from input data with the FI of an existing model. FI is an index showing the contribution of the objective variable of each explanatory variable in the learning model of the tree system.

先ず、図２２ないし図２４を用いて実施形態のモデル選択装置で使用されるデータ構造について説明する。 First, the data structure used in the model selection device of the embodiment will be described with reference to FIGS. 22 to 24.

モデルＦＩ管理テーブル４５０は、各モデルのＦＩの情報を上位から下位にわたって保持するテーブルであり、図２２に示されるように、モデルＩＤ４５０ａ、モデルファイルパス４５０ｂ、ＦＩ１位項目名４５０ｃ１、ＦＩ１位項目名の値４５０ｃ２、ＦＩ２位項目名４５０ｄ１、ＦＩ２位項目名の値４５０ｄ２、…の各フィールドを有する。 The model FI management table 450 is a table that holds the FI information of each model from the upper level to the lower level, and as shown in FIG. 22, the model ID 450a, the model file path 450b, the FI 1st place item name 450c1, and the FI 1st place item name. It has each field of the value 450c2, the value of the FI second place item name 450d1, the value of the FI second place item name 450d2, and so on.

モデルＩＤ４５０ａには、モデルを一意に識別するＩＤが格納される。モデルファイルパス４５０ｂには、そのモデルのファイルシステムにおけるパスが格納される。それ以降のフィールドは、ＦＩの項目名とその項目名の値をペアとして、ＦＩが大きい順から格納される。図２２の例では、第一レコードの値として、ＦＩ１位の項目名として、「築年数」、その値として、４．５、ＦＩ２位の項目名として、「最寄駅徒歩」、その値として、２．５が格納されている。 The model ID 450a stores an ID that uniquely identifies the model. The model file path 450b stores the path in the file system of the model. Subsequent fields are stored in descending order of FI, with the item name of FI and the value of the item name as a pair. In the example of FIG. 22, the value of the first record is "age" as the item name of the first place in FI, the value is 4.5, the item name of the second place in FI is "walking to the nearest station", and the value is. , 2.5 are stored.

新モデルＦＩ結果テーブル４６０は、入力データの学習データから新たに生成した新モデルのＦＩを計算した結果を格納するテーブルであり、図２３に示されるように、ＦＩ１位項目名４６０ａ１、ＦＩ１位項目名の値４６０ａ２、ＦＩ２位項目名４６０ｂ１、ＦＩ２位項目名の値４６０ｂ２、…の各フィールドを有する。 The new model FI result table 460 is a table that stores the result of calculating the FI of the new model newly generated from the training data of the input data, and as shown in FIG. 23, the FI 1st place item name 460a1 and the FI 1st place item. It has the fields of the name value 460a2, the FI second place item name 460b1, the FI second place item name value 460b2, and so on.

ＦＩ１位項目名４６０ａ１、ＦＩ１位項目名の値４６０ａ２には、入力データの学習データから新たに生成した新モデルのＦＩで、その値が一番大きなＦＩの項目名と、値が格納される。そして、同様に、それ以降のフィールドは、ＦＩの項目名とその項目名の値をペアとして、ＦＩが大きい順から格納される。 The FI 1st place item name 460a1 and the FI 1st place item name value 460a2 store the item name and the value of the FI having the largest value in the FI of the new model newly generated from the learning data of the input data. Similarly, the subsequent fields are stored in descending order of FI, with the item name of FI and the value of the item name as a pair.

モデルＦＩ比較テーブル４７０は、新モデルと既存のモデルのＦＩを選択したときの結果を格納するテーブルであり、図２４に示されるように、モデルＩＤ４７０ａ、候補選択フラグ４７０ｂの各フィールドを有する。 The model FI comparison table 470 is a table that stores the results when the FIs of the new model and the existing model are selected, and has the fields of the model ID 470a and the candidate selection flag 470b as shown in FIG. 24.

モデルＩＤ４７０ａには、モデルを一意に識別するＩＤが格納される。候補選択フラグ４７０ｂには、新モデルと比較したときの結果を示すフラグ（詳細は後述）が格納される。なお、候補選択フラグ４７０ｂの初期値は、０（選択されていない）とする。 The model ID 470a stores an ID that uniquely identifies the model. The candidate selection flag 470b stores a flag (details will be described later) indicating the result when compared with the new model. The initial value of the candidate selection flag 470b is 0 (not selected).

次に、図２５ないし図２８を用いて実施形態３のモデル選択装置が行う処理について説明する。 Next, the processing performed by the model selection device of the third embodiment will be described with reference to FIGS. 25 to 28.

先ず、図２５を用いてモデル選択装置が入力データの取り込みからモデルを選択するまでの一連の処理について説明する。
先ず、モデル選択装置１００は、データ分析処理を行う（Ｓ０１）。このデータ分析処理は、実施形態１の図９の処理と同様である。 First, a series of processes from the acquisition of input data to the selection of a model by the model selection device will be described with reference to FIG. 25.
First, the model selection device 100 performs data analysis processing (S01). This data analysis process is the same as the process of FIG. 9 of the first embodiment.

次に、モデル選択装置１００は、候補モデル検索処理を行う（Ｓ２２）。候補モデル検索処理の詳細は、後に、図２６を用いて説明する。 Next, the model selection device 100 performs a candidate model search process (S22). The details of the candidate model search process will be described later with reference to FIG. 26.

次に、モデル選択装置１００は、候補モデル選択処理を行う（Ｓ２３）。候補モデル選択処理の詳細は、後に、図２７を用いて説明する。 Next, the model selection device 100 performs a candidate model selection process (S23). The details of the candidate model selection process will be described later with reference to FIG. 27.

次に、モデル選択装置１００は、転移学習処理を行う（Ｓ０４）。転移学習処理は、実施形態１の図９の処理と同様である。 Next, the model selection device 100 performs a transfer learning process (S04). The transfer learning process is the same as the process of FIG. 9 of the first embodiment.

次に、モデル選択装置１００は、転移モデル評価処理を行う（Ｓ０５）。転移モデル評価処理は、実施形態１の図９の処理と同様である。 Next, the model selection device 100 performs a transfer model evaluation process (S05). The transfer model evaluation process is the same as the process of FIG. 9 of the first embodiment.

次に、モデル選択装置１００は、最適モデル選択処理を行う（Ｓ０６）。最適モデル選択処理は、実施形態１の図９の処理と同様である。 Next, the model selection device 100 performs the optimum model selection process (S06). The optimum model selection process is the same as the process of FIG. 9 of the first embodiment.

次に、図２６を用いて実施形態３の候補モデル検索処理について説明する。
先ず、入力データＤＢから、図１０のＳ２０６で分離した入力データ１０の学習データを取得する（Ｓ１２００）。
次に、その学習データを用いて新モデルを作成し、ＦＩを計算する（Ｓ１２０１）。 Next, the candidate model search process of the third embodiment will be described with reference to FIG. 26.
First, the learning data of the input data 10 separated in S206 of FIG. 10 is acquired from the input data DB (S1200).
Next, a new model is created using the training data, and the FI is calculated (S1201).

次に、モデル管理部１５０を介して、モデル管理ＤＢ２００のモデルＦＩ管理テーブル４５０からモデルＮのモデルＩＤ、ＦＩを取得する（Ｓ１２０２）。ここで、Ｎの初期値は、１であり、フローチャートの中では、モデル１、モデル２、…のように処理するように記述している。 Next, the model ID and FI of the model N are acquired from the model FI management table 450 of the model management DB 200 via the model management unit 150 (S1202). Here, the initial value of N is 1, and in the flowchart, it is described that the processing is performed as model 1, model 2, ....

次に、Ｓ２００１で作成された新モデルのＦＩと、モデルＮのＦＩの項目名が所定個数（例えば、上位１０個）同じか否かを判定し（Ｓ１２０３）、同じときには（Ｓ１２０３：ＹＥＳ）、モデルＦＩ結果比較テーブル４７０の候補選択フラグ４７０ｂに１をセットし（Ｓ１２１０）、違うときには（Ｓ１２０３：ＮＯ）、Ｓ１２０４に行く。 Next, it is determined whether or not the FI of the new model created in S2001 and the item name of the FI of the model N are the same by a predetermined number (for example, the top 10) (S1203), and when they are the same (S1203: YES). Set 1 to the candidate selection flag 470b of the model FI result comparison table 470 (S1210), and when different (S1203: NO), go to S1204.

そして、モデル管理ＤＢで対象とするモデルの全てについて処理したか判定し（Ｓ１２０４）、全てのモデルについて処理したときには（Ｓ１２０４：ＹＥＳ）、処理を終了し、全てのモデルについて処理していないときには（Ｓ１２０４：ＮＯ）、Ｎをインクリメントし（Ｎ＋＋）（Ｓ１２１０）、Ｓ１２０２に戻る。 Then, it is determined whether or not all the target models have been processed in the model management DB (S1204), and when all the models have been processed (S1204: YES), the processing is terminated and all the models have not been processed (S1204). S1204: NO), N is incremented (N ++) (S1210), and the process returns to S1202.

次に、図２７および図２８を用い実施形態３の候補モデル選択処理について説明する。 Next, the candidate model selection process of the third embodiment will be described with reference to FIGS. 27 and 28.

先ず、図２６のＳ１２０１の新モデルのＦＩの検索結果と、Ｓ１２０３の比較処理により、モデルＦＩ比較テーブル４７０の候補選択フラグが１のモデルＩＤのＦＩを、転移候補選択画面３４０に表示する（Ｓ１３００）。 First, by comparing the FI of the new model of S1201 in FIG. 26 and the comparison process of S1203, the FI of the model ID whose candidate selection flag of the model FI comparison table 470 is 1 is displayed on the transfer candidate selection screen 340 (S1300). ).

転移候補選択画面３４０は、図２８に示されるように、転移モデルの候補を選択する画面である。転移候補選択画面３４０は、新モデルＦＩ表示欄３５０と、既存モデル表示欄３６０からなる。 As shown in FIG. 28, the transfer candidate selection screen 340 is a screen for selecting a transfer model candidate. The transfer candidate selection screen 340 includes a new model FI display field 350 and an existing model display field 360.

既存モデル表示欄３６０は、選択チェックボックス３６１と、モデルＩＤ表示欄３６２、ＦＩ表示欄３６３からなり、ユーザは、各々のモデルのＦＩを確認し、選択チェックボックス３６１により、転移モデルの候補をマウスなどポィンティングデバイスにより指示して選択することができる。
次に、ユーザは、転移モデルの候補を選択する（Ｓ１３０１）。 The existing model display field 360 includes a selection check box 361, a model ID display field 362, and an FI display field 363. The user confirms the FI of each model, and the selection check box 361 is used to select a transfer model candidate with the mouse. Etc. can be instructed and selected by the pointing device.
Next, the user selects a candidate for the transfer model (S1301).

以上のように、本実施形態では、ＦＩという項目のモデル寄与度を表す指標を明示して、ユーザに選択することによって、より最適な転移学習のモデル候補を選択することができる。 As described above, in the present embodiment, a more optimal model candidate for transfer learning can be selected by clearly indicating an index representing the model contribution of the item FI and selecting it for the user.

なお、本実施形態では、モデルの項目に関するＦＩを計算して、ユーザに提示したが、同様に、項目の寄与度を表す指標として、ＳＨＡＰのサマリプロットを用いて、ユーザに提示するようにしてもよい。ここで、ＳＨＡＰのサマリプロットとは、モデル構築ための説明変数が目的変数へ影響を与える寄与度を視覚的に表示したものである。 In the present embodiment, the FI related to the item of the model is calculated and presented to the user. Similarly, as an index showing the contribution of the item, the SHAP summary plot is used and presented to the user. May be good. Here, the SHAP summary plot is a visual representation of the contribution of the explanatory variables for model construction to the objective variable.

１０…入力データ
１１０…候補モデル分析部
１２０…転移学習部
１３０…モデル評価部
１４０…最適モデル選択部
１５０…モデル管理部
１６０…記憶部
２００……モデル管理ＤＢ
２１０…入力データＤＢ
２２０…計算結果出ＤＢ 10 ... Input data 110 ... Candidate model analysis unit 120 ... Transfer learning unit 130 ... Model evaluation unit 140 ... Optimal model selection unit 150 ... Model management unit 160 ... Storage unit 200 ... Model management DB
210 ... Input data DB
220 ... Calculation result output DB

Claims

An information processing device is a learning model selection method that selects a learning model suitable for transfer learning from existing learning models when generating a learning model for input data by transfer learning.
The input data includes the objective variable and the explanatory variables.
A step in which an information processing device generates a model by transfer learning using an existing learning model and input data.
A step of calculating an error between the inference result in which the input data is input to the model by the transfer learning and the true value which is the objective variable included in the input data.
A learning model selection method comprising: a step of selecting a learning model having a small error.

The learning model selection method according to claim 1, wherein the error is a mean square error.

An information processing device is a learning model selection method that selects a learning model suitable for transfer learning from existing learning models when generating a learning model for input data by transfer learning.
The step of calculating the statistic about the value of the specified item of the input data, and
The step of calculating the difference from the statistic regarding the value of the corresponding item of the existing model, and
The learning model selection method according to claim 1, further comprising a step of selecting a learning model having a small difference as a candidate for a learning model used for transfer learning.

The learning model selection method according to claim 3, wherein a mean or a variance is used as the statistic.

An information processing device is a learning model selection method that selects a learning model suitable for transfer learning from existing learning models when generating a learning model for input data by transfer learning.
A step of calculating an error between an inference result in which the input data is input to an existing learning model and a true value which is an objective variable included in the input data.
The learning model selection method according to claim 1, further comprising a step of selecting a learning model having a small error as a candidate for a learning model used for transfer learning.

The learning model selection method according to claim 5, wherein the error is a mean square error.

An information processing device is a learning model selection method that selects a learning model suitable for transfer learning from existing learning models when generating a learning model for input data by transfer learning.
A step of generating a new learning model from the input data and calculating the contribution of each item of the input data to the inference result of the new learning model.
Comparing the contribution of each item of the input data used when generating the existing model, the upper item with the large contribution of each item of the data of the new model and each item of the data of the existing model The learning model selection method according to claim 1, wherein a higher-ranking item having a large contribution includes a step of selecting an existing model that matches a predetermined number as a candidate for a learning model used for transfer learning.

The learning model selection method according to claim 7, wherein the contribution is a summary plot of FI (Feature Importance) or SHAP.