JP2016091343A

JP2016091343A - Information processing system, information processing method, and program

Info

Publication number: JP2016091343A
Application number: JP2014225791A
Authority: JP
Inventors: 大志加藤; Hiroshi Kato
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2014-11-06
Filing date: 2014-11-06
Publication date: 2016-05-23

Abstract

PROBLEM TO BE SOLVED: To efficiently execute an operation to an attribute in a data analysis using a statistical model.SOLUTION: A data analysis system 100 includes a statistic amount change calculation part 140 and an output part 180. The statistic amount change calculation part 140 calculates a change in a statistic amount related to a model generated by using a set of selection attributes obtained by a selection operation to each of a plurality of attributes from a statistic amount related to a model generated by using a set of present selection attributes among the plurality of attributes. The output part 180 outputs the calculated change in the static amount in association with each of the plurality of attributes.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理システム、情報処理方法、及び、プログラムに関し、特に、データ分析を行う情報処理システム、情報処理装置、情報処理方法、及び、プログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program, and more particularly, to an information processing system, an information processing apparatus, an information processing method, and a program that perform data analysis.

統計的モデルを用いたデータ分析においては、データを表現する多くの属性から、分析に利用する属性を選択する必要がある。ここでは、このような属性の選択を属性選択と呼ぶ。 In data analysis using a statistical model, it is necessary to select an attribute to be used for analysis from many attributes that represent data. Here, such attribute selection is referred to as attribute selection.

図５は、データ分析において用いられるデータの例を示す図である。図５のデータは、地域毎の家賃に係るデータの例である。図５の例では、データは、属性として、地域ＩＤ（Identifier）、人ロ、世帯数、平均年齢、男性比率、既婚率、物件数、空室率、平均年収、及び、平均家賃を含む。 FIG. 5 is a diagram illustrating an example of data used in data analysis. The data in FIG. 5 is an example of data related to rent for each region. In the example of FIG. 5, the data includes the region ID (Identifier), the number of people, the number of households, the average age, the male ratio, the marriage rate, the number of properties, the vacancy rate, the average annual income, and the average rent as attributes.

このようなデータに対する統計的モデルとして、回帰分析モデルが用いられる場合、例えば、属性の内の平均家賃を予測する一次関数式が、他の属性（人ロ、世帯数、平均年齢、男性比率、既婚率、物件数、空室率、平均年収）を変数として表現される。 When a regression analysis model is used as a statistical model for such data, for example, a linear function formula that predicts the average rent among the attributes is the other attributes (people, households, average age, male ratio, Marriage rate, number of properties, vacancy rate, average annual income) are expressed as variables.

統計的モデルの生成のために用いたデータ（学習データ）の数が属性数より少ない場合、生成された統計的モデルが未知のデータ（テストデータ）に対しては適合しない状態である、過学習が起こることがある。過学習を避けるためには、ＢＩＣ（Bayesian Information Criterion：ベイズ情報量規準）等の様々な情報量規準、及び、情報量規準に基づいて属性の集合の最適解を推定するアルゴリズムが提案されている。このようなアルゴリズムを用いて、多数の属性の内の一部の属性を選択することにより、過学習のない統計的モデルを得ることができる。 If the number of data (learning data) used to generate the statistical model is less than the number of attributes, the generated statistical model is not suitable for unknown data (test data). May happen. In order to avoid over-learning, various information criteria such as BIC (Bayesian Information Criterion) and algorithms for estimating an optimal solution of a set of attributes based on the information criterion have been proposed. . A statistical model without overlearning can be obtained by selecting some of the many attributes using such an algorithm.

ところが、上述のような、アルゴリズムによる属性選択で得られた属性の集合は、分析者の意向に沿わないこともある。すなわち、アルゴリズムでは、与えられた学習データに基づいて属性選択を行うが、分析者は、その他の知識として一般知識や、領域知識、直観等を持つため、アルゴリズムにより得られた属性の組を不適切と判断することがある。例えば、属性が特定の領域に係る特性であり、分析者が当該特定の領域に精通している場合に、分析者は、属性の値の取得の難易度等をもとに、アルゴリズムにより得られた属性の削除、或いは、他の属性の追加をすることがある。 However, the attribute set obtained by the attribute selection by the algorithm as described above may not conform to the intention of the analyst. In other words, the algorithm selects attributes based on given learning data, but the analyst has general knowledge, domain knowledge, intuition, etc. as other knowledge. May be considered appropriate. For example, if an attribute is a characteristic related to a specific area and the analyst is familiar with the specific area, the analyst can obtain the algorithm based on the difficulty of acquiring the attribute value. Attribute may be deleted, or other attributes may be added.

一般的に、上述のような分析者の知識を、事前に全属性について表現できれば、それらをアルゴリズムによる属性選択に反映することができる。しかしながら、例えば、分析者が知識を表現する方法を知らない場合、分析者の知識を反映することはできない。また、属性数が多い場合は、全属性に対する知識を反映できるとは限らない。さらに、分析者がアルゴリズムによる属性選択の結果を見たときに、新たな知識が生まれるような場合も、事前に、分析者の知識を反映することはできない。 Generally, if the analyst's knowledge as described above can be expressed in advance for all attributes, they can be reflected in attribute selection by an algorithm. However, for example, if the analyst does not know how to express knowledge, the knowledge of the analyst cannot be reflected. In addition, when the number of attributes is large, it is not always possible to reflect knowledge about all attributes. Furthermore, even when new knowledge is born when the analyst sees the result of attribute selection by the algorithm, the knowledge of the analyst cannot be reflected in advance.

したがって、アルゴリズムによる属性選択で得られる属性の集合に対して、分析者による属性の追加、削除等の操作が要求される。 Therefore, operations such as addition and deletion of attributes by an analyst are required for a set of attributes obtained by attribute selection by an algorithm.

統計的モデルを用いたデータ分析における、このような分析者による属性の操作に係る技術が、例えば、特許文献１、特許文献２に開示されている。 For example, Patent Literature 1 and Patent Literature 2 disclose techniques relating to attribute manipulation by an analyst in data analysis using a statistical model.

特許文献１に記載の入力変数選択支援装置は、各入力候補変数の出力変数に対する感度を生成し、生成した感度に基づいて入力変数を選択し、提示する。特許文献２に記載のデータ解析方法は、回帰分析において、説明属性、及び、目的属性を選択したときに、説明属性と目的属性との関係を解析結果として提示する。 The input variable selection support device described in Patent Literature 1 generates sensitivity for output variables of each input candidate variable, and selects and presents input variables based on the generated sensitivity. The data analysis method described in Patent Document 2 presents the relationship between the description attribute and the target attribute as the analysis result when the description attribute and the target attribute are selected in the regression analysis.

なお、関連技術として、非特許文献１には、近似アルゴリズムであるグリーディ法を用いた技術が開示されている。 As a related technique, Non-Patent Document 1 discloses a technique using a greedy method that is an approximation algorithm.

特開２０１０−２８２５４７号公報JP 2010-282547 A 特開２００４−１３９１９９号公報JP 2004-139199 A

Ji Liu他「Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint」、[online]、Journal of Machine Learning Research、[平成27年10月29日検索]、インターネット<http://jmlr.org/proceedings/papers/v32/liub14.pdf>Ji Liu et al. "Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint", [online], Journal of Machine Learning Research, [October 29, 2015 search], Internet <http://jmlr.org /proceedings/papers/v32/liub14.pdf>

上述した特許文献１や特許文献２の技術では、入力変数の出力変数に対する感度や、説明属性と目的属性間の関係は提示される。しかしながら、特許文献１や特許文献２の技術では、分析者は、現在選択されている入力変数や説明属性に対して、入力変数や説明属性の追加、削除等の操作を行った場合に、生成される統計的モデルがどのように変化するかを把握することはできない。 In the techniques of Patent Document 1 and Patent Document 2 described above, the sensitivity of the input variable to the output variable and the relationship between the explanation attribute and the target attribute are presented. However, in the techniques of Patent Literature 1 and Patent Literature 2, the analysis is performed when an operation such as addition or deletion of an input variable or explanation attribute is performed on the currently selected input variable or explanation attribute. It is not possible to know how the statistical model to be changed will change.

本発明の目的は、上述の課題を解決し、統計的モデルを用いたデータ分析における属性に対する操作を効率的に実行できる、情報処理システム、情報処理方法、及び、プログラムを提供することである。 An object of the present invention is to provide an information processing system, an information processing method, and a program capable of solving the above-described problems and efficiently executing operations on attributes in data analysis using a statistical model.

本発明の情報処理システムは、複数の属性の内の現在の選択属性の集合を用いて生成される所定のモデルに係る所定の統計量からの、前記複数の属性の各々に対する選択操作により得られる選択属性の集合を用いて生成される前記所定のモデルに係る前記所定の統計量の変化を算出する、算出手段と、前記複数の属性の各々に対応づけて、前記算出された前記所定の統計量の変化を出力する出力手段と、を備える。 The information processing system according to the present invention is obtained by a selection operation for each of the plurality of attributes from a predetermined statistic based on a predetermined model generated using a set of currently selected attributes among the plurality of attributes. A calculating means for calculating a change in the predetermined statistic relating to the predetermined model generated using a set of selection attributes; and the calculated predetermined statistic in association with each of the plurality of attributes. Output means for outputting a change in quantity.

本発明の情報処理方法は、複数の属性の内の現在の選択属性の集合を用いて生成される所定のモデルに係る所定の統計量からの、前記複数の属性の各々に対する選択操作により得られる選択属性の集合を用いて生成される前記所定のモデルに係る前記所定の統計量の変化を算出し、前記複数の属性の各々に対応づけて、前記算出された前記所定の統計量の変化を出力する。 The information processing method of the present invention is obtained by a selection operation for each of the plurality of attributes from a predetermined statistic based on a predetermined model generated using a set of currently selected attributes among the plurality of attributes. A change in the predetermined statistic relating to the predetermined model generated using a set of selection attributes is calculated, and the calculated change in the predetermined statistic is associated with each of the plurality of attributes. Output.

本発明のプログラムは、コンピュータに、複数の属性の内の現在の選択属性の集合を用いて生成される所定のモデルに係る所定の統計量からの、前記複数の属性の各々に対する選択操作により得られる選択属性の集合を用いて生成される前記所定のモデルに係る前記所定の統計量の変化を算出し、前記複数の属性の各々に対応づけて、前記算出された前記所定の統計量の変化を出力する、処理を実行させる。 The program of the present invention is obtained by a selection operation for each of the plurality of attributes from a predetermined statistic associated with a predetermined model generated using a set of currently selected attributes among the plurality of attributes. A change in the predetermined statistic related to the predetermined model generated using a set of selected attributes, and the calculated change in the predetermined statistic in association with each of the plurality of attributes. Is output and the process is executed.

本発明の効果は、統計的モデルを用いたデータ分析において、属性に対する操作を効率的に実行できることである。 An effect of the present invention is that operations on attributes can be efficiently executed in data analysis using a statistical model.

本発明の第１の実施の形態の特徴的な構成を示すブロック図である。It is a block diagram which shows the characteristic structure of the 1st Embodiment of this invention. 本発明の第１の実施の形態における、データ分析システム１００の構成を示す図である。It is a figure which shows the structure of the data analysis system 100 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、コンピュータにより実現されたデータ分析システム１００の構成を示すブロック図である。It is a block diagram which shows the structure of the data analysis system 100 implement | achieved by the computer in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、データ分析システム１００の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the data analysis system 100 in the 1st Embodiment of this invention. データ分析において用いられるデータの例を示す図である。It is a figure which shows the example of the data used in data analysis. 本発明の第１の実施の形態における、評価関数の値の算出結果の例を示す図である。It is a figure which shows the example of the calculation result of the value of an evaluation function in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の例を示す図である。It is a figure which shows the example of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の遷移の例を示す図である。It is a figure which shows the example of the transition of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、評価関数の値の算出結果の他の例を示す図である。It is a figure which shows the other example of the calculation result of the value of an evaluation function in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の遷移の他の例を示す図である。It is a figure which shows the other example of the transition of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、評価関数の値の算出結果の他の例を示す図である。It is a figure which shows the other example of the calculation result of the value of an evaluation function in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の他の例を示す図である。It is a figure which shows the other example of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の他の例を示す図である。It is a figure which shows the other example of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の他の例を示す図である。It is a figure which shows the other example of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の他の例を示す図である。It is a figure which shows the other example of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の他の例を示す図である。It is a figure which shows the other example of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の他の例を示す図である。It is a figure which shows the other example of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の他の例を示す図である。It is a figure which shows the other example of the attribute display screen 200 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、属性表示画面２００の遷移の他の例を示す図である。It is a figure which shows the other example of the transition of the attribute display screen 200 in the 1st Embodiment of this invention.

（第１の実施の形態）
本発明の第１の実施の形態について説明する。 (First embodiment)
A first embodiment of the present invention will be described.

はじめに、本発明の第１の実施の形態の構成を説明する。図２は、本発明の第１の実施の形態における、データ分析システム１００の構成を示す図である。データ分析システム１００は、本発明の情報処理システムの一実施形態である。 First, the configuration of the first exemplary embodiment of the present invention will be described. FIG. 2 is a diagram showing a configuration of the data analysis system 100 in the first embodiment of the present invention. The data analysis system 100 is an embodiment of the information processing system of the present invention.

図２を参照すると、データ分析システム１００は、データ記憶部１１０、初期属性選択部１２０、選択属性記憶部１３０、統計量変化算出部１４０（または、算出部）、モデル生成部１５０、分析部１６０、入力部１７０、及び、出力部１８０を含む。 Referring to FIG. 2, the data analysis system 100 includes a data storage unit 110, an initial attribute selection unit 120, a selection attribute storage unit 130, a statistic change calculation unit 140 (or calculation unit), a model generation unit 150, and an analysis unit 160. , An input unit 170 and an output unit 180.

データ記憶部１１０は、分析対象のデータ（学習データ、及び、テストデータ）を記憶する。データ記憶部１１０は、例えば、図５のようなデータを記憶する。なお、データは、例えばＣＳＶ（Comma-Separated Values）形式で表現されてもよい。この場合、典型的には１行目に属性名が付与される。 The data storage unit 110 stores data to be analyzed (learning data and test data). The data storage unit 110 stores, for example, data as illustrated in FIG. Note that the data may be expressed in CSV (Comma-Separated Values) format, for example. In this case, typically, an attribute name is given to the first line.

初期属性選択部１２０は、分析対象データの複数の属性の中から、アルゴリズムにより属性選択を行う。ここで、アルゴリズムでは、属性を用いて生成される所定の統計的モデル（以下、単にモデルと記載する）の「良さ」を表す所定の種別の統計量をもとに、属性が選択される。統計量の種別としては、モデルに係る情報量基準や、尤度、モデルによる予測誤差（以下、単に誤差と記載する）等が用いられる。 The initial attribute selection unit 120 selects an attribute from among a plurality of attributes of the analysis target data using an algorithm. Here, in the algorithm, an attribute is selected based on a predetermined type of statistic representing “goodness” of a predetermined statistical model (hereinafter simply referred to as a model) generated using the attribute. As the type of the statistic, an information criterion relating to the model, likelihood, prediction error (hereinafter simply referred to as an error) by the model, and the like are used.

モデルとしては、例えば、回帰モデルが用いられる。この場合、選択された属性が説明変数として用いられる。なお、複数の属性の値を用いてある属性の値を予測するモデルであれば、モデルは、自己回帰モデルやニューラルネットワークモデル等、他のモデルでもよい。 As the model, for example, a regression model is used. In this case, the selected attribute is used as an explanatory variable. Note that the model may be another model such as an autoregressive model or a neural network model as long as it is a model that predicts the value of a certain attribute using a plurality of attribute values.

選択属性記憶部１３０は、現在選択されている属性を記憶する。以下、分析対象のデータの複数の属性の内、選択されている属性を選択属性、選択されていない属性（選択属性以外の属性）を非選択属性と呼ぶ。 The selection attribute storage unit 130 stores the currently selected attribute. Hereinafter, among the plurality of attributes of the data to be analyzed, a selected attribute is referred to as a selection attribute, and an unselected attribute (an attribute other than the selection attribute) is referred to as a non-selection attribute.

統計量変化算出部１４０は、分析対象データの複数の属性の各々について、当該属性に対する選択操作により得られる選択属性の集合を用いて生成されるモデルの統計量の、現在の選択属性の集合を用いて生成されるモデルの統計量からの変化を算出する。ここで、選択操作は、現在の選択属性の非選択属性への変更（属性削除）、現在の非選択属性の選択属性への変更（属性追加）を含む。 The statistic change calculation unit 140 calculates, for each of a plurality of attributes of the analysis target data, a set of currently selected attributes of a model statistic generated using a set of selection attributes obtained by a selection operation on the attribute. To calculate the change from the model statistic generated. Here, the selection operation includes a change of the current selection attribute to the non-selection attribute (attribute deletion) and a change of the current non-selection attribute to the selection attribute (attribute addition).

各属性について算出される統計量の変化の大きさは、当該属性に対する選択操作の価値を示す。本発明の第１の実施の形態では、統計量の変化の大きさを表す関数を、選択操作の評価関数と呼ぶ。統計量変化算出部１４０は、当該評価関数により、統計量の変化を算出する。評価関数としては、例えば、統計量の差分や、割合等が用いられる。 The magnitude of the change in statistic calculated for each attribute indicates the value of the selection operation for that attribute. In the first embodiment of the present invention, a function representing the magnitude of a change in statistics is referred to as an evaluation function for selection operation. The statistic change calculation unit 140 calculates a change in statistic using the evaluation function. As the evaluation function, for example, a statistic difference or a ratio is used.

なお、初期属性選択部１２０が、アルゴリズムによる属性選択で用いる統計量の種別と、統計量変化算出部１４０が、評価関数の算出に用いる統計量の種別とは、同じであっても、異なっていてもよい。また、統計量変化算出部１４０は、分析者等からの要求等に応じて、評価関数の算出に用いる統計量の種別を変更してもよい。 Note that the statistic type used by the initial attribute selection unit 120 for attribute selection by the algorithm and the statistic type used by the statistic change calculation unit 140 for calculating the evaluation function are the same or different. May be. Further, the statistic change calculation unit 140 may change the type of the statistic used for calculating the evaluation function in response to a request from an analyst or the like.

モデル生成部１５０は、学習データを用いて、統計量変化算出部１４０により指示されたモデルを生成し、生成したモデルの統計量を算出する。 The model generation unit 150 generates a model instructed by the statistic change calculation unit 140 using the learning data, and calculates the statistic of the generated model.

分析部１６０は、モデルを用いた、新たな分析対象データに対する予測等、所定の分析を行う。 The analysis unit 160 performs a predetermined analysis such as prediction for new analysis target data using a model.

入力部１７０は、分析者等から、選択操作の入力を受け付ける。 The input unit 170 receives an input of a selection operation from an analyst or the like.

出力部１８０は、統計量変化算出部１４０において算出された評価関数の値を、各属性に対応づけて出力（表示）する。 The output unit 180 outputs (displays) the value of the evaluation function calculated by the statistic change calculation unit 140 in association with each attribute.

なお、データ分析システム１００は、それぞれ、ＣＰＵ（Central Processing Unit）とプログラムを記憶した記憶媒体を含み、プログラムに基づく制御によって動作するコンピュータであってもよい。 Each of the data analysis systems 100 may be a computer that includes a CPU (Central Processing Unit) and a storage medium that stores a program and that operates by control based on the program.

図３は、本発明の第１の実施の形態における、コンピュータにより実現されたデータ分析システム１００の構成を示すブロック図である。 FIG. 3 is a block diagram showing a configuration of a data analysis system 100 realized by a computer according to the first embodiment of the present invention.

図３を参照すると、データ分析システム１００は、ＣＰＵ１０１、ハードディスクやメモリ等の記憶デバイス（記憶媒体）１０２、他の装置等と通信を行う通信デバイス１０３、マウスやキーボード等の入力デバイス１０４、及び、ディスプレイ等の出力デバイス１０５を含む。 Referring to FIG. 3, a data analysis system 100 includes a CPU 101, a storage device (storage medium) 102 such as a hard disk and a memory, a communication device 103 that communicates with other devices, an input device 104 such as a mouse and a keyboard, An output device 105 such as a display is included.

ＣＰＵ１０１は、初期属性選択部１２０、統計量変化算出部１４０、モデル生成部１５０、分析部１６０、入力部１７０、及び、出力部１８０の機能を実現するためのコンピュータプログラムを実行する。記憶デバイス１０２は、データ記憶部１１０、及び、選択属性記憶部１３０のデータを記憶する。入力デバイス１０４は、分析者等から、選択操作の入力を受け付ける。出力デバイス１０５は、分析者等へ、評価関数の値を出力する。なお、入力デバイス１０４に代わって、通信デバイス１０３が、他の装置から選択操作の入力を受け付けてもよい。また、出力デバイス１０５に代わって、通信デバイス１０３が、他の装置へ評価関数の値を出力してもよい。 The CPU 101 executes a computer program for realizing the functions of the initial attribute selection unit 120, the statistic change calculation unit 140, the model generation unit 150, the analysis unit 160, the input unit 170, and the output unit 180. The storage device 102 stores data in the data storage unit 110 and the selection attribute storage unit 130. The input device 104 receives an input of a selection operation from an analyst or the like. The output device 105 outputs the value of the evaluation function to an analyst or the like. Instead of the input device 104, the communication device 103 may accept an input of a selection operation from another apparatus. Further, instead of the output device 105, the communication device 103 may output the value of the evaluation function to another apparatus.

また、データ分析システム１００は、図２に示された各構成要素が、有線または無線で接続された複数の物理的な装置に分散的に配置されることより構成されていてもよい。 In addition, the data analysis system 100 may be configured such that each component illustrated in FIG. 2 is distributed in a plurality of physical devices connected by wire or wirelessly.

次に、本発明の第１の実施の形態の動作を説明する。図４は、本発明の第１の実施の形態における、データ分析システム１００の動作を示すフローチャートである。 Next, the operation of the first exemplary embodiment of the present invention will be described. FIG. 4 is a flowchart showing the operation of the data analysis system 100 according to the first embodiment of the present invention.

ここでは、分析対象のデータとして、図５に示すデータがデータ記憶部１１０に記憶されていると仮定する。またモデルとして、図５における属性「平均家賃」を目的変数、属性「地域ＩＤ」「平均家賃」以外の属性から選択された属性を説明変数とした、数１式のような回帰モデルを生成すると仮定する。 Here, it is assumed that data shown in FIG. 5 is stored in the data storage unit 110 as data to be analyzed. Further, as a model, when a regression model such as Equation 1 is generated with the attribute “average rent” in FIG. 5 as an objective variable and an attribute selected from attributes other than the attributes “region ID” and “average rent” as explanatory variables, Assume.

ここで、ｙ’ｉは、ｉ番目のデータにおける目的変数の推定値、ｘｊｉ、はｉ番目のデータにおけるｊ番目の説明変数の値、Ａｊ（ｊ＝１、…、Ｍ）は、ｊ番目の説明変数に対する係数、Ｍは、説明変数の数である。 Here, y′i is the estimated value of the objective variable in the i th data, x ji is the value of the j th explanatory variable in the i th data, and Aj (j = 1,..., M) is the j th The coefficient, M, for the explanatory variable is the number of explanatory variables.

はじめに、初期属性選択部１２０は、データ記憶部１１０に記憶されている分析対象のデータに含まれる複数の属性の中から、アルゴリズムによる属性選択を行う（ステップＳ１０１）。 First, the initial attribute selection unit 120 performs attribute selection by an algorithm from among a plurality of attributes included in the analysis target data stored in the data storage unit 110 (step S101).

例えば、初期属性選択部１２０は情報量基準をもとにしたアルゴリズムにより、属性「人口」、「男性比率」、「空室率」を選択する。 For example, the initial attribute selection unit 120 selects the attributes “population”, “male ratio”, and “vacancy ratio” by an algorithm based on the information amount criterion.

初期属性選択部１２０は、選択された属性を選択属性記憶部１３０に保存する（ステップＳ１０２）。 The initial attribute selection unit 120 stores the selected attribute in the selection attribute storage unit 130 (step S102).

統計量変化算出部１４０は、モデル生成部１５０に対して、現在の選択属性の集合を用いたモデルの生成、及び、生成されたモデルの統計量の算出を指示する（ステップＳ１０３）。 The statistic change calculation unit 140 instructs the model generation unit 150 to generate a model using the current set of selected attributes and to calculate the statistic of the generated model (step S103).

モデル生成部１５０は、データ記憶部１１０に記憶されている学習データをもとに、現在の選択属性の集合を用いてモデルを生成する。そして、モデル生成部１５０は、データ記憶部１１０に記憶されているデータを用いて、生成したモデルの統計量を算出する。 The model generation unit 150 generates a model using the current set of selection attributes based on the learning data stored in the data storage unit 110. And the model production | generation part 150 calculates the statistics of the produced | generated model using the data memorize | stored in the data storage part 110. FIG.

例えば、モデル生成部１５０は、現在の選択属性の集合｛「人口」、「男性比率」、「空室率」｝を用いて、回帰モデルを生成する。ここで、統計量として誤差を用いる場合、モデル生成部１５０は、例えば、数２式により示される、回帰モデルの平均誤差二乗誤差の根（RMSE: Root Mean Square Error）を、生成したモデルの誤差として算出する。 For example, the model generation unit 150 generates a regression model using the current set of selected attributes {“population”, “male ratio”, “vacancy rate”}. Here, when using an error as a statistic, the model generation unit 150 generates, for example, the root mean square error (RMSE) of the regression model, which is expressed by Equation 2, and the error of the generated model. Calculate as

ここで、Ｎは、誤差の算出に用いるデータの数、ｙｉは、ｉ番目のデータにおける目的変数の観測値である。なお、誤差の算出には、例えば、学習データが用いられる。また、誤差の算出に、テストデータ、または、学習データとテストデータの両方が用いられてもよい。また、モデル生成部１５０は、数１式に限らず、誤差の絶対値の最大値（MAE: Max Absolute Error）等、他の方法により、生成したモデルの誤差を算出してもよい。 Here, N is the number of data used for error calculation, and yi is the observed value of the objective variable in the i-th data. For example, learning data is used to calculate the error. In addition, test data or both learning data and test data may be used for calculating the error. In addition, the model generation unit 150 may calculate the error of the generated model by other methods, such as the maximum value of the absolute value of error (MAE: Max Absolute Error), without being limited to Equation 1.

次に、統計量変化算出部１４０は、現在の選択属性の内の１つ抽出する（ステップＳ１０４）。 Next, the statistic change calculation unit 140 extracts one of the currently selected attributes (step S104).

例えば、統計量変化算出部１４０は、現在の選択属性から属性「人口」を抽出する。 For example, the statistic change calculation unit 140 extracts the attribute “population” from the current selection attribute.

統計量変化算出部１４０は、モデル生成部１５０に対して、現在の選択属性の集合から、抽出した属性を削除して得られる選択属性の集合を用いたモデルの生成、及び、当該生成したモデルに係る統計量の算出を指示する（ステップＳ１０５）。 The statistic change calculation unit 140 generates a model using the set of selection attributes obtained by deleting the extracted attributes from the current set of selection attributes, and the generated model Is instructed to calculate the statistic (step S105).

例えば、モデル生成部１５０は、現在の選択属性の集合から属性「人口」を削除した選択属性の集合｛「男性比率」、「空室率」｝を用いて、回帰モデルを生成する。モデル生成部１５０は、数２式により、生成した回帰モデルの誤差を算出する。 For example, the model generation unit 150 generates a regression model by using a selection attribute set {“male ratio”, “vacancy rate”} obtained by deleting the attribute “population” from the current selection attribute set. The model generation unit 150 calculates an error of the generated regression model by using Formula 2.

統計量変化算出部１４０は、抽出した属性に係る評価関数の値を算出する（ステップＳ１０６）。 The statistic change calculation unit 140 calculates the value of the evaluation function related to the extracted attribute (step S106).

ここで、評価関数として統計量（誤差）の差分を用いる場合、統計量変化算出部１４０は、ステップＳ１０６で算出された誤差からステップＳ１０３で算出された誤差を減算することにより、評価関数の値を算出する。 Here, in the case of using a statistic (error) difference as the evaluation function, the statistic change calculation unit 140 subtracts the error calculated in step S103 from the error calculated in step S106, thereby obtaining the value of the evaluation function. Is calculated.

統計量変化算出部１４０は、現在の選択属性の全てについて、ステップＳ１０４からの処理を繰り返す（ステップＳ１０７）。 The statistic change calculation unit 140 repeats the processing from step S104 for all the currently selected attributes (step S107).

例えば、モデル生成部１５０は、現在の選択属性から「男性比率」、「空室率」の各々を削除して得られる選択属性の集合を用いて、回帰モデルの生成、誤差の算出、及び、評価関数の値の算出を行う。 For example, the model generation unit 150 generates a regression model, calculates an error, and uses a set of selection attributes obtained by deleting each of “male ratio” and “vacancy rate” from the current selection attribute, and The value of the evaluation function is calculated.

次に、統計量変化算出部１４０は、現在の非選択属性の内の１つ抽出する（ステップＳ１０８）。 Next, the statistic change calculation unit 140 extracts one of the current non-selected attributes (step S108).

例えば、統計量変化算出部１４０は、現在の非選択属性から属性「世帯数」を抽出する。 For example, the statistic change calculation unit 140 extracts the attribute “number of households” from the current non-selected attribute.

統計量変化算出部１４０は、モデル生成部１５０に対して、現在の選択属性に、抽出した属性を追加して得られる選択属性の集合を用いたモデルの生成、及び、当該生成したモデルに係る統計量の算出を指示する（ステップＳ１０９）。 The statistic change calculation unit 140 generates a model using a set of selection attributes obtained by adding the extracted attribute to the current selection attribute to the model generation unit 150, and relates to the generated model. The calculation of statistics is instructed (step S109).

例えば、モデル生成部１５０は、現在の選択属性に属性「世帯数」を追加した選択属性の集合｛「人口」、「男性比率」、「空室率」、「世帯数」｝を用いて、回帰モデルを生成する。モデル生成部１５０は、数２式により、生成した回帰モデルの誤差を算出する。 For example, the model generation unit 150 uses a set of selection attributes {“population”, “male ratio”, “vacancy rate”, “number of households”} obtained by adding the attribute “number of households” to the current selection attribute, Generate a regression model. The model generation unit 150 calculates an error of the generated regression model by using Formula 2.

統計量変化算出部１４０は、抽出した属性に係る評価関数の値を算出する（ステップＳ１１０）。 The statistic change calculation unit 140 calculates the value of the evaluation function related to the extracted attribute (step S110).

ここで、統計量変化算出部１４０は、ステップＳ１１０で算出された誤差からステップＳ１０３で算出された誤差を減算することにより、評価関数の値を算出する。 Here, the statistic change calculation unit 140 calculates the value of the evaluation function by subtracting the error calculated in step S103 from the error calculated in step S110.

統計量変化算出部１４０は、現在の非選択属性の全てについて、ステップＳ１０８からの処理を繰り返す（ステップＳ１１１）。 The statistic change calculation unit 140 repeats the processing from step S108 for all the current non-selected attributes (step S111).

例えば、モデル生成部１５０は、現在の選択属性に「平均年齢」、「既婚率」、「物件数」、「平均年収」の各々を追加して得られる選択属性の集合を用いて、回帰モデルの生成、誤差の算出、及び、評価関数の値の算出を行う。 For example, the model generation unit 150 uses a set of selection attributes obtained by adding each of “average age”, “married ratio”, “number of properties”, and “average annual income” to the current selection attribute, and uses the regression model. Generation, error calculation, and evaluation function value calculation.

図６は、本発明の第１の実施の形態における、評価関数の値の算出結果の例を示す図である。例えば、図６のように、現在の選択属性、及び、非選択属性の各々について評価関数の値が算出される。 FIG. 6 is a diagram illustrating an example of the calculation result of the value of the evaluation function in the first embodiment of the present invention. For example, as shown in FIG. 6, the value of the evaluation function is calculated for each of the currently selected attribute and the non-selected attribute.

次に、出力部１８０は、分析データに含まれる属性を、選択状態（選択属性、または、非選択属性）、及び、算出された評価関数の値に応じた順番で配置された属性表示画面２００を生成し、分析者等に出力（表示）する（ステップＳ１１２）。 Next, the output unit 180 selects the attributes included in the analysis data from the attribute display screen 200 arranged in the order corresponding to the selected state (selected attribute or non-selected attribute) and the calculated evaluation function value. Is generated and output (displayed) to an analyst or the like (step S112).

ここで、各属性は、以下のような第１のルール、及び、第２のルールに従った順番で、属性表示画面２００上に配置される。 Here, each attribute is arranged on the attribute display screen 200 in the order according to the following first rule and second rule.

第１のルールは、選択状態（選択属性、または、非選択属性）による順番である。すなわち、属性表示画面２００において、属性は、選択属性、非選択属性の順に配置される。 The first rule is the order according to the selection state (selected attribute or non-selected attribute). That is, on the attribute display screen 200, the attributes are arranged in the order of selected attributes and non-selected attributes.

第２のルールは、評価関数の値の大きさの順番である。すなわち、属性表示画面２００において、選択属性、及び、非選択属性は、評価関数の値の大きい順（降順）、または、小さい順（昇順）に並べて配置される。 The second rule is the order of the magnitudes of the evaluation function values. That is, on the attribute display screen 200, the selected attribute and the non-selected attribute are arranged in the order of descending order of the evaluation function values (descending order) or in ascending order (ascending order).

ここで、統計量が小さいほど良いモデルを表す場合、選択属性は評価関数の値の大きい順に、非選択属性は評価関数の値の小さい順に配置される。一方、統計量が大きいほど良いモデルを表す場合、選択属性は評価関数の値の小さい順に、非選択属性は評価関数の値の大きい順に配置される。 Here, when a smaller model represents a better model, selection attributes are arranged in descending order of evaluation function values, and non-selection attributes are arranged in ascending order of evaluation function values. On the other hand, when a larger model represents a better model, selection attributes are arranged in ascending order of evaluation function values, and non-selection attributes are arranged in descending order of evaluation function values.

これにより、選択属性と非選択属性の境目で、現在の選択属性の集合を用いて生成されたモデルの統計量からの、選択操作による統計量の変化が最小となる（ピークを示す）ように、選択属性と非選択属性とが配置される。この場合、選択属性と非選択属性の境目に近い属性に対する選択操作を行っても、当該選択操作により得られる選択属性の集合を用いて生成されるモデルの良さは、現在の選択属性の集合を用いて生成されたモデルからあまり低下しない。一方、選択属性と非選択属性の境目から遠い属性に対する選択操作を行った場合は、モデルの良さは大きく低下する。 As a result, the change in the statistic due to the selection operation from the statistic of the model generated using the current set of selected attributes at the boundary between the selected attribute and the non-selected attribute is minimized (indicating a peak). The selection attribute and the non-selection attribute are arranged. In this case, even if a selection operation is performed on an attribute close to the boundary between the selection attribute and the non-selection attribute, the goodness of the model generated using the set of selection attributes obtained by the selection operation is determined based on the current set of selection attributes. It does not drop much from the model generated using it. On the other hand, when a selection operation is performed on an attribute far from the boundary between the selected attribute and the non-selected attribute, the goodness of the model is greatly reduced.

図７は、本発明の第１の実施の形態における、属性表示画面２００の例を示す図である。 FIG. 7 is a diagram showing an example of the attribute display screen 200 in the first embodiment of the present invention.

例えば、出力部１８０は、図６の評価関数の値をもとに、図７のような属性表示画面２００を表示する。 For example, the output unit 180 displays the attribute display screen 200 as shown in FIG. 7 based on the value of the evaluation function shown in FIG.

図７では、属性表示画面２００における上部を第一優先、かつ、左側を第二優先として、各属性の属性名を示す矩形が、グリッド状に、順番に配置されている。現在の選択属性は、太線で区別されている。そして、上述の第１のルールに従って、属性が、現在の選択属性、非選択属性の順番に表示されている。また、上述の第２のルールに従って、現在の選択属性である、「人口」、「男性比率」、「空室率」は、評価関数の値（誤差の差分）の大きい順に配置されている。また、非選択属性である、「世帯数」、「平均年齢」、「既婚率」、「物件数」、「平均年収」は、評価関数の値（誤差の差分）の小さい順に配置されている。 In FIG. 7, rectangles indicating the attribute names of the respective attributes are arranged in order in the form of a grid, with the top of the attribute display screen 200 having the first priority and the left side having the second priority. The current selection attribute is distinguished by a bold line. And according to the above-mentioned 1st rule, an attribute is displayed in order of the present selection attribute and a non-selection attribute. Further, according to the second rule described above, the current selection attributes “population”, “male ratio”, and “vacancy rate” are arranged in descending order of evaluation function values (error difference). In addition, the non-selected attributes “number of households”, “average age”, “married rate”, “number of properties”, and “average annual income” are arranged in ascending order of evaluation function values (difference in error). .

分析者等は、図７の属性表示画面２００を参照し、削除すべき属性、追加すべき属性を判断する。このとき、分析者等は、属性の配置された順番により、各属性に対する選択操作を行った場合の、モデルの良さの低下の度合いを把握できる。すなわち、分析者等は、選択属性と非選択属性の境目に近い属性を削除、追加することで、現在の選択属性に対するモデルからモデルの良さを大きく低下させることなく、新たな選択属性の集合を得ることができる。 An analyst or the like refers to the attribute display screen 200 of FIG. At this time, the analyst or the like can grasp the degree of decrease in the goodness of the model when the selection operation is performed on each attribute according to the order in which the attributes are arranged. In other words, analysts and others can create a new set of selected attributes without significantly degrading the model from the model for the current selected attribute by deleting and adding attributes close to the boundary between the selected attribute and the non-selected attribute. Can be obtained.

次に、入力部１７０は、分析者等から、属性表示画面２００上で、属性に対する選択操作の入力を受け付ける（ステップＳ１１３）。 Next, the input unit 170 receives an input of a selection operation for the attribute on the attribute display screen 200 from an analyst or the like (step S113).

ここで、入力部１７０は、例えば、選択属性がマウスによりクリックされた場合、当該選択属性を非選択属性へ変更（属性削除）と判断する。また、非選択属性がマウスによりクリックされた場合、当該非選択属性を選択属性へ変更（属性追加）と判断する。 Here, for example, when the selection attribute is clicked with the mouse, the input unit 170 determines that the selection attribute is changed to a non-selection attribute (attribute deletion). If the non-selected attribute is clicked with the mouse, it is determined that the non-selected attribute is changed to the selected attribute (attribute addition).

入力部１７０は、選択操作に従って、現在の選択属性を更新し、選択属性記憶部１３０に保存する（ステップＳ１１４）。また、出力部１８０は、選択操作の結果を、属性表示画面２００上に反映する。 In accordance with the selection operation, the input unit 170 updates the current selection attribute and stores it in the selection attribute storage unit 130 (step S114). Further, the output unit 180 reflects the result of the selection operation on the attribute display screen 200.

以降、ステップＳ１０３からの処理が繰り返し実行されることにより、新たな選択状態、及び、評価関数の値に応じて、属性表示画面２００における属性が再配置される。 Thereafter, by repeatedly executing the processing from step S103, the attributes on the attribute display screen 200 are rearranged according to the new selection state and the value of the evaluation function.

図８、図１０は、本発明の第１の実施の形態における、属性表示画面２００の遷移の例を示す図である。図９、図１１は、本発明の第１の実施の形態における、評価関数の値の算出結果の他の例を示す図である。 8 and 10 are diagrams showing an example of transition of the attribute display screen 200 in the first exemplary embodiment of the present invention. 9 and 11 are diagrams showing another example of the calculation result of the evaluation function value in the first exemplary embodiment of the present invention.

例えば、図８の状態（ａ）の属性表示画面２００上で、選択属性「人口」がクリック（削除）された場合、出力部１８０は、図８の状態（ｂ）の属性表示画面２００を表示する。 For example, when the selected attribute “population” is clicked (deleted) on the attribute display screen 200 in the state (a) in FIG. 8, the output unit 180 displays the attribute display screen 200 in the state (b) in FIG. 8. To do.

モデル生成部１５０は、新たな選択属性の集合｛「男性比率」、「空室率」｝を用いて、回帰モデルを生成し、誤差を算出する。また、モデル生成部１５０は、新たな選択属性「男性比率」、「空室率」の各々を削除して得られる選択属性の集合を用いて、回帰モデルの生成、誤差の算出、及び、評価関数の値の算出を行う。さらに、モデル生成部１５０は、新たな選択属性に非選択属性「人口」、「世帯数」、「平均年齢」、「既婚率」、「物件数」、「平均年収」の各々を追加して得られる選択属性の集合を用いて、回帰モデルの生成、誤差の算出、及び、評価関数の値の算出を行う。統計量変化算出部１４０は、図９のように、新たな選択属性、及び、非選択属性の各々について評価関数の値を算出する。そして、出力部１８０は、図９の評価関数の値をもとに、図８の状態（ｃ）のように、属性表示画面２００上の属性を再配置する。 The model generation unit 150 generates a regression model using a set of new selection attributes {“male ratio”, “vacancy rate”}, and calculates an error. The model generation unit 150 generates a regression model, calculates an error, and evaluates using a set of selection attributes obtained by deleting each of the new selection attributes “male ratio” and “vacancy rate”. Calculate the value of the function. Further, the model generation unit 150 adds the non-selection attributes “population”, “number of households”, “average age”, “marital ratio”, “number of properties”, and “average annual income” to the new selection attributes. A set of selection attributes obtained is used to generate a regression model, calculate an error, and calculate an evaluation function value. As shown in FIG. 9, the statistic change calculation unit 140 calculates the value of the evaluation function for each of the new selection attribute and the non-selection attribute. Then, the output unit 180 rearranges the attributes on the attribute display screen 200 based on the value of the evaluation function in FIG. 9 as in the state (c) in FIG.

同様に、例えば、図１０の状態（ａ）の属性表示画面２００上で、非選択属性「平均年齢」がクリック（追加）された場合、出力部１８０は、図１０の状態（ｂ）の属性表示画面２００を表示する。 Similarly, for example, when the non-selected attribute “average age” is clicked (added) on the attribute display screen 200 in the state (a) in FIG. 10, the output unit 180 displays the attribute in the state (b) in FIG. 10. A display screen 200 is displayed.

モデル生成部１５０は、新たな選択属性の集合｛「人口」、「男性比率」、「空室率」、「平均年齢」｝を用いて、回帰モデルを生成し、誤差を算出する。また、モデル生成部１５０は、新たな選択属性「人口」、「男性比率」、「空室率」、「平均年齢」の各々を削除して得られる選択属性の集合を用いて、回帰モデルの生成、誤差の算出、及び、評価関数の値の算出を行う。さらに、モデル生成部１５０は、新たな選択属性に非選択属性「世帯数」、「既婚率」、「物件数」、「平均年収」の各々を追加して得られる選択属性の集合を用いて、回帰モデルの生成、誤差の算出、及び、評価関数の値の算出を行う。統計量変化算出部１４０は、図１１のように、新たな選択属性、及び、非選択属性の各々について評価関数の値を算出する。そして、出力部１８０は、図１１の評価関数の値をもとに、図１０の状態（ｃ）のように、属性表示画面２００上の属性を再配置する。 The model generation unit 150 generates a regression model using a new set of selection attributes {“population”, “male ratio”, “vacancy rate”, “average age”}, and calculates an error. In addition, the model generation unit 150 uses the set of selection attributes obtained by deleting each of the new selection attributes “population”, “male ratio”, “vacancy rate”, and “average age”, and Generation, error calculation, and evaluation function value calculation are performed. Further, the model generation unit 150 uses a set of selection attributes obtained by adding each of the non-selection attributes “number of households”, “married ratio”, “number of properties”, and “average annual income” to the new selection attributes. The regression model is generated, the error is calculated, and the evaluation function value is calculated. The statistic change calculating unit 140 calculates the value of the evaluation function for each of the new selection attribute and the non-selection attribute, as shown in FIG. Then, the output unit 180 rearranges the attributes on the attribute display screen 200 based on the value of the evaluation function in FIG. 11, as in the state (c) in FIG.

以上により、本発明の第１の実施の形態の動作が完了する。 Thus, the operation of the first exemplary embodiment of the present invention is completed.

なお、上述の説明では、属性表示画面２００において、属性が、評価関数の値に応じた順番で配置された。しかしながら、これに限らず、属性は、各属性に係る評価関数の値が識別可能なように配置されてもよい。 In the above description, the attributes are arranged in the order according to the value of the evaluation function on the attribute display screen 200. However, the present invention is not limited to this, and the attributes may be arranged so that the value of the evaluation function related to each attribute can be identified.

図１２、図１３、図１４、図１５、図１６、及び、図１７は、本発明の第１の実施の形態における、属性表示画面２００の他の例を示す図である。 12, 13, 14, 15, 16, and 17 are diagrams showing another example of the attribute display screen 200 in the first exemplary embodiment of the present invention.

例えば、出力部１８０は、属性表示画面２００において、図１２のように、属性間の評価関数の値の差分に応じた間隔で各属性を配置してもよい。 For example, in the attribute display screen 200, the output unit 180 may arrange the attributes at intervals according to the difference in the value of the evaluation function between the attributes as shown in FIG.

また、出力部１８０は、属性表示画面２００において、図１３のように、１次元のスケール上の評価関数の値に応じた位置に各属性を配置してもよい。この場合、１次元のスケールは、直線状である必要はなく、例えば、螺旋状であってもよい。 Further, the output unit 180 may arrange each attribute at a position corresponding to the value of the evaluation function on the one-dimensional scale on the attribute display screen 200 as shown in FIG. In this case, the one-dimensional scale does not need to be linear, and may be, for example, a spiral.

また、出力部１８０は、属性表示画面２００において、図１４のように、２次元の平面上で中心からの距離が評価関数の絶対値に対応するように各属性を配置してもよい。この場合、出力部１８０は、各属性の位置を、ばねモデルにより調整してもよい。 Further, the output unit 180 may arrange each attribute on the attribute display screen 200 such that the distance from the center on the two-dimensional plane corresponds to the absolute value of the evaluation function as shown in FIG. In this case, the output unit 180 may adjust the position of each attribute using a spring model.

また、出力部１８０は、属性表示画面２００において、図１５のように、各属性の評価関数の値とともに、当該評価関数の値の算出に用いられた統計量を示してもよい。ここで、例えば「RMSE:0.7->1.1(+0.4)」は、現在の選択属性の集合を用いて生成されるモデルのRMSEが「0.7」、選択操作により得られる選択属性の集合を用いて生成されるモデルのRMSEが「1.1」、RMSEの差分が「+0.4」であることを示す。この場合、図１６のように、マウス２０１が属性にオーバラップされたときに、当該属性に係る統計量がポップアップ領域２０２で表示されてもよい。また、図１７のように、マウス２０１が属性にオーバラップされたときに、当該属性に係る統計量が共通の表示領域２０３に表示されてもよい。 Moreover, the output part 180 may show the statistic used for calculation of the value of the said evaluation function with the value of the evaluation function of each attribute on the attribute display screen 200 as shown in FIG. Here, for example, “RMSE: 0.7-> 1.1 (+0.4)” is “0.7” for the model generated using the current set of selection attributes, and the set of selection attributes obtained by the selection operation is used. It indicates that the RMSE of the generated model is “1.1” and the difference between the RMSE is “+0.4”. In this case, as shown in FIG. 16, when the mouse 201 is overlapped with an attribute, the statistics related to the attribute may be displayed in the pop-up area 202. In addition, as shown in FIG. 17, when the mouse 201 is overlapped with an attribute, the statistics related to the attribute may be displayed in the common display area 203.

また、各属性は、当該属性に係る評価関数の値が識別可能なように、評価関数の値に応じた色（評価関数の値に応じた色相や明度、彩度等）で表示されてもよい。また、各属性は、評価関数の値に応じた形状、大きさ等で表示されてもよい。 Each attribute may be displayed in a color corresponding to the value of the evaluation function (hue, brightness, saturation, etc. depending on the value of the evaluation function) so that the value of the evaluation function related to the attribute can be identified. Good. Each attribute may be displayed in a shape, size, or the like according to the value of the evaluation function.

また、出力部１８０は、各属性に係る評価関数の値や当該評価関数の値の算出に用いられた統計量を示す代わりに、モデルによる予測値を出力してもよい。 Further, the output unit 180 may output a predicted value based on the model instead of indicating the value of the evaluation function related to each attribute or the statistic used to calculate the value of the evaluation function.

図１８は、本発明の第１の実施の形態における、属性表示画面２００のさらに他の例を示す図である。 FIG. 18 is a diagram showing still another example of the attribute display screen 200 in the first exemplary embodiment of the present invention.

図１８の例では、属性表示画面２００に、横軸がデータの識別子（「地域ＩＤ」）、縦軸が目的変数「平均家賃」である２次元のグラフ２０４が配置されている。グラフ２０４では、データに含まれる目的変数「平均家賃」の観測値、及び、回帰モデルによる予測値が示されている。この場合、例えば、図１８の状態（ａ）のように、マウス２０１がどの属性にもオーバラップされていないときは、現在の選択属性の集合を用いて生成される回帰モデルによる目的変数「平均家賃」の予測値が表示される。また、図１８の状態（ｂ）のように、マウス２０１が属性にオーバラップされたときは、当該属性に対する選択操作により得られる選択属性の集合を用いて生成される回帰モデルによる目的変数「平均家賃」の予測値が表示される。これにより、分析者等は、属性に対する選択操作に伴う、モデルによる予測値の変化を視覚的に把握できる。 In the example of FIG. 18, a two-dimensional graph 204 is arranged on the attribute display screen 200, where the horizontal axis is the data identifier (“region ID”) and the vertical axis is the objective variable “average rent”. In the graph 204, the observed value of the objective variable “average rent” included in the data and the predicted value based on the regression model are shown. In this case, for example, when the mouse 201 is not overlapped with any attribute as in the state (a) of FIG. 18, the objective variable “average” by the regression model generated using the current set of selected attributes is used. The predicted value of “rent” is displayed. When the mouse 201 is overlapped with an attribute as in the state (b) of FIG. 18, the objective variable “average” by the regression model generated using the set of selection attributes obtained by the selection operation for the attribute is used. The predicted value of “rent” is displayed. Thereby, an analyst etc. can grasp visually change of a predicted value by a model accompanying selection operation to an attribute.

また、上述の説明では、各属性は、第１のルール、及び、第２のルールに従った順番で、属性表示画面２００上に配置された。しかしながら、これに限らず、出力部１８０は、第１のルールを適用する代わりに、選択属性と非選択属性を、例えば、属性表示画面２００の左半分と右半分のように、属性表示画面２００上の異なる領域に配置してもよい。 In the above description, the attributes are arranged on the attribute display screen 200 in the order according to the first rule and the second rule. However, the output unit 180 is not limited to this, and instead of applying the first rule, the output unit 180 displays the selection attribute and the non-selection attribute, for example, the attribute display screen 200 like the left half and the right half of the attribute display screen 200. You may arrange | position in the upper different area | region.

また、上述の説明では、アルゴリズムによる属性選択の後で分析者等による属性選択が行われた。しかしながら、これに限らず、分析者等による属性選択の後でアルゴリズムによる属性選択を行う、アルゴリズムによる属性選択の途中で分析者等による属性選択を行う等、アルゴリズムによる属性選択と分析者等他による属性選択とを他の順序で実行してもよい。 In the above description, attribute selection by an analyst or the like is performed after attribute selection by an algorithm. However, not limited to this, attribute selection by the algorithm after the attribute selection by the analyst, etc., attribute selection by the analyst etc. during attribute selection by the algorithm, etc. The attribute selection may be performed in another order.

また、上述の説明では、選択操作として、現在の選択属性の非選択属性への変更（属性削除）、及び、現在の非選択属性の選択属性への変更（属性追加）が行われた。しかしながら、これに限らず、選択操作として、現在の選択属性の非選択属性への変更と同時に現在の非選択属性の選択属性への変更（属性置換）が行われてもよい。 In the above description, as the selection operation, the current selection attribute is changed to a non-selection attribute (attribute deletion), and the current non-selection attribute is changed to a selection attribute (attribute addition). However, the present invention is not limited to this, and as a selection operation, a change (attribute replacement) of the current non-selection attribute to the selection attribute may be performed simultaneously with the change of the current selection attribute to the non-selection attribute.

この場合、入力部１７０は、例えば、モードの切り替え（トグルボタン等の利用）や、キーボードコンビネーション（シフトキー等を押下しながらマウスをクリック）により、属性置換を、属性削除や属性追加と区別する。 In this case, for example, the input unit 170 distinguishes attribute replacement from attribute deletion or attribute addition by switching modes (using toggle buttons or the like) or keyboard combinations (clicking the mouse while pressing a shift key or the like).

入力部１７０は、属性置換が指定された場合に、クリックされた選択属性（置換元属性）を非選択属性へ変更し、クリックされた非選択属性（置換先属性）を選択属性へ変更する。 When attribute replacement is designated, the input unit 170 changes the clicked selection attribute (replacement source attribute) to a non-selection attribute, and changes the clicked non-selection attribute (replacement destination attribute) to a selection attribute.

属性置換が指定された場合は、選択属性や非選択属性がクリックされる度に、上述のステップＳ１０３〜Ｓ１１２と同様の処理により、属性表示画面２００上の属性の配置が更新されてもよい。また、属性がクリックされる度に属性の配置を更新する代わりに、置換元属性と置換先属性の組み合わせが決定された後で、属性表示画面２００上の属性の配置が更新されてもよい。 When attribute replacement is designated, the arrangement of attributes on the attribute display screen 200 may be updated by the same processing as steps S103 to S112 described above each time a selected attribute or non-selected attribute is clicked. Further, instead of updating the arrangement of attributes every time the attribute is clicked, the arrangement of attributes on the attribute display screen 200 may be updated after the combination of the replacement source attribute and the replacement destination attribute is determined.

図１９は、本発明の第１の実施の形態における、属性表示画面２００の遷移の他の例を示す図である。 FIG. 19 is a diagram showing another example of the transition of the attribute display screen 200 in the first exemplary embodiment of the present invention.

例えば、図１９の状態（ａ）の属性表示画面２００上で、属性置換を指定して、選択属性「人口」と非選択属性「平均年齢が」がクリックされた場合、出力部１８０は、図１９の状態（ｂ）の属性表示画面２００を表示する。統計量変化算出部１４０は、新たな選択属性「男性比率」、「空室率」、「平均年齢」、及び、非選択属性「人口」、「世帯数」、「平均年齢」、「既婚率」、「物件数」、「平均年収」の各々について評価関数の値を算出する。そして、出力部１８０は、算出された評価関数の値をもとに、例えば、図１９の状態（ｃ）のように、属性表示画面２００上の属性を再配置する。 For example, when the attribute replacement is designated on the attribute display screen 200 in the state (a) in FIG. 19 and the selection attribute “population” and the non-selection attribute “average age is” are clicked, the output unit 180 displays The attribute display screen 200 in state 19 (b) is displayed. The statistic change calculation unit 140 includes new selection attributes “male ratio”, “vacancy rate”, “average age”, and non-selection attributes “population”, “number of households”, “average age”, “married rate” ”,“ Number of properties ”, and“ average annual income ”, the value of the evaluation function is calculated. Then, the output unit 180 rearranges the attributes on the attribute display screen 200 based on the calculated value of the evaluation function, for example, as in the state (c) of FIG.

次に、本発明の第１の実施の形態の特徴的な構成を説明する。図１は、本発明の第１の実施の形態の特徴的な構成を示すブロック図である。 Next, a characteristic configuration of the first exemplary embodiment of the present invention will be described. FIG. 1 is a block diagram showing a characteristic configuration of the first embodiment of the present invention.

図１を参照すると、データ分析システム１００（情報処理システム）は、統計量変化算出部１４０（算出部）、及び、出力部１８０を含む。統計量変化算出部１４０は、複数の属性の内の現在の選択属性の集合を用いて生成されるモデルに係る統計量からの、複数の属性の各々に対する選択操作により得られる選択属性の集合を用いて生成されるモデルに係る統計量の変化を算出する。出力部１８０は、複数の属性の各々に対応づけて、算出された統計量の変化を出力する。 Referring to FIG. 1, the data analysis system 100 (information processing system) includes a statistic change calculation unit 140 (calculation unit) and an output unit 180. The statistic change calculation unit 140 calculates a set of selection attributes obtained by a selection operation for each of the plurality of attributes from a statistic related to a model generated using the current set of selection attributes of the plurality of attributes. The change of the statistic related to the model generated by the calculation is calculated. The output unit 180 outputs a change in the calculated statistic in association with each of the plurality of attributes.

本発明の第１の実施の形態によれば、統計的モデルを用いたデータ分析において、属性に対する操作を効率的に実行できる。その理由は、統計量変化算出部１４０が、現在の選択属性に対して生成されるモデルに係る統計量からの、選択操作後の選択属性に対して生成されるモデルの統計量の変化を算出し、出力部１８０が、複数の属性の各々について、統計量の変化を出力するためである。これにより、分析者等に対して、属性の選択操作を支援するための情報（各属性に対する選択操作後のモデルの統計量の変化）が提供され、分析者等は、アルゴリズムによる属性選択の結果を容易に操作できる。 According to the first embodiment of the present invention, it is possible to efficiently perform operations on attributes in data analysis using a statistical model. The reason is that the statistic change calculation unit 140 calculates the change in the statistic of the model generated for the selection attribute after the selection operation from the statistic related to the model generated for the current selection attribute. This is because the output unit 180 outputs a change in statistics for each of the plurality of attributes. This provides the analyst etc. with information for supporting the attribute selection operation (changes in model statistics after the selection operation for each attribute). Can be operated easily.

（第２の実施の形態）
次に、本発明の第２の実施の形態について説明する。 (Second Embodiment)
Next, a second embodiment of the present invention will be described.

本発明の第１の実施の形態では、分析者等により属性に対する選択操作が行われると、新たな選択属性の集合や、選択操作によって得られる選択属性の集合について、評価関数の値が再計算され、属性表示画面２００が更新された。しかしながら、評価関数の再計算では、これらの選択属性の集合についてのモデルの生成、統計量の算出を繰り返し実行する必要がある。このため、属性数やデータ数によっては計算時間が膨大となり、分析者に負担が生じる。 In the first embodiment of the present invention, when an attribute selection operation is performed by an analyst or the like, the value of the evaluation function is recalculated for a new selection attribute set or a selection attribute set obtained by the selection operation. The attribute display screen 200 has been updated. However, in the recalculation of the evaluation function, it is necessary to repeatedly generate a model and calculate a statistic for the set of selected attributes. For this reason, depending on the number of attributes and the number of data, the calculation time becomes enormous, resulting in a burden on the analyst.

本発明の第２の実施の形態では、上述のステップＳ１０３、Ｓ１０５、Ｓ１０９の統計量の算出において、統計量変化算出部１４０が、選択属性の集合と算出された統計量との組を、統計量記憶部（図示せず）にキャッシュする。そして、統計量変化算出部１４０は、分析者等による選択操作後の評価関数の再計算において、当該キャッシュを用いる。すなわち、統計量変化算出部１４０は、評価関数の再計算（ステップＳ１０３〜Ｓ１１１）において、統計量を算出すべき選択属性の集合をキーとして、統計量記憶部にキャッシュされた統計量を検索する。統計量が存在する場合、統計量変化算出部１４０は、当該統計量を用いて、評価関数の値を再計算する。 In the second embodiment of the present invention, in the calculation of the statistic in the above-described steps S103, S105, and S109, the statistic change calculation unit 140 uses the set of the selected attribute set and the calculated statistic as a statistic. Cache in a quantity storage (not shown). Then, the statistic change calculation unit 140 uses the cache in recalculation of the evaluation function after the selection operation by the analyst or the like. That is, the statistic change calculation unit 140 searches the statistic cached in the statistic storage unit by using a set of selection attributes for which the statistic is to be calculated as a key in the recalculation of the evaluation function (steps S103 to S111). . When the statistic exists, the statistic change calculation unit 140 recalculates the value of the evaluation function using the statistic.

本発明の第２の実施の形態によれば、分析者等による選択操作後の評価関数の再計算の時間を低減できる。その理由は、統計量変化算出部１４０が、モデルを生成するために用いた属性の集合と算出された統計量との組をキャッシュし、分析者等による選択操作後の評価関数の再計算で、キャッシュされた統計量を利用するためである。これにより、分析者等の選択操作後の評価関数の再計算に伴う負担が軽減される。 According to the second embodiment of the present invention, it is possible to reduce the time for recalculating the evaluation function after the selection operation by an analyst or the like. The reason is that the statistic change calculation unit 140 caches a set of the attribute set used to generate the model and the calculated statistic, and recalculates the evaluation function after the selection operation by an analyst or the like. This is because the cached statistics are used. Thereby, the burden accompanying recalculation of the evaluation function after the selection operation by the analyst or the like is reduced.

（第３の実施の形態）
次に、本発明の第３の実施の形態について説明する。 (Third embodiment)
Next, a third embodiment of the present invention will be described.

本発明の第３の実施の形態では、分析者の再計算に伴う負担を軽減する他の方法として、出力部１８０が、分析者に対して、再計算の課程を提示する。 In the third embodiment of the present invention, the output unit 180 presents the recalculation process to the analyst as another method for reducing the burden associated with the recalculation of the analyst.

出力部１８０は、上述のステップＳ１０３〜Ｓ１１１により、評価関数の値の再計算が行われているときに、評価関数の値の算出が完了した属性について、その旨を表示する。 When the recalculation of the evaluation function value is performed in steps S103 to S111 described above, the output unit 180 displays that fact for the attribute for which the calculation of the evaluation function value has been completed.

例えば、出力部１８０は、評価関数の値の算出が完了した属性については、属性名を示す矩形の色や枠の太さを、完了していない属性と異なる色や枠の太さで表示する。また、出力部１８０は、評価関数の値の算出が完了した属性に、旗マーク等のアイコンを付与してもよい。 For example, for the attribute for which the calculation of the value of the evaluation function has been completed, the output unit 180 displays the color of the rectangle indicating the attribute name and the thickness of the frame in a color different from the attribute that has not been completed and the thickness of the frame. . Further, the output unit 180 may add an icon such as a flag mark to the attribute for which the calculation of the value of the evaluation function has been completed.

また、出力部１８０は、評価関数の値の算出が完了した属性から、上述の第１のルール、及び、第２のルールに従った順番に従って再配置（並べ替え）を行った後で、完了していない属性を、完了した属性とは別の位置に再配置してもよい。この場合、属性の評価関数の値の算出が完了する毎に属性の並び替えが発生する。また、並び替えによる各属性の順番の変更が視覚的に把握できるように、属性表示画面２００上での属性の位置の移動がアニメーションで表示されてもよい。 The output unit 180 completes the rearrangement (rearrangement) in accordance with the order according to the first rule and the second rule from the attribute for which the calculation of the value of the evaluation function is completed. The attribute that has not been performed may be rearranged at a position different from the completed attribute. In this case, the attribute rearrangement occurs every time the calculation of the value of the attribute evaluation function is completed. Further, the movement of the attribute position on the attribute display screen 200 may be displayed as an animation so that the change in the order of the attributes due to the rearrangement can be visually grasped.

また、入力部１７０は、評価関数の値の算出が完了していない属性があっても、完了した属性から、選択操作を受け付けてもよい。上述の第１のルール、及び、第２のルールに従って属性が配置されている場合、上述のように、選択属性と非選択属性の境目（統計量の変化が最小となる位置）に近い属性に対して選択操作が行われることが多い。したがって、統計量変化算出部１４０は、統計量の変化が最小となる属性を予測し、当該属性から順番に、評価関数の値を算出することが望ましい。 Further, the input unit 170 may accept a selection operation from the completed attribute even if there is an attribute for which the calculation of the value of the evaluation function has not been completed. When the attributes are arranged according to the first rule and the second rule described above, as described above, the attribute is close to the boundary between the selected attribute and the non-selected attribute (position where the change in statistics is minimized). On the other hand, a selection operation is often performed. Therefore, it is desirable that the statistic change calculation unit 140 predicts an attribute that minimizes the change in the statistic, and calculates the value of the evaluation function in order from the attribute.

ここで、統計量によっては、より計算時間の小さい近似計算によって、統計量を算出できることが知られている。例えば、統計量としてモデルの尤度を用いる場合、尤度は、ベクトルの勾配を用いて近似できることが、非特許文献１に開示されている。統計量変化算出部１４０は、このような近似計算を用いて評価関数の値を算出する。そして、統計量変化算出部１４０は、近似計算により算出された評価関数の値から、選択属性と非選択属性の境目に近い属性を推定し、当該境目に近いと推定された属性から、上述のステップＳ１０３〜Ｓ１１１による評価関数の算出を行う。 Here, depending on the statistic, it is known that the statistic can be calculated by an approximate calculation with a shorter calculation time. For example, Non-Patent Document 1 discloses that when the likelihood of a model is used as a statistic, the likelihood can be approximated using a vector gradient. The statistic change calculation unit 140 calculates the value of the evaluation function using such approximate calculation. Then, the statistic change calculation unit 140 estimates the attribute close to the boundary between the selected attribute and the non-selected attribute from the value of the evaluation function calculated by the approximate calculation, and calculates the above-described attribute from the attribute estimated to be close to the boundary. The evaluation function is calculated in steps S103 to S111.

本発明の第３の実施の形態によれば、分析者等による選択操作後の評価関数の再計算に伴う、分析者等の負担を低減できる。その理由は、評価関数の値の再計算が行われているときに、出力部１８０が、評価関数の値の算出が完了した属性の提示や、評価関数の値の算出が完了した属性からの再配置を行うためである。 According to the third embodiment of the present invention, it is possible to reduce the burden on the analyst and the like due to recalculation of the evaluation function after the selection operation by the analyst and the like. The reason is that when the evaluation function value is being recalculated, the output unit 180 presents the attribute for which the calculation of the evaluation function value has been completed, or the attribute from which the calculation of the evaluation function value has been completed. This is for rearrangement.

以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

１００データ分析システム
１０１ＣＰＵ
１０２記憶デバイス
１０３通信デバイス
１０４入力デバイス
１０５出力デバイス
１１０データ記憶部
１２０初期属性選択部
１３０選択属性記憶部
１４０統計量変化算出部
１５０モデル生成部
１６０分析部
１７０入力部
１８０出力部
２００属性表示画面
２０１マウス
２０２ポップアップ領域
２０３表示領域
２０４グラフ 100 data analysis system 101 CPU
102 storage device 103 communication device 104 input device 105 output device 110 data storage unit 120 initial attribute selection unit 130 selection attribute storage unit 140 statistic change calculation unit 150 model generation unit 160 analysis unit 170 input unit 180 output unit 200 attribute display screen 201 Mouse 202 Pop-up area 203 Display area 204 Graph

Claims

Generated using a set of selection attributes obtained by a selection operation for each of the plurality of attributes from a predetermined statistic based on a predetermined model generated using a set of currently selected attributes among the plurality of attributes Calculating means for calculating a change in the predetermined statistic relating to the predetermined model to be performed;
Output means for outputting the calculated change in the predetermined statistic in association with each of the plurality of attributes;
Information processing system with

The output means outputs, on the output screen, the change in the predetermined statistic using at least one of a position, a size, a color, and a shape according to the amount of the change;
The information processing system according to claim 1.

The selection operation for each of the plurality of attributes includes at least one of a change of the current selection attribute to each non-selection attribute and a change of the current non-selection attribute to each selection attribute.
The information processing system according to claim 1 or 2.

The selection operation for each of the plurality of attributes further includes changing the current non-selected attribute to each selected attribute when any of the current selected attributes is changed to a non-selected attribute, and the current Including at least one of changes to each non-selected attribute of the current selected attribute when any non-selected attribute is changed to a selected attribute;
The information processing system according to claim 3.

The output means changes the predetermined statistic with respect to each of the currently selected attributes, and changes the predetermined statistic with respect to each of the current non-selected attributes in ascending or descending order of the amount of the change. Are arranged on the output screen in the opposite order of the current selection attribute of the amount of the change,
The information processing system according to claim 3 or 4.

The output means is generated using a predicted value based on the predetermined model generated using the current set of selected attributes and a set of selected attributes obtained by a selection operation for each of the plurality of attributes. Outputting a predicted value according to the predetermined model;
The information processing system according to claim 1.

Furthermore, an input unit that receives a selection operation for each of the plurality of attributes and updates the current set of selection attributes with a set of selection attributes obtained by the selection operation;
The information processing system according to claim 1.

The current selection attribute is selected in advance by an algorithm based on the predetermined statistic.
The information processing system according to claim 1.

The predetermined statistic is one of an information criterion value related to the predetermined model, a likelihood related to the predetermined model, and a prediction error due to the predetermined model.
The information processing system according to claim 1.

Furthermore, the storage unit stores the predetermined statistic related to the predetermined model calculated by the calculation unit in association with a set of selection attributes used for the calculation of the statistic.
The calculation means uses the predetermined statistic corresponding to a set of selection attributes obtained by a selection operation for each of the plurality of attributes, or the set of current selection attributes, stored in the storage means. Calculating a change in the predetermined statistic,
The information processing system according to claim 1.

The output means displays, on the output screen, whether or not the calculation of the predetermined statistic change by the calculation means has been completed for each of the plurality of attributes.
The information processing system according to claim 1.

Generated using a set of selection attributes obtained by a selection operation for each of the plurality of attributes from a predetermined statistic based on a predetermined model generated using a set of currently selected attributes among the plurality of attributes Calculating a change in the predetermined statistic relating to the predetermined model to be performed;
Outputting the calculated change in the predetermined statistic in association with each of the plurality of attributes;
Information processing method.

On the computer,
Generated using a set of selection attributes obtained by a selection operation for each of the plurality of attributes from a predetermined statistic based on a predetermined model generated using a set of currently selected attributes among the plurality of attributes Calculating a change in the predetermined statistic relating to the predetermined model to be performed;
Outputting the calculated change in the predetermined statistic in association with each of the plurality of attributes;
A program that executes processing.