JP2013025349A

JP2013025349A - Feature selection device, method, and program

Info

Publication number: JP2013025349A
Application number: JP2011156365A
Authority: JP
Inventors: Takayuki Nakada; 貴之中田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-07-15
Filing date: 2011-07-15
Publication date: 2013-02-04

Abstract

PROBLEM TO BE SOLVED: To provide technique to select an appropriate combination of features in a practical time under a condition that relation between variables is not restricted to linear correlation, even if the number of combinations of features exponentially increases when the number of features is large.SOLUTION: A feature selection device comprises: non-dense relation learning means for learning non-dense relation in order to increase a value of an evaluation function calculated on the basis of observation data including a criterion variable and explanatory variables and non-dense relation between the criterion variable and the respective explanatory variables; and prediction function learning means for learning a prediction function in which every variable has correlation with any of variables, on the basis of the learned non-dense relation.

Description

本願発明は特徴選択装置、方法、およびプログラムに関する。 The present invention relates to a feature selection apparatus, method, and program.

目的変数Ｙと、説明変数Ｘ１、・・・、Ｘｐから成る回帰問題を考える。線形モデルを用いた場合、予測関数は、下記式で示すようになる。
ｙ^＝β_１Ｘ_１＋β_２Ｘ_２＋・・・、β_ｐＸ_ｐ
最小二乗推定を用いると、パラメータの推定量は、Ｘ＝（Ｘ_１、・・・、Ｘ_ｐ）、β＝（β_１、・・・、β_ｐ）^Ｔとして、
β＝（Ｘ^ＴＸ）^−１Ｘ^Ｔ _ｙ
と求まる。 Consider a regression problem consisting of an objective variable Y and explanatory variables X1,. When a linear model is used, the prediction function is expressed by the following equation.
y ^ = β ₁ X ₁ + β ₂ X ₂ +..., β _p X _p
Using least squares estimation, the parameter estimators are: X = (X ₁ ,..., X _p ), β = (β ₁ ,..., Β _p ) ^T
β = (X ^T X) ⁻¹ X ^T _y
It is obtained.

しかし、上記で求めたパラメータをそのまま用いて回帰分析すると、目的変数の推定に全ての説明変数を用いることになり、一般的に、
・目的変数の推定精度が最良でないことがある。
・目的変数と説明変数との関係が分かりにくい。
という問題点がある。この問題に対しては、いくつかの説明変数を、目的変数の推定に用いる変数選択を行えばよい。これは、β＝（β_１、・・・、β_ｐ）のうちのいくつかの要素を０とすることに等しい。ここで、目的変数の推定に重要となる説明変数を特徴と呼び、目的変数の推定に用いる変数の選択を特徴選択と呼ぶ。 However, if regression analysis is performed using the parameters obtained above as they are, all explanatory variables will be used to estimate the objective variable.
・ The estimation accuracy of the objective variable may not be the best.
・ The relationship between objective variables and explanatory variables is difficult to understand.
There is a problem. To solve this problem, it is only necessary to select some explanatory variables for use in estimating the objective variable. This is equivalent to setting some elements of β = (β ₁ ,..., Β _p ) to 0. Here, an explanatory variable that is important for estimation of the objective variable is called a feature, and selection of a variable used for estimation of the objective variable is called feature selection.

非特許文献１の３．３．１節には、特徴選択システムの一例が記載されている。この特徴選択システムは、特徴組み合わせ手段と、関数学習手段と、精度評価手段と、最適特徴選択手段とを有している。特徴組み合わせ手段は、全ての特徴の組み合わせを生成する。関数学習手段は、特徴組み合わせ手段で得られた全ての特徴組み合わせに対して、特徴の組み合わせから目的変数への予測関数を学習する。精度評価手段は、学習された予測関数の精度を評価する。最適特徴選択手段は、最も精度のよい特徴の組み合わせを選択する。 Non-Patent Document 1 section 3.3.1 describes an example of a feature selection system. This feature selection system includes feature combination means, function learning means, accuracy evaluation means, and optimum feature selection means. The feature combination means generates a combination of all features. The function learning means learns the prediction function from the feature combination to the objective variable for all the feature combinations obtained by the feature combination means. The accuracy evaluation means evaluates the accuracy of the learned prediction function. The optimum feature selection unit selects the combination of features with the highest accuracy.

非特許文献１の３．３．２節には、特徴選択システムの別の一例が記載されている。この特徴選択システムは、特徴組み合わせ手段と、関数学習手段と、精度評価手段と、最適特徴選択手段とを有する。特徴組み合わせ手段は、現時点で得られている特徴組み合わせに対して、新たに特徴を追加し、或いは、特徴を削減した特徴組み合わせを生成する。関数学習手段は、特徴組み合わせ手段で生成された特徴組み合わせに対して、特徴の組み合わせから目的変数への予測関数を学習する。精度評価手段は、学習された予測関数の精度を評価する。最適特徴選択手段は、最も精度のよい特徴の組み合わせを選択する。 Non-Patent Document 1 Section 3.3.2 describes another example of a feature selection system. This feature selection system includes feature combination means, function learning means, accuracy evaluation means, and optimum feature selection means. The feature combination means adds a new feature to the feature combination obtained at the present time or generates a feature combination with reduced features. The function learning means learns a prediction function from the feature combination to the objective variable with respect to the feature combination generated by the feature combination means. The accuracy evaluation means evaluates the accuracy of the learned prediction function. The optimum feature selection unit selects the combination of features with the highest accuracy.

Trevor Hastie, Robert Tibshirani, Jerome Friedman, “The Elements ofStatistical Learning,” Springer, 2009.Trevor Hastie, Robert Tibshirani, Jerome Friedman, “The Elements of Statistical Learning,” Springer, 2009. 戸坂，吉羽，「コピュラの金融実務での具体的な活用方法の解説」，金融研究，日本銀行金融研究所，2005年12月．Tosaka, Yoshiba, “Explanation of concrete utilization method of copula in financial practice”, Financial Research, Bank of Japan Financial Research Institute, December 2005. S. Kirshner, “Learning with tree-averaged densities anddistributions,” Advances in Neural Information Processing Systems, NIPS 2007,December 2007.S. Kirshner, “Learning with tree-averaged branches and distributions,” Advances in Neural Information Processing Systems, NIPS 2007, December 2007. G. Elidan, “Copula Bayesian Networks,” Advances in NeuralInformation Processing Systems, NIPS 2010, December 2010.G. Elidan, “Copula Bayesian Networks,” Advances in Neural Information Processing Systems, NIPS 2010, December 2010.

非特許文献１に記載の特徴選択システムの第１の問題点は、特徴の数が多い場合に、全ての組み合わせに対して評価することができないという点である。その理由は、特徴の数が多い場合に、指数関数的に増加する特徴の組み合わせに対して、最適な特徴の組み合わせを選択する方法が十分に考慮されていないためである。 The first problem of the feature selection system described in Non-Patent Document 1 is that evaluation cannot be performed for all combinations when the number of features is large. The reason is that, when the number of features is large, a method of selecting an optimum feature combination for a combination of features that exponentially increases is not sufficiently considered.

非特許文献１に記載の特徴選択システムの第２の問題点は、目的変数と特徴との関係が線形相関に限られている点である。その理由は、用いられているモデルにおいて非線形相関を表現する方法が十分に考慮されていないためである。 The second problem of the feature selection system described in Non-Patent Document 1 is that the relationship between the objective variable and the feature is limited to linear correlation. This is because a method for expressing a nonlinear correlation in the model used is not sufficiently considered.

本発明は、上記に鑑み、変数間の関係が線形相関に限定されないという条件のもと、特徴の数が大きい場合に指数関数的に増加する特徴の組み合わせに対しても、現実的な時間で、適切な特徴の組み合わせを選択して予測関数の生成が可能な技術を提供することにある。 In view of the above, the present invention provides a realistic time for a combination of features that increase exponentially when the number of features is large under the condition that the relationship between variables is not limited to linear correlation. Another object of the present invention is to provide a technique capable of generating a prediction function by selecting an appropriate combination of features.

上記課題を解決するための本願発明は、目的変数及び説明変数を含む観測データと前記目的変数及び説明変数の各々との間の疎な関係性とに基づいて計算される評価関数の値が向上するように、前記疎な関係性を学習する疎な関係性学習手段と、前記学習された疎な関係性に基づいて、すべての変数がいずれかの変数と相関関係が結ばれるような予測関数を学習する予測関数学習手段とを備えることを特徴とする。 The present invention for solving the above-mentioned problems improves the value of the evaluation function calculated based on the observation data including the objective variable and the explanatory variable and the sparse relationship between the objective variable and each of the explanatory variable. The sparse relationship learning means for learning the sparse relationship, and a prediction function in which all variables are correlated with one of the variables based on the learned sparse relationship And a prediction function learning means for learning.

上記課題を解決するための本願発明は、目的変数及び説明変数を含む観測データと前記目的変数及び説明変数の各々との間の疎な関係性とに基づいて計算される評価関数の値が向上するように、前記疎な関係性を学習する疎な関係性学習ステップと、前記学習された疎な関係性に基づいて、すべての変数がいずれかの変数と相関関係が結ばれるような予測関数を学習する予測関数学習ステップとを有することを特徴とする。 The present invention for solving the above-mentioned problems improves the value of the evaluation function calculated based on the observation data including the objective variable and the explanatory variable and the sparse relationship between the objective variable and each of the explanatory variable. A sparse relationship learning step for learning the sparse relationship, and a prediction function in which all variables are correlated with any one of the variables based on the learned sparse relationship And a prediction function learning step for learning.

上記課題を解決するための本願発明は、予測関数生成装置のプログラムであって、前記プログラムは、前記予測関数生成装置を、目的変数及び説明変数を含む観測データと前記目的変数及び説明変数の各々との間の疎な関係性とに基づいて計算される評価関数の値が向上するように、前記疎な関係性を学習する疎な関係性学習手段と、前記学習された疎な関係性に基づいて、すべての変数がいずれかの変数と相関関係が結ばれるような予測関数を学習する予測関数学習手段として機能させることを特徴とする。 The present invention for solving the above-mentioned problems is a program of a prediction function generation device, wherein the program uses the prediction function generation device for each of observation data including an objective variable and an explanatory variable, and the objective variable and the explanatory variable. The sparse relationship learning means for learning the sparse relationship and the learned sparse relationship so that the value of the evaluation function calculated based on the sparse relationship between On the basis of this, it is characterized by functioning as a prediction function learning means for learning a prediction function in which all variables are correlated with any one of the variables.

本発明によると、特徴の数が大きい場合に指数関数的に増加する特徴の組み合わせに対しても、現実的な時間で、適切な特徴の組み合わせを選択して予測関数の生成することができる。 According to the present invention, it is possible to generate a prediction function by selecting an appropriate combination of features in a realistic time even for a combination of features that exponentially increases when the number of features is large.

本発明の一実施形態の予測関数生成装置を示すブロック図。The block diagram which shows the prediction function production | generation apparatus of one Embodiment of this invention. 動作手順を示すフローチャート。The flowchart which shows an operation | movement procedure. （ａ）及び（ｂ）は、関係性を示すモデルの図。(A) And (b) is a figure of the model which shows a relationship.

以下、図面を参照し、本発明の実施の形態を詳細に説明する。図１は、本発明の一実施形態の予測関数生成装置（特徴選択システム）を示している。特徴選択システム１００は、疎な関係性学習手段１０２、及び、予測関数学習手段１０３を有する。特徴選択システム１００は、プログラム動作で動作するコンピュータシステムで構成できる。特徴選択システム１００内の各部の機能は、コンピュータが所定のプログラムに従って動作することで実現可能である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 shows a prediction function generation device (feature selection system) according to an embodiment of the present invention. The feature selection system 100 includes sparse relationship learning means 102 and prediction function learning means 103. The feature selection system 100 can be configured by a computer system that operates by a program operation. Functions of each unit in the feature selection system 100 can be realized by a computer operating according to a predetermined program.

入力装置１０１は、計算対象となるデータを取り込む。入力装置１０１が取り込むデータは、説明変数Ｘと目的変数Ｙとを含む。入力装置１０１は、取り込んだデータを、特徴選択システム１００へ入力する。観測データ記憶部１１１は、入力装置１０１から入力された観測データを記憶する。 The input device 101 takes in data to be calculated. Data captured by the input device 101 includes an explanatory variable X and an objective variable Y. The input device 101 inputs the acquired data to the feature selection system 100. The observation data storage unit 111 stores observation data input from the input device 101.

疎な関係性学習手段１０２は、観測データ記憶部１１１から観測データを読み出し、観測データに対して、疎な関係性を学習する。ここで、疎な関係性とは、データ間の全ての要素の間に相関があるわけではなく、要素間の一部には相関がないような関係を指す。疎な関係性学習手段１０２は、学習した疎な関係性のモデルを表すデータを、学習結果記憶部１１２に記憶する。なお、観測データ記憶部１１１及び学習結果記憶部１１２は、特徴選択システム１００から参照可能であればよく、特徴選択システム１００の内部にあっても、外部にあってもよい。 The sparse relationship learning unit 102 reads the observation data from the observation data storage unit 111 and learns a sparse relationship with the observation data. Here, a sparse relationship refers to a relationship in which there is no correlation between all elements between data, and there is no correlation between some elements. The sparse relationship learning unit 102 stores data representing the learned sparse relationship model in the learning result storage unit 112. Note that the observation data storage unit 111 and the learning result storage unit 112 are only required to be referenced from the feature selection system 100, and may be inside or outside the feature selection system 100.

また、疎な関係性学習手段１０２は、観測データ記憶部１１１から観測データを読み出すと共に、学習結果記憶部１１２から学習結果を読み出し、観測データと、学習した疎な関係性とを基に、適切に設定した評価関数の値を計算する。ここで、評価関数は、入力データの分布と、学習モデルの分布との間の距離を定量化する関数である。 The sparse relationship learning unit 102 reads the observation data from the observation data storage unit 111 and also reads the learning result from the learning result storage unit 112. Based on the observation data and the learned sparse relationship, the sparse relationship learning unit 102 The value of the evaluation function set to is calculated. Here, the evaluation function is a function for quantifying the distance between the distribution of the input data and the distribution of the learning model.

疎な関係性学習手段１０２は、計算した評価関数の値が向上するよう、疎な関係性を学習する。特徴選択システム１００は、疎な関係性の評価と学習とを、学習結果が収束するまで繰り返し行う。疎な関係性学習手段１０２は、学習結果が収束すると、学習が終了した旨を予測関数学習手段１０３に通知する。予測関数学習手段１０３は、学習が終了すると学習結果記憶部１１２から疎な関係性の学習結果を読み出し、疎な関係性学習手段１０２で学習した疎な関係性を基に、予測関数を学習する。予測関数学習手段１０３は、学習した予測関数を、出力装置１０４を介して出力する。 The sparse relationship learning means 102 learns a sparse relationship so that the value of the calculated evaluation function is improved. The feature selection system 100 repeatedly performs sparse relationship evaluation and learning until the learning result converges. When the learning result converges, the sparse relationship learning unit 102 notifies the prediction function learning unit 103 that learning has ended. When the learning is completed, the prediction function learning unit 103 reads the learning result of the sparse relationship from the learning result storage unit 112, and learns the prediction function based on the sparse relationship learned by the sparse relationship learning unit 102. . The prediction function learning unit 103 outputs the learned prediction function via the output device 104.

図２は、動作手順を示している。入力装置１０１は、計算対象のデータ（観測データ）を取り込む（ステップＳ１）。疎な関係性学習手段１０２は、観測データと、学習した疎な関係性とに基づいて、評価関数の値を計算する（ステップＳ２）。学習結果記憶部１１２は、疎な関係性学習手段１０２が学習を行う前に、疎な関係性の初期値を記憶している。疎な関係性学習手段１０２は、初回実行時にステップＳ２で評価を行う際は、学習結果記憶部１１２から、任意に設定された疎な関係性の初期値を読み出し、その初期値を用いて評価関数の値を計算する。疎な関係性学習手段１０２は、評価関数の値の計算に際して、直接は観測されない確率変数（隠れ変数）の推定を行う。 FIG. 2 shows an operation procedure. The input device 101 takes in data to be calculated (observation data) (step S1). The sparse relationship learning unit 102 calculates the value of the evaluation function based on the observation data and the learned sparse relationship (step S2). The learning result storage unit 112 stores an initial value of a sparse relationship before the sparse relationship learning unit 102 performs learning. The sparse relationship learning means 102 reads the initial value of the arbitrarily set sparse relationship from the learning result storage unit 112 and performs an evaluation using the initial value when performing the evaluation in step S2 at the first execution. Calculate the value of the function. The sparse relationship learning means 102 estimates random variables (hidden variables) that are not directly observed when calculating the value of the evaluation function.

疎な関係性学習手段１０２は、計算した評価関数の値が高くなるように、疎な関係性を学習する（Ｓ３）。疎な関係性学習手段１０２は、ステップＳ２で疎な関係性を評価する際に、以前の疎な関係性の学習結果を基に、直接は観測されない確率変数を推定しており、疎な関係性学習手段１０２は、ステップＳ３では、以前の学習結果とは異なる疎な関係性を学習する。特徴選択システム１００は、ステップＳ２の学習結果の評価と、ステップＳ３の学習とを、学習結果が収束するまで繰り返し行う。 The sparse relationship learning means 102 learns the sparse relationship so that the calculated evaluation function value is high (S3). When the sparse relationship learning means 102 evaluates the sparse relationship in step S2, the sparse relationship learning means 102 estimates a random variable that is not directly observed based on the learning result of the previous sparse relationship. In step S3, the sex learning means 102 learns a sparse relationship different from the previous learning result. The feature selection system 100 repeatedly performs the evaluation of the learning result in step S2 and the learning in step S3 until the learning result converges.

予測関数学習手段１０３は、学習結果が収束すると、学習結果記憶部１１２から学習結果を読み出し、疎な関係性学習手段１０２で学習した疎な関係性に基づいて、予測関数を学習する（ステップＳ４）。予測関数学習手段１０２は、ステップＳ４で学習した予測関数を、出力装置１０４を介して出力する（ステップＳ５）。 When the learning result converges, the prediction function learning unit 103 reads the learning result from the learning result storage unit 112, and learns the prediction function based on the sparse relationship learned by the sparse relationship learning unit 102 (step S4). ). The prediction function learning unit 102 outputs the prediction function learned in step S4 via the output device 104 (step S5).

Ｓ２及びＳ３の疎な関係性の評価及び学習について、より詳細に説明する。疎な関係性を表すモデルには、例えば、Chow-Liu tree（最大全域木の確率モデル）の混合モデルを用いることができる。これは変数間の関連性をモデル化する場合に、強い相関を示す変数間に対して同時分布を考えるという方法である。Chow-Liu treeの混合モデルは、確率モデルとして、下記式１で表される。

The evaluation and learning of the sparse relationship between S2 and S3 will be described in more detail. For example, a mixed model of Chow-Liu tree (maximum spanning tree probability model) can be used as the model representing the sparse relationship. This is a method of considering the simultaneous distribution between variables showing a strong correlation when modeling the relationship between variables. The Chow-Liu tree mixed model is represented by the following formula 1 as a probability model.

数１において、Ｘは観測データ、ｐ（Ｘ）は観測データの確率分布、Ｋはコンポーネント数、πｉはコンポーネントの混合確率、ｐ（Ｔｉ）はコンポーネントｉにおける木構造（疎な関係性）の確率分布、ｐ（θｉ｜Ｔｉ）は、コンポーネントｉにおける与えられた木構造の下でのパラメータの確率分布、ｐ（Ｘ｜θｉ，Ｔｉ）は、コンポーネントｉにおける与えられた木構造とパラメータの下での観測データの確率分布を表す。ここで、木構造は、大きな偏相関を持つ変数間に線を引き、残りの偏相関をゼロにした構造である。パラメータは、例えば、各変数の平均と分散である。混合確率は、木を１つのコンポーネントとして見たときの、各木に対する重みである。 In Equation 1, X is the observed data, p (X) is the probability distribution of the observed data, K is the number of components, πi is the mixing probability of components, and p (Ti) is the probability of the tree structure (sparse relationship) in component i. The distribution, p (θi | Ti) is the probability distribution of the parameter under the given tree structure in component i, and p (X | θi, Ti) is under the given tree structure and parameter in component i. Represents the probability distribution of the observed data. Here, the tree structure is a structure in which a line is drawn between variables having a large partial correlation, and the remaining partial correlations are made zero. The parameter is, for example, the average and variance of each variable. The mixing probability is a weight for each tree when the tree is viewed as one component.

疎な関係性学習手段１０２は、目的変数と説明変数とが与えられると、それらを多変量の観測データとし、例えば上記の最大全域木の確率モデルを推定する。木構造、パラメータ、混合確率の推定が「疎な関係性の学習」にあたり、「疎な関係性の評価」では、それらが固定されたとき観測データが具体的にどのコンポーネントに属するかの推定を行う。疎な関係性評価では、観測データがどのコンポーネントに属するかを推定すると、入力されたデータの分布と、学習された疎な関係性のモデルを用いた分布との尤度の差（分布間の距離）を、評価関数の値として計算する。ここでは目的変数と説明変数との各々の間の変数間の関連性がループを形成しないよう制約をおきながら、強い相関を持つ変数間から順番に同時分布を与える。この同時分布はコピュラと呼ばれる関数を用いて表すことができる。このコピュラは非線形相関を考慮することができ、ケンドールのタウやスピアマンのローといった順位相関も表すことができる。また，分布の裾での依存関係を表す裾依存係数という指標もあり、これをもって相関の強さと考え特徴選択を行なうことも可能である。尚、上記確率モデルには非特許文献２〜４のいずれかに記載の手法を用いることができる。 When the objective variable and the explanatory variable are given, the sparse relationship learning means 102 uses them as multivariate observation data, and estimates the probability model of the maximum spanning tree, for example. Estimating the tree structure, parameters, and mixing probabilities is “learning sparse relationships”, and “evaluating sparse relationships” estimates the specific components to which observation data belongs when they are fixed. Do. In the sparse relationship evaluation, when the component to which the observed data belongs is estimated, the likelihood difference between the distribution of the input data and the distribution using the learned sparse relationship model (between distributions) Distance) is calculated as the value of the evaluation function. Here, the simultaneous distribution is given in order from among the variables having strong correlations while restricting the relationship between the objective variable and the explanatory variable so as not to form a loop. This simultaneous distribution can be expressed using a function called a copula. This copula can take into account non-linear correlations and can also represent rank correlations such as Kendall's tau and Spearman's low. In addition, there is an index called a tail dependency coefficient that represents a dependency relationship at the bottom of the distribution, and it is possible to select features based on the strength of the correlation. Note that the method described in any of Non-Patent Documents 2 to 4 can be used for the probability model.

特徴選択システム１００は、疎な関係性の評価と学習とを、学習結果が収束するまで行う。特徴選択システム１００は、例えば、学習回数を増やしても評価関数の値が変化しないとき、或いは、その変化が所定のしきい値よりも小さいとき、学習結果が収束したと判断して疎な関係性の評価と学習とを終了する。疎な関係性の評価と学習とを繰り返し行うことで、目的変数と説明変数全体とを含めた疎な関係性が求まる。この関係性に基づいて説明変数から目的変数への関数を学習するのが「予測関数の学習」にあたる。 The feature selection system 100 performs sparse relationship evaluation and learning until the learning result converges. For example, the feature selection system 100 determines that the learning result has converged when the value of the evaluation function does not change even when the number of learnings is increased, or when the change is smaller than a predetermined threshold, and the sparse relationship End sex assessment and learning. By repeatedly evaluating and learning the sparse relationship, a sparse relationship including the objective variable and the entire explanatory variable is obtained. Learning a function from an explanatory variable to an objective variable based on this relationship corresponds to “learning a prediction function”.

予測関数学習手段１０３は、この結果，目的変数と特徴全体を含めた疎な関係性が求まり、この疎な関係性を基に目的変数の予測に用いる特徴を選択し、予測関数を学習する。このときコピュラを用いた回帰式を構成することができ、予測関数として用いることができる。 As a result, the prediction function learning unit 103 obtains a sparse relationship including the objective variable and the entire feature, selects a feature used for prediction of the target variable based on this sparse relationship, and learns the prediction function. At this time, a regression equation using a copula can be constructed and used as a prediction function.

図３は、関係性を表すモデルを示している。Ｙは目的変数であり、Ｘ_１〜Ｘ_４は説明変数である。図３（ａ）は、非特許文献１で用いるモデルを示している。このモデルでは、各説明変数Ｘｉ、Ｘｊ（ｉ≠ｊ）間を全て繋いでおり、データ数が少ないと推定精度があまり上がらず、また、各Ｘｉ、Ｘｊ間に強い相関があると、計算が破たんすることがあった。図３（ｂ）は、本実施形態で用いるモデルを示している。入力されるデータとして、Ｙ、Ｘ_１〜Ｘ_４のあわせて５つの変数が入力される。この５つの変数のうち２変数間の組み合わせは５×４／２＝１０通りとなる。この１０通りの組み合わせ全てに対して、上述のＳ２及びＳ３の処理を経て、相関の大きさを求める。相関の大きい順番に変数間を線でつないでいく。ただし、この時にループができないようにする。すべての変数が少なくとも１本の線でつながれたら処理は終了です。そして、Ｓ４にて、Ｘ_１〜Ｘ_４のうちＹと直接つながれた変数をＹに大きな影響をあたえる特徴と判断して選択する。相関が強いということは、Ｘ_４の影響はＸ_３に、Ｘ_３の影響はＸ_２に含まれているので、Ｙを推定するときはＸ_２とＸ_１を用いれば良いということになる。 FIG. 3 shows a model representing the relationship. Y is an objective variable, and X _{1 to} X ₄ are explanatory variables. FIG. 3A shows a model used in Non-Patent Document 1. In this model, the explanatory variables Xi and Xj (i ≠ j) are all connected. If the number of data is small, the estimation accuracy does not increase so much, and if there is a strong correlation between Xi and Xj, the calculation is Sometimes it broke down. FIG. 3B shows a model used in this embodiment. As input data, five variables including Y and X _{1 to} X ₄ are input. Of these five variables, there are 5 × 4/2 = 10 combinations between two variables. With respect to all of these 10 combinations, the magnitude of correlation is obtained through the processing of S2 and S3 described above. Connect the variables with lines in descending order of correlation. However, the loop should not be allowed at this time. The process is complete when all variables are connected by at least one line. Then, at S4, selects the Y directly tethered variables of X ₁ to X ₄ determines characterized give a great influence on Y. It correlation of strong influence of X ₄ to X _3, the influence of X ₃ is included in X _2, it comes to may be used X ₂ and X ₁ is when estimating Y.

以上、実施の形態及び実施例をあげて本発明を説明したが、本発明は必ずしも上記実施の形態及び実施例に限定されるものではなく、その技術的思想の範囲内において様々に変形し実施することが出来る。 Although the present invention has been described with reference to the embodiments and examples, the present invention is not necessarily limited to the above-described embodiments and examples, and various modifications can be made within the scope of the technical idea. I can do it.

本発明は次のような利用可能性が考えられる。金融・経済において、様々な測定量の間の挙動には非線形な相関構造が観察される。例えばある企業の株価が下がると関連する他の企業の株価も下がる。通常これらの相関は線形でそれほど大きなものではないことが多い。しかし例えば不景気や金融危機においては、ある企業の株価が大きく下がると関連する他の企業の株価も大きく下がるといった非線形な相関を示す極端な状況が見られる。このような場合に、ある企業の株価を予測するために従来手法を用いると、極端な状況を十分に表すことができず予測性能が悪化してしまう。また予測の際に特徴（説明変数）となる企業の数も多く、適切な特徴を選択することが困難である。提案手法を用いるとこれらの問題を解決することができる。 The present invention can be used as follows. In finance and economy, a nonlinear correlation structure is observed in the behavior between various measured quantities. For example, if a company's stock price falls, the stock prices of other related companies will also fall. Usually these correlations are often linear and not very large. However, for example, in a recession or a financial crisis, there is an extreme situation that shows a non-linear correlation in which the stock price of a certain company drops significantly, and the stock price of another related company also drops significantly. In such a case, if a conventional method is used to predict a stock price of a certain company, an extreme situation cannot be expressed sufficiently and the prediction performance deteriorates. In addition, there are many companies that become features (explanatory variables) in the prediction, and it is difficult to select an appropriate feature. These problems can be solved by using the proposed method.

１００：特徴選択システム（予測関数生成装置）
１０１：入力装置
１０２：疎な関係性学習手段
１０３：予測関数学習手段
１０４：出力装置
１１１：観測データ記憶部
１１２：学習結果記憶部 100: Feature selection system (prediction function generator)
101: input device 102: sparse relationship learning means 103: prediction function learning means 104: output device 111: observation data storage unit 112: learning result storage unit

Claims

Learning the sparse relationship so that the value of the evaluation function calculated based on the observation data including the objective variable and the explanatory variable and the sparse relationship between each of the objective variable and the explanatory variable is improved. Sparse relationship learning means,
A prediction function generation device comprising: a prediction function learning unit that learns a prediction function in which all variables are correlated with one of the variables based on the learned sparse relationship.

The prediction function generation apparatus according to claim 1, wherein the prediction function learning unit forms a correlation between variables in descending order of the correlation.

Learning the sparse relationship so that the value of the evaluation function calculated based on the observation data including the objective variable and the explanatory variable and the sparse relationship between each of the objective variable and the explanatory variable is improved. Sparse relationship learning step,
A prediction function generation method comprising: a prediction function learning step that learns a prediction function in which all variables are correlated with one of the variables based on the learned sparse relationship.

The prediction function generation method according to claim 3, wherein the prediction function learning step forms a correlation between variables in descending order of the correlation.

A program for a prediction function generation device, the program comprising:
Learning the sparse relationship so that the value of the evaluation function calculated based on the observation data including the objective variable and the explanatory variable and the sparse relationship between each of the objective variable and the explanatory variable is improved. Sparse relationship learning means,
A program that functions as a prediction function learning unit that learns a prediction function in which all variables are correlated with one of the variables based on the learned sparse relationship.