JP2004086897A

JP2004086897A - Method and system for constructing model

Info

Publication number: JP2004086897A
Application number: JP2003283909A
Authority: JP
Inventors: Tetsuo Matsui; 松井　哲郎; Michio Takenaka; 竹中　道夫
Original assignee: Fuji Electric Holdings Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 2002-08-06
Filing date: 2003-07-31
Publication date: 2004-03-18

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently construct an accurate model by automating the procedure required for modeling; and to flexibly follow a change of the application object by automatically adjusting the model form. <P>SOLUTION: This system is constituted such that input factors and output factors of the model are determined using factors included in each divided data space made by dividing multi-dimensional data space of a plurality of factors, modeling is performed using multivariate data in the respective divided data space, and the divided data space is corrected until conformation of accuracy of the integrated model made by integrating a plurality of models per the divided data space to a predetermined condition. <P>COPYRIGHT: (C)2004,JPO

Description

　本発明は、複数の因子から構成される多変量データを解析し、ある出力因子と入出力関係を有する入力因子を複数因子の中から見出して入出力関係のモデル化を行うモデル構築方法およびモデル構築システムに関する。 The present invention analyzes a multivariate data composed of a plurality of factors, finds an input factor having an input / output relationship with a certain output factor from a plurality of factors, and models the input / output relationship and a model. Concerning the construction system.

　多変量データに潜む入出力関係を見つけ、モデル化してデータの動向を探る解析は従来より行われている。
　例えば、あるプラント、または、情報処理システムについての計測情報、生産情報、運用情報、故障情報に代表される複数因子からなる多変量データが取得され、コンピュータがこれら多変量データを解析して、プラント・情報処理システムの予測モデル、または、異常診断モデルに代表される各種モデルのモデル化を行う、というものである。 2. Description of the Related Art Analyzes for finding the input-output relationship lurking in multivariate data, modeling the data, and exploring data trends have been conventionally performed.
For example, a certain plant, or measurement information about the information processing system, production information, operation information, multivariate data consisting of multiple factors represented by failure information is obtained, the computer analyzes these multivariate data, plant -Modeling of various models typified by a prediction model of an information processing system or an abnormality diagnosis model.

　これら多変量データの入出力関係のモデル化は各種考えられるが、例えば、多変量データの中のある一つの因子の将来状態を予測する予測モデルを構築する従来技術について説明する。
　なお、予測モデルは出力因子と入力因子があり、入力因子および出力因子に相当する多変量データを用いて入出力関係を決定するモデルであるとする。 Various modeling of the input / output relationship of the multivariate data can be considered. For example, a conventional technique for constructing a prediction model for predicting the future state of a certain factor in the multivariate data will be described.
It is assumed that the prediction model has an output factor and an input factor, and that the input / output relationship is determined using multivariate data corresponding to the input factor and the output factor.

　予測モデルの構築方法では、大きく分けて次の三種類の方法が一般的によく利用されている。
（１）過去事例を利用する方法
（２）重回帰式を利用する方法
（３）ニューラルネットワークを利用する方法 In the method of constructing a prediction model, the following three types of methods are generally and often used.
(1) Method using past cases (2) Method using multiple regression equation (3) Method using neural network

　（１）の過去事例を利用する方法として、例えば、特許文献１（特開平６−１６１９８９号公報，発明の名称「予測装置」）が開示されている。これは、予測対象変数と、予測対象に影響を与えると考えられる変数の値と、を対にして事例データベースを作成しておき、入力された予測用の変数と類似している事例をデータベースから取り出して予測値とする方法である。この方法は、予測対象の性質として類似性や周期性が見られる場合によく利用される。 As a method using the past case of (1), for example, Patent Document 1 (Japanese Patent Application Laid-Open No. 6-161989, title of “prediction device”) is disclosed. In this case, a case database is created by pairing the variables to be predicted and the values of variables that are considered to affect the prediction target, and cases similar to the input prediction variables are stored in the database. It is a method of taking out and making a prediction value. This method is often used when similarity or periodicity is observed as a property of a prediction target.

　（２）の重回帰式を利用する方法としては、例えば、特許文献２（特開平６−１７４２８５号公報、発明の名称「空調熱負荷予測方法」）が開示されている。これは、翌日の最高気温、最低気温、平均湿度の線形回帰式により熱負荷量を予測する方法である。このように、予測に用いる入力因子の線形和が出力因子として表される重回帰式を利用するものであって、最も一般的な予測手法である。 As a method using the multiple regression equation of (2), for example, Patent Literature 2 (Japanese Patent Application Laid-Open No. 6-174285, entitled "Air Conditioning Heat Load Prediction Method") is disclosed. This is a method of predicting the heat load by a linear regression equation of the maximum temperature, the minimum temperature, and the average humidity on the next day. As described above, the linear sum of the input factors used for prediction uses the multiple regression equation represented as the output factor, and is the most general prediction method.

　（３）のニューラルネットワークを利用する方法は、ニューラルネットワークが入力因子と出力因子との非線形性をモデル化することが可能な手法であり、最近、特によく利用されている。例えば、非特許文献１（電気学会論文誌Ｂ，題名「ＮＮ応用電力需要予測システムの開発」）があげられる。 The method of using the neural network of (3) is a method by which the neural network can model the nonlinearity between the input factor and the output factor, and has been particularly frequently used recently. For example, Non-Patent Document 1 (IEEJ Transactions B, Title: "Development of NN Applied Power Demand Forecasting System") can be cited.

　ここで、モデル化の一具体例であるニューラルネットワークのモデル化について図を参照しつつ説明する。図１１は、多層ニューラルネットワークを説明する概念図である。一般にニューラルネットワークとは、図１１に示すように入力層、中間層、出力層からなる多層ニューラルネットワーク構造を有しており、さらに、入力層、中間層、出力層には素子が設けられ、入力層と中間層との素子間、中間層と出力層との素子間に結合を持つ。このニューラルネットワークでは入力層における素子が入力因子に、また、出力層における素子が出力因子に、それぞれ相当する。 Here, modeling of a neural network, which is a specific example of modeling, will be described with reference to the drawings. FIG. 11 is a conceptual diagram illustrating a multilayer neural network. Generally, a neural network has a multilayer neural network structure composed of an input layer, an intermediate layer, and an output layer as shown in FIG. 11, and further, elements are provided in the input layer, the intermediate layer, and the output layer. It has coupling between the element between the layer and the intermediate layer and between the element between the intermediate layer and the output layer. In this neural network, the elements in the input layer correspond to the input factors, and the elements in the output layer correspond to the output factors.

　結合係数は、ニューラルネットワークの素子間の結合の重みを表すための係数である。結合係数が大きければ、結合が重みを有している、つまり、必要な結合であるとされ、結合係数が小さければ、結合が重みを有していない、つまり、不要な結合であるとされる。 The coupling coefficient is a coefficient for representing the weight of the coupling between the elements of the neural network. If the coupling coefficient is large, the coupling is weighted, that is, it is determined to be a required coupling. If the coupling coefficient is small, the coupling is deemed to have no weight, that is, it is an unnecessary coupling. .

　このようなニューラルネットワークの予測モデル化とは、複数の入力層素子（入力因子）に入力された入力値（多変量データ）に対し、出力層素子（出力因子）から所望の出力値が得られるように入力層と中間層、また、中間層と出力層との結合係数を変更することをいう。
　ニューラルネットワークによる予測モデルとはこのようなものである。 Such prediction modeling of a neural network means that a desired output value is obtained from an output layer element (output factor) for an input value (multivariate data) input to a plurality of input layer elements (input factors). This means changing the coupling coefficient between the input layer and the intermediate layer and between the intermediate layer and the output layer.
A prediction model based on a neural network is like this.

　さて、このような予測モデルの構築を行う場合には、一般的に以下の手順に従って行われる。
（１）予測対象である出力因子の多変量データの分析
（２）モデル構成の決定
（３）モデル化
（４）モデルの調整 The construction of such a prediction model is generally performed according to the following procedure.
(1) Analysis of multivariate data of output factors to be predicted (2) Determination of model configuration (3) Modeling (4) Model adjustment

　（１）では予測対象である出力因子に関する多変量データ（以下予測対象データという）の様々な分析を行うが、一般には、まず予測対象データがどのような傾向を示しているかについての分析、または、予測対象データと他の因子の多変量データとにどのような相関関係が見られるかなどの分析が行われ、適用する予測モデルの種類を決定する。 In (1), various analyzes of multivariate data (hereinafter referred to as prediction target data) relating to output factors to be predicted are performed. In general, first, analysis is performed on what kind of tendency the prediction target data indicates, or Then, an analysis is performed to determine what correlation is found between the data to be predicted and the multivariate data of other factors, and the type of the prediction model to be applied is determined.

　予測対象データに強い周期性が見られれば過去事例の利用が有効であると考えられる。
　また、予測対象データと他の一または複数の因子の多変量データとに線形の相関関係が見られれば、重回帰式の利用が有効と考えられる。
　また、非線形の相関関係が見られる場合や明確な関係が見出せない場合はニューラルネットワークモデルの利用が有効と考えられる。 If strong periodicity is found in the data to be predicted, the use of past cases is considered to be effective.
If a linear correlation is found between the data to be predicted and the multivariate data of one or more other factors, the use of a multiple regression equation is considered effective.
When a nonlinear correlation is found or when a clear relationship cannot be found, the use of a neural network model is considered effective.

　（２）ではモデル構成の決定を行う。ここにモデル構成とは、予測対象全体を一つの予測モデルで予測する構成とするか、または、複数に分割した予測モデルで予測する構成とするかということを決定する。
　例えば、上記のプラントのある出力を一年間を通して予測するという場合に、一つの予測モデルで予測を行う構成とするか、季節ごとに予測モデルを構築して予測する構成とするか、または平日用と休日用に予測モデルを構築して予測する構成とするかといった方法が考えられる。先に引用した非特許文献１（電気学会論文誌Ｂ，題名「ＮＮ応用電力需要予測システムの開発」）においては、春季、春夏季用、夏季用、夏秋季用、秋用、冬用の６種類の予測モデルが構築されている。 In (2), the model configuration is determined. Here, the model configuration determines whether to use a configuration in which the entire prediction target is predicted using one prediction model or a configuration in which prediction is performed using a plurality of prediction models.
For example, when a certain output of the above-mentioned plant is to be predicted throughout the year, a configuration in which prediction is performed using one prediction model, a configuration in which a prediction model is constructed for each season and prediction is performed, or For example, a method of constructing a prediction model for holidays and making a prediction is conceivable. In Non-Patent Document 1 cited earlier (IEEE Transactions B, titled "Development of NN Applied Electric Power Demand Forecasting System"), there are six types for spring, spring and summer, summer, summer and autumn, autumn and winter. Different types of prediction models have been built.

　（３）のモデル化では、（１）、（２）の結果に従って予測モデルを作成する。その際に複数の予測モデルを利用するのであれば、予測モデルごとに入力因子を適したものに特化することができる。 In the modeling of (3), a prediction model is created according to the results of (1) and (2). If a plurality of prediction models are used at that time, it is possible to specialize input factors suitable for each prediction model.

　（４）のモデルの調整は、予測精度などを評価指標として、ある予測モデルの入力因子の追加・削除という変更を行ったり、または、モデル構成を見直してそれに伴う複数の予測モデルそれぞれの変更を行う。
　予測モデルの構築はこのような手法で行われていた。
　なお、従来技術として予測を例にあげて説明したが、モニタリング、保守、異常診断、その他の目的に応じた入出力関係を表すモデルのモデル化でも、基本的に上記と同様の作業が必要である。 In the adjustment of the model in (4), a change such as addition or deletion of an input factor of a certain prediction model is performed using the prediction accuracy or the like as an evaluation index, or a change in each of a plurality of prediction models accompanying the review of the model configuration is performed. Do.
The construction of the prediction model was performed by such a method.
Although prediction has been described as an example of the conventional technology, the same work as described above is basically required for modeling of a model representing an input / output relationship according to monitoring, maintenance, abnormality diagnosis, and other purposes. is there.

特開平６−１６１９８９号公報　（段落番号００１４〜段落番号００３５，図１〜図５）JP-A-6-161989 (paragraphs 0014 to 0035, FIGS. 1 to 5) 特開平６−１７４２８５号公報　（段落番号００１２〜段落番号００１９，図１〜図４）JP-A-6-174285 (paragraphs 0012 to 0019, FIGS. 1 to 4) 電気学会論文誌Ｂ，電力エネルギー部門誌，Ｖｏｌ．１２０−Ｂ，Ｎｏ．１２，ｐｐ１５５０−１５５６，（２０００），「ＮＮ応用電力需要予測システムの開発」IEEJ Transactions on Electronics B, Power and Energy Division Magazine, Vol. 120-B, no. 12, pp 1550-1556, (2000), "Development of NN applied power demand forecasting system"

　前記手順は一般的に行われているものであるが、実際には確立されたモデル化手法というわけではない。モデル化を行う場合、現状では、当該分野の専門知識を有する専門家が前記（１）〜（４）の各作業を人手により試行錯誤しつつある一定の期間作業を行っており、モデル化が終了するまでの工数が相当必要であるという問題があった。 Although the above procedure is generally performed, it is not actually an established modeling method. In the case of performing modeling, at present, an expert having expertise in the relevant field performs a certain period of time while performing each of the above operations (1) to (4) by trial and error by hand. There was a problem that a lot of man-hours until the end were required.

　また、専門知識を必要とされるため、専門家以外では満足した性能を満たすモデル化作業を行えないという問題もあった。
　さらに、様々な目的に応じて適用可能な統一的な枠組みも存在しないため、前記した一般的な手順を個別の問題ごとに具体化しながら作業を行っているという問題があった。 In addition, since specialized knowledge is required, there has been a problem that modeling work that satisfies satisfactory performance cannot be performed by non-experts.
Furthermore, since there is no unified framework applicable to various purposes, there is a problem in that work is performed while the above-described general procedure is embodied for each individual problem.

　さらにまた、前述したような重回帰式またはニューラルネットワーク等のモデルでは、事前に入力因子および出力因子を選択、決定し、データを与えることでモデル内部のパラメータが自動的に算出される。
　しかし、重回帰式やニューラルネットワークというモデルそのものには入力因子や出力因子の取捨選択の機能は無いため、ある出力因子を選択したとして、この出力因子と無関係な入力因子を選択した場合でもその入力因子を用いたモデルが構築されてしまう。このようなモデルでは出力に含まれる誤差が大きい不適切なモデル構築されるおそれがある。 Furthermore, in a model such as a multiple regression equation or a neural network as described above, parameters inside the model are automatically calculated by selecting and determining input factors and output factors in advance and providing data.
However, since models such as multiple regression formulas and neural networks do not have a function of selecting input factors and output factors, even if an output factor is selected and an input factor unrelated to this output factor is selected, the input A model using factors is built. In such a model, there is a possibility that an improper model having a large error included in the output may be constructed.

　さらにまた、複数のモデルを組み合わせた統合モデルを構築する場合、モデル構成（例えばモデルの個数等）、および、統合モデルを構成する個々のモデル自体の構造（例えば、ニューラルネットワークの素子数等）によりモデルが決定されることとなる。これらモデル構成・モデル自体の構造（以下、モデル形態という）は設計段階で決定され、その後は不変なものとして取り扱われる。対象の変化に追随した適応的なモデル形態の調整としては、学習機能により各モデル内部のパラメータ（重回帰式では回帰係数、ニューラルネットワークではニューロンの重み係数）の調整のみが行われていた。 Furthermore, when constructing an integrated model in which a plurality of models are combined, depending on the model configuration (for example, the number of models, etc.) and the structure of each model itself (for example, the number of elements of a neural network, etc.) constituting the integrated model. The model will be determined. The structure of the model configuration / model itself (hereinafter, referred to as a model form) is determined at the design stage, and thereafter treated as invariant. As the adjustment of the model form adaptive to the change of the target, only the adjustment of the parameters (regression coefficients in the multiple regression equation, weight coefficients of the neurons in the neural network) in each model was performed by the learning function.

　しかし、適用対象の大きな変化があった場合に従来手法ではモデル形態は自動的に変更することはできないため、モデル内部のパラメータ調整の結果、満足の行く性能が得られなくなってしまった場合は、再度専門家によってモデル形態の調整を行うか、新たにシステムを設計しなおすかなどの対応が必要であるという問題があった。 However, when there is a large change in the application target, the model form cannot be automatically changed by the conventional method, and as a result of adjusting the parameters inside the model, if satisfactory performance cannot be obtained, There has been a problem that it is necessary to take measures such as adjusting the model form again by an expert or redesigning the system.

　総じて、前述の従来技術では、モデル形態の設計・変更というモデル化を自動的に行う方法が存在しないため、専門家が人手により行わざるを得ず、また、専門家の経験・勘に頼るため、モデル形態の構築・調整を繰り返すことが避けられず最適なモデルの構築に時間を要するという問題点があった。 In general, in the above-mentioned conventional technology, there is no method for automatically performing modeling, that is, designing and changing the model form. Therefore, it is necessary for an expert to perform the operation manually, and to rely on the experience and intuition of the expert. In addition, there has been a problem that it is inevitable to repeatedly construct and adjust the model form, and it takes time to construct an optimal model.

　本発明は、上記課題を解決するためになされたものであり、その目的は、前記モデル化に必要な手順を自動化する方法、または前記モデル化に必要な手順を人手で行う場合に効率的な作業が行えるように支援することで、精度の良いモデルを効率的に構築できるようにするモデル構築方法およびモデル構築システムを提供することにある。
　さらに、モデル形態（モデル構成・モデル自体の構造）を自動的に調節することで適用対象の変化に柔軟に追随する適応的なモデルを効率的に構築できるようにするモデル構築方法およびモデル構築システムを提供することにある。 The present invention has been made in order to solve the above-described problems, and an object of the present invention is to provide a method for automating a procedure necessary for the modeling, or an efficient method for manually performing the procedure required for the modeling. It is an object of the present invention to provide a model construction method and a model construction system that enable efficient construction of a high-precision model by supporting work.
Furthermore, a model construction method and a model construction system for automatically adjusting a model form (model configuration / structure of the model itself) to efficiently construct an adaptive model that flexibly follows changes in an application target. Is to provide.

　上記課題を解決するため、請求項１記載の発明に係るモデル構築方法は、
　複数因子の多変量データを解析し、ある出力因子と入出力関係を有する入力因子を複数因子の中から見出して入出力関係のモデル化を行うモデル構築方法において、
　複数因子の中の一部または全ての因子により表される多次元のデータ空間を分割して分割データ空間を生成する分割データ空間生成ステップと、
　分割データ空間に含まれる因子を用いてモデルの入力因子と出力因子とを分割データ空間毎に決定する因子決定ステップと、
　分割データ空間に含まれ、入力因子と出力因子とに対応する多変量データを用いて入出力関係のモデル化を分割データ空間毎に行うモデル化ステップと、
　分割データ空間毎に構築された複数のモデルを統合した統合モデルの精度を評価するモデル精度評価ステップと、
　統合モデルの精度の評価が所定条件を満たすように分割データ空間の補正を行う分割データ空間補正ステップと、
　を行い、
　統合モデルの精度の評価が所定条件を満たすまで、分割データ空間補正ステップ、因子決定ステップ、モデル化ステップおよびモデル精度評価ステップを順次行って、モデルを構築することを特徴とする。 In order to solve the above problems, a model construction method according to the invention described in claim 1 is
In a model construction method of analyzing multivariate data of multiple factors, finding an input factor having an input / output relationship with a certain output factor from among the multiple factors, and modeling the input / output relationship,
A divided data space generation step of dividing a multidimensional data space represented by some or all of the factors to generate a divided data space;
A factor determining step of determining an input factor and an output factor of the model for each divided data space using the factors included in the divided data space,
A modeling step of modeling the input / output relationship for each divided data space using multivariate data included in the divided data space and corresponding to the input factor and the output factor,
A model accuracy evaluation step of evaluating the accuracy of an integrated model obtained by integrating a plurality of models constructed for each divided data space;
A divided data space correction step of correcting the divided data space so that the evaluation of the accuracy of the integrated model satisfies a predetermined condition;
Do
Until the evaluation of the accuracy of the integrated model satisfies a predetermined condition, a divided data space correction step, a factor determination step, a modeling step, and a model accuracy evaluation step are sequentially performed to construct a model.

　また、請求項２記載の発明に係るモデル構築方法は、
　請求項１に記載されたモデル構築方法において、
　前記分割データ空間補正ステップは、モデルの入出力の相関関係を表す指標である相関係数を用いて相関の有無に応じて分割データ空間の融合・分割を行って分割データ空間の補正を行う、
　ことを特徴とする。 The model construction method according to the second aspect of the present invention
In the model construction method according to claim 1,
The divided data space correction step corrects the divided data space by performing fusion / division of the divided data space in accordance with the presence or absence of a correlation using a correlation coefficient that is an index indicating the correlation between input and output of the model.
It is characterized by the following.

　また、請求項３記載の発明に係るモデル構築方法は、
　請求項１に記載されたモデル構築方法において、
　前記分割データ空間補正ステップは、分割データ空間毎に決定されるモデルの入力因子に対して多変量データを入力して得られるモデル出力と、出力因子に係る多変量データと、の誤差が所定のしきい値以上のデータを新しいデータ空間に分割して分割データ空間の補正を行う、
　ことを特徴とする。 The model construction method according to the third aspect of the present invention
In the model construction method according to claim 1,
The divided data space correction step includes a step in which an error between a model output obtained by inputting multivariate data with respect to an input factor of a model determined for each divided data space and the multivariate data related to the output factor is a predetermined value. Divide the data above the threshold into a new data space and correct the divided data space,
It is characterized by the following.

　また、請求項４記載の発明に係るモデル構築方法は、
　請求項１〜請求項３の何れか一項に記載されたモデル構築方法において、
　前記分割データ空間生成ステップは、分割データ空間としてクラスタを生成するクラスタリング手法を採用し、
　使用する多変量データ全部を用いてクラスタリングを行って予備クラスタに分割する予備クラスタリングステップと、
　予備クラスタ毎にクラスタリング用因子の重要度を算定し、クラスタリング用因子の選択を行うクラスタリング用因子選択ステップと、
　クラスタリング用因子選択ステップによって選択されたクラスタリング用因子に係る多変量データを用いてクラスタリングを行ってクラスタに分割する本クラスタリングステップと、
　を行い、
　複数因子の中の一部または全ての因子により表される多次元のデータ空間を分割してクラスタ単位の分割データ空間を生成することを特徴とする。 The model construction method according to the fourth aspect of the present invention
In the model construction method according to any one of claims 1 to 3,
The divided data space generation step employs a clustering method of generating a cluster as a divided data space,
A preliminary clustering step of performing clustering using all of the multivariate data to be used and dividing into preliminary clusters;
A clustering factor selection step of calculating the importance of the clustering factor for each preliminary cluster and selecting a clustering factor;
This clustering step of performing clustering using the multivariate data related to the clustering factor selected by the clustering factor selection step and dividing the cluster into clusters,
Do
The method is characterized in that a multidimensional data space represented by some or all of the plurality of factors is divided to generate a divided data space in cluster units.

　また、請求項５記載の発明に係るモデル構築方法は、
　請求項４記載のモデル構築方法において、
　前記クラスタリング用因子選択ステップは、予備クラスタリングによって生成された予備クラスタごとに、各因子の中心を表す値を算出し、その値のばらつき度合いによってクラスタリング用因子の重要度を算定し、クラスタリング用因子を選択することを特徴とする。 The model construction method according to the invention described in claim 5 is as follows.
The model building method according to claim 4,
The clustering factor selection step calculates a value representing the center of each factor for each of the preliminary clusters generated by the preliminary clustering, calculates the importance of the clustering factor based on the degree of dispersion of the values, and calculates the clustering factor. It is characterized by selecting.

　また、請求項６記載の発明に係るモデル構築方法は、
　請求項４記載のモデル構築方法において、
　前記クラスタリング用因子選択ステップは、予備クラスタリングによって生成された予備クラスタごとに、各因子の最小値から最大値の範囲の重なり度合いによってクラスタリング用因子の重要度を算定し、クラスタリング用因子を選択することを特徴とする。 Further, the model construction method according to the invention of claim 6 provides:
The model building method according to claim 4,
The clustering factor selection step calculates the importance of the clustering factor according to the degree of overlap of the range from the minimum value to the maximum value of each factor for each preliminary cluster generated by preliminary clustering, and selects the clustering factor. It is characterized by.

　また、請求項７記載の発明に係るモデル構築システムは、
　複数因子の多変量データを解析し、ある出力因子と入出力関係を有する入力因子を複数因子の中から見出して入出力関係のモデル化を行う計算機等のモデル構築システムにおいて、
　複数因子の中の一部または全ての因子により表される多次元のデータ空間を分割して分割データ空間を生成する分割データ空間生成手段と、
　分割データ空間に含まれる因子を用いてモデルの入力因子と出力因子とを分割データ空間毎に決定する因子決定手段と、
　分割データ空間に含まれ、入力因子と出力因子とに対応する多変量データを用いて入出力関係のモデル化を分割データ空間毎に行うモデル化手段と、
　分割データ空間毎に構築された複数のモデルを統合した統合モデルの精度を評価するモデル精度評価手段と、
　統合モデルの精度の評価が所定条件を満たすように分割データ空間の補正を行う分割データ空間補正手段と、
　を備え、
　統合モデルの精度の評価が所定条件を満たすまで、分割データ空間補正手段、因子決定手段、モデル化手段およびモデル精度評価手段を順次機能させて、モデルを構築することを特徴とする。 The model construction system according to the invention described in claim 7 is:
In a model construction system such as a computer that analyzes multivariate data of multiple factors, finds input factors having an input / output relationship with a certain output factor from the multiple factors, and models the input / output relationship,
A divided data space generating means for generating a divided data space by dividing a multidimensional data space represented by some or all of the plurality of factors,
Factor determining means for determining an input factor and an output factor of the model for each of the divided data spaces using the factors included in the divided data space,
Modeling means for modeling an input / output relationship for each divided data space using multivariate data included in the divided data space and corresponding to the input factor and the output factor,
A model accuracy evaluation means for evaluating the accuracy of an integrated model obtained by integrating a plurality of models constructed for each divided data space;
Division data space correction means for correcting the division data space so that the evaluation of the accuracy of the integrated model satisfies a predetermined condition;
With
Until the evaluation of the accuracy of the integrated model satisfies a predetermined condition, the divided data space correction means, the factor determination means, the modeling means, and the model accuracy evaluation means are sequentially operated to construct a model.

　また、請求項８記載の発明に係るモデル構築システムは、
　請求項７に記載されたモデル構築システムにおいて、
　前記分割データ空間補正手段は、モデルの入出力の相関関係を表す指標である相関係数を用いて相間の有無に応じて分割データ空間の融合・分割を行って分割データ空間の補正を行う、
　ことを特徴とする。 The model construction system according to the invention of claim 8 is:
The model construction system according to claim 7,
The divided data space correction means corrects the divided data space by performing fusion / division of the divided data space according to the presence or absence of a phase using a correlation coefficient which is an index indicating a correlation between input and output of the model.
It is characterized by the following.

　また、請求項９記載の発明に係るモデル構築システムは、
　請求項７に記載されたモデル構築システムにおいて、
　前記分割データ空間補正手段は、分割データ空間毎に決定されるモデルの入力因子に対して多変量データを入力して得られるモデル出力と、出力因子に係る多変量データと、の誤差が所定のしきい値以上のデータを新しいデータ空間に分割して分割データ空間の補正を行う、
　ことを特徴とする。 The model construction system according to the ninth aspect of the present invention
The model construction system according to claim 7,
The divided data space correction means is configured such that an error between a model output obtained by inputting multivariate data for an input factor of a model determined for each divided data space and multivariate data related to an output factor is a predetermined error. Divide the data above the threshold into a new data space and correct the divided data space,
It is characterized by the following.

　また、請求項１０記載の発明に係るモデル構築システムは、
　請求項７〜請求項９の何れか一項に記載されたモデル構築システムにおいて、
　前記分割データ空間生成手段は、分割データ空間としてクラスタを生成するクラスタリング手法を採用し、
　使用する多変量データ全部を用いてクラスタリングを行って予備クラスタに分割する予備クラスタリング手段と、
　予備クラスタ毎にクラスタリング用因子の重要度を算定し、クラスタリング用因子の選択を行うクラスタリング用因子選択手段と、
　クラスタリング用因子選択手段によって選択されたクラスタリング用因子に係る多変量データを用いてクラスタリングを行ってクラスタに分割する本クラスタリング手段と、
　を備え、
　複数因子の中の一部または全ての因子により表される多次元のデータ空間を分割してクラスタ単位の分割データ空間を生成することを特徴とする。 The model construction system according to the invention described in claim 10 is:
In the model construction system according to any one of claims 7 to 9,
The divided data space generating means adopts a clustering method of generating a cluster as a divided data space,
A preliminary clustering means for performing clustering using all of the multivariate data to be used and dividing the cluster into preliminary clusters;
Clustering factor selecting means for calculating the importance of the clustering factor for each preliminary cluster and selecting the clustering factor;
A clustering unit that performs clustering using the multivariate data related to the clustering factor selected by the clustering factor selection unit and divides the cluster into clusters;
With
The method is characterized in that a multidimensional data space represented by some or all of the plurality of factors is divided to generate a divided data space in cluster units.

　また、請求項１１記載の発明に係るモデル構築システムは、
　請求項１０記載のモデル構築システムにおいて、
　前記クラスタリング用因子選択手段は、予備クラスタリングによって生成された予備クラスタごとに、各因子の中心を表す値を算出し、その値のばらつき度合いによってクラスタリング用因子の重要度を算定し、クラスタリング用因子を選択することを特徴とする。 Further, a model construction system according to the invention described in claim 11 is:
The model construction system according to claim 10,
The clustering factor selection means calculates a value representing the center of each factor for each of the preliminary clusters generated by the preliminary clustering, calculates the importance of the clustering factor based on the degree of variation of the value, and calculates the clustering factor. It is characterized by selecting.

　また、請求項１２記載の発明に係るモデル構築システムは、
　請求項１０記載のモデル構築システムにおいて、
　前記クラスタリング用因子選択手段は、予備クラスタリングによって生成された予備クラスタごとに、各因子の最小値から最大値の範囲の重なり度合いによってクラスタリング用因子の重要度を算定し、クラスタリング用因子を選択することを特徴とする。 The model construction system according to the invention of claim 12 is:
The model construction system according to claim 10,
The clustering factor selecting means calculates the importance of the clustering factor according to the degree of overlap of the range from the minimum value to the maximum value of each factor for each preliminary cluster generated by preliminary clustering, and selects the clustering factor. It is characterized by.

　以上のような本発明によれば、前記モデル化に必要な手順を自動化する方法、または前記モデル化に必要な手順を人手で行う場合に効率的な作業が行えるように支援することで、精度の良いモデルを効率的に構築できるようにするモデル構築方法およびモデル構築システムを提供することができる。
　さらに、モデル形態（モデル構成・モデル自体の構造）を自動的に調節することで適用対象の変化に柔軟に追随する適応的なモデルを効率的に構築できるようにするモデル構築方法およびモデル構築システムを提供することができる。 According to the present invention as described above, a method of automating the procedure required for the modeling, or assisting to perform an efficient operation when manually performing the procedure required for the modeling, thereby improving accuracy. A model construction method and a model construction system capable of efficiently constructing a good model can be provided.
Furthermore, a model construction method and a model construction system for automatically adjusting a model form (model configuration / structure of the model itself) to efficiently construct an adaptive model that flexibly follows changes in an application target. Can be provided.

　以下、本発明の最良の形態について説明する。請求項１に係るモデル構築方法の形態および請求項７に係るモデル構築システムの形態について図を参照しつつ説明する。図１はモデル構築方法の流れを説明するフローチャート、図２はデータ空間の分割を説明する説明図である。
　第１形態のモデル構築方法は、複数因子の多変量データを解析し、複数因子の中から入出力関係を有する入力因子と出力因子とを見出して入出力関係のモデル化を行うものである。 Hereinafter, the best mode of the present invention will be described. An embodiment of a model construction method according to claim 1 and an embodiment of a model construction system according to claim 7 will be described with reference to the drawings. FIG. 1 is a flowchart illustrating the flow of the model construction method, and FIG. 2 is an explanatory diagram illustrating the division of the data space.
The model construction method of the first embodiment analyzes multivariate data of a plurality of factors, finds input factors and output factors having an input / output relationship from the plurality of factors, and models the input / output relationship.

（１）分割データ空間生成ステップ（初期分割）
　図１のステップＳ１は、多変量データの複数因子の中の一部または全ての因子により表される多次元のデータ空間を複数のデータ空間に分割して分割データ空間を生成する分割データ空間生成ステップである。この分割データ空間生成ステップにより多次元のデータ空間は、入出力関係を有する入力因子・出力因子・関連する多変量データを含むデータ空間に複数分割される。 (1) Divided data space generation step (initial division)
Step S1 in FIG. 1 is a division data space generation for dividing a multidimensional data space represented by some or all of a plurality of factors of the multivariate data into a plurality of data spaces to generate a divided data space. Step. By the divided data space generation step, the multidimensional data space is divided into a plurality of data spaces including input factors, output factors, and related multivariate data having an input / output relationship.

　仮に、３次元のデータ空間であるとして図２（ａ）で示すように分割されるものとする。この分割例では、各分割データ空間は、図２（ｂ）の表で表したようなデータを含む。なお、図２（ｂ）の表において、“□（ｉ）”、“×（ｉ）”、“△（ｉ）”、“○（ｉ）”は、０（ゼロ）以外の何らかの値を表すものとする。
　分割データ空間１および分割データ空間２では、因子１，２について値が０以外であり、また、因子３について値が０であるような空間である。
　分割データ空間３では因子１，３について値が０以外であり、また、因子２について値が０であるような空間である。
　分割データ空間４では因子１，２，３について値が０以外であるような空間である。
　分割データ空間１，２，３はいずれかの因子で値が０であるような分割データ空間が生成される場合を示している。
　分割データ空間１，２は因子３の値が０である場合で共通するが、さらに別の分割データ空間が生成される場合を示している。
　このような分割データ空間生成手段では、例えば、クラスタ分析（クラスタリング）、決定木分析が適用できる。 It is assumed that the data space is a three-dimensional data space and is divided as shown in FIG. In this division example, each division data space includes data as shown in the table of FIG. In the table of FIG. 2B, “□ (i)”, “× (i)”, “△ (i)”, and “○ (i)” represent some values other than 0 (zero). Shall be.
In the divided data space 1 and the divided data space 2, the values of the factors 1 and 2 are other than 0, and the values of the factor 3 are 0.
The divided data space 3 is a space in which the values of factors 1 and 3 are other than 0, and the value of factor 2 is 0.
The divided data space 4 is a space in which the values of the factors 1, 2, and 3 are other than 0.
The divided data spaces 1, 2, and 3 show a case where a divided data space whose value is 0 is generated by any factor.
The divided data spaces 1 and 2 are common when the value of the factor 3 is 0, but show a case where another divided data space is generated.
In such a divided data space generating means, for example, cluster analysis (clustering) and decision tree analysis can be applied.

　クラスタ分析は、大きく分けて、階層的方法と非階層的方法とが存在する。
　階層的方法では、多変量データの中の個々のデータを一つのクラスタとみなし、類似度が大きいもの（非類似度が小さいもの）を融合していき、クラスタを生成する方法である。なお、このクラスタは分割データ空間に相当するものであるが、本明細書ではクラスタ分析により生成した分割データ空間は、特にクラスタと呼んで区別する。クラスタと分割データ空間は同じものであって呼び方が違うのみである。 Cluster analysis can be roughly divided into a hierarchical method and a non-hierarchical method.
In the hierarchical method, individual data in the multivariate data is regarded as one cluster, and a cluster having a large similarity (a small dissimilarity) is merged to generate a cluster. Although this cluster corresponds to a divided data space, in this specification, a divided data space generated by cluster analysis is particularly called a cluster for distinction. The cluster and the divided data space are the same and only differ in how they are called.

　非階層的方法では、初期クラスタを設定し、各データを全クラスタの重心と比較し、最も近いクラスタに再配置して重心の再計算を行い、重心の変化がなくなるまでこれを繰り返していく方法である。
　これらのクラスタ分析により、類似度が大きい多変量データの因子が纏められたクラスタ（分割データ空間）が複数生成される。このクラスタでは、類似度がないような因子が入らないため、モデル化における入力因子と出力因子の決定を容易にする。 In the non-hierarchical method, a method of setting an initial cluster, comparing each data with the centroid of all clusters, relocating to the closest cluster, recalculating the centroid, and repeating this until there is no change in the centroid. It is.
By these cluster analysis, a plurality of clusters (divided data space) in which factors of multivariate data having a large similarity are put together are generated. Since this cluster does not include a factor having no similarity, it is easy to determine an input factor and an output factor in modeling.

　決定木分析は、ある目的で多変量データを分類（予測）するためのルール（条件）を木構造で表現（モデル化）する方法であり、ＩＤ３やその発展形であるＣ４．５、ＣＡＲＴ、ＣＨＡＩＤなどが代表的な手法である。
　この決定木分析により、条件に合致する多変量データの因子が纏められた複数の分割データ空間が生成される。 Decision tree analysis is a method of expressing (modeling) rules (conditions) for classifying (predicting) multivariate data for a certain purpose in a tree structure. ID3 and its advanced forms C4.5, CART, CHAID is a typical method.
By this decision tree analysis, a plurality of divided data spaces in which factors of the multivariate data matching the conditions are collected are generated.

（２）因子決定ステップ
　ステップＳ２は、分割データ空間に含まれる因子を用いてモデルの入力因子と出力因子とを分割データ空間毎に決定するステップである。
　分割データ空間の生成結果や分割データ空間の分割過程の情報を評価することにより、後述する（３）のモデル化で用いる入力因子を決定する。 (2) Factor determination step Step S2 is a step of determining an input factor and an output factor of the model for each divided data space using the factors included in the divided data space.
By evaluating the generation result of the divided data space and the information on the division process of the divided data space, the input factor used in the modeling of (3) described later is determined.

　例えば、図２で示した分割データ空間では、因子１が出力因子として選択されるものである場合、分割データ空間１，２では入力因子は因子２、分割データ空間３では入力因子３に、分割データ空間４では入力因子は因子２，３に決定される。これら分割データ空間１〜３では、同じ出力因子であっても、入力因子はそれぞれ異なっており、各種条件が違う場合に入力因子が異なるように分割データ空間が生成されている。このように分割データ空間は、全ての因子を含んだり、特定の因子のみ含む空間が構築されたりする。さらに、分割データ空間では因子が同じでも複数空間ができたりする。これはクラスタ分析で起こりうる現象である。 For example, in the divided data space shown in FIG. 2, when the factor 1 is selected as an output factor, the input factor is divided into the factor 2 in the divided data spaces 1 and 2, and the input factor 3 is divided in the divided data space 3. In the data space 4, the input factors are determined as factors 2 and 3. In these divided data spaces 1 to 3, the input factors are different even for the same output factor, and the divided data spaces are generated such that the input factors are different when various conditions are different. As described above, the divided data space includes all the factors, or a space including only a specific factor is constructed. Further, in the divided data space, a plurality of spaces may be created even if the factor is the same. This is a phenomenon that can occur in cluster analysis.

　また、決定木分析では、決定木の上位に出てくる因子が重要な因子であると判断することができる。従って、決定木の上位の因子から入力因子を選択するという方法が可能である。また、決定木分析の結果、分岐条件として出てこない変数は入力因子として用いないという方法が可能である。決定 Also, in decision tree analysis, factors appearing at the top of the decision tree can be determined to be important factors. Therefore, it is possible to select an input factor from factors higher in the decision tree. Also, as a result of the decision tree analysis, a method that does not use variables that do not appear as branch conditions as input factors is possible.

（３）モデル化ステップ
　ステップＳ３は入力因子と出力因子とに対応する多変量データを用いて入出力関係のモデル化を分割データ空間毎に行うモデル化ステップである。
　モデル化は、代表的な方法として、過去事例の利用、重回帰式、ニューラルネットワーク等を選択する。
　この際、分割データ空間に含まれる因子の中からモデルの入力因子を決定する。この入力因子の決定では、このモデルの開発者、またはユーザが経験的な判断に基づいて選択する方法や、ＡＩＣ(Akaike Information Criterion)などの情報量基準を用いて入力因子をさらに限定する方法などが適用可能である。 (3) Modeling Step Step S3 is a modeling step of modeling an input / output relationship for each divided data space using multivariate data corresponding to input factors and output factors.
For modeling, typical methods are selected from the use of past cases, a multiple regression equation, a neural network, and the like.
At this time, the input factors of the model are determined from the factors included in the divided data space. In determining this input factor, a method of selecting the model developer or user based on empirical judgment, a method of further limiting the input factor using an information criterion such as AIC (Akaike Information Criterion), and the like. Is applicable.

　過去事例の利用では、例えば、得られた分割データ空間に含まれる多変量データの平均値をそのモデルの出力値とする方法がある。
　重回帰式の利用では、得られた分割データ空間に含まれる多変量データを用いて重回帰分析を行い重回帰式を構築する。
　ニューラルネットワークを利用する場合も同様に、得られた分割データ空間に含まれる多変量データを用いてニューラルネットワークの学習を行ないモデル化する。 In the use of past cases, for example, there is a method in which the average value of multivariate data included in the obtained divided data space is used as the output value of the model.
In the use of the multiple regression equation, multiple regression analysis is performed using the multivariate data included in the obtained divided data space to construct a multiple regression equation.
Similarly, in the case of using a neural network, learning of the neural network is performed using multivariate data included in the obtained divided data space to form a model.

（４）モデル精度評価ステップ
　ステップＳ４は、分割データ空間毎に構築された複数のモデルを統合した統合モデルの精度を評価するモデル精度評価ステップである。
　統合モデルの精度評価は次式により行われる。 (4) Model accuracy evaluation step Step S4 is a model accuracy evaluation step of evaluating the accuracy of an integrated model obtained by integrating a plurality of models constructed for each divided data space.
The accuracy evaluation of the integrated model is performed by the following equation.

　数１の出力誤差ＥＥが、予め定められた値未満であればモデルの再構築は不要であると判断し、また、予め定められた値以上である場合はモデルの再構築が必要と判断する。
　ここで、各モデルｉには重み係数ｗｉを用いてモデルごとに精度評価の重視度合いを考慮することが可能である。この重み係数ｗｉの調整として、０と１の２値で設定すれば、ある特定のモデルに対する評価のみ行うような精度評価とすることもできる。 If the output error EE of Equation 1 is less than a predetermined value, it is determined that model rebuilding is unnecessary. If the output error EE is more than a predetermined value, it is determined that model rebuilding is necessary. .
Here, for each model i, it is possible to consider the degree of importance of accuracy evaluation for each model using the weighting coefficient wi. If the adjustment of the weight coefficient wi is set to a binary value of 0 and 1, it is possible to perform an accuracy evaluation in which only an evaluation for a specific model is performed.

　なお、モデルの精度評価（誤差評価）に用いる多変量データとしては、モデル化を行った際の多変量データをそのまま用いるモデル化誤差により評価する方法と、モデル化の際に全ての多変量データの中の一部の多変量データを使わないでモデル精度評価用に残しておく方法がある。
　前者の方法では、モデル化のために与えられた多変量データそのものに対してどの程度モデル化できているかを評価することができる。
　後者の方法ではモデルの汎化能力を評価することができる。 The multivariate data used for the model accuracy evaluation (error evaluation) includes a multivariate data evaluation method using the multivariate data obtained at the time of modeling as it is, and a multivariate data evaluation method for modeling. There is a method that does not use some of the multivariate data and saves them for model accuracy evaluation.
In the former method, it is possible to evaluate how much the multivariate data itself given for modeling has been modeled.
In the latter method, the generalization ability of the model can be evaluated.

（５）分割データ空間補正ステップ
　ステップＳ５は、統合モデルの精度の評価が所定条件を満たすようにデータ空間の補正を行う分割データ空間補正ステップである。
　分割データ空間の補正方法としては、更に詳しくは、複数の分割データ空間を融合する方法、分割データ空間をさらに分離して分割空間データを生成する方法、分割データ空間内のデータを入れ替える方法が考えられる。 (5) Divided data space correction step Step S5 is a divided data space correction step of correcting the data space so that the evaluation of the accuracy of the integrated model satisfies a predetermined condition.
More specifically, as a method of correcting the divided data space, a method of fusing a plurality of divided data spaces, a method of further dividing the divided data space to generate divided space data, and a method of replacing data in the divided data space are considered. Can be

　複数の分割データ空間を融合する方法では、融合する対象となる分割データ空間を決定する融合対象分割データ空間決定ステップと、分割データ空間を融合する分割データ空間融合ステップと、を行う。
　分割データ空間をさらに分離して分割空間データを生成する方法では、分離対象の分割データ空間を決定する分離対象分割データ空間決定ステップと、対象となる分割データ空間の中のどの多変量データによるデータ空間とするかを決定する分割データ空間分離ステップと、を行う。
　また、データ空間内の多変量データを入れ替える方法では、入れ替える多変量データを決定する入替対象データ決定ステップと、多変量データの入れ替え、再配置を行う再配置ステップとを行う。 In the method of fusing a plurality of divided data spaces, a fusion target divided data space determining step of determining a divided data space to be fused and a divided data space fusion step of fusing the divided data spaces are performed.
In the method of further dividing the divided data space to generate the divided space data, a separation target divided data space determining step of determining a divided data space to be separated, and data using any multivariate data in the target divided data space And a divided data space separating step of determining whether the data is a space.
Further, in the method of exchanging multivariate data in the data space, a replacement target data determining step of determining multivariate data to be exchanged and a relocation step of exchanging and relocating the multivariate data are performed.

（６）モデル再構築ステップ
　ステップＳ２〜ステップＳ５は、統合モデルの精度の評価が所定条件を満たすまで、分割データ空間補正ステップ、因子決定ステップ、モデル化ステップおよび精度評価ステップを繰り返し行って、最終的なモデルを構築するモデル再構築ステップである。
　統合モデルの精度の評価、つまり、前記した数１の出力誤差ＥＥが所定値未満に収まらないような場合にはこのＥＥが所定値未満になるまで分割データ空間の変更およびこの分割データ空間に含まれる因子や多変量データによるモデル構築を繰り返す。 (6) Model Reconstruction Step In steps S2 to S5, the divided data space correction step, the factor determination step, the modeling step, and the accuracy evaluation step are repeatedly performed until the evaluation of the accuracy of the integrated model satisfies a predetermined condition. This is a model reconstruction step for constructing a dynamic model.
Evaluation of the accuracy of the integrated model, that is, when the output error EE of Equation 1 does not fall below a predetermined value, the divided data space is changed and included in the divided data space until the EE becomes less than a predetermined value. It repeats the model construction by the factors and multivariate data.

　以上説明した本形態によれば、多変量データの因子により構成されるデータ空間を分割して分割データ空間を生成し、出力因子とその出力因子に関連する入力因子を含むデータ空間を抜き出して、出力因子とこの出力因子に関連する多変量データによりこれら出力因子と入力因子によるモデルを構築するため、精度の高いモデルを構築することができる。 According to the present embodiment described above, a data space composed of factors of multivariate data is divided to generate a divided data space, and a data space including an output factor and an input factor related to the output factor is extracted. Since a model based on the output factor and the input factor is constructed based on the output factor and the multivariate data related to the output factor, a highly accurate model can be constructed.

　続いて、請求項７に係るモデル構築システムについて説明する。図１２は、予測モデル構築システムのシステム構成図である。
　第１形態の予測モデル構築システムは、図１２で示すようにデータ入力画面１、データ入力手段２、データ保存手段３、データベース４、データ出力手段（表示装置）５、予測モデル６、分割データ空間生成手段７、因子決定手段８、モデル化手段９、モデル精度評価手段１０、分割データ空間補正手段１１を備える。
　また、インターネットやイントラネット、ＬＡＮなどのネットワーク１００がデータ入力手段２に接続されている。モデル構築システムはこれら手段が機能する計算機（コンピュータ）が該当する。 Subsequently, a model construction system according to claim 7 will be described. FIG. 12 is a system configuration diagram of the prediction model construction system.
As shown in FIG. 12, the prediction model construction system according to the first embodiment includes a data input screen 1, a data input unit 2, a data storage unit 3, a database 4, a data output unit (display device) 5, a prediction model 6, a divided data space. The apparatus includes a generation unit 7, a factor determination unit 8, a modeling unit 9, a model accuracy evaluation unit 10, and a divided data space correction unit 11.
Further, a network 100 such as the Internet, an intranet, and a LAN is connected to the data input unit 2. The model construction system corresponds to a computer (computer) on which these means function.

　多変量データは、コンピュータのデータ入力画面１や、当該システムが接続されているＬＡＮ内の他のコンピュータに蓄積されているデータや、インターネットを介して他のコンピュータに蓄積されているデータが、データ入力手段２を通じて本システム内に取り込まれ、データ保存手段３によってデータベース４に蓄積される。このようにして蓄積されたデータベース内のデータが用いられる。なお、データ入力手段２として、その他の手段、例えば、データを記録した電子媒体を通じて行うことももちろん可能である。 The multivariate data includes a data input screen 1 of a computer, data stored in another computer in a LAN to which the system is connected, and data stored in another computer via the Internet. The data is taken into the system through the input unit 2 and stored in the database 4 by the data storage unit 3. The data in the database thus accumulated is used. It should be noted that the data input means 2 can be performed through other means, for example, an electronic medium on which data is recorded.

（１）分割データ空間生成手段（初期分割）７
　多変量データの複数因子の中の一部または全ての因子により表される多次元のデータ空間を複数のデータ空間に分割して分割データ空間を生成する手段であり、図１のステップ１に対応する機能を有する。この分割データ空間生成手段７により、図２で説明したように、多次元のデータ空間は、入出力関係を有する入力因子・出力因子・関連する多変量データを含むデータ空間に複数分割される。 (1) Divided data space generation means (initial division) 7
Means for generating a divided data space by dividing a multidimensional data space represented by some or all of the factors of the multivariate data into a plurality of data spaces, corresponding to step 1 in FIG. It has the function to do. As described with reference to FIG. 2, the divided data space generation means 7 divides the multidimensional data space into a plurality of data spaces including input factors, output factors, and related multivariate data having an input / output relationship.

（２）因子決定手段８
　分割データ空間に含まれる因子を用いてモデルの入力因子と出力因子とを分割データ空間毎に決定する手段であり、ステップＳ２に対応する機能を有する。
　分割データ空間の生成結果や分割データ空間の分割過程の情報を評価することにより、後述する（３）のモデル化で用いる入力因子を決定する。 (2) Factor determination means 8
This is means for determining an input factor and an output factor of a model for each divided data space using the factors included in the divided data space, and has a function corresponding to step S2.
By evaluating the generation result of the divided data space and the information on the division process of the divided data space, the input factor used in the modeling of (3) described later is determined.

（３）モデル化手段９
　入力因子と出力因子とに対応する多変量データを用いて入出力関係のモデル化を分割データ空間毎に行う手段であり、ステップＳ３に対応する機能を有する。
　モデル化は、代表的な方法として、過去事例の利用、重回帰式、ニューラルネットワーク等を選択する。 (3) Modeling means 9
This is a means for modeling an input / output relationship for each divided data space using multivariate data corresponding to input factors and output factors, and has a function corresponding to step S3.
For modeling, typical methods are selected from the use of past cases, a multiple regression equation, a neural network, and the like.

（４）モデル精度評価手段１０
　分割データ空間毎に構築された複数のモデルを統合した統合モデルの精度を評価する手段であり、ステップＳ４に対応する機能を有する。 (4) Model accuracy evaluation means 10
This is a means for evaluating the accuracy of an integrated model obtained by integrating a plurality of models constructed for each divided data space, and has a function corresponding to step S4.

（５）分割データ空間補正手段１１
　統合モデルの精度の評価が所定条件を満たすようにデータ空間の補正を行う手段であり、ステップＳ５に対応する機能を有する。
　分割データ空間の補正方法としては、更に詳しくは、複数の分割データ空間を融合する方法、分割データ空間をさらに分離して分割空間データを生成する方法、分割データ空間内のデータを入れ替える方法が考えられる。 (5) Divided data space correction means 11
This is a means for correcting the data space so that the evaluation of the accuracy of the integrated model satisfies a predetermined condition, and has a function corresponding to step S5.
More specifically, as a method of correcting the divided data space, a method of fusing a plurality of divided data spaces, a method of further dividing the divided data space to generate divided space data, and a method of replacing data in the divided data space are considered. Can be

　複数の分割データ空間を融合する方法では、融合する対象となる分割データ空間を決定する融合対象分割データ空間決定手段と、分割データ空間を融合する分割データ空間融合手段とを機能させる。
　分割データ空間をさらに分離して分割空間データを生成する方法では、分離対象の分割データ空間を決定する分離対象分割データ空間決定手段と、対象となる分割データ空間の中のどの多変量データによるデータ空間とするかを決定する分割データ空間分離手段とを機能させる。
　また、データ空間内の多変量データを入れ替える方法では、入れ替える多変量データを決定する入替対象データ決定手段と、多変量データの入れ替え、再配置を行う再配置手段とを機能させる。 In the method of fusing a plurality of divided data spaces, a fusion target divided data space determining means for determining a divided data space to be fused and a divided data space fusing means for fusing the divided data spaces are made to function.
In the method of further dividing the divided data space to generate the divided space data, the dividing target divided data space determining means for determining the divided data space to be separated, and the data based on any multivariate data in the target divided data space And a divided data space separating unit that determines whether the data is a space.
Further, in the method of exchanging multivariate data in the data space, the replacement target data determining means for determining the multivariate data to be exchanged and the relocation means for exchanging and relocating the multivariate data are made to function.

（６）モデル再構築手段
　統合モデルの精度の評価が所定条件を満たすまで、分割データ空間補正手段７、因子決定手段８、モデル化手段９およびモデル精度評価手段１０を繰り返し機能させ、最終的なモデルを構築する手段である。
　統合モデルの精度の評価が所定値未満に収まらないような場合には前記した数１の出力誤差ＥＥが所定値未満になるまで分割データ空間の変更およびこの分割データ空間に含まれる因子や多変量データによるモデル構築を繰り返す。 (6) Model Reconstruction Unit The divided data space correction unit 7, the factor determination unit 8, the modeling unit 9, and the model accuracy evaluation unit 10 are repeatedly operated until the evaluation of the accuracy of the integrated model satisfies a predetermined condition. It is a means to build a model.
If the evaluation of the accuracy of the integrated model does not fall below a predetermined value, the divided data space is changed and the factors and multivariates included in the divided data space until the output error EE of Equation 1 becomes smaller than the predetermined value. Repeat model building with data.

　続いて、本発明の請求項２，８に係るモデル構築方法・システムの第２形態について説明する。第２形態は第１形態の分割データ空間補正ステップ・分割データ空間補正手段をより具体化した形態である。
　まず、請求項２に係るモデル構築方法の分割データ空間補正ステップについて説明する。図１で示す分割データ空間補正ステップＳ５は、モデルの入出力の相関関係を表す指標である相関係数を用いて相間関係の有無に応じて分割データ空間の融合・分割を行うことにより分割データ空間の補正を行うステップである。 Subsequently, a second embodiment of the model construction method / system according to claims 2 and 8 of the present invention will be described. The second embodiment is a more specific embodiment of the divided data space correction step / divided data space correction means of the first embodiment.
First, the divided data space correction step of the model construction method according to claim 2 will be described. The divided data space correction step S5 shown in FIG. 1 is performed by performing fusion / division of the divided data space in accordance with the presence or absence of a phase relationship using a correlation coefficient which is an index indicating the correlation between input and output of the model. This is the step of correcting the space.

　本形態の分割データ空間の補正方法は、分割データ空間に含まれる多変量データについて、例えば相関係数などの入出力関係の相関関係を表す指標を計算する。表１は相関係数を示す表である。 In the method of correcting a divided data space according to the present embodiment, for multivariate data included in the divided data space, for example, an index representing a correlation of an input / output relationship such as a correlation coefficient is calculated. Table 1 is a table showing the correlation coefficient.

　表１で示すように分割データ空間Ｃ１、Ｃ２、Ｃ３内の入出力データの相関係数がそれぞれ０．６、０．７、０．５となっているものとする。
　次に、各分割データ空間同士の組み合わせを生成し、組み合わせによって得られた新しい分割データ空間を仮の分割データ空間とする。表１では仮の分割データ空間の組み合わせとして３種類の組み合わせが生成されている。
　続いて、仮の分割データ空間に含まれる多変量データについて相関係数を計算する。表１の例では初期データ空間とあわせて４通りの分割データ空間とそれぞれの相関係数が得られた。 As shown in Table 1, it is assumed that the correlation coefficients of the input and output data in the divided data spaces C1, C2, and C3 are 0.6, 0.7, and 0.5, respectively.
Next, a combination of the divided data spaces is generated, and a new divided data space obtained by the combination is set as a temporary divided data space. In Table 1, three types of combinations are generated as combinations of provisional divided data spaces.
Subsequently, a correlation coefficient is calculated for the multivariate data included in the temporary divided data space. In the example of Table 1, four types of divided data spaces and their respective correlation coefficients were obtained together with the initial data space.

　ここで、相関係数を基準にして４通りのデータ分割の中でどの分割を選択するかを決定する。決定の方法としては、例えば、相関係数が最大となる分割データ空間を含む分割を選択する方法がある。表１の場合は仮の分割データ空間３が相関係数０．９のデータ空間を含むため選択される。
　また、相関係数の平均が最も高い分割を選択する方法がある。表１の場合は仮の分割データ空間１が相関係数の平均が０．７となり選択される。
　さらにまた、特定の因子に関する多変量データを含む分割データ空間の相関係数を重視することも考えられる。　 Here, it is determined which of the four data divisions is to be selected based on the correlation coefficient. As a method of the determination, for example, there is a method of selecting a division including a divided data space in which the correlation coefficient is maximum. In the case of Table 1, the provisional divided data space 3 is selected because it includes a data space having a correlation coefficient of 0.9.
There is also a method of selecting a division having the highest average correlation coefficient. In the case of Table 1, the temporary divided data space 1 is selected because the average of the correlation coefficient is 0.7.
Furthermore, it is conceivable that emphasis is placed on the correlation coefficient of the divided data space including multivariate data relating to a specific factor.

　このような分割データ空間補正ステップは、最終的に分割データ空間が全て融合されて元のデータ空間と一致するまで行い、その過程で得られた分割データ空間の中から統合モデルの全体精度が最も高いような分割空間データとそのモデルを選択する（なお，このような補正を，本明細書ではボトムアップ補正と定義し，図４にて使用している）。 Such a divided data space correction step is performed until all the divided data spaces are finally merged and coincides with the original data space, and the overall accuracy of the integrated model is the highest among the divided data spaces obtained in the process. A high divided spatial data and its model are selected (note that such correction is defined as bottom-up correction in this specification and used in FIG. 4).

　また、請求項８に係るモデル構築システムの分割データ空間補正手段について説明する。第２形態の分割データ空間補正手段は、モデルの入出力の相関関係を表す指標である相関係数を用いて相間関係の有無に応じて分割データ空間の融合・分割を行うことにより分割データ空間の補正を行う手段であり、上記した第２形態の分割データ空間補正ステップに対応する機能を有する。 (5) A divided data space correcting means of the model construction system according to claim 8 will be described. The divided data space correction means of the second embodiment performs fusion / division of the divided data space in accordance with the presence or absence of a phase relationship by using a correlation coefficient which is an index indicating a correlation between input and output of the model, thereby obtaining a divided data space. And has a function corresponding to the divided data space correction step of the second embodiment described above.

　続いて、本発明の請求項３，９に係るモデル構築方法・システムの第３形態について説明する。第３形態は第１形態の分割データ空間補正ステップ・分割データ空間補正手段をより具体化した形態である。
　請求項３に係るモデル構築方法の図１で示す分割データ空間補正ステップＳ５は、分割データ空間毎に決定されるモデルの入力因子に対して多変量データを入力して得られるモデル出力と出力因子に係る多変量データとの誤差が所定条件を満たすまで分割データ空間の融合・分割を行って分割データ空間の補正を行うステップである。 Next, a third embodiment of the model construction method / system according to claims 3 and 9 of the present invention will be described. The third embodiment is a more specific embodiment of the divided data space correction step / divided data space correction means of the first embodiment.
The divided data space correction step S5 shown in FIG. 1 of the model construction method according to claim 3 is a method of obtaining a model output and an output factor obtained by inputting multivariate data with respect to an input factor of a model determined for each divided data space. And correcting the divided data space by performing fusion and division of the divided data space until an error from the multivariate data according to the above satisfies a predetermined condition.

　第３形態では、モデル全体の精度評価において計算される各モデルにおける誤差を利用する。ここで計算された誤差は、現在のモデルでのモデル化精度を表している。すなわち、この誤差が大きいモデルは適切にモデル化できていないことを示している。従って、この誤差がある閾値以上となっているデータについては、新しい別の分割データ空間として分離する。 In the third embodiment, an error in each model calculated in the accuracy evaluation of the entire model is used. The error calculated here represents the modeling accuracy of the current model. In other words, it indicates that a model having a large error cannot be appropriately modeled. Therefore, data in which this error is equal to or greater than a certain threshold is separated as a new separate divided data space.

　以下、分割データ空間の補正について図を参照しつつ具体例を説明する。
　図３は、分割データ空間の補正を説明する説明図である。
　実際は多次元のデータ空間であるが、図３では説明の容易化のため、因子１と因子２という二因子のみの２次元のデータ空間であるものとして図示している。
　ここで、分割データ空間生成ステップ（初期分割）、因子決定ステップ、モデル化ステップの各フェーズを実行することによって、全体のデータ空間が分割データ空間１、分割データ空間２、分割データ空間３の３つの分割データ空間に分割され、分割データ空間１には６個、分割データ空間２は１０個、分割データ空間３は５個の多変量データが含まれている。 Hereinafter, a specific example of the correction of the divided data space will be described with reference to the drawings.
FIG. 3 is an explanatory diagram for explaining the correction of the divided data space.
Although the data space is actually a multidimensional data space, FIG. 3 illustrates the data space as a two-dimensional data space having only two factors, factor 1 and factor 2, for ease of explanation.
Here, by executing each phase of the divided data space generation step (initial division), the factor determination step, and the modeling step, the entire data space is divided into the divided data space 1, the divided data space 2, and the divided data space 3. The divided data space 1 includes six pieces, the divided data space 2 includes ten pieces, and the divided data space 3 includes five pieces of multivariate data.

　各分割データ空間ごとの多変量データの入出力関係を表すモデル１，２，３はニューラルネットワーク、重回帰式などで構成されている。前述のモデル精度評価手段によって、各分割データ空間に対応したモデル１、モデル２、モデル３に多変量データの中の個々のデータが入力されてモデル化誤差が計算される。ここで、モデル化誤差が大きい場合（別途設定される閾値よりも大きい場合）の多変量データが図３中の丸印（ｄ１〜ｄ７）であったとする。モデル Models 1, 2, 3 representing the input / output relationship of multivariate data for each divided data space are composed of neural networks, multiple regression equations, and the like. The model accuracy evaluation means described above inputs individual data in the multivariate data to the model 1, model 2, and model 3 corresponding to each divided data space, and calculates a modeling error. Here, it is assumed that the multivariate data when the modeling error is large (when it is larger than a separately set threshold) are circles (d1 to d7) in FIG.

　このような多変量データが存在することは、現時点での分割データ空間ごとのモデルではモデル化が適切にできないことを示している。そこで分割データ空間を補正する。補正する方法としては、（Ａ）元の分割データ空間から新規の分割データ空間を分離する方法、（Ｂ）元の各分割データ空間から分離した多変量データを全部あわせて１つの新規の分割データ空間として分離する方法が考えられる。多 The existence of such multivariate data indicates that modeling cannot be performed properly with the model for each divided data space at the present time. Therefore, the divided data space is corrected. As a method of correction, (A) a method of separating a new divided data space from an original divided data space, and (B) a multivariate data separated from each of the original divided data spaces is combined into one new divided data. A method of separating as a space is conceivable.

　（Ａ）の方法では、ｄ１、ｄ２を新規の分割データ空間１’に、ｄ３、ｄ４、ｄ５を新規の分割データ空間２’に、ｄ６、ｄ７を新規の分割データ空間３’に含むような分割データ空間を生成するというものである。この場合、ｄ１，ｄ２が除かれた元の分割データ空間１が、ｄ３，ｄ４，ｄ５が除かれた元の分割データ空間２が、ｄ６，ｄ７が除かれた元の分割データ空間３が、残存する。
　（Ｂ）の方法では、ｄ１〜ｄ７を一つとする新規の分割データ空間４を生成するというものである。この場合、ｄ１，ｄ２が除かれた元の分割データ空間１が、ｄ３，ｄ４，ｄ５が除かれた元の分割データ空間２が、ｄ６，ｄ７が除かれた元の分割データ空間３が、残存する。 In the method (A), d1 and d2 are included in the new divided data space 1 ', d3, d4, and d5 are included in the new divided data space 2', and d6 and d7 are included in the new divided data space 3 '. That is, a divided data space is generated. In this case, the original divided data space 1 from which d1 and d2 are removed, the original divided data space 2 from which d3, d4, and d5 are removed, the original divided data space 3 from which d6 and d7 are removed, Will remain.
In the method (B), a new divided data space 4 having d1 to d7 as one is generated. In this case, the original divided data space 1 from which d1 and d2 are removed, the original divided data space 2 from which d3, d4, and d5 are removed, the original divided data space 3 from which d6 and d7 are removed, Will remain.

　このような分割データ空間補正ステップＳ５は、最終的に分割データ空間が設定数になるまで分割され、その過程で得られた分割データ空間の中から統合モデルの全体精度が最も高いような分割空間データとそのモデルを選択する（なお，このような補正を、本明細書ではトップダウン補正と定義し、図４にて使用している）。 Such a divided data space correction step S5 is performed so that the divided data space is finally divided up to the set number, and the divided space having the highest overall accuracy of the integrated model is obtained from the divided data spaces obtained in the process. Data and its model are selected (note that such correction is defined as top-down correction in this specification and used in FIG. 4).

　また、請求項９に係るモデル構築システムの分割データ空間補正手段について説明する。第３形態の分割データ空間補正手段は、分割データ空間毎に決定されるモデルの入力因子に対して多変量データを入力して得られるモデル出力と出力因子に係る多変量データとの誤差が所定条件を満たすまで分割データ空間の融合・分割を行って分割データ空間の補正を行う手段であり、上記した第３形態の分割データ空間補正ステップに対応する機能を有する。 (5) A divided data space correcting means of the model construction system according to claim 9 will be described. The divided data space correcting means according to the third aspect is configured such that an error between a model output obtained by inputting multivariate data with respect to an input factor of a model determined for each divided data space and multivariate data relating to an output factor is predetermined. This is a means for correcting the divided data space by performing fusion / division of the divided data space until the condition is satisfied, and has a function corresponding to the divided data space correction step of the third embodiment described above.

　続いて、請求項４〜６，１０〜１２に係るモデル構築方法・システムの第４形態について説明する。第３形態は第１形態の分割データ空間生成ステップ・分割データ空間生成手段をより具体化した形態である。
　まず請求項４〜６のモデル構築方法について説明する。
　本形態では第１形態で説明した分割データ空間生成ステップＳ１（初期分割）について、クラスタリング手法によりデータ空間の分割して分割データ空間を生成するというものである。以下、分割データ空間はクラスタリング手法で用いられる呼称であるクラスタと呼んで説明を進める。
　本形態における分割データ空間生成ステップは、図４に示すように、さらに初期設定ステップ、予備クラスタリングステップ、クラスタリング用因子選択ステップ、本クラスタリングステップを行う。 Subsequently, a fourth embodiment of the model construction method / system according to claims 4 to 6, 10 to 12 will be described. The third embodiment is a more specific embodiment of the divided data space generation step and the divided data space generation means of the first embodiment.
First, a model construction method according to claims 4 to 6 will be described.
In the present embodiment, in the divided data space generation step S1 (initial division) described in the first embodiment, the data space is divided by a clustering method to generate a divided data space. Hereinafter, the divided data space will be referred to as a cluster, which is a name used in the clustering method, and will be described.
In the divided data space generation step in the present embodiment, as shown in FIG. 4, an initial setting step, a preliminary clustering step, a clustering factor selection step, and a main clustering step are further performed.

（１）初期設定ステップＳ１０
　初期設定ステップＳ１０は、詳しくはクラスタリング用変数選択実行有無、クラスタリング手法の設定、モデル化手法の設定、データ分割補正方法の設定、その他パラメータの設定を行うためのステップである。
　クラスタリング用変数選択実行有無とは、モデルを構築するために利用できる全ての因子の中から、モデル構築を行うユーザや開発者が、明示的にクラスタリングに用いる因子を決定するか否かについて選択するということである。 (1) Initial setting step S10
More specifically, the initial setting step S10 is a step for setting whether or not to execute clustering variable selection, setting a clustering method, setting a modeling method, setting a data division correction method, and setting other parameters.
Whether or not to select a variable for clustering means that the user or developer who builds the model selects whether or not to explicitly determine the factor to be used for clustering from all the factors that can be used to build the model. That's what it means.

　クラスタリング手法の設定は、例えば、階層的方法による手法か非階層的方法による手法かを選択する。
　モデル化手法の設定は、ニューラルネットワーク・重回帰式等のモデルを選択する。
　データ分割補正方法の設定は、上記した相関係数による方法か誤差精度による分析かを決定する。
　パラメータ設定では上記以外の必要なパラメータ（例えば、請求項３，９に係るモデル構築方法・システムなら分割データ空間の個数など）その他具体的な数値が入力される。 The setting of the clustering method selects, for example, a method using a hierarchical method or a method using a non-hierarchical method.
For the setting of the modeling method, a model such as a neural network or a multiple regression equation is selected.
The setting of the data division correction method determines whether the above-described method is based on the correlation coefficient or the analysis is based on error accuracy.
In the parameter setting, other necessary parameters other than the above (for example, the number of divided data spaces in the case of the model construction method and system according to claims 3 and 9) and other specific numerical values are input.

（２）予備クラスタリングステップＳ１１
　予備クラスタリングステップＳ１１は、使用する多変量データ全部を用いてクラスタリングを行って予備クラスタに分割するステップである。
　予備クラスタリングとは、次の（３）クラスタリング用因子選択ステップＳ１２で行うクラスタリング因子の選択処理で使用するクラスタの中心、クラスタ範囲を求めるための予備クラスタを得るために行う。 (2) Preliminary clustering step S11
The preliminary clustering step S11 is a step of performing clustering using all the multivariate data to be used and dividing the data into preliminary clusters.
The preliminary clustering is performed to obtain a preliminary cluster for obtaining the center of the cluster and the cluster range used in the following (3) clustering factor selection processing performed in the clustering factor selection step S12.

（３）クラスタリング用因子選択ステップＳ１２
　クラスタリング用因子選択ステップＳ１２は、予備クラスタ毎にクラスタリング用因子の重要度を算定し、クラスタリング用因子の選択を行うステップである。
　上記（２）のステップＳ１１の予備クラスタリングの結果得られたクラスタから、分離度を計算し、これを用いてクラスタリング用因子を選択する。 (3) Factor selection step for clustering S12
The clustering factor selection step S12 is a step of calculating the importance of the clustering factor for each preliminary cluster and selecting a clustering factor.
The degree of separation is calculated from the clusters obtained as a result of the preliminary clustering in step S11 of the above (2), and a clustering factor is selected using this.

　このクラスタリング用因子選択ステップＳ１２で計算される分離度は、さらに（ａ）クラスタ中心を利用する分離度、（ｂ）クラスタ範囲を利用する分離度がある。
　以下、両者について順次説明する。 The degree of separation calculated in the clustering factor selection step S12 includes (a) a degree of separation using a cluster center and (b) a degree of separation using a cluster range.
Hereinafter, both will be described sequentially.

（ａ）クラスタ中心を利用する分離度によるクラスタリング用因子の選択
　ここではクラスタの中心を利用する分離度について説明する。一般的にクラスタリングは特定の因子ではなく多次元ユークリッド距離に基づいて行われるため、クラスタリング結果は使用する因子やデータの分布状況によって異なったものになる。 (A) Selection of Clustering Factor Based on Degree of Separation Using Cluster Center The following describes the degree of separation using the center of cluster. In general, clustering is performed based on a multidimensional Euclidean distance instead of a specific factor, and thus the clustering result differs depending on the factor used and the distribution state of data.

　ここでは、クラスタ毎に各因子の分布状況を解析することでクラスタ分割に重要な因子を見つけることを考える。具体的には因子ごとにクラスタの中心値を用いた分離度を定義し、分離度が大きい因子をクラスタリング用因子として選択する（または分離度が小さい因子はクラスタリング用因子から除外する）。
　クラスタの中心を用いた分離度は、例えば次式の数２に示すように因子ごとに各クラスタの中心値のばらつきとして定義する。 Here, it is considered to find an important factor for cluster division by analyzing the distribution of each factor for each cluster. Specifically, a degree of separation using the center value of the cluster is defined for each factor, and a factor with a high degree of separation is selected as a factor for clustering (or a factor with a low degree of separation is excluded from the factors for clustering).
The degree of separation using the center of a cluster is defined as the variation of the center value of each cluster for each factor as shown in, for example, the following equation (2).

　具体例を挙げて説明する。図５はクラスタの中心を用いた分離度の説明図である。因子χ_ｉ、因子χ_ｊの２つの因子を用いてクラスタリングを行った結果、クラスタＣ１、Ｃ２、Ｃ３の３つのクラスタに分割されたものとする。ここで、因子χ_ｉと因子χ_ｊのどちらがクラスタリングに重要であるかを各クラスタの中心を用いて判断する。具体的には、因子χの中心ａｖｇの値の幅が大きいほどクラスタが分離しているといえる。図５の例では、因子χ_ｉのほうが因子χ_ｊよりも分離度が大きくクラスタリングに重要な因子であると判断できる。 A specific example will be described. FIG. 5 is an explanatory diagram of the degree of separation using the center of the cluster. It is assumed that as a result of performing clustering using two factors of the factor _{ｉ i} and the factor χ _j , the cluster is divided into three clusters C1, C2, and C3. Here, which of the factors _{ｉ i} and χ _j is important for clustering is determined using the center of each cluster. Specifically, it can be said that the larger the width of the value of the center avg of the factor 大きい, the more the cluster is separated. In the example of FIG. 5, it can be determined that the more factors chi _i is an important factor in the degree of separation greater clustering than factor chi _j.

（ｂ）クラスタ範囲を利用する分離度
　ここでは、クラスタ範囲を利用する分離度について説明する。
　クラスタ範囲を利用する分離度は、クラスタの重なり度合いを考慮する方法であり、分割されたクラスタの範囲の重なりが少ない因子は分離度が大きく、すなわちクラスタリングに重要な因子であると考える。 (B) Degree of Separation Using Cluster Range Here, the degree of separation using the cluster range will be described.
The degree of separation using the cluster range is a method in which the degree of cluster overlap is considered, and a factor in which the range of the divided clusters has little overlap is considered to be a large degree of separation, that is, a factor important for clustering.

　具体例を挙げて説明する。図６はクラスタの範囲を用いた分離度の説明図である。因子χ_ｉ、因子χ_ｊの２つの因子を用いてクラスタリングを行った結果、クラスタＣ１、Ｃ２、Ｃ３の３つのクラスタに分割されたとする。ここで、因子χ_ｉと因子χ_ｊのどちらがクラスタリングに重要であるかを各クラスタの範囲を用いて判断する。 A specific example will be described. FIG. 6 is an explanatory diagram of the degree of separation using the range of the cluster. It is assumed that clustering is performed using two factors of the factor χ _i and the factor _{ｊ j} , and as a result, the cluster is divided into three clusters C1, C2, and C3. Here, which of the factors _{ｉ i} and χ _j is important for clustering is determined using the range of each cluster.

　まず、クラスタＣ１について、クラスタＣ２、クラスタＣ３と重なっている範囲があるかを因子ごとに判断する。因子χ_ｉを例に説明する。まず、因子χ_ｉ全体の値の範囲、Ａ_ｉを求める。図の例では、因子χ_ｉ全体の値の範囲Ａ_ｉを、χ_ｉ ^ｍｉｎ（Ｃ１∪Ｃ２∪Ｃ３）とχ_ｉ ^ｍａｘ（Ｃ１∪Ｃ２∪Ｃ３）の間の範囲として表している。 First, it is determined whether or not the cluster C1 has a range overlapping with the clusters C2 and C3 for each factor. A description will be given using the factor _ｉi as an example. First, the entire range of values factor chi _i, obtaining the _{A i.} In the illustrated example, represent the range _{A i} of the total value factor chi _i, as a range between χ _i ^min (C1∪C2∪C3) and χ _i ^max (C1∪C2∪C3).

　次に、クラスタＣ１とクラスタＣ２、クラスタＣ１とクラスタＣ３、クラスタＣ２とクラスタＣ３について積集合を求める（全てのクラスタについて２つのクラスタの組み合わせを行う）。ここで得られた積集合の和集合を求め、その和集合における因子χ_ｉの最小値χ_ｉ ^ｍｉｎと最大値χ_ｉ ^ｍａｘを求め、重なり範囲Ｌ＝最大値−最小値を求める。図６の例では、クラスタＣ１とＣ３、クラスタＣ２とＣ３の積集合が無いため、クラスタＣ１とＣ２の積集合、すなわちＣ１とＣ２が重なっている範囲がＬとなる。因子χ_ｉにおけるクラスタの分離度は、例えば以下の数式で定義される。 Next, a product set is obtained for the clusters C1 and C2, the clusters C1 and C3, and the clusters C2 and C3 (the combination of the two clusters is performed for all clusters). The union of the product set obtained here is determined, the minimum value _{ｉ i} ^min and the maximum value _{ｉ i} ^max of the factor χ _i in the union are determined, and the overlapping range L = maximum value−minimum value is determined. In the example of FIG. 6, since there is no intersection of clusters C1 and C3 and clusters C2 and C3, the intersection of clusters C1 and C2, that is, the range where C1 and C2 overlap is L. The degree of separation of clusters in the factor _ｉi is defined, for example, by the following equation.

　因子χ_ｊについても同様の処理を行い、分離度ＤＬ_ｊを求める。このＤＬは、各因子の全体の値の範囲とクラスタの重なりが無い範囲の比率を示したものである。従って、ＤＬが小さい因子ほどクラスタの重なりが少ない、すなわちクラスタが明確に分離していることを表しており、クラスタリングに重要な因子であると判断できる。 Similar processing is performed for the factor χ _j to determine the degree of separation DL _j . This DL indicates the ratio of the range of the overall value of each factor to the range in which clusters do not overlap. Therefore, a factor with a smaller DL indicates that the clusters overlap less, that is, that the clusters are clearly separated, and can be determined to be an important factor for clustering.

（ｃ）クラスタリング用因子の選択
　クラスタリング用因子の選択は、上記（ａ）、（ｂ）のいずれかによって求められた分離度を用いて行う。具体的には、分離度が予め設定されている閾値よりも小さい因子はクラスタリングに有効な因子とはいえないため、クラスタリング用因子から除外する。この結果、残った因子をクラスタリング用因子とし、次ステップの本クラスタリングで使用する。このクラスタリング因子の選択を行うことで、より適切なクラスタリングを行うことができる。 (C) Selection of Clustering Factor The selection of the clustering factor is performed using the degree of separation obtained by any of the above (a) and (b). Specifically, a factor whose degree of separation is smaller than a preset threshold is not a factor effective for clustering, and is therefore excluded from the clustering factors. As a result, the remaining factors are used as clustering factors, and used in the next step of the main clustering. By selecting this clustering factor, more appropriate clustering can be performed.

（４）本クラスタリングステップＳ１３
　本クラスタリングステップＳ１３は、クラスタリング用因子選択ステップＳ１２によって選択されたクラスタリング用因子に係る多変量データを用いてクラスタリングを行ってクラスタに分割するステップである。
　本クラスタリングにより、複数因子の中の一部または全ての因子により表される多次元のデータ空間が、複数のクラスタ（データ空間）に分割される。 (4) Main clustering step S13
The clustering step S13 is a step of performing clustering using the multivariate data related to the clustering factor selected in the clustering factor selection step S12 to divide the cluster into clusters.
By this clustering, a multidimensional data space represented by some or all of the factors is divided into a plurality of clusters (data spaces).

　以下、図４のステップＳ１４，Ｓ１５，Ｓ１６，Ｓ１７は、それぞれ図１のステップＳ２，Ｓ３，Ｓ４，Ｓ５に相当するものであり、同様の手法を採用して精度の高いモデルを構築することとなる。
　なお、このようにして構築したモデルを、予測、診断、再学習などに用いる場合は、その入力データが所属するクラスタを判別し（クラスタ中心との距離関数にて判別）、当該クラスタのモデルに対して各操作（予測、診断、再学習など）を行う。 Hereinafter, steps S14, S15, S16, and S17 in FIG. 4 correspond to steps S2, S3, S4, and S5 in FIG. 1, respectively. A similar method is used to construct a highly accurate model. Become.
When the model constructed in this way is used for prediction, diagnosis, relearning, or the like, the cluster to which the input data belongs is determined (determined by a distance function from the center of the cluster), and the model of the cluster is determined. Each operation (prediction, diagnosis, re-learning, etc.) is performed for this.

　また、請求項１０のモデル構築システムに係る分割データ空間生成手段について説明する。第４形態の分割データ空間補正手段は、さらに初期設定手段、予備クラスタリング手段、クラスタリング用因子選択手段、本クラスタリング手段として機能するものであり、初期設定手段は初期設定ステップＳ１０と対応する機能を実現し、予備クラスタリング手段は予備クラスタリングステップＳ１１と対応する機能を実現し、クラスタリング用因子選択手段はクラスタリング用因子選択ステップＳ１２と対応する機能を実現し、本クラスタリング手段は本クラスタリングステップＳ１３と対応する機能を実現する。
　そして、図１２で示す因子決定手段８（ステップＳ２に対応），モデル化手段９（ステップＳ３に対応），モデル精度評価手段１０（ステップＳ４に対応），分割データ空間補正手段１１（ステップＳ５に対応）を順次機能させて、同様に精度の高い予測モデル６を構築することとなる。
　さらにまた、請求項１１，１２に係るモデル構築システムのクラスタリング用因子選択手段は、先に説明したクラスタリング用因子選択ステップＳ１２のように、先に説明した（ａ）クラスタ中心を利用する分離度、（ｂ）クラスタ範囲を利用する分離度を用いて選択するものである。 Further, a divided data space generating means according to the model construction system of claim 10 will be described. The divided data space correction means of the fourth embodiment further functions as an initial setting means, a preliminary clustering means, a clustering factor selecting means, and a main clustering means, and the initial setting means realizes a function corresponding to the initial setting step S10. The preliminary clustering means implements a function corresponding to the preliminary clustering step S11, the clustering factor selecting means implements a function corresponding to the clustering factor selecting step S12, and the present clustering means implements a function corresponding to the main clustering step S13. To achieve.
Then, the factor determining means 8 (corresponding to step S2), the modeling means 9 (corresponding to step S3), the model accuracy evaluating means 10 (corresponding to step S4), and the divided data space correcting means 11 (corresponding to step S5) shown in FIG. ) Function sequentially to construct a highly accurate prediction model 6.
Still further, the clustering factor selecting means of the model construction system according to claims 11 and 12 includes: (a) the degree of separation using the cluster center described above, as in the clustering factor selecting step S12 described above; (B) Selection is performed using the degree of separation using the cluster range.

　なお、このようにして構築した予測モデル６を、予測、診断、再学習などに用いる場合は、その入力データが所属するクラスタを判別し（クラスタ中心との距離関数にて判別）、当該クラスタのモデルに対して各操作（予測、診断、再学習など）を行う。 When the prediction model 6 constructed in this way is used for prediction, diagnosis, relearning, and the like, the cluster to which the input data belongs is determined (determined by a distance function from the cluster center), and the cluster of the cluster is determined. Perform each operation (prediction, diagnosis, relearning, etc.) on the model.

　次に本発明の第４形態をさらに具体化した実施例として、電力需要予測への適用例を用いて説明する。
　電力需要予測とは、電力会社が管轄する電力系統，特定の地域や需要家などにおける電力の消費量を予測するものである。予測対象の電力需要としては，翌日や当日の１日の日量や、最大電力、毎時電力などがある（本実施例では当日の最大電力を予測するものとする）。予測モデルとしては，重回帰式やニューラルネットワーク手法などが用いられる。当日の最高気温、最低気温、平日か土曜日か休日かを表す情報，前日の電力実績などを入力データとして、当日の電力需要を予測する。 Next, an example in which the fourth embodiment of the present invention is further embodied will be described using an application example to power demand prediction.
The power demand prediction is for predicting power consumption in a power system controlled by a power company, a specific area, a customer, and the like. The power demand to be predicted includes the daily amount of the next day or the day, the maximum power, the hourly power, and the like (in the present embodiment, the maximum power of the day is predicted). As a prediction model, a multiple regression equation, a neural network method, or the like is used. The power demand of the current day is predicted using the maximum temperature, the minimum temperature of the day, information indicating whether it is a weekday, a Saturday, or a holiday, and the power results of the previous day as input data.

　一般的に、電力需要の特性は季節によって異なっている。すなわち、夏期は気温の上昇に伴って電力需要が増加（冷房需要）し、一方、冬期は気温の低下に従って電力需要が増加（暖房需要）する。また、春期、秋期は中間期といわれ、冷房需要・暖房需要が少ない期間であり、気温が変化しても電力需要はあまり変化しない。また、平日と土曜日、休日によっても電力需要は大きく異なる。一般的には平日の電力需要は大きく、土曜日はやや少なく、休日は少なくなる傾向がある。 Generally, the characteristics of power demand vary from season to season. That is, in summer, the power demand increases as the temperature rises (cooling demand), while in winter, the power demand increases as the temperature drops (heating demand). Also, the spring and autumn seasons are referred to as the interim periods, in which cooling and heating demands are small, and even if the temperature changes, the power demand does not change much. Electricity demand also differs greatly depending on weekdays, Saturdays and holidays. In general, power demand on weekdays is large, tending to be slightly less on Saturdays, and less on holidays.

　このような特性を持った電力需要を予測するための予測モデルとしては、１年を通して単一の予測モデルを用いる場合では、特性が全く異なる電力需要を一つのモデルで予測することになり、精度良い予測は困難である。従って、一般的には季節ごとに予測モデルを構築して、複数の予測モデル構成が用いられることが多い。しかし、季節の定義をどのように決めるか、すなわち予測モデル構成をどのように決めるかという問題があり、最適な予測モデル構成を得るのは困難である。 As a prediction model for predicting power demand having such characteristics, when a single prediction model is used throughout the year, power demand having completely different characteristics is predicted by one model, and accuracy is reduced. Good predictions are difficult. Therefore, generally, a prediction model is constructed for each season, and a plurality of prediction model configurations are often used. However, there is a problem of how to determine the definition of the season, that is, how to determine the prediction model configuration, and it is difficult to obtain an optimal prediction model configuration.

　以下、電力需要予測を例とした本発明の実施例について図４を参照しつつ説明する。
（１）初期設定ステップＳ１０
　初期設定手段では、例えば、以下のような内容を設定する。
　予備クラスタリング手法：最長距離法
　本クラスタリング手段　：ｗａｒｄ法
　分割するクラスタ数　　：４
　クラスタリング用因子選択に用いる分離度の閾値：０．１，０．０５
　分割データ空間補正ステップ：ボトムアップ補正 Hereinafter, an embodiment of the present invention using power demand prediction as an example will be described with reference to FIG.
(1) Initial setting step S10
In the initial setting means, for example, the following contents are set.
Preliminary clustering method: longest distance method This clustering means: ward method Number of clusters to be divided: 4
Separation threshold used for selecting clustering factors: 0.1, 0.05
Divided data space correction step: bottom-up correction

（２）予備クラスタリングステップＳ１１
　予備クラスタリングとして、階層的クラスタリング手法の一つである最長距離法を適用し４つのクラスタに分割した。階層的クラスタリング手法は、最終的に１つのクラスタになるまでクラスタの融合を続けていくので、クラスタが４つになった段階でクラスタリングを終了させた。クラスタリングに使用した因子を表２に示す。因子は最大電力、最高気温、最低気温、土曜フラグ、休日フラグの５つの因子を使用するケース１と、最大電力、最高気温、最低気温の３つの因子を使用するケース２の２通りについて実施した。
　※土曜フラグ、休日フラグは、１か０の２値データであり、クラスタリング結果に非常に強く影響することが予想される。このため、土曜フラグ、休日フラグを用いるケースと用いないケースの２ケースについて実施した。 (2) Preliminary clustering step S11
As preliminary clustering, the longest distance method, which is one of the hierarchical clustering methods, was applied, and divided into four clusters. In the hierarchical clustering method, cluster fusion is continued until one cluster is finally obtained. Therefore, the clustering is terminated when the number of clusters becomes four. Table 2 shows the factors used for clustering. Factors were implemented in two cases, Case 1 using five factors of maximum power, maximum temperature, minimum temperature, Saturday flag, and holiday flag, and Case 2 using three factors of maximum power, maximum temperature, and minimum temperature. .
* The Saturday flag and the holiday flag are binary data of 1 or 0, and are expected to have a very strong influence on the clustering result. For this reason, two cases were carried out, one using the Saturday flag and the other using the holiday flag.

　各ケースにおける予備クラスタリング結果を図７〜図８にそれぞれ示す。図７〜図８は予備クラスタリング結果を示すための最高気温と最大電力の相関図である。予備 The preliminary clustering results in each case are shown in FIGS. 7 and 8 are correlation diagrams of the maximum temperature and the maximum power to show the result of the preliminary clustering.

　（３）クラスタリング因子選択ステップＳ１２(請求項５の分離度)
　クラスタリング因子の選択として、クラスタの中心を利用する分離度について計算した具体例を、図を参照しつつ説明する。図９，図１０は因子毎のクラスタの中心を説明する説明図であって、図中の（ａ）は棒グラフ、図中（ｂ）は数値を示す表を表している。
　図９（ａ）に、ケース１における各クラスタについて、各クラスタに含まれる各因子の中心の値を棒グラフで示す。土曜日フラグのクラスタ１（Ｃ１）とクラスタ２（Ｃ２）のグラフが表示されていないが、これは中心の値が０であるからである。 (3) Clustering factor selection step S12 (separation degree of claim 5)
A specific example in which the degree of separation using the center of a cluster is calculated as a selection of a clustering factor will be described with reference to the drawings. 9 and 10 are explanatory diagrams for explaining the centers of the clusters for each factor. FIG. 9A shows a bar graph, and FIG. 9B shows a table showing numerical values.
FIG. 9A shows, for each cluster in Case 1, the center value of each factor included in each cluster in a bar graph. The graph of the cluster 1 (C1) and the cluster 2 (C2) of the Saturday flag is not displayed because the center value is 0.

　図９（ｂ）に、ケース１における各クラスタについて、各クラスタに含まれる各因子の中心の値と分離度(クラスタ中心の分散)を示す。ここで分離度(クラスタ中心の分散)は、前述した数２を用いて計算している。 FIG. 9B shows the center value and the degree of separation (variance of the center of the cluster) of each factor included in each cluster for each cluster in Case 1. Here, the degree of separation (variance at the center of the cluster) is calculated using the above-described equation (2).

　クラスタリング結果である図７を見ると、どのクラスタも最高気温は０に近い範囲(最高気温の最小値)から１に近い範囲(最高気温の最大値)にまで分布していることがわかる。すなわち、最高気温はこのクラスタリング結果にとって重要でないことを示している。 From FIG. 7 showing the clustering result, it can be seen that the maximum temperatures of all clusters are distributed from a range close to 0 (minimum value of maximum temperature) to a range close to 1 (maximum value of maximum temperature). That is, it indicates that the maximum temperature is not important for the clustering result.

　一方、図９（ｂ）に示した本発明の分離度を見ると、最高気温と分離度が０．００１と非常に小さい値となっており（最低気温の分離度も０．００２と非常に小さい）、他の因子に比べてクラスタリングに重要でない因子であると判断できる。また，土曜フラグについてみてみると、クラスタ１とクラスタ２の中心は０であり、クラスタ３とクラスタ４の中心は１である。 On the other hand, looking at the degree of separation of the present invention shown in FIG. 9B, the maximum temperature and the degree of separation are very small values of 0.001 (the degree of separation of the minimum temperature is also very small as 0.002). Small), and can be determined to be less important for clustering than other factors. Looking at the Saturday flag, the center of cluster 1 and cluster 2 is 0, and the center of cluster 3 and cluster 4 is 1.

　土曜日フラグの値は０と１の２値をとるため、クラスタ１とクラスタ２は土曜フラグが０の日（すなわち土曜でない日、平日か休日）を表しており、クラスタ３とクラスタ４は土曜日を表していることが分かる。すなわち、土曜フラグ(休日フラグも同様)はこのクラスタリングにとって重要な因子であるといえる。 Since the value of the Saturday flag takes two values of 0 and 1, cluster 1 and cluster 2 represent days when the Saturday flag is 0 (that is, non-Saturday, weekday or holiday), and cluster 3 and cluster 4 represent Saturday. It can be seen that it represents. That is, it can be said that the Saturday flag (similar to the holiday flag) is an important factor for this clustering.

　図９（ｂ）に示した分離度を見ると０．３３３と各因子の中で一番大きい値となっており、分離度がクラスタリングに重要な因子を見つける方法として有効であるといえる。分離 Looking at the degree of separation shown in FIG. 9B, it is 0.333, which is the largest value among the factors, and it can be said that the degree of separation is effective as a method for finding a factor important for clustering.

　ケース２についても見てみると、図８から最高気温の値の大きさによってクラスタリグされていると判断できる。一方、図１０（ｂ）の分離度を見ると、０．０８５と他の因子と比べて大きな値となっており、図８から判断される結果と一致している。 Looking at Case 2, it can be determined from FIG. 8 that cluster rigging is performed according to the value of the maximum temperature. On the other hand, looking at the degree of separation in FIG. 10B, the value is 0.085, which is a large value as compared with other factors, and is consistent with the result determined from FIG.

　クラスタリング用因子の選択例について具体例を以下に示す。
ケース１の場合では、分離度０．１を閾値として、０．１未満の場合はクラスタリング因子から除外した。すなわち、最高気温、最低気温、最大電力の３因子が除外され、土曜フラグ、休日フラグだけがクラスタリング因子として残る。 Specific examples of the selection of the clustering factors will be described below.
In the case of Case 1, the separation degree of 0.1 was set as a threshold, and if it was less than 0.1, it was excluded from the clustering factors. That is, the three factors of the maximum temperature, the minimum temperature, and the maximum power are excluded, and only the Saturday flag and the holiday flag remain as clustering factors.

　同様に、ケース２の場合では、分離度の閾値を０．０５としてクラスタリング因子を選択した結果、最高気温と最低気温の２つの因子がクラスタリング因子として残った。本実施例では、ケース１、ケース２の２種類の予備クラスタリングを行ったので、それぞれで得られるクラスタリングから総合的に判断して、土曜フラグ、休日フラグ、最高気温、最低気温を４種類の因子をクラスタリング因子とした。 Similarly, in case 2, as a result of selecting a clustering factor with the threshold of the degree of separation set to 0.05, two factors, the highest temperature and the lowest temperature, remained as clustering factors. In the present embodiment, two types of preliminary clustering, Case 1 and Case 2, were performed, so that comprehensive judgment was made from the clustering obtained in each case, and the Saturday flag, holiday flag, maximum temperature, and minimum temperature were determined by four factors. Is a clustering factor.

　（４）本クラスタリングステップＳ１３
　クラスタリング用因子選択ステップＳ１２によって決定したクラスタリング用因子(最高気温、最低気温、休日フラグ、土曜フラグの４因子)を用いて、階層的クラスタリング手法の一つであるward法により本クラスタリングを実施した。 (4) Main clustering step S13
Using the clustering factors (four factors of the maximum temperature, the minimum temperature, the holiday flag, and the Saturday flag) determined in the clustering factor selection step S12, this clustering was performed by the ward method, which is one of the hierarchical clustering methods.

　図１４〜図１８にクラスタリング結果を示す。図の横軸は最高気温、縦軸は最大電力である。図１４は、クラスタリングを行う前の全体のデータを表した図である。図１５はクラスタ１（Ｃ１）、図１６はクラスタ２（Ｃ２）、図１７はクラスタ３（Ｃ３）、図１８はクラスタ４（Ｃ４）を表す図である。 FIGS. 14 to 18 show clustering results. The horizontal axis in the figure is the maximum temperature, and the vertical axis is the maximum power. FIG. 14 is a diagram showing the entire data before clustering is performed. 15 is a diagram showing cluster 1 (C1), FIG. 16 is a diagram showing cluster 2 (C2), FIG. 17 is a diagram showing cluster 3 (C3), and FIG. 18 is a diagram showing cluster 4 (C4).

　（５）因子決定ステップＳ１４
　予測モデルで用いる因子の決定については、本実施例では本クラスタリングで使用したクラスタリング用因子(最高気温、最低気温、休日フラグ、土曜フラグの４因子)をそのまま用いた。 (5) Factor determination step S14
Regarding the determination of the factors used in the prediction model, in this embodiment, the clustering factors (the four factors of the maximum temperature, the minimum temperature, the holiday flag, and the Saturday flag) used in this clustering are used as they are.

　（６）モデル化ステップＳ１５
　　モデル化手段としては、３階層型のニューラルネットワークを用いた。すなわち、各クラスタごとに、最高気温、最低気温、休日フラグ、土曜フラグを入力として、最大電力を出力とするニューラルネットワークを構築した。 (6) Modeling step S15
As a modeling means, a three-layer neural network was used. That is, for each cluster, a neural network was constructed in which the maximum temperature, the minimum temperature, the holiday flag, and the Saturday flag were input and the maximum power was output.

　（７）分割データ空間補正ステップＳ１７
　本クラスタリング手段によって得られた４分割を初期状態として、ボトムアップ補正による分割データ空間の補正を行っていく。データ空間の補正を行っていく様子を図１３に示す。
　まず、４分割されたクラスタの２つのクラスタを融合して３分割となる組み合わせを生成する。本実施例では，図１３に示した５通りの３分割となる組み合わせが生成された。ここで、図１３には、図の右側の欄に相関係数(データの分布を考慮して最高気温と最大電力の2次式の決定係数とした)を表記しているが、この相関係数が最も大きい組み合わせを３分割における最良ケースとして選択した。すなわち、相関係数０．６１０で｛Ｃ１｝，｛Ｃ２｝，｛Ｃ３　Ｃ４｝の分割状態が３分割における最良ケースとなる。 (7) Divided data space correction step S17
With the four divisions obtained by the clustering means as an initial state, the division data space is corrected by bottom-up correction. FIG. 13 shows how the data space is corrected.
First, a combination that is divided into three is generated by fusing two clusters of the four divided clusters. In the present embodiment, five combinations of three divisions shown in FIG. 13 are generated. Here, in FIG. 13, the correlation coefficient (determined as a quadratic determination coefficient of the maximum temperature and the maximum power in consideration of the data distribution) is shown in the right column of the figure. The combination with the largest number was selected as the best case in the three divisions. That is, the division state of {C1}, {C2}, and {C3 C4} with a correlation coefficient of 0.610 is the best case in three divisions.

　次に、同様に、３分割を２分割にするような組み合わせを生成し、相関係数を評価する。その結果、図１３に示す通り、３通りの２分割となる組み合わせが生成された。この中で、相関係数がもっとも大きい組み合わせとしては、相関係数０．６３５で｛Ｃ１｝，｛Ｃ２　Ｃ３　Ｃ４｝が得られた。この組み合わせが、全組み合わせの中で相関係数が最大となる組み合わせであり、ボトムアップ補正によって得られる最終的なデータ分割状態となる。 (4) Next, similarly, a combination for converting the three divisions into the two divisions is generated, and the correlation coefficient is evaluated. As a result, as shown in FIG. 13, three combinations of two divisions were generated. Among them, {C1} and {C2} C3 {C4} were obtained with the correlation coefficient of 0.635 as the combination having the largest correlation coefficient. This combination is the combination having the largest correlation coefficient among all the combinations, and is the final data division state obtained by bottom-up correction.

　図１９、図２０は最も相関が高い分割｛Ｃ１｝｛Ｃ２　Ｃ３　Ｃ４｝における相関図を示す。最後に、｛Ｃ１｝と｛Ｃ２　Ｃ３　Ｃ４｝とでそれぞれニューラルネットワークモデルの学習を行なった結果の学習誤差を図２１に示す。比較のために従来手法としてデータ分割を行わない場合と、本発明の場合と、を併記した。図の横軸は、日を表しており、全部で１年分のデータが図示されている。縦軸は、実績値と予測値の誤差[%]である。それによると、横軸が１〜61の範囲、横軸が１２１の付近、横軸が３３１〜３６１の範囲で本発明の効果が顕著に現れていることがわかる。なお、本実施例では、従来手法の予測誤差と，本発明の予測誤差では、本発明の予測誤差は従来手法に比べて約８％向上した。 FIGS. 19 and 20 show correlation diagrams in the division {C1 {C2} C3 {C4} having the highest correlation. Finally, FIG. 21 shows learning errors as a result of learning the neural network model with {C1} and {C2} C3 {C4}. For comparison, a case where data division is not performed as a conventional method and a case of the present invention are both described. The horizontal axis of the figure represents days, and data for one year in total is shown. The vertical axis represents the error [%] between the actual value and the predicted value. According to this, it can be seen that the effect of the present invention is remarkably exhibited when the horizontal axis is in the range of 1 to 61, the horizontal axis is in the vicinity of 121, and the horizontal axis is in the range of 331 to 361. In the present embodiment, the prediction error of the present invention is about 8% higher than that of the conventional method between the prediction error of the conventional method and the prediction error of the present invention.

　以上予測モデル構築方法および予測モデル構築システムについて説明した。
　本発明の第一の効果としては、データ空間を分割して分割データ空間を生成し、モデルへの入力因子を決定し、分割データ空間内の多変量データを用いてモデル化を行い、モデル全体の精度評価が満足するまで、分割データ空間の補正およびその分割データ空間におけるモデル再構築を繰り返すことによってモデル化の作業の自動化を図ることが可能となる。 The prediction model construction method and the prediction model construction system have been described above.
As a first effect of the present invention, a data space is divided to generate a divided data space, input factors to the model are determined, and modeling is performed using multivariate data in the divided data space, and the entire model is Until the accuracy evaluation is satisfied, it is possible to automate the modeling operation by repeating the correction of the divided data space and the reconstruction of the model in the divided data space.

　また、ここで示した処理フローをコンピュータ上で完全自動化せず、各フェーズごとに対話的なユーザインタフェースを用いることにより、モデル化作業を行う人に対して、その支援を行うことも可能となる。この効果によって、モデル化作業にかかる時間を短縮することが可能である。 In addition, it is possible to provide assistance to a person who performs modeling work by using an interactive user interface for each phase without completely automating the processing flow shown here on a computer. . With this effect, the time required for the modeling operation can be reduced.

　本発明の第二の効果としては、従来はモデル化作業をある程度の専門知識を有した当該分野の熟練者が行う必要があったが、本発明により当該分野の非熟練者でもモデル化することができる。 As a second effect of the present invention, conventionally, it was necessary to perform modeling work by a skilled person having a certain degree of expertise in the field, but by the present invention, it is possible to perform modeling even by an unskilled person in the field. Can be.

　本発明の第三の効果としては、定期的にまたは任意のタイミングで上記のモデル構築を行うことで、モデル化対象の傾向の変化や動向の変化があった場合に適応的にモデル再構築を行っていくことが可能である。 As a third effect of the present invention, by performing the above-described model construction periodically or at an arbitrary timing, the model is reconstructed adaptively when there is a change in the trend of the modeling target or a change in the trend. It is possible to go.

　適応的にモデルを調整していく方法は、例えば、学習機能を有するニューラルネットワーク応用システムや任意のタイミングで重回帰分析を実行する重回帰式やカルマンフィルタ応用システムなどで従来より実現されている。
　しかし、このような方法では利用しているモデルで用いるパラメータのみを適応的に調整しているだけであり、モデル構成そのものに対して適応的な調整を行っているものではない。 The method of adaptively adjusting a model has been conventionally realized by, for example, a neural network application system having a learning function, a multiple regression equation for executing multiple regression analysis at an arbitrary timing, or a Kalman filter application system.
However, in such a method, only the parameters used in the model being used are adjusted adaptively, and the model configuration itself is not adaptively adjusted.

　具体例をあげれば、春夏秋冬の季節ごとにニューラルネットワークを用いて需要予測を行うようなシステムでは、システムの設計時に設計者が春夏秋冬の季節区分を行い、それぞれの季節ごとにニューラルネットワークモデルを構築する。ニューラルネットワークの学習機能を実装することにより、春季用予測モデル、夏季用予測モデル、秋季用予測モデル、冬季用予測モデルのニューラルネットワークの重み係数を調整することはできるが、季節区分の調整、例えば、春季の終わりを短くし、夏季の開始を早くするといった調整は不可能である。
　本発明では、モデル区分を自動的に決定し、モデルのパラメータだけでなく、モデル区分そのものまで適応的に調整することが可能であるため、適用対象の環境変化に対する柔軟性が高いことは明らかである。 As a specific example, in a system that uses a neural network to forecast demand for each season of spring, summer, autumn and winter, the designer performs the seasonal division of spring, summer, autumn and winter when designing the system, and the neural network is used for each season. Build the model. By implementing the learning function of the neural network, it is possible to adjust the neural network weighting coefficients of the prediction model for spring, the prediction model for summer, the prediction model for autumn, and the prediction model for winter. Adjustments such as shortening the end of spring and early start of summer are not possible.
In the present invention, it is obvious that the flexibility of the environment to be applied is high because it is possible to automatically determine the model section and adaptively adjust not only the model parameters but also the model section itself. is there.

　本発明の第四の効果としては、クラスタリングによる分割データ空間の生成を行う場合に、クラスタリングに用いるべき因子の選択を自動的に行う、または因子の選択のための判断となる指標を与えることである。一般的なクラスタリング手法では、データ間のユークリッド距離に基づいた類似度、または非類似度を用いてクラスタが生成される。従って、クラスタリングに使用する因子によってクラスタリング結果は大きく異なってしまう。 As a fourth effect of the present invention, when generating a divided data space by clustering, by automatically selecting a factor to be used for clustering, or by giving an index which is a judgment for selecting a factor. is there. In a general clustering method, a cluster is generated using a similarity or a dissimilarity based on the Euclidean distance between data. Therefore, the result of clustering greatly differs depending on the factors used for clustering.

　また、クラスタリングに有用でない因子を用いると、この因子も含めた類似度が計算されてしまうため、本来あるべきクラスタリング結果と異なるものとなってしまう。しかしながら、本発明を用いることで、クラスタリングに悪影響を及ぼす因子を事前に除外することが可能となり、適切なデータ分割が行えるようになることから、最終的に構築されるモデルの精度も向上することが期待できる。 Also, if a factor that is not useful for clustering is used, the similarity including this factor will be calculated, which will be different from the original clustering result. However, by using the present invention, it is possible to eliminate factors that have an adverse effect on clustering in advance, and it is possible to perform appropriate data division, thereby improving the accuracy of a model finally constructed. Can be expected.

モデル構築方法の流れを説明するフローチャートである。It is a flowchart explaining the flow of a model construction method. データ空間の分割を説明する説明図である。FIG. 3 is an explanatory diagram illustrating division of a data space. 分割データ空間の補正を説明する説明図である。FIG. 9 is an explanatory diagram illustrating correction of a divided data space. モデル構築方法の流れを説明するフローチャートである。It is a flowchart explaining the flow of a model construction method. クラスタの中心を用いた分離度の説明図である。It is explanatory drawing of the degree of isolation using the center of a cluster. クラスタの範囲を用いた分離度の説明図である。It is explanatory drawing of the degree of separation using the range of a cluster. クラスタリング結果を示すための最高気温と最大電力の相関図である。It is a correlation diagram of the maximum temperature and the maximum electric power for showing the clustering result. クラスタリング結果を示すための最高気温と最大電力の相関図である。It is a correlation diagram of the maximum temperature and the maximum electric power for showing the clustering result. 因子毎のクラスタの中心を説明する説明図であって、図９（ａ）は棒グラフ、図９（ｂ）は数値を示す表である。It is explanatory drawing explaining the center of the cluster for every factor, FIG.9 (a) is a bar graph, FIG.9 (b) is a table | surface which shows a numerical value. 因子毎のクラスタの中心を説明する説明図であって、図１０（ａ）は棒グラフ、図１０（ｂ）は数値を示す表である。It is explanatory drawing explaining the center of the cluster for every factor, FIG.10 (a) is a bar graph, FIG.10 (b) is a table | surface which shows a numerical value. 多層ニューラルネットワークを説明する概念図である。It is a conceptual diagram explaining a multilayer neural network. 予測モデル構築システムのシステム構成図である。It is a system configuration diagram of a prediction model construction system. 分割数・クラスタ分割・相関係数の関係を説明する説明図である。FIG. 9 is an explanatory diagram illustrating a relationship among the number of divisions, cluster division, and correlation coefficient. 最高気温−最大電力の関係におけるクラスタリング結果を示す特性図である。FIG. 9 is a characteristic diagram illustrating a clustering result in a relationship between maximum temperature and maximum power. 最高気温−最大電力の関係におけるクラスタリング結果を示す特性図である。FIG. 9 is a characteristic diagram illustrating a clustering result in a relationship between maximum temperature and maximum power. 最高気温−最大電力の関係におけるクラスタリング結果を示す特性図である。FIG. 9 is a characteristic diagram illustrating a clustering result in a relationship between maximum temperature and maximum power. 最高気温−最大電力の関係におけるクラスタリング結果を示す特性図である。FIG. 9 is a characteristic diagram illustrating a clustering result in a relationship between maximum temperature and maximum power. 最高気温−最大電力の関係におけるクラスタリング結果を示す特性図である。FIG. 9 is a characteristic diagram illustrating a clustering result in a relationship between maximum temperature and maximum power. 最も相関が高い分割｛Ｃ１｝｛Ｃ２　Ｃ３　Ｃ４｝における相関図を示す図である。It is a figure showing the correlation figure in division {C1 {C2} C3 {C4} with the highest correlation. 最も相関が高い分割｛Ｃ１｝｛Ｃ２　Ｃ３　Ｃ４｝における相関図を示す図である。It is a figure showing the correlation figure in division {C1 {C2} C3 {C4} with the highest correlation. ｛Ｃ１｝と｛Ｃ２　Ｃ３　Ｃ４｝でそれぞれニューラルネットワークモデルの学習を行なった結果の学習誤差を示す図である。It is a figure which shows the learning error of the result of having learned the neural network model by {C1} and {C2} C3 {C4}.

Explanation of reference numerals

１：データ入力画面
２：データ入力手段
３：データ保存手段
４：データベース
５：データ出力手段（表示装置）
６：予測モデル
７：分割データ空間生成手段
８：因子決定手段
９：モデル化手段
１０：モデル精度評価手段
１１：分割データ空間補正手段
１００：ネットワーク 1: Data input screen 2: Data input means 3: Data storage means 4: Database 5: Data output means (display device)
6: prediction model 7: divided data space generation means 8: factor determination means 9: modeling means 10: model accuracy evaluation means 11: divided data space correction means 100: network

Claims

In a model construction method of analyzing multivariate data of multiple factors, finding an input factor having an input / output relationship with a certain output factor from among the multiple factors, and modeling the input / output relationship,
A divided data space generation step of dividing a multidimensional data space represented by some or all of the factors to generate a divided data space;
A factor determining step of determining an input factor and an output factor of the model for each divided data space using the factors included in the divided data space,
A modeling step of modeling the input / output relationship for each divided data space using multivariate data included in the divided data space and corresponding to the input factor and the output factor,
A model accuracy evaluation step of evaluating the accuracy of an integrated model obtained by integrating a plurality of models constructed for each divided data space;
A divided data space correction step of correcting the divided data space so that the evaluation of the accuracy of the integrated model satisfies a predetermined condition;
Do
A model construction method characterized by sequentially performing a divided data space correction step, a factor determination step, a modeling step, and a model precision evaluation step until an evaluation of the accuracy of an integrated model satisfies a predetermined condition, thereby constructing a model.

In the model construction method according to claim 1,
The divided data space correction step corrects the divided data space by performing fusion / division of the divided data space in accordance with the presence or absence of a correlation using a correlation coefficient that is an index indicating the correlation between input and output of the model.
A model construction method characterized by the following.

In the model construction method according to claim 1,
The divided data space correction step includes a step in which an error between a model output obtained by inputting multivariate data with respect to an input factor of a model determined for each divided data space and the multivariate data related to the output factor is a predetermined value. Divide the data above the threshold into a new data space and correct the divided data space,
A model construction method characterized by the following.

In the model construction method according to any one of claims 1 to 3,
The divided data space generation step employs a clustering method of generating a cluster as a divided data space,
A preliminary clustering step of performing clustering using all of the multivariate data to be used and dividing into preliminary clusters;
A clustering factor selection step of calculating the importance of the clustering factor for each preliminary cluster and selecting a clustering factor;
A clustering step of performing clustering using the multivariate data related to the clustering factor selected by the clustering factor selection step to divide the cluster into clusters;
Do
A model construction method characterized by dividing a multidimensional data space represented by some or all of a plurality of factors to generate a divided data space in cluster units.

The model building method according to claim 4,
The clustering factor selection step calculates a value representing the center of each factor for each of the preliminary clusters generated by the preliminary clustering, calculates the importance of the clustering factor based on the degree of variation in the value, and calculates the clustering factor. A model construction method characterized by selecting.

The model building method according to claim 4,
The clustering factor selection step calculates the importance of the clustering factor according to the degree of overlap of the range from the minimum value to the maximum value of each factor for each preliminary cluster generated by preliminary clustering, and selects the clustering factor. A model construction method characterized by the following.

In a model construction system such as a computer that analyzes multivariate data of multiple factors, finds input factors having an input / output relationship with a certain output factor from the multiple factors, and models the input / output relationship
A divided data space generating means for generating a divided data space by dividing a multidimensional data space represented by some or all of the plurality of factors,
Factor determining means for determining an input factor and an output factor of the model for each of the divided data spaces using the factors included in the divided data space,
Modeling means for modeling an input / output relationship for each divided data space using multivariate data included in the divided data space and corresponding to the input factor and the output factor,
A model accuracy evaluation means for evaluating the accuracy of an integrated model obtained by integrating a plurality of models constructed for each divided data space;
Division data space correction means for correcting the division data space so that the evaluation of the accuracy of the integrated model satisfies a predetermined condition;
With
A model construction system wherein a model is constructed by sequentially operating divided data space correction means, factor determination means, modeling means, and model accuracy evaluation means until evaluation of the accuracy of an integrated model satisfies a predetermined condition.

The model construction system according to claim 7,
The divided data space correction unit corrects the divided data space by performing fusion / division of the divided data space according to the presence or absence of a phase using a correlation coefficient that is an index indicating the correlation between input and output of the model.
A model construction system characterized in that:

The model construction system according to claim 7,
The divided data space correction unit is configured such that an error between a model output obtained by inputting multivariate data with respect to an input factor of a model determined for each divided data space and multivariate data related to an output factor is a predetermined error. Divide the data above the threshold into a new data space and correct the divided data space,
A model construction system characterized in that:

In the model construction system according to any one of claims 7 to 9,
The divided data space generating means adopts a clustering method of generating a cluster as a divided data space,
A preliminary clustering means for performing clustering using all of the multivariate data to be used and dividing the cluster into preliminary clusters;
Clustering factor selecting means for calculating the importance of the clustering factor for each preliminary cluster and selecting the clustering factor;
A clustering unit that performs clustering using the multivariate data related to the clustering factor selected by the clustering factor selection unit and divides the cluster into clusters;
With
A model construction system, wherein a multidimensional data space represented by some or all of a plurality of factors is divided to generate a divided data space in cluster units.

The model construction system according to claim 10,
The clustering factor selection means calculates a value representing the center of each factor for each of the preliminary clusters generated by the preliminary clustering, calculates the importance of the clustering factor based on the degree of variation in the value, and calculates the clustering factor. A model construction system characterized by selection.

The model construction system according to claim 10,
The clustering factor selecting means calculates the importance of the clustering factor according to the degree of overlap of the range from the minimum value to the maximum value of each factor for each preliminary cluster generated by preliminary clustering, and selects the clustering factor. A model construction system characterized by: