JP2019101833A

JP2019101833A - Data prediction system by regression analysis applying probability density function to explanatory variable, data prediction method, and data prediction program

Info

Publication number: JP2019101833A
Application number: JP2017233058A
Authority: JP
Inventors: 池里美小; Satomi Koike; 條美奈子北; Minako Hojo; 島正裕中; Masahiro Nakajima
Original assignee: Silom Partners Tax Accountant Corp
Current assignee: Silom Partners Tax Accountant Corp
Priority date: 2017-12-05
Filing date: 2017-12-05
Publication date: 2019-06-24
Anticipated expiration: 2037-12-05
Also published as: JP6418537B1

Abstract

To provide a data prediction system, a data prediction method and a data prediction program which can appropriately set a setting range of an input variable of a data item as a prediction object on the basis of a probability density function, can be easily used by a user, and have an improved prediction accuracy.SOLUTION: A data prediction system comprises: means for calculating a probability density function for each of data items contained in observation data; means for setting a regression equation with any one of the data items being set as an objective variable and one or more of remainder of the data items being set as an explanatory variable; means for calculating a regression coefficient corresponding to each of explanatory variables in the regression equation set; means for calculating and displaying a probability density of an input value to be substituted into the explanatory variable of the data item as a prediction object from the probability density function of the data item, and setting a numeric value or a numeric value range with which a prediction calculation is performed from the input value substituted; and means for calculating a value of the objective variable by substituting the numeric value or the numeric value range set for the regression equation.SELECTED DRAWING: Figure 2

Description

本発明は、データ予測システム、データ予測方法、及びデータ予測プログラムに関し、より詳しくは、説明変数に確率密度関数を適用した回帰分析によるデータ予測システム、データ予測方法、及びデータ予測プログラムに関する。 The present invention relates to a data prediction system, a data prediction method, and a data prediction program, and more particularly, to a data prediction system by regression analysis applying a probability density function to explanatory variables, a data prediction method, and a data prediction program.

関連する多様な項目のデータに基づいて、目的とする事象について多面的な予測を行い、その予測結果、即ち予測データを評価することは、業務改善や経営戦略の立案などに有用である。例えば、飲食店の経営において、立地、天候、曜日などに応じた多数の店舗の売上データを収集、蓄積して、この蓄積されたデータから、所望の条件における店舗の売上高を予測できれば、飲食店の経営者は、店舗の運営や出店を適切に行うことができる。また、美容院の営業において、店舗ごとの業務記録に基づいた担当者と美容コースとの関係を分析することにより、どのような店舗営業をすれば粗利益が増大するか等を予測できれば、美容院は営業利益を向上できる。 It is useful for business improvement, business strategy planning, etc. to make multi-faceted prediction about the target event based on data of various items related and evaluate the prediction result, that is, the prediction data. For example, in restaurant management, if sales data of a large number of stores according to location, weather, day of the week, etc. are collected and accumulated, and sales of stores under desired conditions can be predicted from the accumulated data, then eating and drinking The store owner can properly operate the store and open a store. In addition, if it is possible to predict what kind of store sales will increase gross profit by analyzing the relationship between the person in charge and the beauty course based on the business record of each store in the beauty salon business, it is a beauty. The institute can improve its operating profit.

ここで、予測に用いる元データ（例えば、売上データや、実行された営業内容）を説明変数と呼ぶこととする。また、予測したいデータ（例えば、予測したい店舗売上や、予測したい粗利）を目的変数と呼ぶこととする。さらに、説明変数に任意の値を代入（入力）することによって目的変数（予測値）を得ることができる関数を、モデルあるいは予測関数と呼ぶこととする。従来、複数の説明変数と目的変数との関連性を分析するための統計学的な方法として、重回帰分析をはじめとする各種の多変量解析手法が知られている。 Here, source data (for example, sales data and executed business content) used for the prediction will be referred to as an explanatory variable. Also, data to be predicted (for example, store sales to be predicted, gross profit to be predicted) is referred to as a target variable. Furthermore, a function that can obtain a target variable (predicted value) by substituting (inputting) an arbitrary value into an explanatory variable is called a model or a prediction function. Conventionally, various multivariate analysis methods including multiple regression analysis are known as statistical methods for analyzing the relationship between a plurality of explanatory variables and a target variable.

例えば、特許文献１に記載の技術では、ホテル内に設置された売店の商品の発注の適切化を図ることを目的に、過去の客数と各商品の売上情報との相関を求めておき、予約情報に基づく各日の予約客数から、先に算出した相関を用いて各日の各商品の売上予測を行い、得られた売上予測に基づいて各日の商品の発注を行うシステムが提案されている。この技術では、予め客の形態（即ち、客層）に応じた複数の区分を設定し、発注商品の品揃えを該当日の予約情報に基づく客の形態に応じて算出している。 For example, in the technology described in Patent Document 1, the correlation between the number of customers in the past and the sales information of each product is obtained for the purpose of optimizing the ordering of the products of the store installed in the hotel, and the reservation is made. Based on the number of reservation customers for each day based on the information, sales forecast for each product is calculated using the correlation calculated earlier, and a system is proposed to place an order for each product based on the obtained sales forecast. There is. In this technology, a plurality of categories are set in advance according to the type of customer (that is, the customer class), and the assortment of ordered products is calculated according to the type of customer based on the reservation information of the relevant day.

しかしながら、この技術において、予測に用いる説明変数に対応する各データ項目の入力値は、ユーザが指定する数値であるため、特定された入力値が未だ存在しない予約受付開始前の将来見込みを予測したい場合には、担当者が、経験と勘を頼りに、想定される数値を繰り返し入力して算出されたそれぞれの値を参考にして、予想値を見積もることになる。この際、算出された値のいずれを予測値とするかについて、具体的な裏付けがないため、予測精度が保証されないという問題がある。つまり、現実に即していない説明変数を用いて予測したことにより、予測値として計算された値が現実に起こり得る可能性が著しく低いものになるという問題がある。 However, in this technology, since the input value of each data item corresponding to the explanatory variable used for prediction is a numerical value designated by the user, it is desirable to predict the future prospect before the start of reservation acceptance where the specified input value does not exist yet In this case, the person in charge will estimate the predicted value with reference to each value calculated by repeatedly inputting the assumed numerical value on the basis of experience and intuition. At this time, there is a problem that prediction accuracy can not be guaranteed because there is no specific support for which of the calculated values is to be used as the prediction value. That is, there is a problem that the possibility that the value calculated as the predicted value will be actually low becomes extremely low due to the prediction using the explanatory variable that is not in line with the reality.

一方、将来予測において説明変数となる各事象のデータは、それぞれが何らかのばらつきを有して分布している。そこで、説明変数となる各事象のデータ分布を求めて、確率論的観点から説明変数の入力範囲を絞り込むことにより、より効率的で精度の高い予測が可能となる。しかし、各事象（以下、データ項目という）ごとに統計処理を行って算出された分布特性から適正な入力条件を設定するには、統計学や表計算ソフトウェア等に関する相当の専門知識及び経験が必要である。このため、専門知識や経験が少ない者であっても簡単に現実に即した入力条件を設定でき、業務改善や経営戦略に活用できるデータ予測システム及びデータ予測方法が求められている。 On the other hand, the data of each event which becomes an explanatory variable in the future prediction is distributed with some variation. Therefore, more efficient and accurate prediction can be made by finding the data distribution of each event serving as an explanatory variable and narrowing down the input range of the explanatory variable from a probabilistic point of view. However, in order to set appropriate input conditions from the distribution characteristics calculated by performing statistical processing for each event (hereinafter referred to as data items), considerable expertise and experience about statistics, spreadsheet software, etc. are required. It is. Therefore, there is a need for a data prediction system and data prediction method that can easily set input conditions according to the actual situation even for persons with little specialized knowledge and experience, and that can be used for business improvement and management strategies.

特開平０９−２６９９０号公報Japanese Patent Application Laid-Open No. 09-26990

本発明は、上記従来の問題に鑑みてなされたものであって、本発明の目的は、回帰分析によるデータ予測システムにおいて、評価項目の入力範囲を適切に設定できる、説明変数に確率密度関数を適用したデータ予測システム、データ予測方法、及びデータ予測プログラムを提供することにある。 The present invention has been made in view of the above-described conventional problems, and an object of the present invention is to provide a probability density function as an explanatory variable which can appropriately set an input range of evaluation items in a data prediction system by regression analysis. An object of the present invention is to provide an applied data prediction system, a data prediction method, and a data prediction program.

上記目的を達成するためになされた本発明の一態様によるデータ予測システムは、説明変数に確率密度関数を適用した回帰分析によるデータ予測システムであって、
複数のデータ項目からなる観測データ及び予測処理に使用されるプログラムを格納する記憶部と、ユーザから回帰方程式及び予測条件の設定入力を受け付ける入力部と、前記受け付けた設定入力に基づいて、前記観測データに含まれる複数のデータ項目の中のいずれか１つを目的変数とし、その余のデータ項目の中の１つ以上を説明変数とする回帰方程式を設定し、前記観測データから前記設定された回帰方程式の前記目的変数及び前記説明変数に対応するデータ項目のデータで構成されたデータテーブルを作成する予測項目選択部と、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応する回帰係数を算出する回帰係数算出部と、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応するデータ項目毎に確率密度関数を算出する確率密度関数算出部と、前記設定された回帰方程式の各説明変数に代入する入力値を受け付ける予測条件設定部と、前記受け付けた入力値を、前記設定された回帰方程式の各説明変数に代入して目的変数の値を算出する予測処理部と、前記算出された目的変数の値を前記入力値に対する予測値として出力する出力部と、前記の各部、及びシステム全般を制御する制御部と、を備え、前記予測条件設定部は、前記入力値が代入される説明変数に対応するデータ項目の観測データの分布を、前記算出された確率密度関数を用いて確率密度分布グラフの形式で表示させた入力用画面を生成することを特徴とする。 A data prediction system according to an aspect of the present invention made to achieve the above object is a data prediction system by regression analysis applying a probability density function to explanatory variables,
A storage unit for storing observation data including a plurality of data items and a program used for prediction processing; an input unit for receiving setting input of a regression equation and a prediction condition from a user; and the observation based on the received setting input. A regression equation is set in which any one of a plurality of data items included in the data is used as a target variable, and one or more of the remaining data items is used as an explanatory variable, and the above-mentioned observation data is set Each of the set regression equations is created using a prediction item selection unit for creating a data table composed of data of data items corresponding to the objective variable and the explanatory variable of the regression equation, and using the created data table. A regression coefficient calculation unit that calculates a regression coefficient corresponding to an explanatory variable, and the generated regression table using the created data table. A probability density function calculation unit that calculates a probability density function for each data item corresponding to an explanatory variable, a prediction condition setting unit that receives an input value to be substituted for each explanatory variable of the set regression equation, and the received input value A prediction processing unit that calculates a value of a target variable by substituting the respective explanatory variables of the regression equation that has been set, and an output unit that outputs the calculated value of the target variable as a predicted value for the input value; And a control unit configured to control the respective units and the entire system, wherein the prediction condition setting unit is configured to calculate the distribution of observation data of the data item corresponding to the explanatory variable to which the input value is substituted. It is characterized in that a screen for input displayed in the form of a probability density distribution graph is generated using a density function.

上記目的を達成するためになされた本発明の一態様によるデータ予測方法は、説明変数に確率密度関数を適用した回帰分析によるデータ予測方法であって、
複数のデータ項目からなる観測データ及び予測処理に使用されるプログラムを格納する記憶部と、ユーザからの設定入力を受け付ける入力部と、前記受け付けた設定入力に基づいて、前記観測データに含まれる複数のデータ項目からなる回帰方程式を設定し、前記観測データから前記設定された回帰方程式の目的変数及び説明変数に対応するデータテーブルを作成する予測項目選択部と、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応する回帰係数を算出する回帰係数算出部と、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応するデータ項目毎に確率密度関数を算出する確率密度関数算出部と、前記設定された回帰方程式の各説明変数に代入する入力値を受け付ける予測条件設定部と、前記受け付けた入力値を前記設定された回帰方程式の各説明変数に代入して目的変数の値を算出する予測処理部と、前記算出された目的変数の値を前記入力値に対する予測値として出力する出力部と、前記の各部、及びシステム全般を制御する制御部と、を備えたデータ予測システムにおいて、
前記制御部が、前記入力部でユーザから回帰方程式及び予測条件の設定入力を受け付けるステップと、前記記憶部に格納された前記観測データを読み出すステップと、前記入力部で受け付けた設定入力に基づいて、前記観測データに含まれる複数のデータ項目の中のいずれか１つを目的変数とし、その余のデータ項目の中の１つ以上を説明変数とする回帰方程式を設定するステップと、前記観測データから前記設定された回帰方程式の前記目的変数及び前記説明変数に対応するデータ項目のデータで構成されたデータテーブルを作成するステップと、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応する回帰係数を算出するステップと、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応するデータ項目毎に確率密度関数を算出するステップと、前記予測条件設定部に、前記設定された回帰方程式の各説明変数に対応するデータ項目の観測データの分布を、前記算出された確率密度関数を用いて確率密度分布グラフの形式で表示させた入力用画面を生成させるステップと、前記確率密度分布グラフが表示された前記予測条件設定部の入力用画面で、ユーザからの入力値を受け付けるステップと、前記受け付けた入力値を、前記設定された回帰方程式の対応する説明変数に代入して目的変数の値を算出するステップと、前記算出された目的変数の値を前記入力値に対する予測値として出力するステップと、を含むことを特徴とする。 A data prediction method according to an aspect of the present invention made to achieve the above object is a data prediction method by regression analysis applying a probability density function to explanatory variables,
A plurality of items included in the observation data based on a storage unit storing observation data including a plurality of data items and a program used for prediction processing, an input unit receiving a setting input from a user, and the received setting input Using a prediction item selection unit for setting a regression equation including the following data items, and creating a data table corresponding to the objective variable and the explanatory variable of the set regression equation from the observation data, and using the created data table A regression coefficient calculation unit that calculates a regression coefficient corresponding to each explanatory variable of the set regression equation, and a data item corresponding to each explanatory variable of the set regression equation using the created data table A probability density function calculation unit that calculates a probability density function for each time, and an input value to be substituted for each explanatory variable of the set regression equation are received. The prediction condition setting unit to be added, a prediction processing unit that substitutes the received input value into each explanatory variable of the set regression equation to calculate the value of the target variable, and the input value of the calculated target variable In a data prediction system including an output unit that outputs as a predicted value for a value, and a control unit that controls the above-described units and the entire system,
The control unit receives the setting input of the regression equation and the prediction condition from the user in the input unit, the step of reading out the observation data stored in the storage unit, and the setting input received in the input unit Setting a regression equation in which any one of a plurality of data items included in the observation data is used as a target variable and one or more of the other data items are used as an explanatory variable, and the observation data Creating a data table composed of data of data items corresponding to the objective variable and the explanatory variable of the regression equation set from the above, and using the created data table, the regression equation set up Calculating a regression coefficient corresponding to each explanatory variable, and setting the regression method using the created data table Calculating a probability density function for each data item corresponding to each explanatory variable of the equation; and, in the prediction condition setting unit, a distribution of observation data of data items corresponding to each explanatory variable of the regression equation set, Generating an input screen displayed in the form of a probability density distribution graph using the calculated probability density function; and displaying an input screen of the prediction condition setting unit on which the probability density distribution graph is displayed. Receiving the input value from the step of calculating the value of the objective variable by substituting the received input value into the corresponding explanatory variable of the set regression equation; and calculating the value of the calculated objective variable And D. outputting as a predicted value for the input value.

上記目的を達成するためになされた本発明の一態様によるデータ予測プログラムは、複数のデータ項目からなる観測データ及び予測処理に使用されるプログラムを格納する記憶部と、ユーザからの設定入力を受け付ける入力部と、前記受け付けた設定入力に基づいて、前記観測データに含まれる複数のデータ項目からなる回帰方程式を設定し、前記観測データから前記設定された回帰方程式の目的変数及び説明変数に対応するデータテーブルを作成する予測項目選択部と、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応する回帰係数を算出する回帰係数算出部と、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応するデータ項目毎に確率密度関数を算出する確率密度関数算出部と、前記設定された回帰方程式の各説明変数に代入する入力値を受け付ける予測条件設定部と、前記受け付けた入力値を前記設定された回帰方程式の各説明変数に代入して目的変数の値を算出する予測処理部と、前記算出された目的変数の値を前記入力値に対する予測値として出力する出力部と、前記の各部、及びシステム全般を制御する制御部と、を備えたコンピュータに、観測データに基づき回帰方程式により予測値を算出する演算処理を実行させるデータ予測プログラムであって、
前記コンピュータに、前記入力部でユーザから回帰方程式及び予測条件の設定入力を受け付けるステップと、前記記憶部に格納された前記観測データを読み出すステップと、前記入力部で受け付けた設定入力に基づいて、前記観測データに含まれる複数のデータ項目の中のいずれか１つを目的変数とし、その余のデータ項目の中の１つ以上を説明変数とする回帰方程式を設定するステップと、前記観測データから前記設定された回帰方程式の前記目的変数及び前記説明変数に対応するデータ項目のデータで構成されたデータテーブルを作成するステップと、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応する回帰係数を算出するステップと、前記作成されたデータテーブルを用いて、前記設定された回帰方程式の各説明変数に対応するデータ項目毎に確率密度関数を算出するステップと、前記予測条件設定部に、前記設定された回帰方程式の各説明変数に対応するデータ項目の観測データの分布を、前記算出された確率密度関数を用いて確率密度分布グラフの形式で表示させた入力用画面を生成させるステップと、前記確率密度分布グラフが表示された予測条件設定部の入力用画面で、ユーザからの入力値を受け付けるステップと、前記受け付けた入力値を、前記設定された回帰方程式の対応する説明変数に代入して目的変数の値を算出するステップと、前記算出された目的変数の値を前記入力値に対する予測値として出力するステップと、を実行させることを特徴とする。 A data prediction program according to an aspect of the present invention made to achieve the above object receives a storage unit storing observation data consisting of a plurality of data items and a program used for prediction processing, and a setting input from a user A regression equation comprising a plurality of data items included in the observation data is set based on the input unit and the received setting input, and the observation data corresponds to the objective variable and the explanatory variable of the regression equation set. A prediction item selection unit for generating a data table, a regression coefficient calculation unit for calculating a regression coefficient corresponding to each explanatory variable of the set regression equation using the generated data table, and the generated data The probability density function is calculated for each data item corresponding to each explanatory variable of the set regression equation using a table. A density function calculation unit, a prediction condition setting unit for receiving input values to be substituted for each explanatory variable of the set regression equation, and substituting the received input value for each explanatory variable of the set regression equation A prediction processing unit that calculates a value of a variable, an output unit that outputs the calculated value of the target variable as a predicted value for the input value, and a control unit that controls the above-described units and the entire system. A data prediction program that causes a computer to execute arithmetic processing that calculates a predicted value by a regression equation based on observation data,
Based on the step of receiving the setting input of the regression equation and the prediction condition from the user in the input unit, the step of reading out the observation data stored in the storage unit, and the setting input received in the input unit. Setting a regression equation using any one of a plurality of data items included in the observation data as a target variable and one or more of the other data items as an explanatory variable; Creating a data table composed of data of data items corresponding to the objective variable and the explanatory variable of the set regression equation; and using the created data table, the set of the regression equation Calculating the regression coefficient corresponding to each explanatory variable, and using the created data table, Calculating a probability density function for each data item corresponding to each explanatory variable of the regression equation, and the distribution of observed data of data items corresponding to each explanatory variable of the regression equation set in the prediction condition setting unit And generating a screen for input displayed in the form of a probability density distribution graph using the calculated probability density function, and using the screen for input of the prediction condition setting unit on which the probability density distribution graph is displayed. Receiving the input value from the step of calculating the value of the objective variable by substituting the received input value into the corresponding explanatory variable of the set regression equation; and calculating the value of the calculated objective variable Outputting as a predicted value for the input value.

本発明のデータ予測システム及びデータ予測方法によれば、過去の観測データを利用して、回帰分析によるデータ予測を行う際に、観測データの確率密度関数を算出して、予測条件設定時に、各説明変数に対する入力値を当該説明変数に対応する観測データの確率分布と対比して認識できる形態で設定することができるため、より現実に即した条件で予測を行うことができる。したがって、従来より効率的で精度の高い分析が可能になる。また、複雑な予測条件の設定が、表示画面に出力された画像や図表を利用した視覚的な指示操作や数値入力で実行できるため、専門知識や経験が少ない者であってもデータ予測技術を簡単に利用できる。 According to the data prediction system and data prediction method of the present invention, when performing data prediction by regression analysis using past observation data, the probability density function of observation data is calculated, and each prediction condition is set. Since the input value for the explanatory variable can be set in a form that can be recognized in comparison with the probability distribution of the observation data corresponding to the explanatory variable, prediction can be performed under more realistic conditions. Therefore, more efficient and accurate analysis than before can be performed. In addition, since setting of complicated prediction conditions can be executed by visual instruction operation or numerical value input using an image or chart output on a display screen, data prediction technology can be used even for persons with little expertise or experience. Easy to use.

本発明の一実施形態によるデータ予測システムの構成を示すブロック図である。It is a block diagram showing composition of a data prediction system by one embodiment of the present invention. 本発明の一実施形態によるデータ予測システムの機能的構成を示すブロック図である。FIG. 1 is a block diagram illustrating the functional configuration of a data prediction system according to an embodiment of the present invention. 本実施形態によるデータ予測システムの入力部により生成された初期画面の一例を示す図である。It is a figure which shows an example of the initial screen produced | generated by the input part of the data prediction system by this embodiment. 本実施形態によるデータ予測システムの観測データ取得部により生成されたデータ入力用画面の一例を示す図である。It is a figure which shows an example of the screen for data input produced | generated by the observation data acquisition part of the data prediction system by this embodiment. 本実施形態によるデータ予測システムの観測データ取得部により生成されたデータテーブルの一例を示す図である。It is a figure which shows an example of the data table produced | generated by the observation data acquisition part of the data prediction system by this embodiment. 本実施形態によるデータ予測システムの予測項目選択部により生成された選択入力用画面の一例を示す図である。It is a figure which shows an example of the screen for selection input produced | generated by the prediction item selection part of the data prediction system by this embodiment. 本実施形態によるデータ予測システムの予測項目選択部により生成された回帰分析用データテーブルの一例を示す図である。It is a figure which shows an example of the data table for regression analysis produced | generated by the prediction item selection part of the data prediction system by this embodiment. 本実施形態によるデータ予測システムの予測条件設定部により生成された予測条件を設定するための第１の入力用画面の一例を示す図である。It is a figure which shows an example of the 1st screen for an input for setting the prediction conditions produced | generated by the prediction condition setting part of the data prediction system by this embodiment. 本実施形態によるデータ予測システムの予測条件設定部により生成された予測条件を設定するための第２の入力用画面の一例を示す図である。It is a figure which shows an example of the screen for 2nd input for setting the prediction conditions produced | generated by the prediction condition setting part of the data prediction system by this embodiment. 本実施形態におけるデータ予測システムの予測処理部により生成された予測結果画面の一例を示す図である。It is a figure which shows an example of the prediction result screen produced | generated by the prediction process part of the data prediction system in this embodiment. 本実施形態におけるデータ予測システムの予測処理部により生成された予測結果画面の他の例を示す図である。It is a figure which shows the other example of the prediction result screen produced | generated by the prediction process part of the data prediction system in this embodiment. 本発明の一実施形態によるデータ予測方法を示すフローチャートである。5 is a flowchart illustrating a data prediction method according to an embodiment of the present invention. 本実施形態による確率密度関数算出部で確率密度関数を算出する方法の一例を示すフローチャートである。It is a flowchart which shows an example of the method of calculating a probability density function by the probability density function calculation part by this embodiment.

以下、本発明を実施するための形態の具体例を、図面を参照しながら詳細に説明する。 Hereinafter, specific examples of modes for carrying out the present invention will be described in detail with reference to the drawings.

図１は、本発明の一実施形態によるデータ予測システムの構成を示すブロック図である。 FIG. 1 is a block diagram showing the configuration of a data prediction system according to an embodiment of the present invention.

図１に示す本発明の一実施形態によるデータ予測システム１０は、汎用のコンピュータシステムであって、制御装置１１、記憶装置１２、入力装置１３、表示装置１４、及び出力装置１５を備える。さらに、外部のデータ端末２０等とのネットワーク接続を行う通信装置１６を備える。 A data prediction system 10 according to an embodiment of the present invention illustrated in FIG. 1 is a general-purpose computer system, and includes a control device 11, a storage device 12, an input device 13, a display device 14, and an output device 15. Furthermore, the communication apparatus 16 which performs network connection with external data terminals 20 grade | etc., Is provided.

制御装置１１は、中央処理装置（ＣＰＵ）、並びにＲＯＭ及びＲＡＭ（図示せず）等を有して、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）、所定のプログラム及びデータを読み込み、実行することで、後述する本データ予測システム１０を構成する各部（各機能部）を実現するとともにこれら各部を制御する。 The control device 11 includes a central processing unit (CPU), a ROM and a RAM (not shown), etc., and reads and executes an OS (Operating System), predetermined programs and data, and executes later-described actual data. While realizing each part (each function part) which constitutes prediction system 10, these parts are controlled.

記憶装置１２には、各種のデータ及びプログラムが格納され、後述するデータ内容に応じた記憶領域（以下、記憶部という）を備える。記憶装置１２は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）又はＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の不揮発性の記憶装置であることが望ましい。また、１台の記憶装置である必要はなく、ネットワークで接続されて複数台に分散された記憶装置であってもよい。即ち、記憶装置１２は、データ予測システム１０に内蔵されるか又はネットワークで接続された外部のデータサーバで構成される。 The storage device 12 stores various data and programs, and includes a storage area (hereinafter referred to as a storage unit) according to the data content described later. The storage device 12 is preferably a non-volatile storage device such as a hard disk drive (HDD) or a solid state drive (SSD). Moreover, it is not necessary to be one storage device, and may be storage devices connected by a network and dispersed into a plurality of devices. That is, the storage device 12 is configured by an external data server which is built in the data prediction system 10 or connected by a network.

入力装置１３は、ユーザからの指示を含む各種入力を受け付けるための装置であって、キーボードやポインティングデバイス、又はタッチパッドなどで構成される。 The input device 13 is a device for receiving various inputs including an instruction from the user, and is configured of a keyboard, a pointing device, a touch pad, or the like.

表示装置１４は、取得した情報をユーザに対して視覚的に表すディスプレイ装置である。表示装置１４は、タッチ入力が可能なタッチパッドと組み合わされたタッチパネルとして、入力装置１３と一部機能を分担する構成とすることも可能である。 The display device 14 is a display device that visually represents the acquired information to the user. The display device 14 may be configured to share a part of the function with the input device 13 as a touch panel combined with a touch pad capable of touch input.

出力装置１５は、紙媒体への書き込みを行うプリンタ又はコンピュータ読み取り可能な記録媒体への書き込みを行うデータ書き込み装置等であり、コンピュータ本体に接続された外部装置として構成されるか又はコンピュータ内に含まれる形態で構成される。 The output device 15 is a printer for writing on a paper medium, a data writing device for writing on a computer-readable recording medium, etc., and is configured as an external device connected to the computer main body or included in the computer Configured in the following manner.

通信装置１６は、ネットワーク接続するための通信デバイス等で構成された通信インタフェースである。通信装置１６は、例えば、有線又は無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、又はＷｉ−Ｆｉ（登録商標）用の通信カード等であり、また、光通信用のルータ、ＡＤＳＬ（ＡｓｙｍｍｅｔｒｉｃＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）用のルータ、又は、各種通信用のモデム等であってもよい。 The communication device 16 is a communication interface configured of a communication device or the like for network connection. The communication device 16 is, for example, a wired or wireless LAN (Local Area Network), a Bluetooth (registered trademark), a communication card for Wi-Fi (registered trademark) or the like, and a router for optical communication, ADSL (Asymmetric) It may be a router for Digital Subscriber Line, or a modem for various communications.

なお、データ端末２０は、本データ予測システム１０での予測に利用する観測データを提供するネットワーク接続された外部のコンピュータ端末又はデータサーバであり得る。 The data terminal 20 may be a network-connected external computer terminal or data server that provides observation data used for prediction in the data prediction system 10.

本発明の一実施形態によるデータ予測システム１０は、単独のコンピュータで構成される形態に限定されるものではなく、ネットワーク接続された複数台のコンピュータによって構成され得る。或いは、本データ予測システム１０は、外部のクライアント端末からの要求に応答して、データ予測処理を実行し、処理結果をクライアント端末に送信する形態で構成され得る。 The data prediction system 10 according to an embodiment of the present invention is not limited to a form configured by a single computer, and may be configured by a plurality of network connected computers. Alternatively, the data prediction system 10 may be configured to execute data prediction processing in response to a request from an external client terminal and transmit the processing result to the client terminal.

以下、本発明の一実施形態によるデータ予測システムについて、図１及び図２を参照しながら説明する。一例として、不動産販売業の営業戦略に利用するデータを予測する場合について説明する。 Hereinafter, a data prediction system according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2. As an example, the case of predicting data used for the sales strategy of the real estate sales industry will be described.

図２は、本発明の一実施形態によるデータ予測システムの機能的構成を示すブロック図である。 FIG. 2 is a block diagram showing a functional configuration of a data prediction system according to an embodiment of the present invention.

図２に示すように、本実施形態によるデータ予測システム１０は、制御装置１１でデータ予測プログラムを実行させることで実現されるシステムである。データ予測システム１０は、入力部１１０、予測項目選択部１２０、予測条件設定部１３０、観測データ取得部１４０、確率密度関数算出部１５０、回帰係数算出部１６０、予測処理部１７０、及び出力部１８０を備え、観測データ格納部２１０、確率密度関数格納部２２０、回帰係数格納部２３０、及び予測データ格納部を含む記憶部２００が記憶装置１２内に備えられ、これらの構成要素を制御部１００が制御する。 As shown in FIG. 2, the data prediction system 10 according to the present embodiment is a system realized by causing the control device 11 to execute a data prediction program. The data prediction system 10 includes an input unit 110, a prediction item selection unit 120, a prediction condition setting unit 130, an observation data acquisition unit 140, a probability density function calculation unit 150, a regression coefficient calculation unit 160, a prediction processing unit 170, and an output unit 180. And a storage unit 200 including an observation data storage unit 210, a probability density function storage unit 220, a regression coefficient storage unit 230, and a prediction data storage unit is provided in the storage device 12, and the control unit 100 Control.

入力部１１０は、ユーザから本システムに実行させる処理を指示する入力を受け付けるための初期画面を生成して表示装置１４に出力し、表示された初期画面に基づくユーザの指定入力を、入力装置１３から受け取って所定の出力先に出力する。具体的に、入力部１１０は、後述する回帰方程式の設定入力及び予測条件の設定入力を受け付ける各機能部へ移動する指示入力を受け付けて、受け付けた指示入力を制御部１００に出力する。制御部１００は、指定入力に対応するプログラムを実行することで実現される各機能部を動作させて指定された処理を進行させる。 The input unit 110 generates an initial screen for receiving an input instructing a process to be executed by the system from the user, outputs the initial screen to the display device 14, and inputs the user's designated input based on the displayed initial screen. And output to a predetermined output destination. Specifically, the input unit 110 receives an instruction input for moving to each functional unit that receives a setting input of a regression equation described later and a setting input of a prediction condition, and outputs the received instruction input to the control unit 100. The control unit 100 operates each functional unit realized by executing a program corresponding to the designation input to advance the designated processing.

図３は、本実施形態によるデータ予測システムの入力部により生成された初期画面の一例を示す図である。 FIG. 3 is a diagram showing an example of an initial screen generated by the input unit of the data prediction system according to the present embodiment.

入力部１１０は、初期画面でユーザから、例えば、予測したい項目を選択する画面に移動する指示入力、即ち、図３に示す「２．予測項目選択」の画像部分を指示（ポインティング）する入力を受け付けると、予測項目選択部１２０にその指示入力を出力する。 The input unit 110 is, for example, an instruction input for moving to a screen for selecting an item to be predicted from the user on the initial screen, that is, an input for instructing (pointing) an image portion of "2. prediction item selection" shown in FIG. If accepted, the instruction input is output to the prediction item selection unit 120.

同様に、入力部１１０は、初期画面でユーザから、予測したい条件範囲を設定する画面に移動する指示入力、即ち、図３に示す「３．予測条件設定」の画像部分を指示（ポインティング）する入力を受けると、予測条件設定部１３０にその指示入力を出力し、予測に利用する観測データを入力する画面に移動する指示入力、即ち、図３に示す「１．データ入力」の画像部分を指示（ポインティング）する入力を受けると、観測データ取得部１４０に当該指示入力を出力する。 Similarly, the input unit 110 instructs (pointing) an image portion of “3. prediction condition setting” shown in FIG. 3 from the user on the initial screen to move to a screen for setting a condition range to be predicted. When an input is received, the instruction input is output to the prediction condition setting unit 130, and the instruction input for moving to a screen for inputting observation data used for prediction, that is, the image portion of "1. data input" shown in FIG. When receiving an instruction for pointing (pointing), the observation data acquisition unit 140 outputs the instruction input.

観測データ取得部１４０は、入力部１１０から観測データを入力する画面に移動する指示入力を受け取ると、観測データを取得するための画面を生成して表示装置１４に出力し、表示画面に基づいて、入力装置１３に入力された指示に応じて、指定された入力先からデータを取得する。なお、以下の説明では、表示装置１４がタッチパッドを備えて入力装置１３の指示入力機能を果たすものとして説明する。 When the observation data acquisition unit 140 receives an instruction to move to a screen for inputting observation data from the input unit 110, the observation data acquisition unit 140 generates a screen for acquiring observation data and outputs the screen to the display device 14 based on the display screen. Data is acquired from the designated input destination in accordance with the instruction input to the input device 13. In the following description, it is assumed that the display device 14 includes a touch pad and performs the instruction input function of the input device 13.

図４は、本実施形態によるデータ予測システムの観測データ取得部により生成されたデータ入力用画面の一例を示す図である。 FIG. 4 is a view showing an example of a data input screen generated by the observation data acquisition unit of the data prediction system according to the present embodiment.

観測データ取得部１４０は、図４に示すデータ入力用画面の「データファイル一覧」からユーザが所望のデータファイルを指定すると、指示されたデータファイルを当該ファイル格納先の記憶装置から読み込んで観測データ格納部２１０及び確率密度関数算出部１５０に出力する。具体的に、観測データ取得部１４０は、指示されたデータファイル中の個々のデータをデータ項目毎に整列させたデータテーブルに構成して観測データ格納部２１０に保存し、確率密度関数算出部１５０に出力する。なお、本明細書中の説明において、データファイルは、「観測データ」から成り、「観測データ」は、予測対象の事象（又は事項）に関連する複数のデータ項目の実測値が所定の順に配列されたデータ（即ち、行データ）である。 When the user designates a desired data file from the “data file list” on the data input screen shown in FIG. 4, the observation data acquisition unit 140 reads the instructed data file from the storage device of the file storage destination and observes the observation data. It is output to the storage unit 210 and the probability density function calculation unit 150. Specifically, the observation data acquisition unit 140 configures each data in the instructed data file into a data table aligned for each data item, and stores the data table in the observation data storage unit 210. Output to In the description of the present specification, the data file is made up of “observation data”, and “observation data” is an array of measured values of a plurality of data items related to the event (or item) to be predicted in a predetermined order. Data (ie, row data).

図５は、本実施形態によるデータ予測システムの観測データ取得部により生成されたデータテーブルの一例を示す図である。 FIG. 5 is a diagram showing an example of a data table generated by the observation data acquisition unit of the data prediction system according to the present embodiment.

観測データ取得部１４０は、例えば、過去に販売した不動産物件ごとの利益率、床面積、築年数、最寄り駅、徒歩時間などのデータからなるＣＳＶ形式のデータファイルをデータ端末２０から取得して、図５に示すように、物件ごとに、これらのデータ項目が並んだデータテーブル（以下、一般データテーブルという）を生成する。なお、データ項目に質的変数を含める場合、ダミー変数を設定する等の手法を適用することができる。 For example, the observation data acquisition unit 140 acquires, from the data terminal 20, a data file in CSV format including data such as the profit rate, floor area, age of each real estate property sold in the past, nearest station, walking time, etc. As shown in FIG. 5, for each property, a data table (hereinafter referred to as a general data table) in which these data items are arranged is generated. When a qualitative variable is included in a data item, a method such as setting a dummy variable can be applied.

観測データ取得部１４０は、取得したＣＳＶデータファイルからデータテーブルを生成するための設定入力として、ＣＳＶデータファイルに含まれるデータ項目の中から、着目する変数、即ち予測処理に使用する変数に使用するデータ項目を選択する欄Ａを、図４に示すデータ入力用画面に表示してもよい。この場合、観測データ取得部１４０は、図４に示すデータ入力用画面で指定されたデータ項目で構成されたデータテーブル（以下、選択データテーブルという）を別個に生成して、観測データ格納部２１０に保存し、確率密度関数算出部１５０に出力する。これにより、ユーザ所望の要因に着目した予測処理が可能となり、また予測処理時間が短縮される。なお、着目する変数の指定がない場合に生成されるデータテーブルは、一般データテーブルに相当する。 The observation data acquisition unit 140 uses, as a setting input for generating a data table from the acquired CSV data file, a variable of interest, that is, a variable used for prediction processing among data items included in the CSV data file. A column A for selecting a data item may be displayed on the data input screen shown in FIG. In this case, the observation data acquisition unit 140 separately generates a data table (hereinafter referred to as a selection data table) configured of data items designated on the data input screen shown in FIG. , And output to the probability density function calculation unit 150. As a result, it becomes possible to perform prediction processing focusing on the factor desired by the user, and the prediction processing time is shortened. The data table generated when the designated variable is not specified corresponds to a general data table.

確率密度関数算出部１５０は、観測データ取得部１４０から観測データの一般データテーブル又は選択データテーブルを直接受け取るか、又は観測データ格納部２１０から読み出して、取得したデータテーブルに基づいて、データ項目毎に確率密度関数を算出し、算出した確率密度関数を確率密度関数格納部２２０に保存する。確率密度関数算出部１５０が算出する確率密度関数は、各データ項目の事象が出現する確率密度の分布を表す。 The probability density function calculation unit 150 receives the general data table or selected data table of observation data directly from the observation data acquisition unit 140 or reads out from the observation data storage unit 210, and based on the acquired data table, each data item The probability density function is calculated, and the calculated probability density function is stored in the probability density function storage unit 220. The probability density function calculated by the probability density function calculation unit 150 represents the distribution of the probability density at which the event of each data item appears.

確率密度関数算出部１５０は、各データ項目の確率密度関数を推定する方法として、パラメトリック密度推定、ノンパラメトリック密度推定、又はセミパラメトリックな推定手法のいずれかを使用し得る。 The probability density function calculator 150 may use any of parametric density estimation, nonparametric density estimation, or semiparametric estimation method as a method of estimating the probability density function of each data item.

パラメトリック密度推定は、収集した観測データが正規分布、ポアソン分布といった一般的な分布に従うことが明らかな場合に、その分布に対応する分布関数を当てはめて、データからその分布関数のパラメータ値を決める方法である。ノンパラメトリック密度推定は、収集したデータがどの分布に従うかが明らかでない場合に、特定の関数型を仮定しないで、データから分布の形を推定する方法である。また、セミパラメトリックな推定手法は、これらの手法の中間的なもので、複雑な分布を表現するためにパラメータの数を系統的に増やせるようにすることで、パラメトリック密度推定よりも一般的な関数型を表現するセミパラメトリックな手法である。 Parametric density estimation is a method of determining the parameter value of the distribution function from the data by fitting the distribution function corresponding to the distribution, when it is clear that the collected observation data follows a general distribution such as normal distribution or Poisson distribution. It is. Nonparametric density estimation is a method of estimating the form of distribution from data without assuming a specific functional type when it is not clear which distribution the collected data conform to. Also, semiparametric estimation methods are intermediate to these methods, and by allowing the number of parameters to be systematically increased to represent complex distributions, a more general function than parametric density estimation It is a semiparametric method of expressing types.

確率密度関数算出部１５０が、いずれの方法を利用するかは、予測に際して説明変数として収集した観測データの内容（例えば、データ項目）やデータ数に応じて、予めプログラミングされるか、又は別途の試行結果に基づいてユーザが指定できるように構成され得る。 Which method is to be used by the probability density function calculation unit 150 is preprogrammed according to the contents (for example, data items) and the number of data of the observation data collected as an explanatory variable at the time of prediction, or separately It may be configured to allow the user to specify based on the trial result.

予測項目選択部１２０は、入力部１１０から予測したいデータ項目を選択する画面に移動する指示入力（図３を参照）を受け取ると、予測したいデータ項目（目的変数）の選択入力用画面を生成して表示装置１４に出力する。なお、選択入力用画面に表示されるデータ項目は、図４の画面で変数に指定されたデータ項目に限定するのが好ましい。 When receiving an instruction input (see FIG. 3) to move to the screen for selecting the data item to be predicted from the input unit 110, the prediction item selection unit 120 generates a selection input screen for the data item (objective variable) to be predicted. Output to the display unit 14. The data items displayed on the selection input screen are preferably limited to the data items designated as variables in the screen of FIG.

図６は、本実施形態におけるデータ予測システムの予測項目選択部により生成された選択入力用画面の一例を示す図である。 FIG. 6 is a view showing an example of a selection input screen generated by the prediction item selection unit of the data prediction system in the present embodiment.

図６に示す選択入力用画面に表示されたデータ項目の中から、ユーザが所望の項目として、例えば、「利益率」を選択（指示）すると、予測項目選択部１２０は、選択入力用画面で指示されたデータ項目を、後述する重回帰方程式の目的変数に設定し、その余のデータ項目を説明変数に設定する命令を回帰係数算出部１６０に出力する。なお、選択入力用画面には、表示されていないデータ項目の追加表示を指示するチェックボックスを配置してもよい。 For example, when the user selects (instructs) “Profit rate” as a desired item from among the data items displayed on the selection input screen shown in FIG. 6, the prediction item selection unit 120 displays the selection input screen. The instructed data item is set as an objective variable of the multiple regression equation described later, and an instruction to set the other data items as an explanatory variable is output to the regression coefficient calculation unit 160. A check box may be arranged on the selection input screen to instruct additional display of data items not displayed.

予測項目選択部１２０は、さらに、選択入力用画面に表示されたその余のデータ項目の中から、特に注目するデータ項目のみを説明変数に限定するための第２の選択入力用画面を生成して表示装置１４に出力してもよい。第２の選択入力用画面は、図示しないが、図６と同様の形式で生成され得る。 The prediction item selection unit 120 further generates a second selection input screen for limiting only the data item of particular interest to the explanatory variable among the remaining data items displayed on the selection input screen. It may be output to the display device 14. Although not shown, the second selection input screen may be generated in the same format as FIG.

予測項目選択部１２０は、観測データ格納部２１０から一般データテーブル又は選択データテーブルを取得し、上述した選択入力に基づいて、該当するデータ項目をそれぞれ重回帰方程式の目的変数又は説明変数に対応付けて配列し、配列された各データ項目のセルに個々のデータを配置した回帰分析用データテーブルに構成して回帰係数算出部１６０に出力する。また、回帰分析用データテーブルは、観測データ格納部２１０に一般データテーブル又は選択データテーブルとは別個に保存される。 The prediction item selection unit 120 acquires a general data table or a selection data table from the observation data storage unit 210, and associates the corresponding data item with the objective variable or explanatory variable of the multiple regression equation based on the selection input described above. The data are arranged and arranged in a data table for regression analysis in which individual data are arranged in cells of the arranged data items, and the data table is output to the regression coefficient calculation unit 160. In addition, the regression analysis data table is stored in the observation data storage unit 210 separately from the general data table or the selected data table.

図７は、本実施形態におけるデータ予測システムの予測項目選択部により生成された回帰分析用データテーブルの一例を示す図である。 FIG. 7 is a diagram showing an example of a regression analysis data table generated by the prediction item selection unit of the data prediction system in the present embodiment.

回帰係数算出部１６０は、予測項目選択部１２０から予測処理の対象となるデータ項目の指示入力、即ち、目的変数及び説明変数に設定するデータ項目及びこれに対応する回帰分析用データテーブルの指定を受け取ると、受け取った指定に基づいて、該当するデータ項目のそれぞれを、下記の式（１）で表す重回帰方程式の目的変数ｙ及び説明変数（Ｘ_１、・・・、Ｘ_ｉ）に対応付ける。そして、式（１）の目的変数ｙ及び説明変数（Ｘ_１、・・・、Ｘ_ｉ）のそれぞれに、図７に示す回帰分析用データテーブルの対応するセルのデータを代入して、各説明変数（Ｘ_１、・・・、Ｘ_ｉ）の偏回帰係数（β_１、・・・、β_ｉ）及び定数β_０を算出する。偏回帰係数（β_１、・・・、β_ｉ）は、例えば、最小二乗法により算出される。 The regression coefficient calculation unit 160 inputs from the prediction item selection unit 120 an instruction input of a data item to be subjected to prediction processing, that is, designation of data items to be set in the objective variable and the explanatory variable and the corresponding regression analysis data table. When received, each of the corresponding data items is associated with the objective variable y and the explanatory variables (X ₁ ,..., X _i ) of the multiple regression equation represented by the following equation (1) based on the received designation. Then, the data of the corresponding cell of the regression analysis data table shown in FIG. 7 is substituted for each of the objective variable y and the explanatory variables (X ₁ ,..., X _i ) in Expression (1), variable _{_{(X 1, ···, X i}} ) partial regression coefficients _{_{(β 1, ···, β i}} ) is calculated and the constant beta _0. The partial regression coefficients (β ₁ ,..., Β _i ) are calculated by, for example, the least squares method.

回帰係数算出部１６０は、算出した偏回帰係数（β_１、・・・、β_ｉ）及び定数β_０を回帰係数格納部２３０に保存する。なお、回帰係数算出部１６０は、多重共線性を評価して、予測計算の際に使用し得る説明変数を限定する処理を行ってもよい。本実施形態において、説明変数とするデータ項目を１つだけ選択した場合は、単回帰分析となるのは自明であるので、その説明は省略する。 The regression coefficient calculation unit 160 stores the calculated partial regression coefficients (β ₁ ,..., Β _i ) and the constant β ₀ in the regression coefficient storage unit 230. Note that the regression coefficient calculation unit 160 may perform a process of evaluating the multicollinearity and limiting explanatory variables that can be used in the prediction calculation. In the present embodiment, when only one data item to be an explanatory variable is selected, it is obvious that the single regression analysis is to be performed, and thus the description thereof is omitted.

予測条件設定部１３０は、入力部１１０から予測条件を設定する画面に移動する指示入力（図３を参照）を受け取ると、予測条件を設定するための入力用画面を生成して表示装置１４に出力する。 The prediction condition setting unit 130 generates an input screen for setting the prediction condition when the instruction input for moving to the screen for setting the prediction condition (see FIG. 3) is received from the input unit 110 and causes the display device 14 to Output.

図８は、本実施形態におけるデータ予測システムの予測条件設定部により生成された予測条件を設定するための第１の入力用画面の一例を示す図である。 FIG. 8 is a diagram showing an example of a first input screen for setting the prediction condition generated by the prediction condition setting unit of the data prediction system in the present embodiment.

図８に示す第１の入力用画面は、ユーザが先に目的変数に設定したデータ項目に対して、影響を及ぼすデータ項目、言い換えると回帰方程式の説明変数に代入する入力値（以下、入力パラメータという）に設定するデータ項目を選択するための入力用画面である。例えば、ユーザが目的変数に設定したデータ項目の「利益率」に対する「床面積」の影響を調べたい場合、ユーザは同画面上に表示されたデータ項目の中から、「床面積」の欄にチェックを入れる（ポインティングする）。その後、第１の入力用画面に表示された「決定」ボタンが指示（ポインティング）されると、予測条件設定部１３０は、チェックされたデータ項目に対して、さらに予測条件を設定するための第２の入力画面を生成する。 The first input screen shown in FIG. 8 is an input value (hereinafter referred to as an input parameter) to be substituted for a data item that affects the data item previously set by the user as the target variable, in other words, an explanatory variable of the regression equation. Is an input screen for selecting a data item to be set. For example, when the user wants to examine the influence of “floor area” on the “margin ratio” of the data item set as the objective variable, the user selects the “floor area” from the data items displayed on the screen. Put a check (point). Thereafter, when the “decision” button displayed on the first input screen is instructed (pointing), the prediction condition setting unit 130 further sets a prediction condition for the checked data item. Generate 2 input screens.

予測条件設定部１３０は、第１の入力用画面で入力パラメータに設定するデータ項目を「決定」する上記入力を受け取ると、設定されたデータ項目に対応する確率密度関数を確率密度関数格納部２２０から取得して、先に取得した回帰分析用データテーブルの当該データ項目に含まれるデータの分布範囲に対応させた確率密度分布グラフを作成する。そして、作成した確率密度分布グラフを含む形態で、予測条件を設定するための第２の入力用画面を生成して表示装置１４に出力する。なお、第２の入力用画面の構成及び表示させるグラフ若しくは図表の形式は、予めプログラミングされるか又はユーザが設定できるように構成され得る。 When the prediction condition setting unit 130 receives the input for “determining” the data item to be set as the input parameter on the first input screen, the probability density function storage unit 220 is used for the probability density function corresponding to the set data item. And a probability density distribution graph corresponding to the distribution range of data included in the data item of the data table for regression analysis acquired earlier is created. Then, in the form including the created probability density distribution graph, the second input screen for setting the prediction conditions is generated and output to the display device 14. The configuration of the second input screen and the format of the graph or chart to be displayed may be pre-programmed or configured to be set by the user.

図９は、本実施形態におけるデータ予測システムの予測条件設定部により生成された予測条件を設定するための第２の入力用画面の一例を示す図である。 FIG. 9 is a diagram showing an example of a second input screen for setting the prediction condition generated by the prediction condition setting unit of the data prediction system in the present embodiment.

図９に示す第２の入力用画面は、ユーザが入力パラメータに設定したデータ項目に対して、予測処理を実行するデータ範囲を設定するための画面である。例えば、ユーザが、データ項目の「床面積」を入力パラメータに設定した場合、予測条件設定部１３０は、横軸を床面積、縦軸を確率密度とするグラフ（分布曲線）を作成して第２の入力用画面内に表示する。また、第２の入力用画面には、入力値のデータ範囲を指定するための入力欄Ｂが設けられる。具体的に、入力欄Ｂは、入力値の上限ａ及び下限ｂをそれぞれ入力するための欄が個別に設けられてもよい。あるいは、グラフ画面上で入力値のデータ範囲の上限ａ及び下限ｂを表す軸線をポインタで移動させる方式で設定できる構成にしてもよい。なお、入力欄Ｂの上限及び下限を同じ値に設定すれば、その値が単一の入力値として設定される。 The second input screen shown in FIG. 9 is a screen for setting a data range in which the prediction process is performed on the data item set by the user as the input parameter. For example, when the user sets “floor area” of the data item as an input parameter, the prediction condition setting unit 130 creates a graph (distribution curve) in which the horizontal axis is the floor area and the vertical axis is the probability density. Displayed in the 2 input screen. In the second input screen, an input field B for designating a data range of input values is provided. Specifically, the input field B may be provided individually with fields for inputting the upper limit a and the lower limit b of the input value. Alternatively, an axis line representing the upper limit a and the lower limit b of the data range of the input value may be set on the graph screen so as to be moved by the pointer. If the upper limit and the lower limit of the input field B are set to the same value, the value is set as a single input value.

予測条件設定部１３０は、さらに、第２の入力用画面内に、入力中の設定条件を確認するための設定確認欄Ｃを設けて、入力中のデータ範囲に基づく付加情報を表示させ得る。具体的に、予測条件設定部１３０は、入力欄Ｂで設定された入力値のデータ範囲の上限ａ及び下限ｂの値に基づき、上限ａと下限ｂとの間の区間に対応する確率を算出し、また上限ａ及び下限ｂを示す軸線と確率密度分布曲線との交点位置の高さ（例えば、ピーク高さとの比）を算出して設定確認欄Ｃに表示する。したがって、ユーザに入力値の妥当性を確認する情報を提供することができる。 The prediction condition setting unit 130 can further provide a setting confirmation field C for confirming the setting condition being input in the second input screen, and can display additional information based on the data range being input. Specifically, the prediction condition setting unit 130 calculates the probability corresponding to the section between the upper limit a and the lower limit b based on the values of the upper limit a and the lower limit b of the data range of the input value set in the input field B. Also, the height (for example, the ratio to the peak height) of the point of intersection of the probability density distribution curve and the axis line indicating the upper limit a and the lower limit b is calculated and displayed in the setting confirmation field C. Therefore, it is possible to provide the user with information for confirming the validity of the input value.

第２の入力用画面を上述の構成で生成することにより、ユーザは、影響を評価したいデータ項目に関する予測条件を、現実にデータが存在する範囲とそのデータ分布状態とを視覚的に確認しながら適正に設定することができる。なお、ユーザは、第２の入力用画面の「再設定」を指示することにより、第１の入力用画面に戻って、入力パラメータに設定するデータ項目の選択から、やり直すことができる構成とすることも可能である。 By generating the second input screen with the above-described configuration, the user visually confirms the prediction conditions related to the data item whose effect is to be evaluated, with the range in which the data actually exists and the data distribution state thereof. It can be set properly. Note that the user can return to the first input screen by instructing "re-set" on the second input screen, and can start over from selecting a data item to be set as an input parameter. It is also possible.

なお、本実施形態では、影響を評価したいデータ項目を１項目とした例について説明したが、これに限定されない。例えば、図９に示す第２の入力用画面で、「次設定」を指示することにより、影響を評価したいデータ項目を複数選択できる構成としてもよい。このために、選択されたデータ項目毎に別個の第２の入力用画面を生成する構成、又は、同一画面上に複数のデータ項目の分布グラフを表示する構成など、多様に変更実施することが可能である。 In addition, although this embodiment demonstrated the example which made the data item which wants to evaluate influence one item, it is not limited to this. For example, by designating “next setting” on the second input screen shown in FIG. 9, a plurality of data items for which it is desired to evaluate the influence may be selected. To this end, various modifications may be made, such as a configuration for generating a separate second input screen for each selected data item, or a configuration for displaying a distribution graph of a plurality of data items on the same screen. It is possible.

予測条件設定部１３０は、図９に示す第２の入力用画面で「完了」を指示する入力を受け取ると、その時点での各入力値を、当該データ項目に対して予測処理を実行するデータ範囲として設定し、予測処理部１７０に出力する。例えば、図９に示す実施形態では、データ項目の「床面積」が入力パラメータに設定され、入力値のデータ範囲として、下限ａに６０ｍ^２、上限ｂに８０ｍ^２が設定されると、予測条件設定部１３０は、これらの設定データを予測処理部１７０に出力する。なお、このように設定された予測条件で、回帰分析の実行開始の指示を受け付ける入力用画面（図示せず）を生成して表示装置１４に出力する。 When the prediction condition setting unit 130 receives an input instructing “completion” on the second input screen shown in FIG. 9, the prediction condition setting unit 130 performs data prediction processing on the respective data items at each input value at that time. It is set as a range and output to the prediction processing unit 170. For example, in the embodiment shown in FIG. 9, the prediction condition is that “floor area” of the data item is set as the input parameter, and 60 m ² is set as the lower limit a and 80 m ^{2 as} the upper limit b as the data range of the input value. The setting unit 130 outputs these setting data to the prediction processing unit 170. Under the prediction conditions set in this way, an input screen (not shown) for receiving an instruction to start execution of regression analysis is generated and output to the display device 14.

予測処理部１７０は、予測条件設定部１３０から受け取った予測条件の設定データに基づいて、確率密度関数格納部２２０から入力パラメータに設定されたデータ項目の確率密度関数ｆ（ｘ）を取得する。そして、予測処理を実行する入力値のデータ範囲に設定された下限ａ及び上限ｂを確率密度関数ｆ（ｘ）に代入して、積分することにより、下限ａから上限ｂまでの区間の確率Ｐ（ａ≦ｘ≦ｂ）を算出する。 The prediction processing unit 170 acquires the probability density function f (x) of the data item set as the input parameter from the probability density function storage unit 220 based on the setting data of the prediction condition received from the prediction condition setting unit 130. Then, by substituting the lower limit a and the upper limit b set in the data range of the input value to be subjected to the prediction process into the probability density function f (x) and integrating, the probability P of the section from the lower limit a to the upper limit b Calculate (a ≦ x ≦ b).

予測処理部１７０は、また、回帰係数格納部２３０に格納されている偏回帰係数及び定数を取得して、下記の式（２）に表すように、データ項目の「床面積」を入力パラメータｘ_１とする重回帰方程式ｙ（ｘ_１）の対応する各偏回帰係数（β_１、・・・、β_ｉ）及び定数β_０の項に代入する。 The prediction processing unit 170 also obtains the partial regression coefficient and the constant stored in the regression coefficient storage unit 230, and expresses the “floor area” of the data item as an input parameter x as expressed by the following equation (2). ₁ to correspond to each partial regression coefficients of the multiple regression equation _{_{y (x 1) (β 1}} , ···, β i) is substituted in the section and constant beta _0.

ここで、設定される入力パラメータがｘ_１のみの１変数である場合、それ以外の各データ項目の説明変数項（Ｘ_２、・・・、Ｘ_ｉ）には、例えば、各データ項目の確率密度分布が最大値となる位置に対応するデータの値又は各データ項目のデータの平均値をそれぞれに対応する定数（Ｋ_２、・・・、Ｋ_ｉ）として代入する。そして、式（２）の入力パラメータに設定されたデータ項目の説明変数項（Ｘ_１）の入力値として、変数ｘ_１を下限ａから上限ｂまで変化させて、これに対応する目的変数ｙ（ｘ_１）の値を算出する。 Here, when the input parameter to be set is one variable of only x ₁ , for example, the probability of each data item is included in the explanatory variable term (X ₂ ,..., X _i ) of each data item other than that. The value of data corresponding to the position at which the density distribution reaches the maximum value or the average value of the data of each data item is substituted as a constant (K ₂ ,..., K _i ) corresponding to each. Then, the variable x ₁ is changed from the lower limit a to the upper limit b as an input value of the explanatory variable term (X ₁ ) of the data item set in the input parameter of Expression (2), and the corresponding target variable y ((1) Calculate the value of x ₁ ).

予測処理部１７０は、設定された入力パラメータに対応する予測処理が完了すると、算出した目的変数ｙ（ｘ_１）の値を、変数ｘ_１に対する予測値として出力部１８０及び予測データ格納部２４０に出力する。 When the prediction processing corresponding to the set input parameter is completed, the prediction processing unit 170 outputs the calculated value of the objective variable y (x ₁ ) to the output unit 180 and the prediction data storage unit 240 as a prediction value for the variable x ₁ . Output.

出力部１８０は、所定の予測結果画面を生成し、予測処理部１７０から受け取った目的変数ｙ（ｘ_１）の値を同面上に所定の形式でプロットして表示装置１４に表示させる。また、予測データ格納部２４０は、算出された目的変数ｙ（ｘ_１）の値を、予測条件設定部１３０で受け付けた各予測条件（例えば、説明変数項（Ｘ_１）のデータ項目及びその入力値ｘ_１）に対応付けて保存する。 The output unit 180 generates a predetermined prediction result screen, plots the value of the objective variable y (x ₁ ) received from the prediction processing unit 170 on the same surface in a predetermined format, and causes the display device 14 to display the same. In addition, the predicted data storage unit 240 receives each predicted condition (for example, the data item of the explanatory variable term (X ₁ ) and its input) received by the predicted condition setting unit 130 for the calculated value of the objective variable y (x ₁ ) It stores it in association with the value x ₁ ).

図１０は、本実施形態におけるデータ予測システムの予測処理部により生成された予測結果画面の一例を示す図である。 FIG. 10 is a diagram showing an example of a prediction result screen generated by the prediction processing unit of the data prediction system in the present embodiment.

図１０に示す予測結果画面は、設定された入力パラメータが説明変数Ｘ_１（ｘ_１）のみの１変数である場合の例である。この実施形態では、回帰方程式が１次式となるため、入力値として上限及び下限が設定されると、予測処理部１７０は、図９に示す第２の入力用画面のグラフ領域に当該入力パラメータに設定されたデータ項目確率密度分布曲線に重畳される形態で予測結果を線表示するように画面を構成して出力する。但し、予測結果の出力形態は、これに限定されない。図示しないが、入力パラメータに対して予測値を表形式で表示する形態にしてもよい。 The prediction result screen shown in FIG. 10 is an example in the case where the set input parameter is one variable of only the explanatory variable X ₁ (x ₁ ). In this embodiment, since the regression equation is a linear expression, when the upper limit and the lower limit are set as input values, the prediction processing unit 170 sets the input parameters in the graph area of the second input screen shown in FIG. The screen is configured and output so that the prediction result is displayed in a line in the form of being superimposed on the data item probability density distribution curve set in. However, the output form of the prediction result is not limited to this. Although not shown, the predicted values may be displayed in tabular form for input parameters.

本実施形態の他の例として、複数の入力パラメータが設定される場合、予測処理部１７０は、回帰係数格納部２３０に格納されている偏回帰係数及び定数を取得して、下記の式（３）に表すように、各データ項目を入力パラメータとする重回帰方程式ｙ（ｘ_{１、・・・、}ｘ_ｉ）の対応する各偏回帰係数（β_１、・・・、β_ｉ）及び定数β_０の項に代入する。 As another example of the present embodiment, when a plurality of input parameters are set, the prediction processing unit 170 acquires partial regression coefficients and constants stored in the regression coefficient storage unit 230, and sets the following equation (3) As shown in), corresponding partial regression coefficients (β ₁ ,..., Β _i ) and constants β of multiple regression equations y (x _1,..., X _i ) with each data item as an input parameter Assign to the ₀ term.

ここで、入力パラメータに設定された各説明変数項には、それぞれ個別のデータ範囲の数値が代入されるため、便宜上、Ｘ_１（ｘ_１）、・・・、Ｘ_ｉ（ｘ_ｉ）と表記する。そして、式（３）の各説明変数をそれぞれの下限値から上限値まで変化させて、予測演算を実行させ、これに対応する目的変数ｙ（ｘ_{１、・・・、}ｘ_ｉ）の値を算出する。 Here, since each numerical value of the individual data range is substituted for each explanatory variable term set to the input parameter, for convenience, it is expressed as X ₁ (x ₁ ),..., X _i (x _i ) Do. Then, by changing the respective explanatory variables of formula (3) to the upper limit from the respective lower limit value, to execute the prediction calculation, target variable y corresponding thereto _{(x 1, ···, x i} ) the value of calculate.

また、予測処理部１７０は、各説明変数のデータ項目に対応する確率密度関数（ｆ_１（ｘ_１）、・・・、ｆ_ｉ（ｘ_ｉ））を取得して、個別に設定された入力値のデータ範囲のそれぞれの下限ａ及び上限ｂを各確率密度関数ｆ（ｘ）に代入して、積分することにより、下限ａから上限ｂまでの区間のそれぞれの確率（Ｐ_１（ａ_１≦ｘ≦ｂ_１）、・・・、Ｐ_ｉ（ａ_ｉ≦ｘ≦ｂ_ｉ））を算出する。 In addition, the prediction processing unit 170 acquires probability density functions (f ₁ (x ₁ ),..., F _i (x _i )) corresponding to data items of the respective explanatory variables, and inputs separately set. By substituting the lower limit a and the upper limit b of the data range of the values into each probability density function f (x) and integrating, the respective probabilities of the section from the lower limit a to the upper limit b (P ₁ (a ₁ ≦ x ≦ b ₁ ),..., P _i (a _i ≦ x ≦ b _i )) is calculated.

予測処理部１７０は、設定された入力パラメータに対応する予測処理が完了すると、算出した目的変数ｙ（ｘ_{１、・・・、}ｘ_ｉ）の値を、各説明変数Ｘ_１（ｘ_１）、・・・、Ｘ_ｉ（ｘ_ｉ）に対する予測値として予測データ格納部２４０に保存する。また、予測処理部１７０は、所定の予測結果画面を生成して表示装置１４に出力する。 When the prediction processing corresponding to the set input parameter is completed, the prediction processing unit 170 calculates the value of the calculated objective variable y (x _1,..., X _i ) as each explanatory variable X ₁ (x ₁ ), Save as the predicted value for X _i (x _i ) in the predicted data storage unit 240. Further, the prediction processing unit 170 generates a predetermined prediction result screen and outputs the screen to the display device 14.

図１１は、本実施形態におけるデータ予測システムの予測処理部により生成された予測結果画面の他の例を示す図である。 FIG. 11 is a diagram showing another example of the prediction result screen generated by the prediction processing unit of the data prediction system in the present embodiment.

図１１に示す予測結果画面は、入力パラメータとして複数の説明変数が設定される場合の例である。本実施形態では、予測条件設定部１３０は、図８に示す第１の入力用画面で入力パラメータに指定された複数のデータ項目のそれぞれに対して確率密度分布グラフを作成し、これらのデータ項目に対応する説明変数のそれぞれに代入する入力値を設定するための第２の入力用画面（図１１に示す予測結果画面に対応）を生成する。 The prediction result screen shown in FIG. 11 is an example in the case where a plurality of explanatory variables are set as input parameters. In the present embodiment, the prediction condition setting unit 130 creates a probability density distribution graph for each of a plurality of data items designated as input parameters in the first input screen shown in FIG. The second input screen (corresponding to the prediction result screen shown in FIG. 11) for setting the input value to be substituted for each of the explanatory variables corresponding to.

本実施形態の第２の入力用画面には、各説明変数に代入する入力値を設定するための入力用テーブルが設けられ、この入力用テーブルの所定欄に各説明変数に対応する確率密度分布グラフが埋め込まれる。そして、ユーザが予測結果を見たい説明変数のセルに確率密度分布グラフを参照して所望の数値を入力すると、予測処理部１７０は、入力された数値を回帰方程式の該当する説明変数に代入して予測値を算出して、図１１に示す予測結果画面の結果表示欄に表示する。なお、数値が入力されない説明変数には、予め算出された所定の定数が代入される。所定の定数は、例えば、説明変数に対応するデータ項目の観測データの平均値又はその確率密度分布が最大値となる位置に対応する同データの値である。 The second input screen of the present embodiment is provided with an input table for setting an input value to be substituted for each explanatory variable, and a probability density distribution corresponding to each explanatory variable is provided in a predetermined column of the input table. The graph is embedded. Then, when the user inputs a desired numerical value with reference to the probability density distribution graph in the cell of the explanatory variable that he wants to see the prediction result, the prediction processing unit 170 substitutes the input numerical value into the corresponding explanatory variable of the regression equation. The prediction value is calculated and displayed in the result display field of the prediction result screen shown in FIG. A predetermined constant calculated in advance is substituted for an explanatory variable into which no numerical value is input. The predetermined constant is, for example, the average value of the observation data of the data item corresponding to the explanatory variable or the value of the same data corresponding to the position where the probability density distribution is the maximum value.

上述のように、本発明のデータ予測システムによれば、ユーザは所望の説明変数に対して、その発現確率を認識しながら予測値を確認することができる。但し、予測結果の出力形態は、これに限定されない。例えば、任意の２つの変動パラメータを選択して、これらを縦軸及び横軸に配した２次元グラフの形態で予測結果を表示するなど、多様に変更実施することが可能である。 As described above, according to the data prediction system of the present invention, the user can confirm the predicted value while recognizing the occurrence probability of the desired explanatory variable. However, the output form of the prediction result is not limited to this. For example, it is possible to carry out various changes, such as displaying arbitrary two fluctuation parameters and displaying the prediction result in the form of a two-dimensional graph in which these are arranged on the vertical axis and the horizontal axis.

また、本発明のデータ予測システムによれば、多数のデータ項目からなる観測データを利用して、回帰分析によるデータ予測を行う際に、各データ項目の確率密度関数を予め推定し、確率的観点から代入されるべき説明変数及びその範囲を絞り込むことができるため、より効果的且つ正確な目的変数の予測が可能になる。 Further, according to the data prediction system of the present invention, when performing data prediction by regression analysis using observation data consisting of a large number of data items, the probability density function of each data item is estimated in advance, and a probabilistic viewpoint Since it is possible to narrow down the explanatory variables to be substituted from and its range, it is possible to predict more effective and accurate objective variables.

以下、本発明の一実施形態によるデータ予測システム及びデータ予測方法を適用して、予測に利用する観測データからデータ予測を実行する段階を図１〜３及び図１２を参照しながら詳細に説明する。 Hereinafter, the steps of executing data prediction from observation data used for prediction by applying the data prediction system and data prediction method according to one embodiment of the present invention will be described in detail with reference to FIGS. 1 to 3 and FIG. .

本実施形態によるデータ予測方法では、データ予測システム１０が、データ予測プログラムを実行して、後述するステップＳ１００からステップＳ３７０までの一連のステップを、ユーザの指示入力に基づいて進行させるため、ユーザによる操作は、上述したデータ予測システム１０が画面表示する選択用アイコンの指示（ポインティング）及び入力欄への数値入力に簡略化される。したがって、ユーザは、高度な数学的知識やプログラミング技能を必要とせずに、統計理論に基づく予測結果を得ることができる。なお、本実施形態によるデータ予測システムは、取得した観測データに含まれる全てのデータ項目に対して、予測計算を行うことに代えて、予測に利用するデータ項目を指定することも可能である。 In the data prediction method according to the present embodiment, the data prediction system 10 executes the data prediction program to advance a series of steps from step S100 to step S370 described later based on the user's instruction input. The operation is simplified to the pointing (pointing) of the selection icon displayed on the screen of the data prediction system 10 described above and the numerical value input to the input field. Thus, the user can obtain statistical theory based prediction results without the need for advanced mathematical knowledge and programming skills. Note that the data prediction system according to the present embodiment can specify data items to be used for prediction instead of performing prediction calculation for all data items included in the acquired observation data.

図１２は、本発明の一実施形態によるデータ予測方法を示すフローチャートである。 FIG. 12 is a flowchart illustrating a data prediction method according to an embodiment of the present invention.

図１２に示すように、データ予測システム１０が起動されると、データ予測システム１０の制御部１００は、データ予測プログラムを実行して初期画面（図３を参照）を生成し、表示装置１４に表示させる（ステップＳ１００）。なお、以下の説明では各段階の処理を実行する本システムの個々の構成要素の機能及び動作の詳細説明は省略する。画面入力はタッチパネルを備えて入力装置１３のポインティング入力機能を兼ねる表示装置１４が対応するものとして説明する。 As shown in FIG. 12, when the data prediction system 10 is activated, the control unit 100 of the data prediction system 10 executes a data prediction program to generate an initial screen (see FIG. 3), and causes the display device 14 to It is displayed (step S100). In the following description, the detailed description of the functions and operations of the individual components of the present system for executing the process of each step will be omitted. The screen input will be described as being compatible with the display device 14 which has a touch panel and doubles as a pointing input function of the input device 13.

初期画面に表示された「作業項目」の中の「１．データ入力」が選択されると（ステップＳ１１０）、制御部１００は、観測データ取得部１４０に、データ入力用画面（図４を参照）を生成して表示させる（ステップＳ１２０）。データ入力用画面に表示されたファイルリストから、取得するデータファイルが指定されると、観測データ取得部１４０は、当該データファイルが格納された記憶装置からデータファイルを取得する（ステップＳ１３０）。例えば、ネットワーク接続されたデータ端末２０から取得する場合、データ予測システム１０は、通信装置１６を介して、データ端末２０に対して指定のデータファイルを出力するように要請する。 When "1. data input" in "work items" displayed on the initial screen is selected (step S110), the control unit 100 causes the observation data acquisition unit 140 to display the data input screen (see FIG. 4). ) Is generated and displayed (step S120). When the data file to be acquired is designated from the file list displayed on the data input screen, the observation data acquisition unit 140 acquires the data file from the storage device in which the data file is stored (step S130). For example, when acquiring from a network-connected data terminal 20, the data prediction system 10 requests the data terminal 20 to output a designated data file via the communication device 16.

観測データ取得部１４０は、取得したデータファイル中の個々のデータをデータ項目毎に整列させたデータテーブルを作成し、データファイルに含まれるデータ項目のリストをデータ入力用画面に表示する（ステップＳ１４０）。その後、ユーザの選択入力に基づいて、選択されたデータ項目で構成された選択データテーブルを作成して（ステップＳ１５０）、観測データ格納部２１０及び確率密度関数算出部１５０に出力し、確率密度関数算出部１５０は、選択データテーブルに基づいて、データ項目毎の確率密度関数を算出し、算出された確率密度関数を確率密度関数格納部２２０に出力する（ステップＳ１６０）。その後、制御部１００は、ステップＳ１００に戻って、初期画面を表示させる。なお、確率密度関数算出部１５０は、選択データテーブルを観測データ取得部１４０から直接受け取らずに、観測データ格納部２１０から読み出して、データ項目毎の確率密度関数を算出してもよい。 The observation data acquisition unit 140 creates a data table in which individual data in the acquired data file are arranged for each data item, and displays a list of data items included in the data file on the data input screen (step S140). ). Thereafter, based on the user's selection input, a selection data table composed of the selected data items is created (step S150), and output to the observation data storage unit 210 and the probability density function calculation unit 150. The calculation unit 150 calculates the probability density function for each data item based on the selection data table, and outputs the calculated probability density function to the probability density function storage unit 220 (step S160). Thereafter, the control unit 100 returns to step S100 to display an initial screen. The probability density function calculation unit 150 may read the selected data table from the observation data storage unit 210 without receiving the selection data table directly from the observation data acquisition unit 140 and calculate the probability density function for each data item.

図１３は、本実施形態による確率密度関数算出部で確率密度関数を算出する方法の一例を示すフローチャートである。 FIG. 13 is a flowchart showing an example of a method of calculating a probability density function by the probability density function calculation unit according to the present embodiment.

以下では、確率密度関数算出部１５０が、ノンパラメトリック密度推定により、確率密度関数を算出する例を説明する。 Hereinafter, an example in which the probability density function calculation unit 150 calculates the probability density function by nonparametric density estimation will be described.

図１３に示すように、確率密度関数算出部１５０は、観測データ取得部１４０から選択データテーブルを受け取ると（ステップＳ１５１）、選択データテーブルに含まれるデータ項目毎に、最小値及び最大値を求める（ステップＳ１５２）、そして、最小値と最大値との間の領域に所定数のビン（区間）を設定し、設定した各ビン内における各データの値を平均値と見做し、選択データテーブルの対応するデータ項目のデータを利用して各ビン内における各データに対応する確率分布の値を算出する（ステップＳ１５３）。確率密度関数を推定するため、本実施形態では、カーネル関数を使用し、カーネル関数としては、正規核関数を用いる。 As shown in FIG. 13, when the probability density function calculation unit 150 receives the selection data table from the observation data acquisition unit 140 (step S 151), the probability density function calculation unit 150 obtains the minimum value and the maximum value for each data item included in the selection data table. (Step S152) Then, a predetermined number of bins (sections) are set in the area between the minimum value and the maximum value, and the value of each data in each set bin is regarded as the average value, and the selected data table The value of the probability distribution corresponding to each data in each bin is calculated using the data of the corresponding data item (step S153). In order to estimate the probability density function, in the present embodiment, a kernel function is used, and a normal kernel function is used as the kernel function.

その後、ステップＳ１５３で算出した確率分布の値の平均値を各ビン内における各データ点の関数の値と推定する（ステップＳ１５４）。各ビン内における各データの値をｘ、関数の値をｙとしてグラフを作成し、ステップＳ１５４で算出した関数の値をベースとし、正規核関数を用いて確率密度関数を算出する（ステップＳ１５５）。算出された確率密度関数は、確率密度関数格納部２２０に保存される（ステップＳ１６０）。 Thereafter, the average value of the values of the probability distribution calculated in step S153 is estimated as the value of the function of each data point in each bin (step S154). Create a graph with x as the value of each data in each bin and y as the value of the function, calculate the probability density function using the normal kernel function based on the value of the function calculated in step S154 (step S155) . The calculated probability density function is stored in the probability density function storage unit 220 (step S160).

次に、初期画面に表示された「作業項目」の中の「２．予測項目選択」が選択されると（ステップＳ１１０）、制御部１００は、先のステップＳ１４０で作成されたデータテーブル及びデータ項目リストを観測データ格納部２１０から読み出して、予測項目選択部１２０に、目的変数に設定するデータ項目の選択入力用画面（図６を参照）を生成して表示させる（ステップＳ２２０）。なお、先行するステップＳ１４０が実行されていない場合、制御部１００は、その旨を警告して初期画面に戻るように設定される。 Next, when "2. prediction item selection" in "work items" displayed on the initial screen is selected (step S110), the control unit 100 generates the data table and data created in the previous step S140. The item list is read out from the observation data storage unit 210, and the prediction item selection unit 120 generates and displays a screen (see FIG. 6) for selecting and inputting data items to be set as target variables (step S220). When the preceding step S140 is not executed, the control unit 100 is set to return to the initial screen with a warning to that effect.

その後、選択入力用画面で目的変数に設定するデータ項目が指示されると、予測項目選択部１２０は、指示されたデータ項目を目的変数に対応付けるともに、その余のデータ項目を説明変数に対応付けた回帰分析用データテーブル（図７を参照）を作成して回帰係数算出部１６０に出力する（ステップＳ２３０）。回帰係数算出部１６０は、回帰分析用データテーブルのデータに基づき、これらのデータ項目からなる重回帰方程式について重回帰分析を実行し、各説明変数に対する偏回帰係数を算出する（ステップＳ２４０）。その後、制御部１００は、ステップＳ１００に戻って、初期画面を表示させる。 After that, when the data item to be set as the target variable is designated on the selection input screen, the prediction item selection unit 120 associates the instructed data item with the target variable, and associates the other data items with the explanatory variable. The regression analysis data table (see FIG. 7) is created and output to the regression coefficient calculation unit 160 (step S230). The regression coefficient calculation unit 160 executes the multiple regression analysis on the multiple regression equation including these data items based on the data of the data table for regression analysis, and calculates the partial regression coefficient for each explanatory variable (step S240). Thereafter, the control unit 100 returns to step S100 to display an initial screen.

最後に、初期画面に表示された「作業項目」の中の「３．予測条件設定」が指示されると（ステップＳ１１０）、制御部１００は、先のステップＳ２３０で作成された回帰分析用データテーブルを読み出して、予測条件設定部１３０に、予測条件を設定するための第１入力用画面（図８を参照）を生成して表示させる（ステップＳ３２０）。予測処理に適用する入力パラメータ、即ち、回帰方程式の説明変数に代入される入力値に設定するデータ項目を選択する指示入力を受け付けると（ステップＳ３３０）、予測条件設定部１３０は、選択されたデータ項目の確率密度分布グラフを作成して、予測処理を実行する際に選択されたデータ項目に対応する入力パラメータのデータ範囲を設定するための第２入力用画面（図９を参照）を生成して表示させる（ステップＳ３４０）。予測処理を実行する入力値のデータ範囲を設定する入力を受け付けると（ステップＳ３５０）、予測処理部１７０は、受け取った設定入力に基づいて、設定されたデータ範囲の区間の確率Ｐと目的変数の値、即ち予測値を算出し、予測結果画面（図１０を参照）を生成して表示装置１４に出力する（ステップＳ３６０）。 Finally, when “3. prediction condition setting” in the “work item” displayed on the initial screen is instructed (step S110), the control unit 100 generates the data for regression analysis created in the previous step S230. The table is read, and the prediction condition setting unit 130 generates and displays a first input screen (see FIG. 8) for setting the prediction condition (step S320). When an instruction input for selecting an input parameter to be applied to the prediction processing, that is, a data item to be set to an input value substituted for an explanatory variable of the regression equation is received (step S330), the prediction condition setting unit 130 selects the selected data. Create a probability density distribution graph of items, and generate a second input screen (see Figure 9) to set the data range of the input parameter corresponding to the data item selected when performing the prediction process And display (step S340). When the input for setting the data range of the input value for executing the prediction process is received (step S350), the prediction processing unit 170 determines the probability P of the section of the set data range and the objective variable based on the received setting input. A value, that is, a predicted value is calculated, and a prediction result screen (see FIG. 10) is generated and output to the display device 14 (step S360).

その後、制御部１００は、予測処理を終了するか又は別の条件で再予測するかを選択するための画像又はアイコンを表示装置１４に表示させる（ステップＳ３７０）。別の条件で再予測することが選択されると、制御部１００は、ステップＳ１００に戻って、初期画面を表示させる。 After that, the control unit 100 causes the display device 14 to display an image or an icon for selecting whether to end the prediction processing or re-predict on another condition (step S370). If re-prediction under another condition is selected, the control unit 100 returns to step S100 to display an initial screen.

上述したデータ予測方法は、アセンブラ、Ｃ、Ｃ＋＋、Ｃ＃、Ｊａｖａ（登録商標）、Ｒ言語などのレガシープログラミング言語やオブジェクト指向プログラミング言語などで記述されたコンピュータによる実行可能なプログラムにより実現でき、ＲＯＭ、ＥＥＰＲＯＭ、ＥＰＲＯＭ、フラッシュメモリ、フレキシブルディスク、ＣＤ−ＲＯＭ、ＣＤ−ＲＷ、ＤＶＤ、ＳＤカード、ＭＯなど装置可読な記録媒体に格納して頒布することができる。 The data prediction method described above can be realized by a computer executable program written in a legacy programming language such as assembler, C, C ++, C #, Java (registered trademark) or R language, or an object oriented programming language, etc. It can be stored and distributed in a device readable recording medium such as an EEPROM, an EPROM, a flash memory, a flexible disk, a CD-ROM, a CD-RW, a DVD, an SD card, and an MO.

以上、本発明のデータ予測システム及びデータ予測方法によれば、業務ソフト等によって予め作成された観測データを利用して、回帰分析により目的とするデータ項目を予測する際に、説明変数となる各データ項目の確率密度関数を算出して、予測条件設定時に、各説明変数に対する入力値が確率分布と対比して認識できる形態で表示される構成とすることで、ユーザがより現実に即した条件を選択できるように支援することができる。したがって、ランダムに数値を代入する場合と比較して、効率的で効果的な分析が可能になる。さらに、説明変数の確率分布を知ることができるため、シミュレーションしている事象が起こりやすいことであるのか、又は起こりがたいことであるのかについて検証できるという利点がある。 As described above, according to the data prediction system and data prediction method of the present invention, when predicting the target data item by regression analysis using observation data created in advance by business software or the like, the respective explanatory variables can be obtained. By calculating the probability density function of the data item and displaying the input value for each explanatory variable in a form that can be recognized in comparison with the probability distribution at the time of setting the prediction condition, the condition more realistic for the user Can help you to choose Therefore, efficient and effective analysis is possible as compared with random substitution of numerical values. Furthermore, since the probability distribution of the explanatory variable can be known, there is an advantage that it can be verified whether the event being simulated is likely to occur or unlikely to occur.

また、本発明のデータ予測システムは、予測条件の設定が表示画面に出力された画像や図表を利用して視覚的に選択する操作で実行できるため、専門知識や経験が少ない者であってもデータ予測技術を簡単に利用できる。さらに、本システムへの入力データは一般的なファイル形式で構成されるため、異なる種類のアプリケーションソフト間のデータ交換が容易であり、使用者の利便性が高いという利点を有する。 Further, since the data prediction system of the present invention can be executed by an operation of visually selecting a setting of prediction conditions using an image or a chart output on a display screen, even a person with little expert knowledge or experience. Easy to use data prediction technology. Furthermore, since input data to the present system is configured in a general file format, it is easy to exchange data between different types of application software, and has the advantage of being highly convenient for the user.

以上、本発明の実施形態について図面を参照しながら詳細に説明したが、本発明は、上述の実施形態に限定されるものではなく、本発明の技術的範囲から逸脱しない範囲内で多様に変更実施することが可能である。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the present invention is not limited to the above-described embodiments, and various modifications may be made without departing from the technical scope of the present invention. It is possible to carry out.

１０データ予測システム
１１制御装置
１２記憶装置
１３入力装置
１４表示装置
１５出力装置
１６通信装置
２０データ端末
１００制御部
１１０入力部
１２０予測項目選択部
１３０予測条件設定部
１４０観測データ取得部
１５０確率密度関数算出部
１６０回帰係数算出部
１７０予測処理部
１８０出力部
２００記憶部
２１０観測データ格納部
２２０確率密度関数格納部
２３０回帰係数格納部
２４０予測データ格納部 DESCRIPTION OF SYMBOLS 10 Data prediction system 11 Control device 12 Storage device 13 Input device 14 Display device 16 Output device 16 Communication device 20 Data terminal 100 Control unit 110 Input unit 120 Prediction item selection unit 130 Prediction condition setting unit 140 Observation data acquisition unit 150 Probability density function Calculation unit 160 Regression coefficient calculation unit 170 Prediction processing unit 180 Output unit 200 Storage unit 210 Observation data storage unit 220 Probability density function storage unit 230 Regression coefficient storage unit 240 Prediction data storage unit

Claims

A data prediction system based on regression analysis that applies a probability density function to an explanatory variable, and
A storage unit for storing observation data consisting of a plurality of data items and a program used for prediction processing;
An input unit that receives setting input of a regression equation and a prediction condition from a user;
A regression equation in which one of a plurality of data items included in the observation data is used as a target variable and one or more of the remaining data items are described as explanatory variables based on the received setting input. A prediction item selection unit which sets and creates a data table composed of data of data items corresponding to the objective variable and the explanatory variable of the regression equation set from the observation data;
A regression coefficient calculation unit that calculates regression coefficients corresponding to respective explanatory variables of the set regression equation using the created data table;
A probability density function calculation unit that calculates a probability density function for each data item corresponding to each explanatory variable of the set regression equation using the created data table;
A prediction condition setting unit that receives an input value to be substituted into each explanatory variable of the set regression equation;
A prediction processing unit that substitutes the received input value into each explanatory variable of the set regression equation to calculate the value of the target variable;
An output unit that outputs the calculated value of the target variable as a predicted value for the input value;
A control unit that controls the above-described units and the entire system;
The prediction condition setting unit is an input for displaying a distribution of observation data of a data item corresponding to an explanatory variable to which the input value is substituted, in the form of a probability density distribution graph using the calculated probability density function. A data prediction system characterized by generating a screen.

The prediction condition setting unit receives an input for specifying the maximum value and the minimum value of the input value to be substituted into a specific explanatory variable selected from among the explanatory variables of the set regression equation,
The prediction processing unit substitutes the maximum value and the minimum value of the designated input value into the selected specific explanatory variable, and sets the value of the objective variable of the regression equation corresponding to the maximum value and the minimum value. The data prediction system according to claim 1, wherein the probability of the section between the maximum value and the minimum value is calculated using the calculated probability density function while calculating.

3. The apparatus according to claim 1, wherein the prediction processing unit substitutes a predetermined constant into an explanatory variable in which the input value is not input among the explanatory variables of the set regression equation. Data forecasting system described.

A data prediction method by regression analysis in which a probability density function is applied to an explanatory variable,
A plurality of items included in the observation data based on a storage unit storing observation data including a plurality of data items and a program used for prediction processing, an input unit receiving a setting input from a user, and the received setting input Using a prediction item selection unit for setting a regression equation including the following data items, and creating a data table corresponding to the objective variable and the explanatory variable of the set regression equation from the observation data, and using the created data table A regression coefficient calculation unit that calculates a regression coefficient corresponding to each explanatory variable of the set regression equation, and a data item corresponding to each explanatory variable of the set regression equation using the created data table A probability density function calculation unit that calculates a probability density function for each time, and an input value to be substituted for each explanatory variable of the set regression equation are received. The prediction condition setting unit to be added, a prediction processing unit that substitutes the received input value into each explanatory variable of the set regression equation to calculate the value of the target variable, and the input value of the calculated target variable In a data prediction system including an output unit that outputs as a predicted value for a value, and a control unit that controls the above-described units and the entire system,
The control unit
Accepting input of setting of regression equation and prediction condition from the user at the input unit;
Reading out the observation data stored in the storage unit;
Based on the setting input received by the input unit, any one of the plurality of data items included in the observation data is used as a target variable, and one or more of the remaining data items are used as explanatory variables. Setting the regression equation,
Creating a data table composed of data of data items corresponding to the objective variable and the explanatory variable of the set regression equation from the observation data;
Calculating a regression coefficient corresponding to each explanatory variable of the set regression equation using the created data table;
Calculating a probability density function for each data item corresponding to each explanatory variable of the set regression equation using the created data table;
An input in which the distribution of observation data of data items corresponding to each explanatory variable of the set regression equation is displayed in the form of a probability density distribution graph using the calculated probability density function in the prediction condition setting unit. Creating a screen for
Receiving an input value from a user on the input screen of the prediction condition setting unit on which the probability density distribution graph is displayed;
Substituting the received input value into the corresponding explanatory variable of the set regression equation to calculate the value of the target variable;
Outputting the calculated value of the target variable as a predicted value for the input value;
A data prediction method characterized in that it comprises:

The step of receiving an input value from the user is a step of receiving an input designating the maximum value and the minimum value of the input value to be substituted into a specific explanatory variable selected from among the explanatory variables of the set regression equation. Including
In the step of calculating the value of the objective variable, the maximum value and the minimum value of the designated input value are substituted into the selected specific explanatory variable, and the regression equation of the regression equation corresponding to the maximum value and the minimum value. 5. The method according to claim 4, further comprising the steps of calculating the value of the objective variable and calculating the probability of the section between the maximum value and the minimum value using the calculated probability density function. Data prediction method.

The step of calculating the value of the objective variable may include, among the explanatory variables of the set regression equation, substituting a predetermined constant for the explanatory variable for which the input value has not been input. The data prediction method according to claim 4 or 5.

A plurality of items included in the observation data based on a storage unit storing observation data including a plurality of data items and a program used for prediction processing, an input unit receiving a setting input from a user, and the received setting input Using a prediction item selection unit for setting a regression equation including the following data items, and creating a data table corresponding to the objective variable and the explanatory variable of the set regression equation from the observation data, and using the created data table A regression coefficient calculation unit that calculates a regression coefficient corresponding to each explanatory variable of the set regression equation, and a data item corresponding to each explanatory variable of the set regression equation using the created data table A probability density function calculation unit that calculates a probability density function for each time, and an input value to be substituted for each explanatory variable of the set regression equation are received. The prediction condition setting unit to be added, a prediction processing unit that substitutes the received input value into each explanatory variable of the set regression equation to calculate the value of the target variable, and the input value of the calculated target variable Data prediction that causes a computer including an output unit that outputs a predicted value to a value, and the control unit that controls the above-described units and the entire system to execute calculation processing that calculates a predicted value using a regression equation based on observation data A program,
On the computer
Accepting input of setting of regression equation and prediction condition from the user at the input unit;
Reading out the observation data stored in the storage unit;
Based on the setting input received by the input unit, any one of the plurality of data items included in the observation data is used as a target variable, and one or more of the remaining data items are used as explanatory variables. Setting the regression equation,
Creating a data table composed of data of data items corresponding to the objective variable and the explanatory variable of the set regression equation from the observation data;
Calculating a regression coefficient corresponding to each explanatory variable of the set regression equation using the created data table;
Calculating a probability density function for each data item corresponding to each explanatory variable of the set regression equation using the created data table;
An input in which the distribution of observation data of data items corresponding to each explanatory variable of the set regression equation is displayed in the form of a probability density distribution graph using the calculated probability density function in the prediction condition setting unit. Creating a screen for
Receiving an input value from a user on the input screen of the prediction condition setting unit on which the probability density distribution graph is displayed;
Substituting the received input value into the corresponding explanatory variable of the set regression equation to calculate the value of the target variable;
Outputting the calculated value of the target variable as a predicted value for the input value.