JP2020154890A

JP2020154890A - Correlation extraction method and correlation extraction program

Info

Publication number: JP2020154890A
Application number: JP2019053874A
Authority: JP
Inventors: 知弘米田; Tomohiro Yoneda; 健吉加藤; Kenkichi Kato; 翔太山根; Shota YAMANE
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2020-09-24
Anticipated expiration: 2039-03-20
Also published as: JP6622938B1

Abstract

To automatically extract a preferred correlation condition even in a case where analytical data includes a different types of data or in a case where an analyzer does not understand content of each variable constituting these pieces of analytical data.SOLUTION: A correlation extraction program causes a computer to execute the steps of: receiving specification of two variables of a plurality of variables constituting analytical data; calculating each straight line passing through a gravity center of the analytical data in a scatter diagram of the two variables; extracting each data in which a deviation from each straight line does not exceed a threshold value; calculating each correlation coefficient from each data; calculating each conditional probability of a combination of a single variable and/or variables; and displaying the combination of the single variable and/or variables on a display unit, on the basis of each correlation coefficient and each conditional probability.SELECTED DRAWING: Figure 2

Description

本発明は、相関性抽出方法および相関性抽出プログラムに関する。 The present invention relates to a correlation extraction method and a correlation extraction program.

データ分析においては、目的変数に対して相関が認められる変数を抽出することが重要となる。現状は、多種多様なデータが混在しているため分析者が可視化などの手作業を行い、条件を指定して傾向を見ている。この手作業において、過去の経験則や統計的手法などが用いられる。手作業によるデータ分析において、コンピュータは相関係数を算出し、分析者がデータの傾向を確認している。しかし、上手く条件を指定していない場合、例えば、他の種類までデータに含まれている場合などは、上手く相関性を有する変数が抽出されないおそれがある。 In data analysis, it is important to extract variables that are correlated with the objective variable. At present, since a wide variety of data are mixed, analysts perform manual work such as visualization, specify conditions, and observe trends. In this manual work, past empirical rules and statistical methods are used. In manual data analysis, the computer calculates the correlation coefficient and the analyst confirms the data trends. However, if the conditions are not specified properly, for example, if other types are included in the data, variables having good correlation may not be extracted.

そこで、分析者の負担を減らすために、データの相関性を自動算出する技術が開示されている。例えば特許文献１の解決手段には、「目的変数の異常値を除去する。目的変数と複数の説明変数の間の関連度を計算し、関連度の高い複数の説明変数を抽出し、それらの間の独立度を計算する。関連度および独立度に基づいて、目的変数に大きな影響を与える可能性の高い説明変数の複数の候補を選択する。累積寄与率に基づいて、説明変数の候補の中から目的変数に対する寄与率の高い説明変数を選択し、回帰式を計算して、目的変数の予測値を求める。目的変数の予測値と実測値との差分を新たな目的変数とし、かつこの差分を求める際に用いた説明変数を除いた残りの説明変数を新たな説明変数として、同様の処理を繰り返す。」と記載されている。 Therefore, in order to reduce the burden on the analyst, a technique for automatically calculating the correlation of data is disclosed. For example, as a solution of Patent Document 1, "remove outliers of objective variables. Calculate the degree of relevance between the objective variable and a plurality of explanatory variables, extract a plurality of highly relevant explanatory variables, and extract them. Calculate the degree of independence between. Select multiple candidates for explanatory variables that are likely to have a significant impact on the objective variable based on the degree of relevance and degree of independence. Based on the cumulative contribution rate, the candidates for explanatory variables Select an explanatory variable with a high contribution rate to the objective variable from among them, calculate the regression equation, and obtain the predicted value of the objective variable. The difference between the predicted value of the objective variable and the measured value is used as the new objective variable, and this The same process is repeated with the remaining explanatory variables excluding the explanatory variables used when calculating the difference as new explanatory variables. "

特開２００７−３２９４１５号公報JP-A-2007-329415

特許文献１に記載されている発明は、一定の条件下で製造しているデータには有効である。しかし、様々なデータが混入している場合には、通常の現象による影響であるか、又はデータ混入による影響であるか判断がつかないため、適用が困難である。
例えば機械の使用時間と部品交換の回数を分析する場合に、異なる部品のデータが混入した状態では、データの特徴が埋もれてしまい、目的変数に大きな影響を与える説明変数を正しく抽出できない可能性がある。更に目的変数に大きな影響を与える説明変数を取り出すだけでは、データの中に隠された知見、例えば、説明変数がある範囲の場合に目的変数に対する影響が大きい等の条件を抽出することができない。
また、データを手作業で分析する場合、分析者がこれらデータを構成する各変数の内容を理解する必要があった。 The invention described in Patent Document 1 is valid for data produced under certain conditions. However, when various data are mixed, it is difficult to apply because it cannot be determined whether the influence is due to a normal phenomenon or the influence due to data mixing.
For example, when analyzing the usage time of a machine and the number of parts replacement, if data of different parts are mixed, the characteristics of the data may be buried and it may not be possible to correctly extract explanatory variables that have a large effect on the objective variable. is there. Furthermore, it is not possible to extract the findings hidden in the data, for example, the condition that the explanatory variable has a large influence on the objective variable in the case of a certain range, only by extracting the explanatory variable that has a large influence on the objective variable.
In addition, when analyzing data manually, it was necessary for the analyst to understand the contents of each variable that composes these data.

そこで、本発明は、分析データが異なる種類のデータを含んでいる場合や、これら分析データを構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件を自動抽出することを課題とする。 Therefore, the present invention automatically extracts suitable correlation conditions even when the analysis data contains different types of data or even when the analyst does not understand the contents of each variable constituting the analysis data. Is the subject.

前記した課題を解決するため、本発明の相関性抽出方法は、コンピュータが、分析データを構成する複数の変数のうち２変数の指定を受け付けるステップと、前記２変数の散布図において前記分析データの重心を通る各回帰直線を算出するステップと、各前記回帰直線からの偏差が閾値を超えない各データを抽出するステップと、各前記データから各相関係数を算出するステップと、単一変数または／および変数の組合せの各条件付き確率を算出するステップと、各前記相関係数と各前記条件付き確率に基づき、前記単一変数または／および前記変数の組合せを表示部に表示するステップと、を実施することを特徴とする。 In order to solve the above-mentioned problems, the correlation extraction method of the present invention includes a step in which a computer accepts designation of two variables among a plurality of variables constituting the analysis data, and a scatter diagram of the two variables of the analysis data. A single variable or a step of calculating each regression line passing through the center of gravity, a step of extracting each data whose deviation from each of the regression lines does not exceed the threshold, and a step of calculating each correlation coefficient from each of the said data. A step of calculating each conditional probability of / and a combination of variables, and a step of displaying the single variable or / and the combination of the variables on the display unit based on each of the correlation coefficients and each of the conditional probabilities. It is characterized by carrying out.

本発明の相関性抽出プログラムは、コンピュータに、分析データを構成する複数の変数のうち２変数の指定を受け付ける工程、前記２変数の散布図において前記分析データの重心を通る各回帰直線を算出する工程、各前記回帰直線からの偏差が閾値を超えない各データを抽出する工程、各前記データから相関係数を算出する工程、単一変数または／および変数の組合せの各条件付き確率を算出する工程、各前記相関係数と各前記条件付き確率に基づき、前記単一変数または／および前記変数の組合せを表示部に表示する工程、を実行させる。
その他の手段については、発明を実施するための形態のなかで説明する。 The correlation extraction program of the present invention calculates a step of accepting the designation of two variables among a plurality of variables constituting the analysis data in the computer, and each regression line passing through the center of gravity of the analysis data in the scatter diagram of the two variables. Steps, extracting each data whose deviation from each regression line does not exceed the threshold, calculating the correlation coefficient from each of the data, calculating each conditional probability of a single variable or / and a combination of variables. A step of displaying the single variable or / and a combination of the variables on the display unit based on each of the correlation coefficients and each of the conditional probabilities is executed.
Other means will be described in the form for carrying out the invention.

本発明によれば、分析データが異なる種類のデータを含んでいる場合や、これら分析データを構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件を自動抽出することが可能となる。 According to the present invention, even when the analysis data contains different types of data or the analyst does not understand the contents of each variable constituting the analysis data, suitable correlation conditions are automatically extracted. Is possible.

相関性抽出方法を実行するコンピュータの構成図である。It is a block diagram of the computer which executes the correlation extraction method. 相関性抽出処理を示すフローチャートである。It is a flowchart which shows the correlation extraction processing. 選択した２変数の散布図の重心を特定する動作を説明する図である。It is a figure explaining the operation of specifying the center of gravity of the scatter plot of the selected two variables. 選択した２変数の散布図の重心を通る回帰直線を引く動作を説明する図である。It is a figure explaining the operation of drawing the regression line passing through the center of gravity of the scatter plot of the selected two variables. 回帰直線との偏差が閾値を超えないデータを抽出する動作を説明する図である。It is a figure explaining the operation of extracting the data whose deviation from the regression line does not exceed the threshold value. 抽出したデータから、条件を満たすものに絞り込む動作を説明する図である。It is a figure explaining the operation of narrowing down the extracted data to the one satisfying a condition. 条件を満たす単一変数または／および変数の組合せと、その条件付き確率を抽出する動作を示すフローチャート（その１）である。It is a flowchart (No. 1) which shows the operation of extracting the conditional probability of a single variable or / and a combination of variables satisfying a condition. 条件を満たす単一変数または／および変数の組合せと、その条件付き確率を抽出する動作を示すフローチャート（その２）である。It is a flowchart (No. 2) which shows the operation of extracting the conditional probability of a single variable or / and a combination of variables satisfying a condition. 変数Ａのヒストグラムである。It is a histogram of the variable A. 変数Ａの範囲を拡げる動作を示す図である。It is a figure which shows the operation which expands the range of a variable A. 変数Ａの最頻値により、この変数Ａの出現比率を決定する図である。It is a figure which determines the appearance ratio of this variable A by the mode value of the variable A. 変数Ｚの最頻値により、この変数Ｚの出現比率を決定する図である。It is a figure which determines the appearance ratio of this variable Z by the mode value of the variable Z. 相関性抽出のための初期設定画面である。This is the initial setting screen for correlation extraction. 相関性を抽出した結果を示す図である。It is a figure which shows the result of having extracted the correlation. クラスタ化した分析データの相関性抽出処理を示すフローチャートである。It is a flowchart which shows the correlation extraction processing of the clustered analysis data. 選択した２変数の散布図を説明する図である。It is a figure explaining the scatter plot of the selected two variables. 選択した２変数の散布図において分析データをクラスタ化し、各クラスタの重心を特定する動作を説明する図である。It is a figure explaining the operation of clustering analysis data in a scatter plot of selected two variables, and specifying the center of gravity of each cluster. 選択した２変数の散布図の各クラスタにおける回帰直線を特定する動作を説明する図である。It is a figure explaining the operation of specifying a regression line in each cluster of the scatter plot of selected two variables.

以降、本発明を実施するための形態を、各図を参照して詳細に説明する。
図１は、相関性抽出方法を実行するコンピュータの構成図である。
コンピュータ１は、ＣＰＵ（Central Processing Unit）１１と、ＲＯＭ（Read Only Memory）１２と、ＲＡＭ（Random Access Memory）１３と、記憶部１６とを備えている。このコンピュータ１は、後記する第１、第２の実施形態に共通するものである。 Hereinafter, a mode for carrying out the present invention will be described in detail with reference to each figure.
FIG. 1 is a configuration diagram of a computer that executes the correlation extraction method.
The computer 1 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, and a storage unit 16. This computer 1 is common to the first and second embodiments described later.

ＣＰＵ１１は、ＲＯＭ１２やＲＡＭ１３や記憶部１６に格納されたプログラムを実行し、ＲＯＭ１２やＲＡＭ１３や記憶部１６に格納されたデータを処理するものである。
ＲＯＭ１２は、不揮発性メモリで構成されており、例えばＢＩＯＳ（Basic Input/Output System）を格納する。ＲＡＭ１３は、揮発性メモリで構成されており、プログラムが一時的に格納する変数等に用いられる。記憶部１６は、例えばハードディスクやＳＳＤ（Solid State Drive）などの大容量記憶装置で構成されており、内部に分析データ１６１と相関性抽出プログラム１６２を格納している。 The CPU 11 executes a program stored in the ROM 12, the RAM 13, or the storage unit 16 and processes the data stored in the ROM 12, the RAM 13, or the storage unit 16.
The ROM 12 is composed of a non-volatile memory, and stores, for example, a BIOS (Basic Input / Output System). The RAM 13 is composed of a volatile memory and is used as a variable or the like temporarily stored by the program. The storage unit 16 is composed of a large-capacity storage device such as a hard disk or an SSD (Solid State Drive), and stores analysis data 161 and a correlation extraction program 162 inside.

コンピュータ１は更に、入力部１４と、表示部１５とを備えている。
入力部１４は、例えばキーボードやマウスなどであり、このコンピュータ１に各種情報を入力するために用いられる。
表示部１５は、例えば液晶ディスプレイなどであり、このコンピュータ１が処理結果などを表示するために用いられる。 The computer 1 further includes an input unit 14 and a display unit 15.
The input unit 14 is, for example, a keyboard, a mouse, or the like, and is used for inputting various information into the computer 1.
The display unit 15 is, for example, a liquid crystal display, and the computer 1 is used to display a processing result or the like.

《第１の実施形態》
以下、図２から図１１により、第１の実施形態の相関性抽出プログラム１６２について説明する。この相関性抽出プログラム１６２によれば、分析データ１６１が異なる種類のデータを含んでいる場合や、これら分析データ１６１を構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件を自動抽出することができる。 << First Embodiment >>
Hereinafter, the correlation extraction program 162 of the first embodiment will be described with reference to FIGS. 2 to 11. According to this correlation extraction program 162, even when the analysis data 161 contains different types of data or the analyst does not understand the contents of each variable constituting the analysis data 161, a suitable correlation is obtained. Conditions can be automatically extracted.

図２は、相関性抽出処理を示すフローチャートである。このフローチャートを、以下の図３から図６までの各グラフとともに説明する。
ＣＰＵ１１が相関性抽出プログラム１６２を読み込んで実行することにより、以下の各ステップが実行される。
ＣＰＵ１１は、表示部１５に、目的変数のメニューと説明変数のメニューを含む初期設定画面を表示する。この初期設定画面は、後記する図１０で説明する。ユーザは、表示部１５にメニュー表示された目的変数と説明変数を、入力部１４により選択する。これにより、ＣＰＵ１１は、分析データを構成する複数の変数のうち２つの変数を目的変数および説明変数とする指定を受け付けて（Ｓ１０）、ステップＳ１１〜Ｓ１９の一連の動作を開始する。 FIG. 2 is a flowchart showing the correlation extraction process. This flowchart will be described together with the graphs of FIGS. 3 to 6 below.
When the CPU 11 reads and executes the correlation extraction program 162, the following steps are executed.
The CPU 11 displays an initial setting screen including a menu of objective variables and a menu of explanatory variables on the display unit 15. This initial setting screen will be described later with reference to FIG. The user selects the objective variable and the explanatory variable displayed in the menu on the display unit 15 by the input unit 14. As a result, the CPU 11 accepts the designation that two of the plurality of variables constituting the analysis data are the objective variable and the explanatory variable (S10), and starts a series of operations of steps S11 to S19.

ステップＳ１１において、ＣＰＵ１１は、入力された２変数で構成される散布図における分析データ１６１の重心２１（図３参照）を算出する。この重心２１について、図３のグラフを用いて説明する。 In step S11, the CPU 11 calculates the center of gravity 21 (see FIG. 3) of the analysis data 161 in the scatter plot composed of the input two variables. The center of gravity 21 will be described with reference to the graph of FIG.

図３は、分析データ１６１の機械使用時間と部品交換回数の散布図における重心２１を特定する動作を説明する図である。この散布図の横軸は、機械使用時間である。散布図の縦軸は、部品交換回数である。 FIG. 3 is a diagram illustrating an operation of specifying the center of gravity 21 in the scatter diagram of the machine usage time and the number of parts replacement of the analysis data 161. The horizontal axis of this scatter plot is the machine usage time. The vertical axis of the scatter plot is the number of parts replacement.

具体的にいうと、ＣＰＵ１１は、分析データ１６１の機械使用時間の平均を算出する。これにより重心２１の横軸座標が算出される。次にＣＰＵ１１は、分析データ１６１の部品交換回数の平均を算出する。これにより重心２１の縦軸座標が算出される。 Specifically, the CPU 11 calculates the average machine usage time of the analysis data 161. As a result, the horizontal axis coordinates of the center of gravity 21 are calculated. Next, the CPU 11 calculates the average number of parts replacements for the analysis data 161. As a result, the vertical axis coordinates of the center of gravity 21 are calculated.

ステップＳ１２において、ＣＰＵ１１は、重心２１を通る線を引き、これを回帰直線３とする。これにより、ＣＰＵ１１は、機械使用時間と部品交換回数の散布図における重心を通る各回帰直線を算出する。次にＣＰＵ１１は、ステップＳ１３〜Ｓ１６において、回帰直線３の回転処理を行う。この線について、図４のグラフを用いて説明する。 In step S12, the CPU 11 draws a line passing through the center of gravity 21 and sets this as the regression line 3. As a result, the CPU 11 calculates each regression line passing through the center of gravity in the scatter plot of the machine usage time and the number of parts replacement. Next, the CPU 11 performs the rotation process of the regression line 3 in steps S13 to S16. This line will be described with reference to the graph of FIG.

図４は、選択した２変数の散布図の重心２１を通る回帰直線３を引く動作を説明する図である。具体的にいうと、ＣＰＵ１１は、重心２１を通る回帰直線３を引く。更にＣＰＵ１１は、この回帰直線３を０度から１度ずつ回転させ、１８０度になるまで繰り返す。但し、回転角は１度ごとに限定されず、所定の角度ごとに回転させてもよい。 FIG. 4 is a diagram illustrating an operation of drawing a regression line 3 passing through the center of gravity 21 of the scatter plot of the selected two variables. Specifically, the CPU 11 draws a regression line 3 passing through the center of gravity 21. Further, the CPU 11 rotates the regression line 3 by 1 degree from 0 degrees and repeats it until it reaches 180 degrees. However, the rotation angle is not limited to each degree, and may be rotated by a predetermined angle.

この回転処理ごとに、ＣＰＵ１１は、全ての分析データ１６１のうち回帰直線３に結びつくデータ２が所定割合（例えば２５％）になるように抽出する（Ｓ１３）。このデータ２の抽出処理について、図５のグラフを用いて説明する。 For each rotation process, the CPU 11 extracts the data 2 connected to the regression line 3 out of all the analysis data 161 so as to have a predetermined ratio (for example, 25%) (S13). The extraction process of the data 2 will be described with reference to the graph of FIG.

図５は、回帰直線３との偏差が閾値を超えないデータ２を抽出する動作を説明する図である。ＣＰＵ１１は、各データ２と回帰直線３との偏差を算出し、この偏差が閾値を超えないデータ２の数が、例えば分析データ１６１に含まれるデータ２の数の２５％になるよう閾値を設定し、データ２を抽出する。具体的にいうと、ＣＰＵ１１は、各データ２に、このデータ２と回帰直線３との偏差とを対応付ける。更にＣＰＵ１１は、回帰直線３との偏差の昇順で各データ２を並べ替え、偏差が小さいものから順に２５％分のデータ２を抽出すればよい。 FIG. 5 is a diagram illustrating an operation of extracting data 2 whose deviation from the regression line 3 does not exceed the threshold value. The CPU 11 calculates the deviation between each data 2 and the regression line 3, and sets the threshold so that the number of data 2 whose deviation does not exceed the threshold is, for example, 25% of the number of data 2 included in the analysis data 161. Then, the data 2 is extracted. Specifically, the CPU 11 associates each data 2 with the deviation between the data 2 and the regression line 3. Further, the CPU 11 may sort each data 2 in ascending order of deviation from the regression line 3, and extract 25% of the data 2 in ascending order of deviation from the regression line 3.

ＣＰＵ１１は、ステップＳ１３で抽出したデータ２から、出現比率の大きい単一変数または／および変数の組合せ、およびその範囲と、その条件付き確率とを算出する（Ｓ１４）。このステップＳ１４の処理は、後記する図７Ａと図７Ｂで詳細に説明する。これにより、図６に示すように、抽出したデータ２が、所定条件を満たすものに更に絞り込まれる。 From the data 2 extracted in step S13, the CPU 11 calculates a single variable or / and a combination of variables having a large appearance ratio, a range thereof, and a conditional probability thereof (S14). The process of this step S14 will be described in detail with reference to FIGS. 7A and 7B described later. As a result, as shown in FIG. 6, the extracted data 2 is further narrowed down to those satisfying the predetermined conditions.

具体的にいうと、ステップＳ１４において、ＣＰＵ１１は、回帰直線３との偏差が閾値を超えないデータ２を抽出し、そのときのデータ２に共通する条件や特徴などを抽出する。ここで共通する条件や特徴とは、例えば生産地域が同一であることや、生産地域および使用地域が同一であること等である。抽出されるデータ数が多いほど、信頼性の高い相関が導出される。よって、信頼性の高い相関が導出される条件を用いることで、必要な条件を自動で抽出可能である。 Specifically, in step S14, the CPU 11 extracts the data 2 whose deviation from the regression line 3 does not exceed the threshold value, and extracts the conditions and features common to the data 2 at that time. The conditions and characteristics common here are, for example, that the production area is the same, the production area and the usage area are the same, and the like. The greater the number of data extracted, the more reliable the correlation is derived. Therefore, it is possible to automatically extract the necessary conditions by using the conditions from which a highly reliable correlation is derived.

ＣＰＵ１１は、回帰直線３を更に１度回転させ（Ｓ１５）、１８０度まで回転し終えたか否かを判定する（Ｓ１６）。ＣＰＵ１１は、回帰直線３を１８０度まで回転し終えていないならば（Ｎｏ）、ステップＳ１３に戻り、回帰直線３を１８０度まで回転し終えたならば（Ｙｅｓ）、ステップＳ１７に進む。即ちＣＰＵ１１は、ステップＳ１３〜Ｓ１６において、重心を通る直線を１度ごとに回転させて各回帰直線３としている。 The CPU 11 further rotates the regression line 3 once (S15), and determines whether or not the regression line 3 has been rotated to 180 degrees (S16). The CPU 11 returns to step S13 if the regression line 3 has not been rotated to 180 degrees (No), and proceeds to step S17 if the regression line 3 has been rotated to 180 degrees (Yes). That is, in steps S13 to S16, the CPU 11 rotates the straight line passing through the center of gravity every degree to form each regression line 3.

ステップＳ１７において、ＣＰＵ１１は、データ２から相関係数を算出し、この相関係数と条件付確率により、単一変数または／および変数の組合せの評価数値を算出する。
ＣＰＵ１１は、単一変数または／および変数の組合せを評価数値により降順に並べ替え（Ｓ１８）、並べ替えた単一変数または／および変数の組合せを含む分析結果（図１１参照）を表示部１５に表示すると（Ｓ１９）、図２の処理を終了する。 In step S17, the CPU 11 calculates a correlation coefficient from the data 2, and calculates an evaluation value of a single variable or / or a combination of variables based on the correlation coefficient and the conditional probability.
The CPU 11 sorts the single variable or / and the combination of variables in descending order according to the evaluation value (S18), and displays the analysis result (see FIG. 11) including the sorted single variable or / and the combination of variables on the display unit 15. When it is displayed (S19), the process of FIG. 2 ends.

図７Ａと図７Ｂは、条件を満たす単一変数または／および変数の組合せと、その条件付き確率を抽出する動作を示すフローチャートである。このフローチャートに示した処理は、図２のステップＳ１４の処理に対応する。 7A and 7B are flowcharts showing an operation of extracting a single variable or / and a combination of variables satisfying the conditions and their conditional probabilities. The process shown in this flowchart corresponds to the process of step S14 of FIG.

ＣＰＵ１１は、回帰直線３の周りのデータを抽出する（Ｓ３０）。次にＣＰＵ１１は、抽出したデータについて変数毎の出現比率を算出する（Ｓ３１）。ここで変数の出現比率とは、この変数の最頻値の比率、または、この変数のヒストグラムのうち個数が多いデータの比率のことをいう。 The CPU 11 extracts the data around the regression line 3 (S30). Next, the CPU 11 calculates the appearance ratio for each variable with respect to the extracted data (S31). Here, the appearance ratio of the variable means the ratio of the mode value of this variable or the ratio of the data having a large number in the histogram of this variable.

ＣＰＵ１１は、各変数について、ステップＳ３２〜Ｓ４６の処理を繰り返す。
最初、ＣＰＵ１１は、各変数のうち一つを１個目として選択する（Ｓ３２）。ＣＰＵ１１は、この変数の出現比率が５０％を超えるか否かを判定する（Ｓ３３）。ＣＰＵ１１は、この変数の出現比率が５０％を超えないならば（Ｎｏ）、ステップＳ３４の処理に進み、この変数の出現比率が５０％を超えるならば（Ｙｅｓ）、ステップＳ３６の処理に進む。但し、変数の出現比率の閾値については、あらかじめ定めた任意の所定の値でもよい。 The CPU 11 repeats the processes of steps S32 to S46 for each variable.
First, the CPU 11 selects one of the variables as the first variable (S32). The CPU 11 determines whether or not the appearance ratio of this variable exceeds 50% (S33). If the appearance ratio of this variable does not exceed 50% (No), the CPU 11 proceeds to the process of step S34, and if the appearance ratio of this variable exceeds 50% (Yes), the CPU 11 proceeds to the process of step S36. However, the threshold value of the appearance ratio of the variable may be an arbitrary predetermined value.

ステップＳ３４において、ＣＰＵ１１は、指定回数よりも範囲を拡げた回数が大きいか否かを判定する。ＣＰＵ１１は、指定回数よりも範囲を拡げた回数が大きくないならば（Ｎｏ）、この変数の範囲を拡げて（Ｓ３５）、ステップＳ３３の処理に戻る。ＣＰＵ１１は、指定回数よりも範囲を拡げた回数が大きいならば（Ｙｅｓ）、ステップＳ３６の処理に進む。 In step S34, the CPU 11 determines whether or not the number of times the range is expanded is larger than the specified number of times. If the number of times the range is expanded is not larger than the specified number of times (No), the CPU 11 expands the range of this variable (S35) and returns to the process of step S33. If the number of times the range is expanded is larger than the specified number of times (Yes), the CPU 11 proceeds to the process of step S36.

ステップＳ３４の変数の範囲を拡げる処理を、図８Ａと図８Ｂを用いて説明する。この図８Ａは、変数Ａのヒストグラムを示している。変数Ａの値の範囲は、式（１）に示すスタージェスの公式を使用することで、好適に範囲を設定することができる。

The process of expanding the range of variables in step S34 will be described with reference to FIGS. 8A and 8B. FIG. 8A shows a histogram of the variable A. The range of the value of the variable A can be preferably set by using the Sturges' formula shown in the equation (1).

データ６３は、変数Ａの最頻値であり、変数Ａが１０〜２０の範囲のデータである。ここではデータ６３の出現比率は５０％以下なので、変数の範囲が拡げられる。 The data 63 is the mode of the variable A, and the variable A is the data in the range of 10 to 20. Here, since the appearance ratio of the data 63 is 50% or less, the range of variables is expanded.

図８Ｂは、変数Ａのデータ６３に加えて、次に多いデータ６４も範囲に加えられたことを示している。なお、この範囲を拡げる処理は、量的データも質的データも同様である。このように単一変数であっても、範囲を拡げることで出現比率を閾値以上にすることができる。 FIG. 8B shows that in addition to the data 63 of the variable A, the next largest data 64 was also added to the range. The process of expanding this range is the same for both quantitative data and qualitative data. In this way, even if it is a single variable, the appearance ratio can be made equal to or higher than the threshold value by expanding the range.

図７Ａに戻り説明を続ける。ステップＳ３６において、ＣＰＵ１１は、該変数の出現比率を条件付き確率として記録する。また、ＣＰＵ１１は、該変数の相関係数を記録し（Ｓ３７）、図７ＢのステップＳ３８の処理に進む。
ＣＰＵ１１は、該変数を除く、その他の変数について、ステップＳ３８〜Ｓ４４の処理を繰り返す。最初、ＣＰＵ１１は、該変数を除く、その他の変数のうち一つを２個目として選択し（Ｓ３８）、１個目に選択した該変数と２個目に選択したその他の変数の組合せの出現比率を算出する（Ｓ３９）。 The explanation will be continued by returning to FIG. 7A. In step S36, the CPU 11 records the appearance ratio of the variable as a conditional probability. Further, the CPU 11 records the correlation coefficient of the variable (S37), and proceeds to the process of step S38 of FIG. 7B.
The CPU 11 repeats the processes of steps S38 to S44 for the other variables excluding the variable. First, the CPU 11 selects one of the other variables excluding the variable as the second variable (S38), and the appearance of a combination of the first selected variable and the second selected other variable. The ratio is calculated (S39).

ＣＰＵ１１は、変数の組合せの出現比率が４０％を超えるか否かを判定する（Ｓ４０）。ＣＰＵ１１は、変数の組合せの出現比率が４０％を超えないならば（Ｎｏ）、ステップＳ４３の処理に進み、変数の組合せの出現比率が４０％を超えるならば（Ｙｅｓ）、ステップＳ４１の処理に進む。但し、変数の組合せの出現比率の閾値については、あらかじめ定めた任意の所定の値でもよい。 The CPU 11 determines whether or not the appearance ratio of the combination of variables exceeds 40% (S40). If the appearance ratio of the variable combination does not exceed 40% (No), the CPU 11 proceeds to the process of step S43, and if the appearance ratio of the variable combination exceeds 40% (Yes), the CPU 11 proceeds to the process of step S41. move on. However, the threshold value of the appearance ratio of the combination of variables may be an arbitrary predetermined value.

ステップＳ４１において、ＣＰＵ１１は、該変数の出現比率を条件付き確率として記録する。また、ＣＰＵ１１は、該変数の相関係数を記録し（Ｓ４２）、ステップＳ４３の処理に進む。 In step S41, the CPU 11 records the appearance ratio of the variable as a conditional probability. Further, the CPU 11 records the correlation coefficient of the variable (S42), and proceeds to the process of step S43.

ステップＳ４３において、ＣＰＵ１１は、該変数を除く、次の変数を２個目として選択する。次にＣＰＵ１１は、該変数を除く、その他の全ての変数について処理を終了したか否か、即ち該変数を除く、次の変数の選択に失敗したか否かを判定する（Ｓ４４）。ＣＰＵ１１は、該変数を除く、その他の全ての変数について処理を終了していないならば（Ｎｏ）、ステップＳ３９に戻り、該変数を除く、その他の全ての変数について処理を終了したならば（Ｙｅｓ）、ステップＳ４５に進む。 In step S43, the CPU 11 selects the next variable excluding the variable as the second variable. Next, the CPU 11 determines whether or not the processing has been completed for all the other variables except the variable, that is, whether or not the selection of the next variable excluding the variable has failed (S44). If the CPU 11 has not completed processing for all the other variables except the variable (No), the process returns to step S39, and if the processing for all the other variables except the variable is completed (Yes). ), Proceed to step S45.

ステップＳ４５において、ＣＰＵ１１は、１個目として次の変数を選択する。次にＣＰＵ１１は、全ての変数について処理を終了したか否か、即ち１個目として次の変数の選択に失敗したか否かを判定する（Ｓ４６）。ＣＰＵ１１は、全ての変数について処理を終了していないならば（Ｎｏ）、図７ＡのステップＳ３３に戻り、全ての変数について処理を終了したならば（Ｙｅｓ）、図７Ｂの処理を終了する。但し、変数の最大選択個数は２個に限定されず、任意の所定の値でもよい。 In step S45, the CPU 11 selects the next variable as the first variable. Next, the CPU 11 determines whether or not the processing for all the variables has been completed, that is, whether or not the selection of the next variable as the first variable has failed (S46). If the processing of all the variables has not been completed (No), the CPU 11 returns to step S33 of FIG. 7A, and if the processing of all the variables is completed (Yes), the CPU 11 ends the processing of FIG. 7B. However, the maximum number of variables to be selected is not limited to two, and may be any predetermined value.

ステップＳ３９〜Ｓ４０の変数の組合せの抽出処理を、図９Ａと図９Ｂを用いて説明する。この図９Ａは、変数Ａのヒストグラムを示している。データ６１は、変数Ａの最頻値であり、変数Ａが１０〜２０の範囲のデータである。 The extraction process of the combination of variables in steps S39 to S40 will be described with reference to FIGS. 9A and 9B. FIG. 9A shows a histogram of the variable A. The data 61 is the mode of the variable A, and the variable A is the data in the range of 10 to 20.

図９Ｂは、変数Ｚのヒストグラムを示している。データ６２は、変数Ｚの最頻値であり、変数Ｚが２０〜３０の範囲のデータである。 FIG. 9B shows a histogram of the variable Z. The data 62 is the mode of the variable Z, and the variable Z is the data in the range of 20 to 30.

ＣＰＵ１１は、変数Ａの最頻値であるデータ６１を算出して、データ６１の出現比率が５０％を超えるか否かを判定する。ここでは５０％を超えているので、ステップＳ３６に進み、変数の組合せの抽出処理を行う。 The CPU 11 calculates the data 61, which is the mode of the variable A, and determines whether or not the appearance ratio of the data 61 exceeds 50%. Here, since it exceeds 50%, the process proceeds to step S36 to extract the combination of variables.

次にＣＰＵ１１は、データ６１に係る他の変数Ｂ〜Ｚの最頻値を算出し、変数の組合せの出現比率を算出する。具体的にいうと、変数Ａが１０〜２０の範囲かつ変数Ｂが２０〜３０の範囲の出現比率は４５％である。変数Ａが１０〜２０の範囲かつ変数Ｃが３０〜４０の範囲の出現比率は３０％である。以下同様に、変数Ａが１０〜２０の範囲かつ変数Ｚが２０〜３０の範囲の出現比率は８０％である。このようにＣＰＵ１１は、変数と他の変数との組合せの出現比率を算出する。以下、変数Ｂと変数Ａ，Ｃ〜Ｚとの組み合わせも同様である。ＣＰＵ１１は、これら２つの変数の組合せを評価数値により降順に並べ替えて、表示部に表示する。これによりＣＰＵは、異なる種類の変数であっても、それらの組合せのうち最も出現比率の高いものを機械的に抽出して表示することができる。 Next, the CPU 11 calculates the mode of the other variables B to Z related to the data 61, and calculates the appearance ratio of the combination of the variables. Specifically, the appearance ratio of the variable A in the range of 10 to 20 and the variable B in the range of 20 to 30 is 45%. The appearance ratio of the variable A in the range of 10 to 20 and the variable C in the range of 30 to 40 is 30%. Similarly, the appearance ratio of the variable A in the range of 10 to 20 and the variable Z in the range of 20 to 30 is 80%. In this way, the CPU 11 calculates the appearance ratio of the combination of the variable and the other variable. Hereinafter, the combination of the variable B and the variables A, C to Z is the same. The CPU 11 sorts the combination of these two variables in descending order according to the evaluation value and displays it on the display unit. As a result, the CPU can mechanically extract and display the variable having the highest appearance ratio among the combinations of variables of different types.

図１０は、相関性抽出のための初期設定画面４である。
初期設定画面４は、データ選択コンボボックス４１、目的変数コンボボックス４２、説明変数コンボボックス４３、ＯＫボタン４４、キャンセルボタン４５を含んで構成される。 FIG. 10 is an initial setting screen 4 for correlation extraction.
The initial setting screen 4 includes a data selection combo box 41, an objective variable combo box 42, an explanatory variable combo box 43, an OK button 44, and a cancel button 45.

データ選択コンボボックス４１は、相関性を抽出する対象となる分析データ１６１を選択するコンボボックス（メニュー）であり、ここでは「Ａ装置の稼働ログ」が選択されている。
目的変数コンボボックス４２は、分析データ１６１に含まれる各変数から、目的変数を選択するコンボボックスであり、ここでは「部品交換回数」が選択されている。 The data selection combo box 41 is a combo box (menu) for selecting the analysis data 161 for which the correlation is to be extracted, and here, “operation log of device A” is selected.
The objective variable combo box 42 is a combo box for selecting an objective variable from each variable included in the analysis data 161. Here, the “number of parts replacement” is selected.

説明変数コンボボックス４３は、分析データ１６１に含まれる各変数から、説明変数を選択するコンボボックスであり、ここでは「機械使用時間」が選択されている。
ＯＫボタン４４は、データ選択コンボボックス４１によって選択された分析データ１６１の相関性抽出を実行するためのボタンである。 The explanatory variable combo box 43 is a combo box for selecting an explanatory variable from each variable included in the analysis data 161. Here, "machine usage time" is selected.
The OK button 44 is a button for executing the correlation extraction of the analysis data 161 selected by the data selection combo box 41.

キャンセルボタン４５は、各コンボボックスで選択された内容をキャンセルして、この初期設定画面４を閉じるためのボタンである。
ユーザがこの初期設定画面４を操作することで、分析データ、目的変数および説明変数を設定することができる。 The cancel button 45 is a button for canceling the contents selected in each combo box and closing the initial setting screen 4.
By operating the initial setting screen 4, the user can set the analysis data, the objective variable, and the explanatory variable.

図１１は、相関性を抽出した分析結果５を示す図である。この分析結果５は、図２のステップＳ１９の処理にて表示される。
この分析結果５は、番号欄と、対象変数欄と、回帰式欄と、評価数値欄と、変数名#1欄および範囲欄#1、変数名#2欄および範囲欄#2を含んでいる。なお、更に右側の変数名#n欄および範囲欄は記載を省略している。 FIG. 11 is a diagram showing the analysis result 5 from which the correlation was extracted. The analysis result 5 is displayed in the process of step S19 of FIG.
This analysis result 5 includes a number column, a target variable column, a regression equation column, an evaluation numerical value column, a variable name column # 1 and a range column # 1, a variable name # 2 column and a range column # 2. .. The variable name #n column and range column on the right side are omitted.

番号欄は、相関性のランキング番号を示している。
対象変数欄は、目的変数名と説明変数名とを示しており、ここでは「部品交換回数×機械使用時間」が示されている。 The number column shows the correlation ranking number.
The target variable column shows the objective variable name and the explanatory variable name, and here, "part replacement count x machine usage time" is shown.

回帰式欄は、回帰直線の定数と傾き（一次定数）が示されている。ここでは「ｙ＝ａｘ＋ｂ」と記載されているが、実際にはａとｂに具体的な数値が示されている。
評価数値欄は、単一変数または／および変数の組合せの評価数値が示されている。ここで評価数値とは、相関係数と条件付き確率の積である。 In the regression equation column, the constant and slope (first-order constant) of the regression line are shown. Here, "y = ax + b" is described, but in reality, specific numerical values are shown in a and b.
The evaluation value column shows the evaluation value of a single variable or / and a combination of variables. Here, the evaluation value is the product of the correlation coefficient and the conditional probability.

各変数名欄には、このランキングに係る単一変数または／および変数の組合せが示されている。この変数名の右側の各範囲欄には、この変数に係る最頻値を与える範囲が示されている。このように、データを構成する各変数の内容を分析者が理解していない場合でも、好適な相関条件となる単一変数または／および変数の組合せを、評価数値の降順のランキング形式で自動抽出できる。
このことにより、様々なデータが混在している場合でもデータの特徴が埋もれることなく、好適な条件を抽出できる。更に、データの中に隠された知見、例えば説明変数がある範囲の時に目的変数に対する影響が大きい等の条件を抽出することができる。本実施形態の場合、図１１に示すように部品交換回数と機械使用時間に関しては、部品番号Ａ０１と湿度が２０〜３０の範囲の条件である時に相関関係が一番高いことがわかる。 In each variable name column, a single variable or / and a combination of variables related to this ranking are shown. In each range column to the right of this variable name, the range that gives the mode value for this variable is shown. In this way, even if the analyst does not understand the contents of each variable that composes the data, a single variable or / and a combination of variables that are suitable correlation conditions are automatically extracted in a descending ranking format of the evaluation values. it can.
As a result, even when various data are mixed, suitable conditions can be extracted without burying the characteristics of the data. Furthermore, it is possible to extract the knowledge hidden in the data, for example, the condition that the explanatory variable has a large influence on the objective variable when it is in a certain range. In the case of this embodiment, as shown in FIG. 11, it can be seen that the correlation between the number of parts replacement and the machine use time is highest when the part number A01 and the humidity are in the range of 20 to 30.

《第２の実施形態》
第２の実施形態は、分析データをクラスタ化したのち、各クラスタにおいて重心を求めて、好適な相関条件となる単一変数または／および変数の組合せを抽出するというものである。 << Second Embodiment >>
In the second embodiment, after clustering the analysis data, the center of gravity is obtained in each cluster, and a single variable or / and a combination of variables that are suitable correlation conditions are extracted.

図１２は、クラスタ化した分析データの相関性抽出処理を示すフローチャートである。
ＣＰＵ１１は、表示部１５に、目的変数のメニューと説明変数のメニューを選択可能に表示する。ユーザは、表示部１５にメニュー表示された目的変数と説明変数を、入力部１４により選択する（Ｓ５０）。これにより、ＣＰＵ１１は、ステップＳ５１〜Ｓ５９の一連の動作を開始する。
目的変数と説明変数を選択することにより、図１３の２変数の散布図が決定する。分析データ１６１に含まれる各データ２は、この散布図にプロットされている。なお、図１３から図１５までのグラフの横軸は、機械使用時間である。グラフの縦軸は、部品交換回数である。 FIG. 12 is a flowchart showing a correlation extraction process of clustered analysis data.
The CPU 11 selectively displays a menu of objective variables and a menu of explanatory variables on the display unit 15. The user selects the objective variable and the explanatory variable displayed in the menu on the display unit 15 by the input unit 14 (S50). As a result, the CPU 11 starts a series of operations in steps S51 to S59.
By selecting the objective variable and the explanatory variable, the scatter plot of the two variables shown in FIG. 13 is determined. Each data 2 included in the analysis data 161 is plotted in this scatter plot. The horizontal axis of the graphs from FIG. 13 to FIG. 15 is the machine usage time. The vertical axis of the graph is the number of parts replacement.

ＣＰＵ１１は、クラスタ数のｋに２の初期値を設定すると（Ｓ５１）、ステップＳ５２に進み、ｋ−ｍｅａｎｓによりクラスタリングを実施する。 When the CPU 11 sets an initial value of 2 for k of the number of clusters (S51), the CPU 11 proceeds to step S52 and performs clustering by k-means.

ＣＰＵ１１は、データ個数が３０未満のクラスタ２２が有るか否かを判定する（Ｓ５３）。ここでデータ個数の閾値の３０は一例であり、サンプルに必要な数であればよい。サンプルに必要な数は、分析データ１６１や変数によって可変であってもよい。 The CPU 11 determines whether or not there is a cluster 22 having less than 30 data (S53). Here, the threshold value of 30 for the number of data is an example, and may be any number required for the sample. The number required for the sample may be variable depending on the analysis data 161 and variables.

ＣＰＵ１１は、クラスタ２２のデータ個数が何れも３０個以上ならば（Ｎｏ）、クラスタ数のｋを一つ増加させて（Ｓ５４）、ステップＳ５２に戻る。ＣＰＵ１１は、クラスタ２２のデータ個数が３０未満のものが有れば（Ｙｅｓ）、ステップＳ５５の処理に進み、その１つ前の（ｋ−１）個のクラスタ２２を処理の対象とする。 If the number of data in the cluster 22 is 30 or more (No), the CPU 11 increases the number of clusters k by one (S54), and returns to step S52. If the number of data in the cluster 22 is less than 30 (Yes), the CPU 11 proceeds to the process of step S55, and targets the previous (k-1) clusters 22 for processing.

クラスタリングを実施した結果の一例を図１４に示す。図１４は、３つのクラスタ２２ａ，２２ｂ，２２ｃに分けられている。各クラスタ２２ａ，２２ｂ，２２ｃは、重心２１ａ，２１ｂ，２１ｃを含んでいる。以下、各クラスタを区別しないときには、単にクラスタ２２と記載する。
ＣＰＵ１１は、各クラスタ２２ａ，２２ｂ，２２ｃの重心２１ａ，２１ｂ，２１ｃから、それぞれ回帰直線３ａ，３ｂ，３ｃを引く。ＣＰＵ１１は、これら回帰直線３ａ，３ｂ，３ｃを同時に回転させながら単一変数または／および変数の組合せの相関係数と条件付確率を求める（Ｓ５６）。これら回帰直線３ａ，３ｂ，３ｃは、後記する図１５に示されている。このステップＳ５６の処理は、図２のステップＳ１３〜Ｓ１６の処理に対応する。 An example of the result of performing clustering is shown in FIG. FIG. 14 is divided into three clusters 22a, 22b, 22c. Each cluster 22a, 22b, 22c includes a center of gravity 21a, 21b, 21c. Hereinafter, when each cluster is not distinguished, it is simply referred to as cluster 22.
The CPU 11 draws regression lines 3a, 3b, and 3c from the centers of gravity 21a, 21b, and 21c of the clusters 22a, 22b, and 22c, respectively. The CPU 11 obtains the correlation coefficient and the conditional probability of a single variable or / and a combination of variables while rotating these regression lines 3a, 3b, and 3c at the same time (S56). These regression lines 3a, 3b, and 3c are shown in FIG. 15 described later. The process of step S56 corresponds to the process of steps S13 to S16 of FIG.

次にＣＰＵ１１は、相関係数と条件付確率により、単一変数または／および変数の組合せの評価数値を算出する（Ｓ５７）。ＣＰＵ１１は、単一変数または／および変数の組合せを評価数値により降順に並べ替え（Ｓ５８）、並べ替えた単一変数または／および変数の組合せを表示部１５に表示し（Ｓ５９）、図１２の処理を終了する。これらステップＳ５７〜Ｓ５９の処理は、図２のステップＳ１７〜Ｓ１９の処理に対応する。 Next, the CPU 11 calculates the evaluation value of a single variable or / and a combination of variables from the correlation coefficient and the conditional probability (S57). The CPU 11 sorts the single variable or / and the combination of variables in descending order according to the evaluation value (S58), displays the sorted single variable or / and the combination of variables on the display unit 15 (S59), and shows FIG. End the process. The processes of steps S57 to S59 correspond to the processes of steps S17 to S19 of FIG.

（変形例）
本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。例えば上記した実施形態は、本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。ある実施形態の構成の一部を他の実施形態の構成に置き換えることが可能であり、ある実施形態の構成に他の実施形態の構成を加えることも可能である。また、各実施形態の構成の一部について、他の構成の追加・削除・置換をすることも可能である。 (Modification example)
The present invention is not limited to the above-described embodiment, and includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. It is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, it is also possible to add / delete / replace a part of the configuration of each embodiment with another configuration.

上記の各構成、機能、処理部、処理手段などは、それらの一部または全部を、例えば集積回路などのハードウェアで実現してもよい。上記の各構成、機能などは、プロセッサがそれぞれの機能を実現するプログラムを解釈して実行することにより、ソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイルなどの情報は、メモリ、ハードディスク、ＳＳＤ（SolidStateDrive）などの記録装置、または、フラッシュメモリカード、ＤＶＤ（DigitalVersatileDisk）などの記録媒体に置くことができる。 Each of the above configurations, functions, processing units, processing means, and the like may be partially or wholly realized by hardware such as an integrated circuit. Each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be placed in a memory, a hard disk, a recording device such as an SSD (SolidStateDrive), or a recording medium such as a flash memory card or a DVD (DigitalVersatileDisk).

各実施形態に於いて、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には、殆ど全ての構成が相互に接続されていると考えてもよい。
本発明の変形例として、例えば、次の（ａ）〜（ｈ）のようなものがある。 In each embodiment, the control lines and information lines indicate what is considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. In practice, it can be considered that almost all configurations are interconnected.
Examples of modifications of the present invention include the following (a) to (h).

（ａ）第１の実施形態では、回帰直線を引く点を重心の１点としているが、第２の実施形態のように複数の点から複数の回帰直線を引いてもよく、限定されない。
（ｂ）クラスタリングの方法は、ｋ−ｍｅａｎｓに限定されず、任意の方法であってもよい。
（ｃ）抽出する変数の組合せは任意個数でよいが、経験的に３個までが好適である。
（ｄ）回帰直線の近傍のデータを抽出したのちの条件抽出の方法は、図７Ａと図７Ｂの処理に限定されず、相関ルール抽出手法を使用して、目的変数および説明変数に相関があるものを抽出してもよい。
（ｅ）回帰直線を回転させるステップは、１度ごとの回転角に限定されず、所定の角度ごとに回転させてもよい。
（ｆ）回帰直線との偏差が閾値を超えないデータを２５％だけ抽出しているが、２５％に限定されず、任意の割合だけ抽出すればよい。
（ｇ）相関関数と条件付き確率の積の降順で、単一変数または／および変数の組合せを並び替えてランキング表示しているが、これに限られず、単一変数または／および変数の組合せを相関関数で並び替えてランキング表示してもよい。
（ｈ）コンピュータは、分析データを構成する複数の変数のうち２つの変数を目的変数および説明変数とする指定を受け付ける。しかし、これに限られず、コンピュータが、分析データを構成する複数の変数のうち２つの変数を目的変数および説明変数として選択してもよい。 (A) In the first embodiment, the point where the regression line is drawn is set as one point of the center of gravity, but as in the second embodiment, a plurality of regression lines may be drawn from a plurality of points, and the present invention is not limited.
(B) The clustering method is not limited to k-means and may be any method.
(C) The number of combinations of variables to be extracted may be arbitrary, but empirically, up to 3 is preferable.
(D) The method of condition extraction after extracting the data in the vicinity of the regression line is not limited to the processing of FIGS. 7A and 7B, and there is a correlation between the objective variable and the explanatory variable using the correlation rule extraction method. You may extract things.
(E) The step of rotating the regression line is not limited to the rotation angle of each degree, and may be rotated by a predetermined angle.
(F) Only 25% of the data whose deviation from the regression line does not exceed the threshold value is extracted, but the data is not limited to 25%, and only an arbitrary ratio may be extracted.
(G) The single variable or / and the combination of variables are sorted and displayed in the ranking in descending order of the product of the correlation function and the conditional probability, but the ranking is not limited to this, and the single variable or / and the combination of variables are displayed. You may sort by the correlation function and display the ranking.
(H) The computer accepts designation that two of the plurality of variables constituting the analysis data are the objective variable and the explanatory variable. However, the present invention is not limited to this, and the computer may select two of the plurality of variables constituting the analysis data as the objective variable and the explanatory variable.

１コンピュータ
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４入力部
１５表示部
１６記憶部
１６１分析データ
１６２相関性抽出プログラム
２データ
２１，２１ａ〜２１ｃ重心
２２ａ〜２２ｃクラスタ
３，３ａ〜３ｃ回帰直線
４初期設定画面
４１データ選択コンボボックス
４２目的変数コンボボックス
４３説明変数コンボボックス
４４ＯＫボタン
４５キャンセルボタン
５分析結果 1 computer 11 CPU
12 ROM
13 RAM
14 Input unit 15 Display unit 16 Storage unit 161 Analysis data 162 Correlation extraction program 2 Data 21,21a to 21c Center of gravity 22a to 22c Cluster 3,3a to 3c Regression straight line 4 Initial setting screen 41 Data selection combo box 42 Objective variable combo box 43 Explanatory variable combo box 44 OK button 45 Cancel button 5 Analysis result

前記した課題を解決するため、本発明の相関性抽出方法は、コンピュータが、分析データを構成する複数の変数のうち２変数の指定を受け付けるステップと、前記２変数の散布図において前記分析データの重心を通る各直線を算出するステップと、各前記直線からの偏差が閾値を超えない各データを抽出するステップと、各前記データから各相関係数を算出するステップと、抽出した各前記データから出現比率が所定値よりも大きい単一変数または／および変数の組合せを取り出すステップと、各前記相関係数と各前記出現比率に基づき、前記単一変数または／および前記変数の組合せを表示部に表示するステップと、を実施することを特徴とする。 In order to solve the above-mentioned problems, in the correlation extraction method of the present invention, the computer accepts the designation of two variables among the plurality of variables constituting the analysis data, and the scatter diagram of the two variables shows the analysis data. A step of calculating each straight line passing through the center of gravity, a step of extracting each data whose deviation from each of the straight lines does not exceed the threshold, a step of calculating each correlation coefficient from each of the above data, and a step of calculating each correlation coefficient from each of the extracted data. a step ratio of appearance retrieves allowed combination of large single variable or / and variables than the predetermined value, based on each said correlation coefficient and each said occurrence ratio, displaying a combination of said single variable or / and the variable portion It is characterized by carrying out the steps displayed in.

本発明の相関性抽出プログラムは、コンピュータに、分析データを構成する複数の変数のうち２変数の指定を受け付ける工程、前記２変数の散布図において前記分析データの重心を通る各直線を算出する工程、各前記直線からの偏差が閾値を超えない各データを抽出する工程、各前記データから相関係数を算出する工程、抽出した各前記データから出現比率が所定値よりも大きい単一変数または／および変数の組合せを取り出す工程、各前記相関係数と各前記出現比率に基づき、前記単一変数または／および前記変数の組合せを表示部に表示する工程、を実行させる。
その他の手段については、発明を実施するための形態のなかで説明する。 The correlation extraction program of the present invention is a step of accepting the designation of two variables among a plurality of variables constituting the analysis data by a computer, and a step of calculating each straight line passing through the center of gravity of the analysis data in the scatter plot of the two variables. , A step of extracting each data whose deviation from each of the above straight lines does not exceed the threshold, a step of calculating a correlation coefficient from each of the above data, a single variable having an appearance ratio larger than a predetermined value from each of the extracted data, or / and the step of taking out the allowed combination of variables, on the basis of each of said correlation coefficient and each of said occurrence percentage, the step of displaying the combination of said single variable or / and the variables on the display unit, thereby to execute.
Other means will be described in the form for carrying out the invention.

相関性抽出方法を実行するコンピュータの構成図である。It is a block diagram of the computer which executes the correlation extraction method. 相関性抽出処理を示すフローチャートである。It is a flowchart which shows the correlation extraction processing. 選択した２変数の散布図の重心を特定する動作を説明する図である。It is a figure explaining the operation of specifying the center of gravity of the scatter plot of the selected two variables. 選択した２変数の散布図の重心を通る直線を引く動作を説明する図である。It is a figure explaining the operation of drawing a straight line passing through the center of gravity of the scatter plot of selected two variables. 直線との偏差が閾値を超えないデータを抽出する動作を説明する図である。It is a figure explaining the operation of extracting the data which the deviation from a straight line does not exceed a threshold value. 抽出したデータから、条件を満たすものに絞り込む動作を説明する図である。It is a figure explaining the operation of narrowing down the extracted data to the one satisfying a condition. 条件を満たす単一変数または／および変数の組合せと、その条件付き確率を抽出する動作を示すフローチャート（その１）である。It is a flowchart (No. 1) which shows the operation of extracting the conditional probability of a single variable or / and a combination of variables satisfying a condition. 条件を満たす単一変数または／および変数の組合せと、その条件付き確率を抽出する動作を示すフローチャート（その２）である。It is a flowchart (No. 2) which shows the operation of extracting the conditional probability of a single variable or / and a combination of variables satisfying a condition. 変数Ａのヒストグラムである。It is a histogram of the variable A. 変数Ａの範囲を拡げる動作を示す図である。It is a figure which shows the operation which expands the range of a variable A. 変数Ａの最頻値により、この変数Ａの出現比率を決定する図である。It is a figure which determines the appearance ratio of this variable A by the mode value of the variable A. 変数Ｚの最頻値により、この変数Ｚの出現比率を決定する図である。It is a figure which determines the appearance ratio of this variable Z by the mode value of the variable Z. 相関性抽出のための初期設定画面である。This is the initial setting screen for correlation extraction. 相関性を抽出した結果を示す図である。It is a figure which shows the result of having extracted the correlation. クラスタ化した分析データの相関性抽出処理を示すフローチャートである。It is a flowchart which shows the correlation extraction processing of the clustered analysis data. 選択した２変数の散布図を説明する図である。It is a figure explaining the scatter plot of the selected two variables. 選択した２変数の散布図において分析データをクラスタ化し、各クラスタの重心を特定する動作を説明する図である。It is a figure explaining the operation of clustering analysis data in a scatter plot of selected two variables, and specifying the center of gravity of each cluster. 選択した２変数の散布図の各クラスタにおける直線を特定する動作を説明する図である。It is a figure explaining the operation of specifying a straight line in each cluster of the scatter plot of selected two variables.

ステップＳ１２において、ＣＰＵ１１は、重心２１を通る線を引き、これを直線３とする。これにより、ＣＰＵ１１は、機械使用時間と部品交換回数の散布図における重心を通る各直線を算出する。次にＣＰＵ１１は、ステップＳ１３〜Ｓ１６において、直線３の回転処理を行う。この線について、図４のグラフを用いて説明する。 In step S12, the CPU 11 draws a line passing through the center of gravity 21 and sets this as a straight line 3. As a result, the CPU 11 calculates each straight line passing through the center of gravity in the scatter plot of the machine usage time and the number of parts replacement. Next, the CPU 11 performs the rotation process of the straight line 3 in steps S13 to S16. This line will be described with reference to the graph of FIG.

図４は、選択した２変数の散布図の重心２１を通る直線３を引く動作を説明する図である。具体的にいうと、ＣＰＵ１１は、重心２１を通る直線３を引く。更にＣＰＵ１１は、この直線３を０度から１度ずつ回転させ、１８０度になるまで繰り返す。但し、回転角は１度ごとに限定されず、所定の角度ごとに回転させてもよい。 FIG. 4 is a diagram illustrating an operation of drawing a straight line 3 passing through the center of gravity 21 of the scatter plot of the selected two variables. Specifically, the CPU 11 draws a straight line 3 passing through the center of gravity 21. Further, the CPU 11 rotates the straight line 3 by 1 degree from 0 degrees and repeats it until it reaches 180 degrees. However, the rotation angle is not limited to each degree, and may be rotated by a predetermined angle.

この回転処理ごとに、ＣＰＵ１１は、全ての分析データ１６１のうち直線３に結びつくデータ２が所定割合（例えば２５％）になるように抽出する（Ｓ１３）。このデータ２の抽出処理について、図５のグラフを用いて説明する。 For each rotation process, the CPU 11 extracts the data 2 connected to the straight line 3 out of all the analysis data 161 so as to have a predetermined ratio (for example, 25%) (S13). The extraction process of the data 2 will be described with reference to the graph of FIG.

図５は、直線３との偏差が閾値を超えないデータ２を抽出する動作を説明する図である。ＣＰＵ１１は、各データ２と直線３との偏差を算出し、この偏差が閾値を超えないデータ２の数が、例えば分析データ１６１に含まれるデータ２の数の２５％になるよう閾値を設定し、データ２を抽出する。具体的にいうと、ＣＰＵ１１は、各データ２に、このデータ２と直線３との偏差とを対応付ける。更にＣＰＵ１１は、直線３との偏差の昇順で各データ２を並べ替え、偏差が小さいものから順に２５％分のデータ２を抽出すればよい。 FIG. 5 is a diagram illustrating an operation of extracting data 2 whose deviation from the straight line 3 does not exceed the threshold value. The CPU 11 calculates the deviation between each data 2 and the straight line 3, and sets the threshold so that the number of data 2 whose deviation does not exceed the threshold is, for example, 25% of the number of data 2 included in the analysis data 161. , Data 2 is extracted. Specifically, the CPU 11 associates each data 2 with the deviation between the data 2 and the straight line 3. Further, the CPU 11 may sort each data 2 in ascending order of deviation from the straight line 3, and extract 25% of the data 2 in ascending order of deviation from the straight line 3.

具体的にいうと、ステップＳ１４において、ＣＰＵ１１は、直線３との偏差が閾値を超えないデータ２を抽出し、そのときのデータ２に共通する条件や特徴などを抽出する。ここで共通する条件や特徴とは、例えば生産地域が同一であることや、生産地域および使用地域が同一であること等である。抽出されるデータ数が多いほど、信頼性の高い相関が導出される。よって、信頼性の高い相関が導出される条件を用いることで、必要な条件を自動で抽出可能である。 Specifically, in step S14, the CPU 11 extracts the data 2 whose deviation from the straight line 3 does not exceed the threshold value, and extracts the conditions and features common to the data 2 at that time. The conditions and characteristics common here are, for example, that the production area is the same, the production area and the usage area are the same, and the like. The greater the number of data extracted, the more reliable the correlation is derived. Therefore, it is possible to automatically extract the necessary conditions by using the conditions from which a highly reliable correlation is derived.

ＣＰＵ１１は、直線３を更に１度回転させ（Ｓ１５）、１８０度まで回転し終えたか否かを判定する（Ｓ１６）。ＣＰＵ１１は、直線３を１８０度まで回転し終えていないならば（Ｎｏ）、ステップＳ１３に戻り、直線３を１８０度まで回転し終えたならば（Ｙｅｓ）、ステップＳ１７に進む。即ちＣＰＵ１１は、ステップＳ１３〜Ｓ１６において、重心を通る直線を１度ごとに回転させて各直線３としている。 The CPU 11 further rotates the straight line 3 once (S15), and determines whether or not the straight line 3 has been rotated to 180 degrees (S16). The CPU 11 returns to step S13 if the straight line 3 has not been rotated to 180 degrees (No), and proceeds to step S17 if the straight line 3 has been rotated to 180 degrees (Yes). That is, in steps S13 to S16, the CPU 11 rotates a straight line passing through the center of gravity every degree to form each straight line 3.

ＣＰＵ１１は、直線３の周りのデータを抽出する（Ｓ３０）。次にＣＰＵ１１は、抽出したデータについて変数毎の出現比率を算出する（Ｓ３１）。ここで変数の出現比率とは、この変数の最頻値の比率、または、この変数のヒストグラムのうち個数が多いデータの比率のことをいう。 The CPU 11 extracts data around the straight line 3 (S30). Next, the CPU 11 calculates the appearance ratio for each variable with respect to the extracted data (S31). Here, the appearance ratio of the variable means the ratio of the mode value of this variable or the ratio of the data having a large number in the histogram of this variable.

図１１は、相関性を抽出した分析結果５を示す図である。この分析結果５は、図２のステップＳ１９の処理にて表示される。
この分析結果５は、番号欄と、対象変数欄と、直線式欄と、評価数値欄と、変数名#1欄および範囲欄#1、変数名#2欄および範囲欄#2を含んでいる。なお、更に右側の変数名#n欄および範囲欄は記載を省略している。 FIG. 11 is a diagram showing the analysis result 5 from which the correlation was extracted. The analysis result 5 is displayed in the process of step S19 of FIG.
This analysis result 5 includes a number column, a target variable column, a linear expression column, an evaluation numerical value column, a variable name column # 1 and a range column # 1, a variable name # 2 column and a range column # 2. .. The variable name #n column and range column on the right side are omitted.

直線式欄は、直線の定数と傾き（一次定数）が示されている。ここでは「ｙ＝ａｘ＋ｂ」と記載されているが、実際にはａとｂに具体的な数値が示されている。
評価数値欄は、単一変数または／および変数の組合せの評価数値が示されている。ここで評価数値とは、相関係数と条件付き確率の積である。 In the linear column, the constant and slope (first-order constant) of the straight line are shown. Here, "y = ax + b" is described, but in reality, specific numerical values are shown in a and b.
The evaluation value column shows the evaluation value of a single variable or / and a combination of variables. Here, the evaluation value is the product of the correlation coefficient and the conditional probability.

クラスタリングを実施した結果の一例を図１４に示す。図１４は、３つのクラスタ２２ａ，２２ｂ，２２ｃに分けられている。各クラスタ２２ａ，２２ｂ，２２ｃは、重心２１ａ，２１ｂ，２１ｃを含んでいる。以下、各クラスタを区別しないときには、単にクラスタ２２と記載する。
ＣＰＵ１１は、各クラスタ２２ａ，２２ｂ，２２ｃの重心２１ａ，２１ｂ，２１ｃから、それぞれ直線３ａ，３ｂ，３ｃを引く。ＣＰＵ１１は、これら直線３ａ，３ｂ，３ｃを同時に回転させながら単一変数または／および変数の組合せの相関係数と条件付確率を求める（Ｓ５６）。これら直線３ａ，３ｂ，３ｃは、後記する図１５に示されている。このステップＳ５６の処理は、図２のステップＳ１３〜Ｓ１６の処理に対応する。 An example of the result of performing clustering is shown in FIG. FIG. 14 is divided into three clusters 22a, 22b, 22c. Each cluster 22a, 22b, 22c includes a center of gravity 21a, 21b, 21c. Hereinafter, when each cluster is not distinguished, it is simply referred to as cluster 22.
The CPU 11 draws straight lines 3a, 3b, 3c from the centers of gravity 21a, 21b, 21c of each cluster 22a, 22b, 22c, respectively. The CPU 11 obtains the correlation coefficient and the conditional probability of a single variable or / or a combination of variables while rotating these straight lines 3a, 3b, and 3c at the same time (S56). These straight lines 3a, 3b, and 3c are shown in FIG. 15 described later. The process of step S56 corresponds to the process of steps S13 to S16 of FIG.

（ａ）第１の実施形態では、直線を引く点を重心の１点としているが、第２の実施形態のように複数の点から複数の直線を引いてもよく、限定されない。
（ｂ）クラスタリングの方法は、ｋ−ｍｅａｎｓに限定されず、任意の方法であってもよい。
（ｃ）抽出する変数の組合せは任意個数でよいが、経験的に３個までが好適である。
（ｄ）直線の近傍のデータを抽出したのちの条件抽出の方法は、図７Ａと図７Ｂの処理に限定されず、相関ルール抽出手法を使用して、目的変数および説明変数に相関があるものを抽出してもよい。
（ｅ）直線を回転させるステップは、１度ごとの回転角に限定されず、所定の角度ごとに回転させてもよい。
（ｆ）直線との偏差が閾値を超えないデータを２５％だけ抽出しているが、２５％に限定されず、任意の割合だけ抽出すればよい。
（ｇ）相関係数と条件付き確率の積の降順で、単一変数または／および変数の組合せを並び替えてランキング表示しているが、これに限られず、単一変数または／および変数の組合せを相関係数で並び替えてランキング表示してもよい。
（ｈ）コンピュータは、分析データを構成する複数の変数のうち２つの変数を目的変数および説明変数とする指定を受け付ける。しかし、これに限られず、コンピュータが、分析データを構成する複数の変数のうち２つの変数を目的変数および説明変数として選択してもよい。 (A) In the first embodiment, a point where a straight line is drawn is set as one point of the center of gravity, but as in the second embodiment, a plurality of straight lines may be drawn from a plurality of points, and the present invention is not limited.
(B) The clustering method is not limited to k-means and may be any method.
(C) The number of combinations of variables to be extracted may be arbitrary, but empirically, up to 3 is preferable.
(D) The method of condition extraction after extracting the data in the vicinity of the straight line is not limited to the processing of FIGS. 7A and 7B, and the objective variable and the explanatory variable are correlated by using the correlation rule extraction method. May be extracted.
(E) The step of rotating the straight line is not limited to the rotation angle of each degree, and may be rotated by a predetermined angle.
(F) Only 25% of the data whose deviation from the straight line does not exceed the threshold value is extracted, but the data is not limited to 25%, and only an arbitrary ratio may be extracted.
(G) in descending order of the product of correlation coefficient and conditional probabilities, although ranking display rearranges combination of single variables and / or variables, not limited to this, the combination of a single variable or / and variables the may be ranking display is sorted by the correlation coefficient.
(H) The computer accepts designation that two of the plurality of variables constituting the analysis data are the objective variable and the explanatory variable. However, the present invention is not limited to this, and the computer may select two of the plurality of variables constituting the analysis data as the objective variable and the explanatory variable.

１コンピュータ
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４入力部
１５表示部
１６記憶部
１６１分析データ
１６２相関性抽出プログラム
２データ
２１，２１ａ〜２１ｃ重心
２２ａ〜２２ｃクラスタ
３，３ａ〜３ｃ直線
４初期設定画面
４１データ選択コンボボックス
４２目的変数コンボボックス
４３説明変数コンボボックス
４４ＯＫボタン
４５キャンセルボタン
５分析結果 1 computer 11 CPU
12 ROM
13 RAM
14 Input unit 15 Display unit 16 Storage unit 161 Analysis data 162 Correlation extraction program 2 Data 21,21a to 21c Center of gravity 22a to 22c Cluster 3,3a to 3c Straight line 4 Initial setting screen 41 Data selection combo box 42 Objective variable combo box 43 Explanatory variable combo box 44 OK button 45 Cancel button 5 Analysis result

Claims

The computer
A step that accepts the specification of two of the multiple variables that make up the analysis data,
In the scatter plot of the two variables, the step of calculating each regression line passing through the center of gravity of the analysis data, and
A step of extracting each data whose deviation from each regression line does not exceed the threshold value, and
Steps to calculate each correlation coefficient from each of the above data,
Steps to calculate each conditional probability of a single variable or / and a combination of variables, and
A step of displaying the single variable or / and a combination of the variables on the display unit based on each of the correlation coefficients and each of the conditional probabilities.
Correlation extraction method characterized by carrying out.

The computer
In the step of calculating each regression line, the straight line passing through the center of gravity is rotated by a predetermined angle to obtain each regression line.
The correlation extraction method according to claim 1, wherein the correlation extraction method is characterized.

The computer
After performing the step of clustering the analysis data, the step of designating two of the plurality of variables constituting the analysis data is performed.
Each of the regression lines passes through the center of gravity of each cluster of the analytical data in the scatter plot of the two variables.
The correlation extraction method according to claim 1 or 2, wherein the correlation extraction method is characterized.

The computer
In the step of displaying the single variable or / and the combination of the variables on the display unit, the single variable or / and the combination of the variables are displayed in descending order of the product of each correlation coefficient and each conditional probability. To do,
The correlation extraction method according to any one of claims 1 to 3.

The computer
In the step of displaying the single variable or / and the combination of the variables on the display unit, the single variable and / and the combination of the variables are displayed in descending order of each of the correlation coefficients.
The correlation extraction method according to any one of claims 1 to 3.

The computer
The threshold value is set so that a predetermined ratio of the analysis data is extracted.
The correlation extraction method according to claim 1 or 2, wherein the correlation extraction method is characterized.

The computer
In extracting the single variable and / and the combination of the variables for which the conditional probability is calculated, the appearance ratio of the data in the mode range of the single variable and / and the combination of the variables is a predetermined value. If so, the step of combining other variables,
The correlation extraction method according to any one of claims 1 to 6, wherein the above method is carried out.

The computer
In extracting the single variable for which the conditional probability is calculated, if the appearance ratio of the data in the range of the mode of the single variable is less than a predetermined value, the step of expanding the range.
The correlation extraction method according to any one of claims 1 to 7, wherein the method is performed.

On the computer
The process of accepting the designation of two of the multiple variables that make up the analysis data,
A step of calculating each regression line passing through the center of gravity of the analysis data in the scatter plot of the two variables.
Step of extracting each data whose deviation from each regression line does not exceed the threshold value,
The process of calculating the correlation coefficient from each of the above data,
The process of calculating each conditional probability of a single variable or / and a combination of variables,
A step of displaying the single variable or / and a combination of the variables on the display unit based on each of the correlation coefficients and each of the conditional probabilities.
Correlation extraction program to execute.