JP2010153588A

JP2010153588A - Program, apparatus and method for analyzing data

Info

Publication number: JP2010153588A
Application number: JP2008330114A
Authority: JP
Inventors: Hidetaka Tsuda; 英隆津田; Eidai Shirai; 英大白井; Mariko Fukuda; 真理子福田
Original assignee: Fujitsu Semiconductor Ltd
Current assignee: Fujitsu Semiconductor Ltd
Priority date: 2008-12-25
Filing date: 2008-12-25
Publication date: 2010-07-08

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently extract remarkable characteristics, statistical significant difference and correlation between data. <P>SOLUTION: In a graph of response variables by kind after splitting, an axis of abscissas refers to an assignment record number and an axis of ordinates refers to a response variable (temperature T1). In addition, a group split in a first layer is indicated as a thick dotted oval. Thus, in regression tree analysis, an explanatory variable most affecting a value of the response variable by two-dividing a set and its range are automatically extracted. Since the explanatory variable is only the assignment record number E when sorted in the order of magnitudes of the response variables, it can be known which assignment record number E endures the most clear grouping when the response variables are split into two large and small record groups. That is, grouping as shown in Fig.9-1 results in set branching with the largest significant difference for the temperature T1. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、広く産業界で取り扱われるデータ間の関連を解析するデータ解析プログラム、データ解析装置およびデータ解析方法に関する。 The present invention relates to a data analysis program, a data analysis apparatus, and a data analysis method for analyzing a relationship between data widely handled in the industry.

従来、半導体プロセスをはじめとして、多くのサイトで多種大量のデータが計算機システムに蓄積されている。これらのデータはただ蓄積されるだけでは、収益をもたらさない。そこで、これらの多種大量のデータに潜む規則性、特徴を効率的に見出すデータ解析技術の一つがデータマイニングであり、産業界でよく活用されている。 Conventionally, a large amount of data has been accumulated in computer systems at many sites including semiconductor processes. Simply accumulating these data does not generate revenue. Therefore, data mining is one of the data analysis techniques for efficiently finding out the regularity and features hidden in these large amounts of data, and is often used in the industry.

データマイニングは、金融、流通等の分野では従来からよく活用され成果をあげてきたが、近年では半導体をはじめとするプロセスデータ解析の分野でも適用されるようになってきている。 Data mining has been widely used and has been successful in the fields of finance and distribution, but has recently been applied to the field of process data analysis including semiconductors.

データマイニングの手法として、相関分析（Ａｓｓｏｃｉａｔｉｏｎ）、クラスタリング分析（Ｃｌｕｓｔｅｒ）、判別木分析、ニューラルネット分析（Ｎｅｕｒａｌｎｅｔ）、ＭＢＲ（ＭｅｍｏｒｙＢａｓｅｄＲｅａｓｏｎｉｎｇ）分析等がある。 Data mining techniques include correlation analysis, clustering analysis, clustering analysis, neural network analysis, neural net analysis, MBR (Memory Based Reasoning) analysis, and the like.

たとえば、対象とする全データを一括して解析するデータ解析（従来例１）がある。また、全レコードをいくつかのレコード群に分割して解析を行い、各レコード群について存在している相関関係を抽出するデータ解析（従来例２）がある（たとえば、下記特許文献１，２を参照。）。 For example, there is data analysis (conventional example 1) in which all target data is analyzed at once. Further, there is a data analysis (conventional example 2) in which all records are divided into several record groups and analyzed, and the correlation existing for each record group is extracted (for example, Patent Documents 1 and 2 below) reference.).

また、条件の異なっているレコード群に分割して解析を行い、各レコード群について存在している相関関係を抽出するデータ解析（従来例３）もある。たとえば、製造工程における使用装置をはじめとする条件別にデータを解析する。 There is also a data analysis (conventional example 3) in which analysis is performed by dividing into record groups with different conditions, and the correlation existing for each record group is extracted. For example, data is analyzed according to conditions including equipment used in the manufacturing process.

特開２００６−０８６４０３号公報JP 2006-086403 A 特開２００６−３３８２６５号公報JP 2006-338265 A

しかしながら、上述した従来例１のデータ解析では、各種の条件で収集されたデータが混じり合い互いが雑音となり、本来存在している特徴が抽出されないという問題があった。また、上述した従来例２のデータ解析では、ある変数でソーティングしてレコード順を変更することもできるが、原則的にはレコードの存在順にレコード群が作成されるが、レコード群を構成するレコード数が固定であり、デ−タ内容に応じた柔軟なものではない。したがって、顕著な相関関係が抽出されない場合も多いという問題があった。 However, the above-described data analysis of Conventional Example 1 has a problem in that data collected under various conditions are mixed and noises are generated from each other, so that originally existing features cannot be extracted. In the data analysis of Conventional Example 2 described above, the record order can be changed by sorting with a certain variable, but in principle, record groups are created in the order in which records exist, but the records that make up the record group The number is fixed and is not flexible according to the data contents. Therefore, there is a problem that a significant correlation is not often extracted.

さらに、説明変数が工程の使用装置のような名義変数の属性値である場合は、集合分割基準が明確であるので有効である。しかしながら、上述した従来例３のデータ解析のように、数値変数である説明変数でグループ分けする場合、すべての数値についての組み合わせでグループ分けしていくのは、分割パターンが多くなり非効率であるという問題があった。 Furthermore, when the explanatory variable is an attribute value of a nominal variable such as a process using device, it is effective because the set division criterion is clear. However, as in the data analysis of Conventional Example 3 described above, when grouping with explanatory variables that are numerical variables, it is inefficient to group by the combination of all the numerical values because of a large number of division patterns. There was a problem.

この発明は、上述した従来技術による問題点を解消するため、データ間の顕著な特徴や統計的有意差、相関関係を効率的に抽出することができるデータ解析プログラム、データ解析装置およびデータ解析方法を提供することを目的とする。 The present invention eliminates the problems caused by the prior art described above, and a data analysis program, a data analysis apparatus, and a data analysis method capable of efficiently extracting remarkable features, statistically significant differences, and correlations between data The purpose is to provide.

上述した課題を解決し、目的を達成するため、このデータ解析プログラム、データ解析装置、およびデータ解析方法は、説明変数となる順序番号と当該順序番号に対応する目的変数とをレコードとする解析対象データを取得し、取得された解析対象データのレコードを前記目的変数の昇順または降順にソートし、ソート後のレコードに対してあらたな順序番号を割り当て、回帰木分析に基づいて、割り当てられた順序番号（以下、「割当順序番号」という）から選ばれた特定の番号を境界として、前記ソート後のレコードを２つの分割グループに分割し、分割された２つの分割グループを出力することを要件とする。 In order to solve the above-described problems and achieve the object, the data analysis program, the data analysis apparatus, and the data analysis method are configured to analyze a sequence number as an explanatory variable and a target variable corresponding to the sequence number as a record. Acquire data, sort the records of the acquired data to be analyzed in ascending or descending order of the objective variable, assign a new sequence number to the sorted records, and assign the order based on regression tree analysis It is a requirement to divide the sorted records into two divided groups with a specific number selected from the numbers (hereinafter referred to as “allocation order number”) as a boundary, and to output the two divided groups. To do.

このデータ解析プログラム、データ解析装置およびデータ解析方法によれば、割当順序番号を唯一の説明変数とすることで目的変数が大きく変動している箇所を境界としてグループ化することができる。 According to the data analysis program, the data analysis apparatus, and the data analysis method, the allocation sequence number is set as the only explanatory variable, so that the location where the objective variable greatly varies can be grouped as a boundary.

このデータ解析プログラム、データ解析装置、およびデータ解析方法によれば、データ間の顕著な特徴や統計的有意差、相関関係を効率的に抽出することができるという効果を奏する。 According to the data analysis program, the data analysis apparatus, and the data analysis method, it is possible to efficiently extract remarkable features, statistically significant differences, and correlations between data.

以下に添付図面を参照して、このデータ解析プログラム、データ解析装置およびデータ解析方法の好適な実施の形態を詳細に説明する。 Exemplary embodiments of a data analysis program, a data analysis apparatus, and a data analysis method will be described below in detail with reference to the accompanying drawings.

図１は、解析対象データの一例を示す説明図である。解析対象データは説明変数と複数種類の目的変数を有する。説明変数は順序をあらわすレコード番号（順序番号）である。たとえば、同種の製造装置に割り振られた一連の番号である。ここでは、レコード番号をｅとし、ｅ＝１，２，３，…，２５とする。すなわち、１番から２５番までの同種の製造装置を示していることとなる。 FIG. 1 is an explanatory diagram illustrating an example of analysis target data. The analysis target data has explanatory variables and a plurality of types of objective variables. The explanatory variable is a record number (order number) indicating the order. For example, it is a series of numbers assigned to the same kind of manufacturing apparatus. Here, the record number is e, and e = 1, 2, 3,. That is, the same kind of manufacturing apparatuses from No. 1 to No. 25 are shown.

また、目的変数は、各レコード番号において得られたパラメータである。本例では、４種類の目的変数を用意している。これら目的変数は、たとえば、製造装置ごとの異なる条件下（時間帯別など）での温度ｔ１〜ｔ４を示している。ここでは、説明変数であるレコード番号とその目的変数の組み合わせをレコードと称す。したがって、目的変数の種別ごとのレコードの集合が解析対象データとなる。図１では、温度ｔ１〜ｔ４の４種類の解析対象データがまとめられている。 The objective variable is a parameter obtained for each record number. In this example, four types of objective variables are prepared. These objective variables indicate, for example, temperatures t1 to t4 under different conditions (for example, by time zone) for each manufacturing apparatus. Here, a combination of a record number as an explanatory variable and its objective variable is referred to as a record. Therefore, a set of records for each type of objective variable becomes analysis target data. In FIG. 1, four types of analysis target data of temperatures t1 to t4 are collected.

図２−１〜図２−４は、目的変数の種類別のグラフを示す説明図である。図２−１は温度ｔ１、図２−２は温度ｔ２、図２−３は温度ｔ３、図２−４は温度ｔ４についてグラフ化している。図２−１〜図２−４のグラフにおいて横軸はレコード番号、縦軸は温度を示している。 FIGS. 2-1 to 2-4 are explanatory diagrams illustrating graphs for different types of objective variables. FIG. 2-1 is a graph for the temperature t1, FIG. 2-2 is the temperature t2, FIG. 2-3 is the temperature t3, and FIG. 2-4 is the temperature t4. In the graphs of FIGS. 2-1 to 2-4, the horizontal axis indicates the record number, and the vertical axis indicates the temperature.

図３−１〜図３−６は、全レコード番号についての目的変数の組み合わせごとの相関図を示す説明図である。図３−１は温度ｔ１と温度ｔ２との相関図、図３−２は温度ｔ１と温度ｔ３との相関図、図３−３は温度ｔ１と温度ｔ４との相関図、図３−４は温度ｔ２と温度ｔ３との相関図、図３−５は温度ｔ２と温度ｔ４との相関図、図３−６は温度ｔ３と温度ｔ４との相関図である。図３−１〜図３−６内の直線は一次回帰式であり、Ｒ²は寄与率である。図２−１〜図２−４および図３−１〜図３−６のいずれのグラフにおいても顕著な相関関係は見られないことがわかる。 FIGS. 3-1 to 3-6 are explanatory diagrams illustrating correlation diagrams for each combination of objective variables for all record numbers. 3-1 is a correlation diagram between temperature t1 and temperature t2, FIG. 3-2 is a correlation diagram between temperature t1 and temperature t3, FIG. 3-3 is a correlation diagram between temperature t1 and temperature t4, and FIG. FIG. 3-5 is a correlation diagram between the temperature t2 and the temperature t4, and FIG. 3-6 is a correlation diagram between the temperature t3 and the temperature t4. The straight lines in FIGS. 3-1 to 3-6 are linear regression equations, and R ² is the contribution rate. It can be seen that no significant correlation is found in any of the graphs of FIGS. 2-1 to 2-4 and FIGS. 3-1 to 3-6.

（データ解析装置のハードウェア構成）
図４は、本実施の形態にかかるデータ解析装置のハードウェア構成を示すブロック図である。図４において、データ解析装置は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）４０１と、ＲＯＭ（Ｒｅａｄ‐ＯｎｌｙＭｅｍｏｒｙ）４０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）４０３と、磁気ディスクドライブ４０４と、磁気ディスク４０５と、光ディスクドライブ４０６と、光ディスク４０７と、ディスプレイ４０８と、Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）４０９と、キーボード４１０と、マウス４１１と、スキャナ４１２と、プリンタ４１３と、を備えている。また、各構成部はバス４００によってそれぞれ接続されている。 (Hardware configuration of data analyzer)
FIG. 4 is a block diagram showing a hardware configuration of the data analysis apparatus according to the present embodiment. In FIG. 4, the data analysis apparatus includes a CPU (Central Processing Unit) 401, a ROM (Read-Only Memory) 402, a RAM (Random Access Memory) 403, a magnetic disk drive 404, a magnetic disk 405, and an optical disk drive. 406, an optical disk 407, a display 408, an I / F (Interface) 409, a keyboard 410, a mouse 411, a scanner 412, and a printer 413. Each component is connected by a bus 400.

ここで、ＣＰＵ４０１は、データ解析装置の全体の制御を司る。ＲＯＭ４０２は、ブートプログラムなどのプログラムを記憶している。ＲＡＭ４０３は、ＣＰＵ４０１のワークエリアとして使用される。磁気ディスクドライブ４０４は、ＣＰＵ４０１の制御にしたがって磁気ディスク４０５に対するデータのリード／ライトを制御する。磁気ディスク４０５は、磁気ディスクドライブ４０４の制御で書き込まれたデータを記憶する。 Here, the CPU 401 controls the entire data analysis apparatus. The ROM 402 stores programs such as a boot program. The RAM 403 is used as a work area for the CPU 401. The magnetic disk drive 404 controls the reading / writing of the data with respect to the magnetic disk 405 according to control of CPU401. The magnetic disk 405 stores data written under the control of the magnetic disk drive 404.

光ディスクドライブ４０６は、ＣＰＵ４０１の制御にしたがって光ディスク４０７に対するデータのリード／ライトを制御する。光ディスク４０７は、光ディスクドライブ４０６の制御で書き込まれたデータを記憶したり、光ディスク４０７に記憶されたデータをコンピュータに読み取らせたりする。 The optical disk drive 406 controls reading / writing of data with respect to the optical disk 407 according to the control of the CPU 401. The optical disk 407 stores data written under the control of the optical disk drive 406, or causes the computer to read data stored on the optical disk 407.

ディスプレイ４０８は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する。このディスプレイ４０８は、たとえば、ＣＲＴ、ＴＦＴ液晶ディスプレイ、プラズマディスプレイなどを採用することができる。 The display 408 displays data such as a document, an image, and function information as well as a cursor, an icon, or a tool box. As the display 408, for example, a CRT, a TFT liquid crystal display, a plasma display, or the like can be adopted.

インターフェース（以下、「Ｉ／Ｆ」と略する。）４０９は、通信回線を通じてＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどのネットワーク４１４に接続され、このネットワーク４１４を介して他の装置に接続される。そして、Ｉ／Ｆ４０９は、ネットワーク４１４と内部のインターフェースを司り、外部装置からのデータの入出力を制御する。Ｉ／Ｆ４０９には、たとえばモデムやＬＡＮアダプタなどを採用することができる。 An interface (hereinafter abbreviated as “I / F”) 409 is connected to a network 414 such as a LAN (Local Area Network), a WAN (Wide Area Network), and the Internet through a communication line, and the other via the network 414. Connected to other devices. The I / F 409 controls an internal interface with the network 414 and controls data input / output from an external apparatus. For example, a modem or a LAN adapter may be employed as the I / F 409.

キーボード４１０は、文字、数字、各種指示などの入力のためのキーを備え、データの入力をおこなう。また、タッチパネル式の入力パッドやテンキーなどであってもよい。マウス４１１は、カーソルの移動や範囲選択、あるいはウィンドウの移動やサイズの変更などをおこなう。ポインティングデバイスとして同様に機能を備えるものであれば、トラックボールやジョイスティックなどであってもよい。 The keyboard 410 includes keys for inputting characters, numbers, various instructions, and the like, and inputs data. Moreover, a touch panel type input pad or a numeric keypad may be used. The mouse 411 moves the cursor, selects a range, moves the window, changes the size, and the like. A trackball or a joystick may be used as long as they have the same function as a pointing device.

スキャナ４１２は、画像を光学的に読み取り、データ解析装置内に画像データを取り込む。なお、スキャナ４１２は、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）機能を持たせてもよい。また、プリンタ４１３は、画像データや文書データを印刷する。プリンタ４１３には、たとえば、レーザプリンタやインクジェットプリンタを採用することができる。 The scanner 412 optically reads an image and takes in the image data into the data analysis apparatus. The scanner 412 may have an OCR (Optical Character Reader) function. The printer 413 prints image data and document data. As the printer 413, for example, a laser printer or an ink jet printer can be adopted.

（データ解析装置の機能的構成）
図５は、データ解析装置の機能的構成を示すブロック図である。データ解析装置５００は、取得部５０１と、ソート部５０２と、割当部５０３と、分割部５０４と、特定部５０５と、選択部５０６と、相関関係情報算出部５０７と、まとまり度算出部５０８と、決定部５０９と、出力部５１０とを含む構成である。これら各機能は、具体的には、たとえば、図４に示したＲＯＭ４０２、ＲＡＭ４０３、磁気ディスク４０５、光ディスク４０７などの記憶領域に記憶されたプログラムをＣＰＵ４０１に実行させることにより、または、Ｉ／Ｆ４０９により、その機能を実現する。また、各機能は、その前段の機能で生成されたデータをＲＯＭ４０２、ＲＡＭ４０３、磁気ディスク４０５などの記憶装置から読み出し、当該機能で生成した情報を記憶装置に記憶する。 (Functional configuration of data analysis device)
FIG. 5 is a block diagram illustrating a functional configuration of the data analysis apparatus. The data analysis apparatus 500 includes an acquisition unit 501, a sorting unit 502, an allocation unit 503, a dividing unit 504, a specifying unit 505, a selecting unit 506, a correlation information calculating unit 507, and a unity degree calculating unit 508. , A determination unit 509 and an output unit 510. Specifically, each of these functions is executed by, for example, causing the CPU 401 to execute a program stored in a storage area such as the ROM 402, the RAM 403, the magnetic disk 405, and the optical disk 407 illustrated in FIG. 4 or by the I / F 409. Realize its function. Each function reads data generated by the preceding function from a storage device such as the ROM 402, the RAM 403, and the magnetic disk 405, and stores the information generated by the function in the storage device.

取得部５０１は、説明変数となる順序番号と当該順序番号に対応する目的変数とをレコードとする解析対象データを取得する機能を有する。具体的には、たとえば、図１に示した解析対象データを取得する。より具体的には、外部から解析対象データを受信したり、ＲＯＭ４０２、ＲＡＭ４０３、磁気ディスク４０５などの記憶装置に記憶されている解析対象データを読み出す。 The acquisition unit 501 has a function of acquiring analysis target data in which a sequence number serving as an explanatory variable and a target variable corresponding to the sequence number are records. Specifically, for example, the analysis target data shown in FIG. 1 is acquired. More specifically, the analysis target data is received from the outside, or the analysis target data stored in a storage device such as the ROM 402, the RAM 403, and the magnetic disk 405 is read.

ソート部５０２は、解析対象データのレコードを目的変数の昇順または降順にソートする機能を有する。図６−１〜図６−４は、目的変数の種類別のソート結果を示す説明図である。図６−１〜図６−４において、温度Ｔ１〜Ｔ４は、温度ｔ１〜ｔ４を昇順にソートしたデータ列である。このソートにしたがって、説明変数であるレコード番号ｅも並び替えられている。 The sort unit 502 has a function of sorting the records of the analysis target data in ascending order or descending order of the objective variable. FIG. 6A to FIG. 6D are explanatory diagrams illustrating sorting results for each type of objective variable. 6A to 6D, the temperatures T1 to T4 are data strings in which the temperatures t1 to t4 are sorted in ascending order. According to this sort, the record number e which is an explanatory variable is also rearranged.

割当部５０３は、ソート部５０２によるソート後のレコードに対してあらたな順序番号を割り当てる機能を有する。ソート部５０２により並び替えられたレコード番号は、シーケンシャルではないため、図６−１〜図６−４に示すように、あらたに順序をあらわすレコード番号をＥとし、Ｅ＝１，２，３，…，２５とする。このレコード番号Ｅを割当レコード番号と称す。割当レコード番号は、Ｅ＝１から順に昇順の目的変数が割り当てられることとなる。これにより、元のレコード番号ｅ、割当レコード番号Ｅ、および目的変数の組み合わせが１つのレコードとなる。 The assigning unit 503 has a function of assigning a new sequence number to the records after sorting by the sorting unit 502. Since the record numbers rearranged by the sorting unit 502 are not sequential, as shown in FIGS. 6-1 to 6-4, E is a record number that represents a new order, and E = 1, 2, 3, ..., 25. This record number E is referred to as an assigned record number. The assigned record numbers are assigned objective variables in ascending order from E = 1. Thereby, the combination of the original record number e, the assigned record number E, and the objective variable becomes one record.

図７−１〜図７−４は、ソート後における目的変数の種類別のグラフを示す説明図である。横軸は割当レコード番号Ｅであり縦軸は目的変数（温度Ｔ１〜Ｔ４）である。 FIG. 7A to FIG. 7D are explanatory diagrams illustrating graphs for each type of objective variable after sorting. The horizontal axis is the assigned record number E, and the vertical axis is the objective variable (temperatures T1 to T4).

図５において、分割部５０４は、回帰木分析に基づいて割当レコード番号から選ばれた特定の番号を境界として、ソート後の目的変数を２つの分割グループに分割する機能を有する。すなわち、割当前のレコード番号ｅではなく、割当レコード番号Ｅを基準に分割をするために、割当レコード番号Ｅを説明変数として回帰木分析を実行する。回帰木分析では、分割前の目的変数の集合の平方和をＳ０、分割後の一方の目的変数の集合の平方和をＳ１、分割後の他方の目的変数の集合の平方和をＳ２とした場合、ΔＳが最大となるように、Ｓ１とＳ２に分割する。 In FIG. 5, the dividing unit 504 has a function of dividing the sorted objective variable into two divided groups with a specific number selected from the allocation record numbers based on the regression tree analysis as a boundary. That is, in order to divide based on the allocation record number E instead of the record number e before allocation, the regression tree analysis is executed using the allocation record number E as an explanatory variable. In regression tree analysis, S0 is the sum of squares of the set of objective variables before division, S1 is the sum of squares of the set of one objective variable after division, and S2 is the sum of squares of the other set of objective variables after division. , ΔS is divided into S1 and S2 so as to be maximum.

ΔＳ＝Ｓ０−（Ｓ１＋Ｓ２）・・・・・・・（１） ΔS = S0− (S1 + S2) (1)

具体的には、たとえば、図６−１における割当レコード番号Ｅ＝１〜２５の温度Ｔ１の集合をグループＧ０、割当レコード番号Ｅ＝１〜ｉの温度Ｔ１の集合をグループＧ１（ｉは１≦ｉ≦２５）、割当レコード番号Ｅ＝ｉ＋１〜２５の温度Ｔ１の集合をグループＧ２とし、それぞれの平方和Ｓ０〜Ｓ２を算出する。 Specifically, for example, in FIG. 6A, a set of temperatures T1 with allocation record numbers E = 1 to 25 is group G0, and a set of temperatures T1 with allocation record numbers E = 1 to i is group G1 (i is 1 ≦ 1). i ≦ 25), and a set of temperatures T1 of assigned record numbers E = i + 1 to 25 is set as a group G2, and the respective square sums S0 to S2 are calculated.

そして、ΔＳが最大となるようにグループ分けをする。すなわち、その境界となる割当レコード番号Ｅ＝ｉを特定することで、２つのグループに分割することができる。この分割処理は分割されたグループを分割元として再帰的に実行する。これにより、回帰木が生成されることとなる。 Then, grouping is performed so that ΔS is maximized. That is, by specifying the allocation record number E = i serving as the boundary, it can be divided into two groups. This division processing is recursively executed with the divided group as a division source. As a result, a regression tree is generated.

図８−１〜図８−４は、目的変数別に生成された回帰木を示す説明図である。どの階層まで分割するかは、あらかじめ設定しておく。たとえば、ｋ番目の階層まで分割したり、ΔＳの最大値がしきい値以下となるまで分割したりするなど、ユーザが解析目的に応じて設定することができる。なお、回帰木のルート（グループＧ）を第０階層とすると、図８−１〜図８−４に示した回帰木は第２階層まで分割されている。 8A to 8D are explanatory diagrams illustrating regression trees generated for each objective variable. The level to be divided is set in advance. For example, the user can set according to the analysis purpose, such as dividing up to the k-th hierarchy, or dividing until the maximum value of ΔS becomes a threshold value or less. If the root of the regression tree (group G) is the 0th hierarchy, the regression trees shown in FIGS. 8-1 to 8-4 are divided to the 2nd hierarchy.

また、図８−１において、第１階層に着目すると、割当レコード番号Ｅ＝１〜２５の目的変数のグループＧは、グループＧ１０−１とグループＧ１０−２に分割される。グループＧ１０−１は、割当レコード番号Ｅ＝１〜１５までの目的変数の集合であり、グループＧ１０−２は、割当レコード番号Ｅ＝１６〜２５までの目的変数の集合である。 In FIG. 8A, focusing on the first hierarchy, the group G of objective variables having allocation record numbers E = 1 to 25 is divided into a group G10-1 and a group G10-2. The group G10-1 is a set of objective variables with allocation record numbers E = 1 to 15 and the group G10-2 is a set of objective variables with allocation record numbers E = 16-25.

また、図８−２において、第１階層に着目すると、割当レコード番号Ｅ＝１〜２５の目的変数のグループＧは、グループＧ２０−１とグループＧ２０−２に分割される。グループＧ２０−１は、割当レコード番号Ｅ＝１〜１５までの目的変数の集合であり、グループＧ２０−２は、割当レコード番号Ｅ＝１６〜２５までの目的変数の集合である。 Also, in FIG. 8B, focusing on the first hierarchy, the group G of objective variables with allocation record numbers E = 1 to 25 is divided into a group G20-1 and a group G20-2. The group G20-1 is a set of objective variables with allocation record numbers E = 1 to 15, and the group G20-2 is a set of objective variables with allocation record numbers E = 16-25.

また、図８−３において、第１階層に着目すると、割当レコード番号Ｅ＝１〜２５の目的変数のグループＧは、グループＧ３０−１とグループＧ３０−２に分割される。グループＧ３０−１は、割当レコード番号Ｅ＝１〜２０までの目的変数の集合であり、グループＧ３０−２は、割当レコード番号Ｅ＝２１〜２５までの目的変数の集合である。 Further, in FIG. 8C, focusing on the first hierarchy, the group G of the objective variables having the allocation record numbers E = 1 to 25 is divided into a group G30-1 and a group G30-2. The group G30-1 is a set of objective variables with allocation record numbers E = 1 to 20, and the group G30-2 is a set of objective variables with allocation record numbers E = 21 to 25.

また、図８−４において、第１階層に着目すると、割当レコード番号Ｅ＝１〜２５の目的変数のグループＧは、グループＧ４０−１とグループＧ４０−２に分割される。グループＧ４０−１は、割当レコード番号Ｅ＝１〜１４までの目的変数の集合であり、グループＧ４０−２は、割当レコード番号Ｅ＝１５〜２５までの目的変数の集合である。 Also, in FIG. 8-4, focusing on the first hierarchy, the group G of objective variables with allocation record numbers E = 1 to 25 is divided into a group G40-1 and a group G40-2. The group G40-1 is a set of objective variables with allocation record numbers E = 1 to 14, and the group G40-2 is a set of objective variables with allocation record numbers E = 15 to 25.

図９−１〜図９−４は、分割後における目的変数の種類別のグラフを示す説明図である。横軸は割当レコード番号であり縦軸は目的変数（温度Ｔ１〜Ｔ４）である。また、各グラフにおいて、第１階層で分割されたグループを太点線の楕円で示す。 FIG. 9A to FIG. 9D are explanatory diagrams illustrating graphs for each type of objective variable after division. The horizontal axis is the assigned record number, and the vertical axis is the objective variable (temperatures T1 to T4). In each graph, the groups divided in the first hierarchy are indicated by bold dotted ellipses.

このように、回帰木分析では、集合を２分割することにより目的変数の値に最も影響を与える説明変数とその範囲が自動的に抽出される。ここでは、説明変数は目的変数の大きさ順にソートされた場合の割当レコード番号Ｅのみであるから、目的変数を大小の２レコード群に分割した場合、どの割当レコード番号Ｅで分けると最も明確に分けられるかが示される。すなわち、図９−１〜図９−４のようなグループ分けが各温度Ｔ１〜Ｔ４に最も大きな有意差が生じる集合分岐となる。 As described above, in the regression tree analysis, the explanatory variable that most influences the value of the objective variable and its range are automatically extracted by dividing the set into two. Here, since the explanatory variable is only the allocation record number E when the objective variable is sorted in the order of the size of the objective variable, when the objective variable is divided into two large and small record groups, the allocation record number E is most clearly divided. Shows whether it can be divided. That is, the grouping as shown in FIGS. 9-1 to 9-4 is a set branch in which the most significant difference occurs between the temperatures T1 to T4.

また、図５において、特定部５０５は、各分割グループを構成する目的変数に対応する順序番号を特定する機能を有する。具体的には、図６−１〜図６−４に示した各レコードにおいて、温度Ｔ１〜Ｔ４に対応する元の順序番号ｅを読み出す。このように、元の順序番号ｅを特定することで、図２−１〜図２−４に示したグラフに対し、分割グループを特徴付ける情報を与える。 Further, in FIG. 5, the specifying unit 505 has a function of specifying a sequence number corresponding to the objective variable constituting each divided group. Specifically, in each record shown in FIGS. 6-1 to 6-4, the original sequence number e corresponding to the temperatures T1 to T4 is read. Thus, by specifying the original sequence number e, information that characterizes the divided groups is given to the graphs shown in FIGS. 2-1 to 2-4.

図１０−１〜図１０−４は、分割グループを割り当てた目的変数の種類別のグラフを示す説明図である。図１０−１〜図１０−４では、第１階層で分割されたグループを太点線の楕円で示す。この楕円が分割グループを特徴付けている情報である。 10A to 10D are explanatory diagrams illustrating graphs according to the types of objective variables to which divided groups are assigned. In FIGS. 10-1 to 10-4, the groups divided in the first hierarchy are indicated by thick dotted ovals. This ellipse is information that characterizes the divided group.

図１０−１を例に挙げると、割当レコード番号Ｅ＝１〜１５に対応する元のレコード番号ｅ＝１〜５，１６〜２５の温度ｔ１が分割グループＧ１０−１に相当する。同様に、割当レコード番号Ｅ＝１６〜２５に対応する元のレコード番号ｅ＝６〜１５の温度ｔ１が分割グループＧ１０−２に相当する。 Taking FIG. 10-1 as an example, the temperature t1 of the original record numbers e = 1 to 5 and 16 to 25 corresponding to the assigned record numbers E = 1 to 15 corresponds to the divided group G10-1. Similarly, the temperature t1 of the original record numbers e = 6 to 15 corresponding to the allocation record numbers E = 16 to 25 corresponds to the divided group G10-2.

この２つの分割グループＧ１０−１，Ｇ１０−２間には何らかの差があり、他の変数値に影響をおよぼし合い、同時に解析すると本来存在していた顕著な相関関係が隠れてしまったことが考えられる。この場合、後述する選択部および相関関係情報算出部により、分割グループＧ１０−１，Ｇ１０−２のレコード群の各々について、顕著な相関関係の検出をおこなう。 There is some difference between the two divided groups G10-1 and G10-2, which affects other variable values, and when analyzed at the same time, the remarkable correlation that originally existed may be hidden. It is done. In this case, a significant correlation is detected for each of the record groups of the divided groups G10-1 and G10-2 by the selection unit and the correlation information calculation unit described later.

図５において、選択部５０６は、説明変数を共通にする複数種類の解析対象データの中から２種類の解析対象データを選択する機能を有する。具体的には、たとえば、温度ｔ１を目的変数とする解析対象データと温度ｔ２を目的変数とする解析対象データを選択する。組み合わせは目的変数の種別の総当りで選択することとなる。 In FIG. 5, the selection unit 506 has a function of selecting two types of analysis target data from a plurality of types of analysis target data having a common explanatory variable. Specifically, for example, analysis target data having a temperature t1 as an objective variable and analysis target data having a temperature t2 as an objective variable are selected. The combination is selected based on the brute force type of the objective variable.

相関関係情報算出部５０７は、選択部５０６によって選択された２種類の解析対象データの各々の目的変数のうち、特定部５０５によって特定された順序番号および当該順序番号に対応する目的変数どうしの相関関係をあらわす情報を算出する機能を有する。具体的には、たとえば、分割グループに対応する元のレコード番号に共通する目的変数を、選択された２種類の解析対象データから取り出し、一次回帰式や相関係数Ｒ、寄与率Ｒ²といった相関関係情報を算出する。 The correlation information calculation unit 507 correlates the sequence number identified by the identification unit 505 and the objective variable corresponding to the sequence number among the objective variables of the two types of analysis target data selected by the selection unit 506. It has a function of calculating information representing the relationship. Specifically, for example, a target variable common to the original record number corresponding to the divided group is extracted from the two types of analysis target data selected, and a correlation such as a linear regression equation, a correlation coefficient R, and a contribution rate R ^{2 is obtained.} Calculate relationship information.

図１１−１は、レコード番号ｅ＝６〜１５についての温度ｔ１，ｔ２の相関図を示す説明図である。図１１−２は、レコード番号ｅ＝６〜１５についての温度ｔ１，ｔ３の相関図を示す説明図である。図１１−３は、レコード番号ｅ＝６〜１５についての温度ｔ１，ｔ４の相関図を示す説明図である。図１１−４は、レコード番号ｅ＝６〜１５についての温度ｔ２，ｔ３の相関図を示す説明図である。図１１−５は、レコード番号ｅ＝６〜１５についての温度ｔ２，ｔ４の相関図を示す説明図である。図１１−６は、レコード番号ｅ＝６〜１５についての温度ｔ３，ｔ４の相関図を示す説明図である。これらは、レコード番号ｅ＝６〜１５に対応する分割グループＧ１０−２に特化した相関図である。 FIG. 11A is an explanatory diagram of a correlation diagram of temperatures t1 and t2 for record numbers e = 6 to 15. FIG. 11B is an explanatory diagram of a correlation diagram of the temperatures t1 and t3 for the record numbers e = 6 to 15. FIG. 11C is an explanatory diagram of a correlation diagram of the temperatures t1 and t4 for the record numbers e = 6 to 15. FIG. 11-4 is an explanatory diagram of a correlation diagram of temperatures t2 and t3 for record numbers e = 6 to 15. FIG. 11-5 is an explanatory diagram of a correlation diagram between temperatures t2 and t4 for record numbers e = 6 to 15. FIG. 11-6 is an explanatory diagram of a correlation diagram of temperatures t3 and t4 for record numbers e = 6 to 15. These are correlation diagrams specialized for the divided group G10-2 corresponding to the record numbers e = 6 to 15.

図１２−１は、レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ１，ｔ２の相関図を示す説明図である。図１２−２は、レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ１，ｔ３の相関図を示す説明図である。図１２−３は、レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ１，ｔ４の相関図を示す説明図である。図１２−４は、レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ２，ｔ３の相関図を示す説明図である。図１２−５は、レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ２，ｔ４の相関図を示す説明図である。図１２−６は、レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ３，ｔ４の相関図を示す説明図である。これらは、レコード番号ｅ＝１〜５，１６〜２５に対応する分割グループＧ１０−１に特化した相関図である。 FIG. 12A is an explanatory diagram of a correlation diagram of temperatures t1 and t2 for record numbers e = 1 to 5 and 16 to 25. FIG. 12B is an explanatory diagram of a correlation diagram of the temperatures t1 and t3 for the record numbers e = 1 to 5 and 16 to 25. FIG. 12C is an explanatory diagram of a correlation diagram of the temperatures t1 and t4 with respect to the record numbers e = 1 to 5 and 16 to 25. FIG. 12-4 is an explanatory diagram of a correlation diagram of the temperatures t2 and t3 for the record numbers e = 1 to 5 and 16 to 25. FIG. 12-5 is an explanatory diagram of a correlation diagram of the temperatures t2 and t4 for the record numbers e = 1 to 5 and 16 to 25. FIG. 12-6 is an explanatory diagram of a correlation diagram of the temperatures t3 and t4 for the record numbers e = 1 to 5 and 16 to 25. These are correlation diagrams specialized for the divided group G10-1 corresponding to the record numbers e = 1 to 5, 16 to 25.

図１１−１〜図１１−６および図１２−１〜図１２−６の相関図のうち、図１１−４の相関図では、一次回帰式がｙ＝ｘ、寄与率Ｒ²＝１となるため、レコード番号ｅ＝６〜１５についての温度ｔ２，ｔ３には顕著な相関関係があることがわかる。このように、すべての温度の組み合わせについて相関関係を求め、顕著な相関関係を取り出すこととしてもよい。 Among the correlation diagrams of FIGS. 11-1 to 11-6 and FIGS. 12-1 to 12-6, in the correlation diagram of FIG. 11-4, the linear regression equation is y = x and the contribution rate R ² = 1. Therefore, it can be seen that the temperatures t2 and t3 for the record numbers e = 6 to 15 have a significant correlation. Thus, it is good also as calculating | requiring a correlation about all the combinations of temperature, and taking out a remarkable correlation.

また、図５において、まとまり度算出部５０８は、説明変数を共通にする複数種類の解析対象データごとに、２つの分割グループの集合分割の明確さをあらわすまとまり度を算出する機能を有する。具体的には、たとえば、分割元となる目的変数の集合について平方和Ｓ０を算出する。また、一方の分割グループを構成する目的変数の集合について平方和Ｓ１を算出する。さらに、他方の分割グループを構成する目的変数の集合について平方和Ｓ２を算出する。まとまり度Ｕは、下記式（２）により算出する。 In FIG. 5, the unity degree calculation unit 508 has a function of calculating a degree of unity that represents the clarity of set division of two division groups for each of a plurality of types of analysis target data having a common explanatory variable. Specifically, for example, the sum of squares S0 is calculated for the set of objective variables that are the division sources. In addition, the sum of squares S1 is calculated for the set of objective variables constituting one of the divided groups. Further, the sum of squares S2 is calculated for the set of objective variables that constitute the other divided group. The unity degree U is calculated by the following equation (2).

Ｕ＝［｛Ｓ０−（Ｓ１＋Ｓ２）｝／（Ｓ１＋Ｓ２）］×１００・・・（２） U = [{S0− (S1 + S2)} / (S1 + S2)] × 100 (2)

まとまり度Ｕの範囲は０≦Ｕ≦１００であり、値が大きいほど明確に集合分割が行われたことを示す。ここで、図８−１〜８−４に示した第１階層の分割グループでは、図８−３に示した分割グループＧ３０−１，Ｇ３０−２に分割したときのまとまり度が最大値（Ｕ＝８６．７８１３）となる。 The range of the unity degree U is 0 ≦ U ≦ 100, and the larger the value, the clearer the set partitioning is performed. Here, in the division group of the first hierarchy shown in FIGS. 8-1 to 8-4, the unity degree when dividing into the division groups G30-1 and G30-2 shown in FIG. = 86.7813).

また、図５において、決定部５０９は、まとまり度算出部５０８によって算出された各まとまり度に基づいて、複数種類の解析対象データの中から出力対象を決定する機能を有する。具体的には、複数種類の解析対象データにおいて、回帰木のある特定の階層における分割のうち、どの解析対象データの集合分割が明確であるかを決定する。たとえば、算出されたまとまり度のうち最大値の集合分割がおこなわれた分割グループを出力対象とする。より具体的には、まとまり度が最大となる集合分割がおこなわれた分割グループに対して、出力対象を識別するフラグを設定する。 In FIG. 5, the determination unit 509 has a function of determining an output target from a plurality of types of analysis target data based on each unit degree calculated by the unit degree calculation unit 508. Specifically, in a plurality of types of analysis target data, it is determined which set division of analysis target data is clear among divisions in a certain hierarchy of regression trees. For example, a divided group in which a set division of the maximum value among the calculated unity degrees is performed is an output target. More specifically, a flag for identifying an output target is set for the divided group that has undergone the set division that maximizes the unity.

図７−１〜図７−４に示した回帰木を例に挙げると、第１階層において温度ｔ３の値により全レコードを２分割した場合（図７−３）が、最も明確に大小に分かれるので、その要因も明確に存在する可能性が高い。言い換えれば、明確に異なった条件が存在しているため、全データで解析することはノイズの多い状態を解析することになる。したがって、分割して解析することにより、潜んでいた相関関係が抽出されやすくなる。 Taking the regression tree shown in FIGS. 7-1 to 7-4 as an example, the case where all records are divided into two by the value of temperature t3 in the first hierarchy (FIG. 7-3) is most clearly divided into large and small. Therefore, there is a high possibility that the factor is clearly present. In other words, since there are clearly different conditions, analyzing with all data results in analyzing a noisy state. Therefore, by dividing and analyzing, the hidden correlation is easily extracted.

また、相関関係情報算出部５０７は、決定部５０９によって出力対象に決定された分割グループについて相関関係情報を算出することとしてもよい。これにより、出力対象以外の相関関係情報を算出する必要がなく、計算処理の効率化を図ることができる。また、必要な情報のみ得られるため、データ解析の効率化も図ることができる。 Further, the correlation information calculation unit 507 may calculate the correlation information for the divided group determined as the output target by the determination unit 509. Thereby, it is not necessary to calculate correlation information other than the output target, and the calculation process can be made more efficient. Further, since only necessary information can be obtained, the efficiency of data analysis can be improved.

図１３−１は、レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ１，ｔ２の相関図を示す説明図である。図１３−２は、レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ１，ｔ３の相関図を示す説明図である。図１３−３は、レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ１，ｔ４の相関図を示す説明図である。図１３−４は、レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ２，ｔ３の相関図を示す説明図である。図１３−５は、レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ２，ｔ４の相関図を示す説明図である。図１３−６は、レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ３，ｔ４の相関図を示す説明図である。これらは、レコード番号ｅ＝１〜５，１１〜２５に対応する分割グループＧ３０−１に特化した相関図である。 FIG. 13A is an explanatory diagram of a correlation diagram of temperatures t1 and t2 with respect to record numbers e = 1 to 5 and 11 to 25. FIG. 13-2 is an explanatory diagram of a correlation diagram of the temperatures t1 and t3 for the record numbers e = 1 to 5 and 11 to 25. FIG. 13C is an explanatory diagram of a correlation diagram of the temperatures t1 and t4 with respect to the record numbers e = 1 to 5 and 11 to 25. FIG. 13-4 is an explanatory diagram of a correlation diagram of the temperatures t2 and t3 for the record numbers e = 1 to 5 and 11 to 25. FIG. 13-5 is an explanatory diagram of a correlation diagram of temperatures t2 and t4 with respect to record numbers e = 1 to 5 and 11 to 25. FIG. 13-6 is an explanatory diagram of a correlation diagram of the temperatures t3 and t4 for the record numbers e = 1 to 5 and 11 to 25. These are correlation diagrams specialized for the division group G30-1 corresponding to the record numbers e = 1 to 5, 11 to 25.

図１４−１は、レコード番号ｅ＝６〜１０についての温度ｔ１，ｔ２の相関図を示す説明図である。図１４−２は、レコード番号ｅ＝６〜１０についての温度ｔ１，ｔ３の相関図を示す説明図である。図１４−３は、レコード番号ｅ＝６〜１０についての温度ｔ１，ｔ４の相関図を示す説明図である。図１４−４は、レコード番号ｅ＝６〜１０についての温度ｔ２，ｔ３の相関図を示す説明図である。図１４−５は、レコード番号ｅ＝６〜１０についての温度ｔ２，ｔ４の相関図を示す説明図である。図１４−６は、レコード番号ｅ＝６〜１０についての温度ｔ３，ｔ４の相関図を示す説明図である。これらは、レコード番号ｅ＝６〜１０に対応する分割グループＧ３０−２に特化した相関図である。 FIG. 14A is an explanatory diagram of a correlation diagram of temperatures t1 and t2 for record numbers e = 6 to 10. FIG. 14-2 is an explanatory diagram of a correlation diagram of temperatures t1 and t3 for record numbers e = 6 to 10. FIG. 14C is an explanatory diagram of a correlation diagram of the temperatures t1 and t4 for the record numbers e = 6 to 10. FIG. 14-4 is an explanatory diagram of a correlation diagram of temperatures t2 and t3 for record numbers e = 6 to 10. FIG. 14-5 is an explanatory diagram of a correlation diagram of temperatures t2 and t4 for record numbers e = 6 to 10. FIG. 14-6 is an explanatory diagram of a correlation diagram of temperatures t3 and t4 for record numbers e = 6 to 10. These are correlation diagrams specialized for the divided group G30-2 corresponding to the record numbers e = 6-10.

また、図５において、出力部５１０は、各種データを出力する機能を有する。出力部５１０はディスプレイやプリンタ、Ｉ／Ｆに出力する。出力されるデータとしては、たとえば、図１に示した解析対象データ、図２−１〜図２−４に示した解析前のグラフ、図３−１〜図３−６に示した解析前の相関図、図６−１〜図６−４に示したソートおよび割当後の解析対象データ、図７−１〜図７−４に示した割当レコード番号の温度特性グラフ、図８−１〜図８−４に示した回帰木やまとまり度、図９−１〜図９−４に示した分割後における割当レコード番号の温度特性グラフ、図１０−１〜図１０−４に示した分割グループが特徴付けられたレコード番号の温度特性グラフ、図１１―１〜図１１−６、図１２−１〜図１２−６、図１３−１〜図１３−６、図１４−１〜図１４−６に示した相関図を出力する。 In FIG. 5, the output unit 510 has a function of outputting various data. The output unit 510 outputs to a display, a printer, or an I / F. As output data, for example, data to be analyzed shown in FIG. 1, graphs before analysis shown in FIGS. 2-1 to 2-4, and data before analysis shown in FIGS. Correlation diagram, analysis target data after sorting and allocation shown in FIGS. 6-1 to 6-4, temperature characteristic graphs of allocation record numbers shown in FIGS. 7-1 to 7-4, FIGS. The regression tree and the degree of unity shown in 8-4, the temperature characteristic graph of the allocation record number after the division shown in FIGS. 9-1 to 9-4, and the division group shown in FIGS. FIG. 11-1 to FIG. 11-6, FIG. 12-1 to FIG. 12-6, FIG. 13-1 to FIG. 13-6, and FIG. 14-1 to FIG. 14-6. The correlation diagram shown in is output.

（データ解析処理手順）
図１５は、本実施の形態にかかるデータ解析装置５００によるデータ解析処理手順を示すフローチャートである。まず、取得部５０１により解析対象データの取得処理を実行し（ステップＳ１５０１）、解析処理を実行する（ステップＳ１５０２）。つぎに、決定部５０９による決定処理を実行し（ステップＳ１５０３）、相関関係分析処理を実行する（ステップＳ１５０４）。そして、出力部５１０により出力処理を実行して（ステップＳ１５０５）、一連の処理を終了する。 (Data analysis procedure)
FIG. 15 is a flowchart showing a data analysis processing procedure by the data analysis apparatus 500 according to this embodiment. First, an acquisition process of data to be analyzed is executed by the acquisition unit 501 (step S1501), and an analysis process is executed (step S1502). Next, a determination process by the determination unit 509 is executed (step S1503), and a correlation analysis process is executed (step S1504). Then, output processing is executed by the output unit 510 (step S1505), and a series of processing ends.

図１６は、図１５に示した解析処理（ステップＳ１５０２）の詳細な処理手順を示すフローチャートである。まず、ソート部５０２により、解析対象データのレコード群を、目的変数の昇順（または降順）にソートする（ステップＳ１６０１）。つぎに、割当部５０３により、ソートされたレコード群に対し、割当レコード番号をソート順に割り当てる（ステップＳ１６０２）。そして、分割部５０４により分割処理を実行して（ステップＳ１６０３）、ステップＳ１５０３に移行する。 FIG. 16 is a flowchart showing a detailed processing procedure of the analysis processing (step S1502) shown in FIG. First, the record group of the analysis target data is sorted in ascending order (or descending order) of the objective variable by the sorting unit 502 (step S1601). Next, the allocation unit 503 allocates allocation record numbers to the sorted record group in the sort order (step S1602). Then, the dividing process is executed by the dividing unit 504 (step S1603), and the process proceeds to step S1503.

図１７は、図１６に示した分割処理（ステップＳ１６０３）の詳細な処理手順を示すフローチャートである。まず、未処理の解析対象データがあるか否かを判断する（ステップＳ１７０１）。たとえば、図１に示した解析対象データについて未処理の目的変数群があるか否かを判断する。未処理の解析対象データがある場合（ステップＳ１７０１：Ｙｅｓ）、未処理の解析対象データを取得する（ステップＳ１７０２）。 FIG. 17 is a flowchart showing a detailed processing procedure of the dividing process (step S1603) shown in FIG. First, it is determined whether there is unprocessed analysis target data (step S1701). For example, it is determined whether or not there is an unprocessed objective variable group for the analysis target data shown in FIG. If there is unprocessed analysis target data (step S1701: Yes), unprocessed analysis target data is acquired (step S1702).

つぎに、分割可能な未処理の目的変数群があるか否かを判断する（ステップＳ１７０３）。未処理の目的変数群がある場合（ステップＳ１７０３：Ｙｅｓ）、分割可能な未処理の目的変数群を１つ選択する（ステップＳ１７０４）。たとえば、初回（第０階層）の分割の場合、温度Ｔ１の割当レコード番号Ｅ＝１〜２５のデータ群を選択する（ステップＳ１７０４）。そして、選択目的変数群の平方和Ｓ０を算出する（ステップＳ１７０５）。ここで、割当レコード番号Ｅの変数をｉとし、ｉ＝ｍとする（ステップＳ１７０６）。ｍは、選択目的変数群の先頭のレコード番号である。温度Ｔ１の割当レコード番号Ｅ＝１〜２５のデータ群の場合は、ｍ＝１である。 Next, it is determined whether there is an unprocessed objective variable group that can be divided (step S1703). If there is an unprocessed objective variable group (step S1703: YES), one unprocessed objective variable group that can be divided is selected (step S1704). For example, in the case of division for the first time (0th hierarchy), the data group of the allocation record number E = 1 to 25 of the temperature T1 is selected (step S1704). Then, the sum of squares S0 of the selected objective variable group is calculated (step S1705). Here, the variable of the allocation record number E is set to i, and i = m is set (step S1706). m is the first record number of the selection target variable group. In the case of the data group of the allocation record number E = 1 to 25 of the temperature T1, m = 1.

そして、割当レコード番号Ｅ＝ｍ〜ｉまでの第１の分割グループの平方和Ｓ１を算出する（ステップＳ１７０７）。また、割当レコード番号Ｅ＝ｉ＋１〜ｎまでの第２の分割グループの平方和Ｓ２を算出する（ステップＳ１７０８）。ここで、ｎは選択目的変数群の末尾のレコード番号である。たとえば、温度Ｔ１の割当レコード番号Ｅ＝１〜２５のデータ群の場合は、ｎ＝２５である。このあと、ΔＳを算出して記憶装置に保持させ（ステップＳ１７０９）、ステップＳ１７０３に戻る。 Then, the sum of squares S1 of the first divided groups from the allocation record numbers E = m to i is calculated (step S1707). Further, the sum of squares S2 of the second divided groups from the allocation record number E = i + 1 to n is calculated (step S1708). Here, n is the record number at the end of the selection target variable group. For example, in the case of the data group of the allocation record number E = 1 to 25 of the temperature T1, n = 25. Thereafter, ΔS is calculated and stored in the storage device (step S1709), and the process returns to step S1703.

ステップＳ１７０３において、分割可能な未処理のレコード群がない場合（ステップＳ１７０３：Ｎｏ）、これまでに算出されたΔＳが最大となる分割グループを特定する（ステップＳ１７１０）。温度Ｔ１の割当レコード番号Ｅ＝１〜２５のデータ群の場合は、図８−１や図９−１に示したようにグループＧ１０−１とグループＧ１０−２に分割される。そして、特定された分割グループを構成する目的変数に対応するソート前のレコード番号を特定して（ステップＳ１７１１）、ステップＳ１７０１にもどる。 In step S1703, if there is no unprocessed record group that can be divided (step S1703: No), the divided group having the maximum ΔS calculated so far is specified (step S1710). In the case of the data group of the allocation record number E = 1 to 25 of the temperature T1, it is divided into a group G10-1 and a group G10-2 as shown in FIGS. Then, the record number before sorting corresponding to the objective variable constituting the specified divided group is specified (step S1711), and the process returns to step S1701.

図１８は、図１５に示した決定処理（ステップＳ１５０３）の詳細な処理手順を示すフローチャートである。まず、回帰木の階層指定を待ちうけ（ステップＳ１８０１：Ｎｏ）、階層指定があった場合（ステップＳ１８０１：Ｙｅｓ）、未処理の解析対象データがあるか否かを判断する（ステップＳ１８０２）。 FIG. 18 is a flowchart showing a detailed processing procedure of the determination process (step S1503) shown in FIG. First, it waits for the hierarchy designation of the regression tree (step S1801: No), and if there is a hierarchy designation (step S1801: Yes), it is determined whether there is unprocessed analysis target data (step S1802).

未処理の解析対象データがある場合（ステップＳ１８０３：Ｙｅｓ）、その指定階層の分割グループを選択する（ステップＳ１８０３）。そして、まとまり度を算出して（ステップＳ１８０４）、ステップＳ１８０２に戻る。一方、ステップＳ１８０２において、未処理の解析対象データがない場合（ステップＳ１８０２：Ｎｏ）、出力対象となる分割グループをまとまり度から決定する（ステップＳ１８０５）。そして、ステップＳ１５０４に移行する。 If there is unprocessed data to be analyzed (step S1803: Yes), the division group of the designated hierarchy is selected (step S1803). Then, the unity degree is calculated (step S1804), and the process returns to step S1802. On the other hand, in step S1802, when there is no unprocessed analysis target data (step S1802: No), a division group to be output is determined from the unity (step S1805). Then, control goes to a step S1504.

図１９は、図１５に示した相関関係分析処理（ステップＳ１５０４）の詳細な処理手順を示すフローチャートである。まず、出力対象に決定された分割グループのうち、未処理の分割グループがあるか否かを判断し（ステップＳ１９０１）、未処理の分割グループがある場合（ステップＳ１９０１：Ｙｅｓ）、未処理の分割グループを選択する（ステップＳ１９０２）。つぎに、選択分割グループのソート前のレコード番号を特定する（ステップＳ１９０３）。 FIG. 19 is a flowchart showing a detailed processing procedure of the correlation analysis processing (step S1504) shown in FIG. First, it is determined whether there is an unprocessed divided group among the divided groups determined as output targets (step S1901). If there is an unprocessed divided group (step S1901: Yes), an unprocessed divided group is determined. A group is selected (step S1902). Next, the record number of the selected divided group before sorting is specified (step S1903).

そして、特定されたレコード番号について、目的変数の未処理の組み合わせがあるか否かを判断する（ステップＳ１９０４）。未処理の組み合わせがある場合（ステップＳ１９０４：Ｙｅｓ）、１次回帰式、相関係数および寄与率を算出して（ステップＳ１９０５）、ステップＳ１９０４に戻る。一方、ステップＳ１９０４において、未処理の組み合わせがない場合（ステップＳ１９０４：Ｎｏ）、ステップＳ１９０１に戻る。ステップＳ１９０１において、未処理の分割グループがない場合（ステップＳ１９０１：Ｎｏ）、出力処理（ステップＳ１５０５）に移行する。 Then, it is determined whether or not there is an unprocessed combination of objective variables for the specified record number (step S1904). When there is an unprocessed combination (step S1904: Yes), a linear regression equation, a correlation coefficient, and a contribution rate are calculated (step S1905), and the process returns to step S1904. On the other hand, if there is no unprocessed combination in step S1904 (step S1904: No), the process returns to step S1901. In step S1901, if there is no unprocessed divided group (step S1901: No), the process proceeds to output processing (step S1505).

このように、本実施の形態によれば、目的変数である各数値変数をその値により昇順にソートして、そのソート結果の各々のレコードに対しあらたな順序となるレコード番号（割当レコード番号）を割り当てることで、その割当レコード番号を唯一の説明変数として回帰木分析を実行する。回帰木分析は、目的変数の大きさに対する説明変数の影響を抽出する処理であるため、１つのレコードの構成要素としての説明変数と目的変数の組み合わせを変更する。 As described above, according to the present embodiment, each numerical variable that is a target variable is sorted in ascending order by its value, and a record number (assigned record number) that becomes a new order for each record of the sorting result. , The regression tree analysis is executed with the assigned record number as the only explanatory variable. Since the regression tree analysis is a process of extracting the influence of the explanatory variable on the size of the objective variable, the combination of the explanatory variable and the objective variable as the constituent elements of one record is changed.

これにより、本実施の形態のデータ解析では、目的変数の値が大きな区間、小さな区間を抽出するのではなく、目的変数の値を高いグループ、低いグループに分ける場合、どこで分けるのが最も明確であるかを求めることができる。したがって、顕著な相関関係がないかをチェックすることができる。 As a result, in the data analysis of this embodiment, when the objective variable value is divided into a high group and a low group, instead of extracting a large section and a small section of the target variable, it is most obvious where to divide. You can ask for it. Therefore, it can be checked whether there is a significant correlation.

また、データから有用な情報を抽出するのに有効な手法の一つとして、できるだけ多くの情報をグラフ等で可視化することである。特に、データ間の相関に関する情報を抽出するには、縦軸、横軸に該当変数を表示して、その関連を把握する。この２次元の分布から、相関係数の値やデータの分布の偏りの有無に関する情報を得ている。このためには、データをあえてバラバラにする必要はなくそのまま使えばよく、レコード名を新たに作成する必要はない。 One of the effective methods for extracting useful information from data is to visualize as much information as possible on a graph or the like. In particular, in order to extract information related to the correlation between data, the relevant variables are displayed on the vertical and horizontal axes, and the relationship is grasped. From this two-dimensional distribution, information on the correlation coefficient value and the presence or absence of bias in the data distribution is obtained. For this purpose, it is not necessary to separate the data, it can be used as it is, and it is not necessary to create a new record name.

一方で、目的変数の値でレコードをソートし、横軸をその新たに作成された変数として表示すると、そのグラフは１変数についてのみの情報となり、他の変数との関連性は無視され情報量は圧倒的に減少する。本実施の形態では、このような１変数のみに関するグラフを基に複数変数間に存在する相関を抽出することができる。 On the other hand, if the records are sorted by the value of the objective variable and the horizontal axis is displayed as the newly created variable, the graph shows information for only one variable, and the relationship with other variables is ignored and the amount of information Decreases overwhelmingly. In the present embodiment, it is possible to extract a correlation existing between a plurality of variables based on such a graph relating to only one variable.

回帰木分析の因果関係の分析を行う場合、原因となる説明変数と結果となる目的変数とを明確に区別して分析する。この区別を間違えると、因果関係を見ていることにならなくなってしまい、無意味な分析となる。よって、目的変数を使って目的変数自身を分析（分割）するようなことは、因果関係を分析しようという目的において無意味と考えられる。本実施の形態では、あえてその無意味と思われている分析、すなわち、説明変数（原因）と目的変数（結果）の区別が不要な相関関係の強い変数どうしを抽出する。 When analyzing the causal relationship of regression tree analysis, the cause explanatory variable and the objective variable resulting are clearly distinguished and analyzed. If you make this mistake, you won't be looking at the causal relationship, which is a meaningless analysis. Therefore, analyzing (dividing) the objective variable itself using the objective variable is considered meaningless for the purpose of analyzing the causal relationship. In this embodiment, analysis that is considered meaningless, that is, highly correlated variables that do not require distinction between explanatory variables (causes) and objective variables (results) are extracted.

すなわち、目的変数自身をどのように分割すれば、目的変数の分布が明確に分かれるかを抽出することである。そのため、ソートして割当レコード番号が割り当てられた目的変数群を回帰木分析することで、横軸（割当レコード番号）に対して、目的変数の値は大きさ順に並ぶ。したがって、説明変数のどこで集合二分割をすれば、目的変数の大小が最も明確にされるか、すなわち目的変数の値が近いものがグループ化されるかが抽出されることとなる。 That is, it is to extract how the objective variable itself is divided and the distribution of the objective variable is clearly divided. Therefore, by performing a regression tree analysis on the objective variable group to which the assigned record numbers are assigned after sorting, the values of the objective variables are arranged in order of magnitude on the horizontal axis (assigned record number). Therefore, where the explanatory variable is divided into two sets, it is extracted whether the size of the objective variable is most clearly defined, that is, whether the objective variables having similar values are grouped.

また、部分的に発生している相関関係を見つけるためには、解析対象データを様々なパターンで分割する必要があるが、その分割パターンは非常に多く、すべてを網羅することは困難である。また、あらゆる分割パターンを作って、強い相関関係を抽出できても、その分割パターンが何を意味するのかが分からない場合は、何を改善してよいか分からない。 In addition, in order to find a partially occurring correlation, it is necessary to divide analysis target data into various patterns, but there are very many division patterns, and it is difficult to cover all of them. Moreover, even if all the divided patterns can be created and a strong correlation can be extracted, if you do not know what the divided pattern means, you do not know what to improve.

そのため、何らかの意味のある分割を用いるために、従来手法として、ある変数を用いてその値の順番に並べて任意の間隔で区切るなどの方法がある。その分割パターンの作成に回帰木分析を用いることも従来技術の組み合わせで可能であるが、そのまま回帰木分析を用いた場合は、できる分割パターンは目的変数を明確に分ける説明変数の２分割パターンに限定されている。 Therefore, in order to use some meaningful division, as a conventional method, there is a method in which a certain variable is used and arranged in the order of the values and divided at an arbitrary interval. It is possible to use regression tree analysis for the creation of the division pattern by a combination of conventional techniques. However, when regression tree analysis is used as it is, the division pattern that can be created is a two-division pattern of explanatory variables that clearly divides the objective variable. Limited.

これに対し本実施の形態では、目的変数自身を用いて分割パターンを作ることで、従来は作ることが困難であった目的変数の値が近いと言う意味を持った分割パターンを作ることができるようになる。したがって、従来見つけることが困難であった意味の分かる分割での相関関係を検出することができる。 On the other hand, in the present embodiment, by creating a division pattern using the objective variable itself, it is possible to create a division pattern having the meaning that the value of the objective variable, which has been difficult to make in the past, is close. It becomes like this. Therefore, it is possible to detect a correlation in a division whose meaning has been difficult to find in the past.

また、説明変数自身を用いた説明変数の分割パターンを新たな説明変数の値を用いて分析することで、特徴が強調されたより明確な分析も可能となる。またその変数自身を用いた分割パターン作成に回帰木分析等のデータマイニング技術等を用いることにより、効率よく分割パターンを作ることができる。このように目的が異なることによって、従来は無意味と思われるような分析方法の活用方法及び組み合わせ方法も有効になる。 Further, by analyzing the division pattern of the explanatory variable using the explanatory variable itself by using the value of the new explanatory variable, it becomes possible to perform a clearer analysis with emphasized features. Further, by using a data mining technique such as regression tree analysis to create a division pattern using the variable itself, a division pattern can be created efficiently. By such different purposes, a method for utilizing and combining analysis methods that are conventionally considered meaningless are also effective.

以上説明したように、本実施の形態によれば、データ間の顕著な特徴や統計的有意差、相関関係を効率的に抽出することができるという効果を奏する。 As described above, according to the present embodiment, it is possible to efficiently extract a remarkable feature, statistical significance difference, and correlation between data.

なお、本実施の形態で説明したデータ解析方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。このプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネット等のネットワークを介して配布することとしてもよい。 The data analysis method described in this embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. This program may be distributed via a network such as the Internet.

解析対象データの一例を示す説明図である。It is explanatory drawing which shows an example of analysis object data. 目的変数の種類別のグラフを示す説明図（その１）である。It is explanatory drawing (the 1) which shows the graph according to the kind of objective variable. 目的変数の種類別のグラフを示す説明図（その２）である。It is explanatory drawing (the 2) which shows the graph according to the kind of objective variable. 目的変数の種類別のグラフを示す説明図（その３）である。It is explanatory drawing (the 3) which shows the graph according to the kind of objective variable. 目的変数の種類別のグラフを示す説明図（その４）である。It is explanatory drawing (the 4) which shows the graph according to the kind of objective variable. 全レコード番号についての目的変数の組み合わせごとの相関図を示す説明図（その１）である。It is explanatory drawing (the 1) which shows the correlation diagram for every combination of the objective variable about all the record numbers. 全レコード番号についての目的変数の組み合わせごとの相関図を示す説明図（その２）である。It is explanatory drawing (the 2) which shows the correlation diagram for every combination of the objective variable about all the record numbers. 全レコード番号についての目的変数の組み合わせごとの相関図を示す説明図（その３）である。It is explanatory drawing (the 3) which shows the correlation diagram for every combination of the objective variable about all the record numbers. 全レコード番号についての目的変数の組み合わせごとの相関図を示す説明図（その４）である。It is explanatory drawing (the 4) which shows the correlation figure for every combination of the objective variable about all the record numbers. 全レコード番号についての目的変数の組み合わせごとの相関図を示す説明図（その５）である。It is explanatory drawing (the 5) which shows the correlation figure for every combination of the objective variable about all the record numbers. 全レコード番号についての目的変数の組み合わせごとの相関図を示す説明図（その６）である。It is explanatory drawing (the 6) which shows the correlation diagram for every combination of the objective variable about all the record numbers. 本実施の形態にかかるデータ解析装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the data analysis apparatus concerning this Embodiment. データ解析装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of a data analyzer. 目的変数の種類別のソート結果を示す説明図（その１）である。It is explanatory drawing (the 1) which shows the sort result according to the kind of objective variable. 目的変数の種類別のソート結果を示す説明図（その２）である。It is explanatory drawing (the 2) which shows the sort result according to the kind of objective variable. 目的変数の種類別のソート結果を示す説明図（その３）である。It is explanatory drawing (the 3) which shows the sort result according to the kind of objective variable. 目的変数の種類別のソート結果を示す説明図（その４）である。It is explanatory drawing (the 4) which shows the sorting result according to the kind of objective variable. ソート後における目的変数の種類別のグラフを示す説明図（その１）である。It is explanatory drawing (the 1) which shows the graph according to the kind of objective variable after sorting. ソート後における目的変数の種類別のグラフを示す説明図（その２）である。It is explanatory drawing (the 2) which shows the graph according to the kind of objective variable after sorting. ソート後における目的変数の種類別のグラフを示す説明図（その３）である。It is explanatory drawing (the 3) which shows the graph according to the kind of objective variable after sorting. ソート後における目的変数の種類別のグラフを示す説明図（その４）である。It is explanatory drawing (the 4) which shows the graph according to the kind of objective variable after sorting. 目的変数別に生成された回帰木を示す説明図（その１）である。It is explanatory drawing (the 1) which shows the regression tree produced | generated according to the objective variable. 目的変数別に生成された回帰木を示す説明図（その２）である。It is explanatory drawing (the 2) which shows the regression tree produced | generated according to the objective variable. 目的変数別に生成された回帰木を示す説明図（その３）である。It is explanatory drawing (the 3) which shows the regression tree produced | generated according to the objective variable. 目的変数別に生成された回帰木を示す説明図（その４）である。It is explanatory drawing (the 4) which shows the regression tree produced | generated according to the objective variable. 分割後における目的変数の種類別のグラフを示す説明図（その１）である。It is explanatory drawing (the 1) which shows the graph according to the kind of objective variable after a division | segmentation. 分割後における目的変数の種類別のグラフを示す説明図（その２）である。It is explanatory drawing (the 2) which shows the graph according to the kind of objective variable after a division | segmentation. 分割後における目的変数の種類別のグラフを示す説明図（その３）である。It is explanatory drawing (the 3) which shows the graph according to the kind of objective variable after a division | segmentation. 分割後における目的変数の種類別のグラフを示す説明図（その４）である。It is explanatory drawing (the 4) which shows the graph according to the kind of objective variable after a division | segmentation. 分割グループを割り当てた目的変数の種類別のグラフを示す説明図（その１）である。It is explanatory drawing (the 1) which shows the graph according to the kind of the objective variable which allocated the division group. 分割グループを割り当てた目的変数の種類別のグラフを示す説明図（その２）である。It is explanatory drawing (the 2) which shows the graph according to the kind of the objective variable which allocated the division group. 分割グループを割り当てた目的変数の種類別のグラフを示す説明図（その３）である。It is explanatory drawing (the 3) which shows the graph according to the kind of the objective variable which allocated the division group. 分割グループを割り当てた目的変数の種類別のグラフを示す説明図（その４）である。It is explanatory drawing (the 4) which shows the graph according to the kind of the objective variable which allocated the division group. レコード番号ｅ＝６〜１５についての温度ｔ１，ｔ２の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t2 about record number e = 6-15. レコード番号ｅ＝６〜１５についての温度ｔ１，ｔ３の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t3 about record number e = 6-15. レコード番号ｅ＝６〜１５についての温度ｔ１，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t4 about record number e = 6-15. レコード番号ｅ＝６〜１５についての温度ｔ２，ｔ３の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t2, t3 about record number e = 6-15. レコード番号ｅ＝６〜１５についての温度ｔ２，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t2, t4 about record number e = 6-15. レコード番号ｅ＝６〜１５についての温度ｔ３，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t3, t4 about record number e = 6-15. レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ１，ｔ２の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t2 about record number e = 1-5 and 16-25. レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ１，ｔ３の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t3 about record number e = 1-5 and 16-25. レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ１，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t4 about record number e = 1-5, 16-25. レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ２，ｔ３の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t2, t3 about record number e = 1-5 and 16-25. レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ２，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t2, t4 about record number e = 1-5, 16-25. レコード番号ｅ＝１〜５，１６〜２５についての温度ｔ３，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t3, t4 about record number e = 1-5, 16-25. レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ１，ｔ２の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t2 about record number e = 1-5, 11-25. レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ１，ｔ３の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t3 about record number e = 1-5, 11-25. レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ１，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t4 about record number e = 1-5, 11-25. レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ２，ｔ３の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t2, t3 about record number e = 1-5, 11-25. レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ２，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t2, t4 about record number e = 1-5, 11-25. レコード番号ｅ＝１〜５，１１〜２５についての温度ｔ３，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t3, t4 about record number e = 1-5, 11-25. レコード番号ｅ＝６〜１０についての温度ｔ１，ｔ２の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t2 about record number e = 6-10. レコード番号ｅ＝６〜１０についての温度ｔ１，ｔ３の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t3 about record number e = 6-10. レコード番号ｅ＝６〜１０についての温度ｔ１，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t1, t4 about record number e = 6-10. レコード番号ｅ＝６〜１０についての温度ｔ２，ｔ３の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t2, t3 about record number e = 6-10. レコード番号ｅ＝６〜１０についての温度ｔ２，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t2, t4 about record number e = 6-10. レコード番号ｅ＝６〜１０についての温度ｔ３，ｔ４の相関図を示す説明図である。It is explanatory drawing which shows the correlation diagram of temperature t3, t4 about record number e = 6-10. 本実施の形態にかかるデータ解析装置５００によるデータ解析処理手順を示すフローチャートである。It is a flowchart which shows the data-analysis processing procedure by the data-analysis apparatus 500 concerning this Embodiment. 図１５に示した解析処理（ステップＳ１５０２）の詳細な処理手順を示すフローチャートである。16 is a flowchart showing a detailed processing procedure of the analysis process (step S1502) shown in FIG. 図１６に示した分割処理（ステップＳ１６０３）の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of the division | segmentation process (step S1603) shown in FIG. 図１５に示した決定処理（ステップＳ１５０３）の詳細な処理手順を示すフローチャートである。16 is a flowchart showing a detailed processing procedure of the determination process (step S1503) shown in FIG. 図１５に示した相関関係分析処理（ステップＳ１５０４）の詳細な処理手順を示すフローチャートである。16 is a flowchart showing a detailed processing procedure of correlation analysis processing (step S1504) shown in FIG.

Explanation of symbols

５００データ解析装置
５０１取得部
５０２ソート部
５０３割当部
５０４分割部
５０５特定部
５０６選択部
５０７相関関係情報算出部
５０８まとまり度算出部
５０９決定部
５１０出力部 500 Data Analysis Device 501 Obtaining Unit 502 Sorting Unit 503 Allocation Unit 504 Division Unit 505 Identification Unit 506 Selection Unit 507 Correlation Information Calculation Unit 508 Grouping Level Calculation Unit 509 Determination Unit 510 Output Unit

Claims

Computer
An acquisition means for acquiring analysis target data having a sequence number as an explanatory variable and a target variable corresponding to the sequence number as records;
Sort means for sorting records of data to be analyzed acquired by the acquisition means in ascending or descending order of the objective variable;
Assigning means for assigning a new sequence number to the records after sorting by the sorting means;
A division that divides the sorted records into two division groups with a specific number selected from the order numbers assigned by the assignment means based on regression tree analysis (hereinafter referred to as “assignment order numbers”) as boundaries. means,
Output means for outputting two divided groups divided by the dividing means;
Data analysis program characterized by functioning as

The computer,
Function as specifying means for specifying the sequence number corresponding to the objective variable constituting each of the divided groups;
The output means includes
The data analysis program according to claim 1, wherein the data analysis program outputs information that characterizes the analysis target data, a sequence number specified by the specifying unit, and an objective variable corresponding to the sequence number.

The computer,
Selecting means for selecting two types of analysis target data from a plurality of types of analysis target data having the same explanatory variable;
Correlation for calculating information indicating the correlation between the order number specified by the specifying means and the objective variable corresponding to the order number among the objective variables of the two types of analysis target data selected by the selection means Function as a relation information calculation means,
The output means includes
The data analysis program according to claim 2, wherein the calculation result calculated by the correlation information calculation means is output.

The computer,
Based on the sum of squares of the objective variables and the sum of squares of the objective variables constituting the two divided groups, a set partitioning of the two divided groups is performed for each of a plurality of types of analysis target data having the same explanatory variable. A unity degree calculation means for calculating a unity degree expressing clarity,
Based on each unity degree calculated by the unity degree calculation unit, function as a determination unit that determines an output target from the plurality of types of analysis target data,
The output means includes
3. The analysis target data determined as the output target by the determining unit, and information characterizing the sequence number specified by the specifying unit and an objective variable corresponding to the sequence number are output. The data analysis program described.

The computer,
Selecting means for selecting two types of analysis target data from the plurality of types of analysis target data;
Correlation for calculating information indicating the correlation between the order number specified by the specifying means and the objective variable corresponding to the order number among the objective variables of the two types of analysis target data selected by the selection means Function as a relation information calculation means,
The output means includes
The data analysis program according to claim 4, wherein the calculation result calculated by the correlation information calculation means is output.

An acquisition means for acquiring analysis target data having a record of a sequence number serving as an explanatory variable and a target variable corresponding to the sequence number;
Sorting means for sorting the records of the data to be analyzed acquired by the acquisition means in ascending or descending order of the objective variable;
Assigning means for assigning a new sequence number to the records after sorting by the sorting means;
A division that divides the sorted records into two division groups with a specific number selected from the order numbers assigned by the assignment means based on regression tree analysis (hereinafter referred to as “assignment order numbers”) as boundaries. Means,
Output means for outputting two divided groups divided by the dividing means;
A data analysis apparatus comprising:

Computer
An acquisition step of acquiring analysis target data having a sequence number as an explanatory variable and a target variable corresponding to the sequence number as records;
A sorting step of sorting the records of data to be analyzed acquired by the acquisition step in ascending or descending order of the objective variable;
An assigning step of assigning a new sequence number to the records after sorting by the sorting step;
A division that divides the sorted records into two division groups with a specific number selected from the order numbers assigned by the assignment step based on regression tree analysis (hereinafter referred to as “assignment order numbers”) as boundaries. Process,
An output step of outputting the two divided groups divided by the dividing step;
The data analysis method characterized by performing.