JP4180540B2

JP4180540B2 - Data mining system

Info

Publication number: JP4180540B2
Application number: JP2004093971A
Authority: JP
Inventors: 幹夫吉田
Original assignee: Mitsubishi Electric Information Systems Corp
Current assignee: Mitsubishi Electric Information Systems Corp
Priority date: 2004-03-29
Filing date: 2004-03-29
Publication date: 2008-11-12
Anticipated expiration: 2024-03-29
Also published as: JP2005284424A

Description

本発明は、時系列データに対するデータマイニングを支援するデータマイニングシステムに係り、データ抽出、前処理、データ変換処理の各フェーズの遷移と使用パラメータを管理し、自在に過去のフェーズからの処理を再開できるようにするシステムに関する。 The present invention relates to a data mining system that supports data mining for time-series data, manages transitions and parameters used in each phase of data extraction, preprocessing, and data conversion processing, and freely resumes processing from past phases. It relates to a system that can do this.

図１５は、データマイニング操作の手順を示す図である。時系列データを収集し（Ｓ１５０１）、その中から処理対象のデータ群を抽出し（Ｓ１５０２）、前処理によりクリーンなデータに変換し（Ｓ１５０３）、データ変換によりデータを加工し（Ｓ１５０４）、データマイニングを行う（Ｓ１５０５）。 FIG. 15 is a diagram illustrating a procedure of a data mining operation. Time-series data is collected (S1501), a data group to be processed is extracted from the time series (S1502), converted into clean data by preprocessing (S1503), processed by data conversion (S1504), and data Mining is performed (S1505).

しかし、データマイニングで理想的な結果を得るには、試行錯誤により各フェーズをやり直すことになる。 However, in order to obtain an ideal result by data mining, each phase is redone by trial and error.

現状のデータマイニングツールは、過去に実行したフェーズへ戻って、その状態から処理を再開することができない。 The current data mining tool cannot return to a previously executed phase and resume processing from that state.

特開平１１−００３３６０は、データマイニング処理中における停止であって、データマイニングの事前の処理に係る制御ではない。
特開平１１−００３３６０号公報特開２０００−２４２６５１号公報 Japanese Patent Laid-Open No. 11-003360 is a stop during the data mining process, and is not a control related to the data mining prior process.
JP-A-11-003360 JP 2000-242651 A

本発明は、上記した従来技術の欠点を除くためになされたものであって、その目的とするところは、時系列データに対するデータマイニングを支援するデータマイニングシステムに係り、データ抽出、前処理、データ変換処理の各フェーズの遷移と使用パラメータを管理し、自在に過去のフェーズからの処理を再開できるようにすることである。 The present invention has been made in order to eliminate the above-described drawbacks of the prior art, and the object thereof is related to a data mining system that supports data mining for time-series data. Data extraction, preprocessing, data It is to manage the transition and use parameters of each phase of the conversion process so that the process from the past phase can be freely resumed.

本発明に係るデータマイニングシステムは、
以下の要素を有することを特徴とする
（１）時系列データ群を記憶する時系列データ記憶部
（２）時系列データ記憶部から時系列データを抽出するデータ選択処理部
（３）抽出した時系列データに対して前処理する前処理部
（４）抽出した時系列データ、あるいは前処理した時系列データを変換処理するデータ変換処理部
（５）抽出した時系列データ、前処理した時系列データ、あるいは変換処理した時系列データをデータマイニング処理するデータマイニング処理部
（６）データ選択処理、前処理、データ変換処理、及びデータマイニング処理の各フェーズについての遷移を木構造で関連付けて記憶し、更に当該フェーズで使用したパラメータも対応付けて記憶する処理データツリー記憶部
（７）操作者の指示に応じて、データ選択処理、前処理、データ変換処理、及びデータマイニング処理の各フェーズの遷移を制御するデータマイニング制御部
（８）再開するフェーズの指定を操作者に促し、指定されたフェーズで使用したパラメータを読み出し、読み出したパラメータを設定して、指定されたフェーズの起動をデータマイニング制御部に指示するリスタート処理部。 The data mining system according to the present invention is:
(1) Time series data storage unit for storing time series data group (2) Data selection processing unit for extracting time series data from time series data storage unit (3) Extracted time Pre-processing unit for pre-processing series data (4) Extracted time-series data, or data conversion processing unit for converting pre-processed time-series data (5) Extracted time-series data, pre-processed time-series data Alternatively, a data mining processing unit (6) for performing data mining processing on the time-series data subjected to conversion processing (6) storing transitions associated with each phase of data selection processing, preprocessing, data conversion processing, and data mining processing in a tree structure, Further, a processing data tree storage unit (7) for storing the parameters used in the phase in association with each other, in accordance with an instruction from the operator, Data mining control unit that controls transition of each phase of processing, data conversion processing, and data mining processing (8) The operator is prompted to specify the phase to be resumed, the parameters used in the specified phase are read, and the read parameters And a restart processing unit that instructs the data mining control unit to start the specified phase.

データマイニングシステムは、更に
操作者の指示があった場合に、当該指示の直前の処理について、前記遷移及び前記パラメータを対応付けて処理データツリー記憶部に記憶させるチェックポイント処理部を有することを特徴とする。 The data mining system further includes a checkpoint processing unit that stores the transition and the parameter in association with each other in the processing data tree storage unit for the processing immediately before the instruction when an instruction from the operator is given. And

データ選択処理部、前処理部、データ変換処理部、及びデータマイニング処理部は、自らの処理について、前記遷移及び前記パラメータを対応付けて処理データツリー記憶部に記憶させることを特徴とする。 The data selection processing unit, the preprocessing unit, the data conversion processing unit, and the data mining processing unit store the transition and the parameter in association with each other in the processing data tree storage unit.

本発明に係るプログラムは、
時系列データ群を記憶する時系列データ記憶部を有するデータマイニングシステムとなるコンピュータに、以下の手順を実行させるためのプログラムであることを特徴とする
（１）操作者の指示に応じて、時系列データ記憶部から時系列データを抽出する処理手順
（２）操作者の指示に応じて、抽出した時系列データに対してする前処理手順
（３）操作者の指示に応じて、抽出した時系列データ、あるいは前処理した時系列データを変換する処理手順
（４）操作者の指示に応じて、抽出した時系列データ、前処理した時系列データ、あるいは変換処理した時系列データをデータマイニングする処理手順
（５）操作者の指示があった場合に、当該指示の直前の処理について、データ選択の処理、前処理、データ変換の処理、及びデータマイニングの処理の各フェーズについての遷移を木構造で関連付けて記憶し、更に当該フェーズで使用したパラメータも対応付けて記憶する処理手順
（６）再開するフェーズの指定を操作者に促し、指定されたフェーズで使用したパラメータを読み出し、読み出したパラメータを設定して、指定されたフェーズの処理の起動を指示する処理手順。 The program according to the present invention is:
It is a program for causing a computer to be a data mining system having a time series data storage unit for storing a time series data group to execute the following procedure. (1) According to an instruction from an operator, Processing procedure for extracting time-series data from the series-data storage unit (2) Pre-processing procedure for extracting time-series data in accordance with an instruction from the operator (3) When extracting in accordance with an instruction from the operator Processing procedure for converting series data or pre-processed time series data (4) Data mining of extracted time-series data, pre-processed time series data, or converted time-series data in accordance with an instruction from the operator Processing Procedure (5) When there is an instruction from the operator, the data selection processing, preprocessing, data conversion processing, and data mining are performed for the processing immediately before the instruction. The transition for each phase of the process is stored in association with a tree structure, and the parameters used in that phase are also stored in association with each other. (6) The operator is prompted to specify the phase to be restarted, and the specified phase A processing procedure for reading out the parameters used in the above, setting the read parameters, and instructing the start of processing in the specified phase.

本発明に係るデータマイニング支援方法は、
時系列データ群を記憶する時系列データ記憶部を有するデータマイニングシステムによるデータマイニング支援方法であって、以下の要素を有することを特徴とする
（１）操作者の指示に応じて、時系列データ記憶部から時系列データを抽出する処理工程
（２）操作者の指示に応じて、抽出した時系列データに対してする前処理工程
（３）操作者の指示に応じて、抽出した時系列データ、あるいは前処理した時系列データを変換する処理工程
（４）操作者の指示に応じて、抽出した時系列データ、前処理した時系列データ、あるいは変換処理した時系列データをデータマイニングする処理工程
（５）操作者の指示があった場合に、当該指示の直前の処理について、データ選択の処理、前処理、データ変換の処理、及びデータマイニングの処理の各フェーズについての遷移を木構造で関連付けて記憶し、更に当該フェーズで使用したパラメータも対応付けて記憶する処理工程
（６）再開するフェーズの指定を操作者に促し、指定されたフェーズで使用したパラメータを読み出し、読み出したパラメータを設定して、指定されたフェーズの処理の起動を指示する処理工程。 The data mining support method according to the present invention includes:
A data mining support method by a data mining system having a time-series data storage unit for storing a time-series data group, and having the following elements: (1) Time-series data according to an operator's instruction Processing step for extracting time-series data from the storage unit (2) Pre-processing step for extracting time-series data in accordance with an instruction from the operator (3) Extracting time-series data in accordance with an instruction from the operator (4) Processing step for data mining extracted time-series data, pre-processed time-series data, or converted time-series data in accordance with an instruction from the operator (5) When there is an instruction from the operator, data selection processing, preprocessing, data conversion processing, and data mining processing are performed immediately before the instruction. The transition for each phase is stored in association with a tree structure, and the parameters used in that phase are also stored in association with each other. (6) The operator is prompted to specify the phase to be resumed and used in the specified phase. A processing step of reading parameters, setting the read parameters, and instructing start of processing of a specified phase.

本発明においては、操作者の指示により、あるいは自動的に、直前の処理に係るフェーズの遷移を木構造で関連付けて記憶し、当該フェーズで使用したパラメータも対応付けて記憶し、後に再開するフェーズを指示された場合に、指定されたフェーズで使用したパラメータを読み出し、読み出したパラメータを設定して、指定されたフェーズを起動するので、自在に過去のフェーズから処理を再開することができる。 In the present invention, in accordance with an operator's instruction or automatically, a phase transition related to the immediately preceding process is stored in association with a tree structure, parameters used in the phase are also stored in association with each other, and a phase to be restarted later Is read, the parameters used in the designated phase are read, the read parameters are set, and the designated phase is activated, so that the processing can be freely restarted from the past phase.

実施の形態１．
以下本発明を図面に示す実施例に基づいて説明する。図１は、データマイニングシステムの構成を示す図である。データマイニング制御部１、データ選択処理部２、前処理部３、データ変換処理部４、データマイニング処理部５、チェックポイント処理部６、リスタート処理部７、処理データツリー記憶部８、実行フェーズ記憶部９、時系列データ記憶部１０、抽出済みデータ記憶部１１、前処理済みデータ記憶部１２、及びデータ変換済みデータ記憶部１３を有している。 Embodiment 1 FIG.
Hereinafter, the present invention will be described based on embodiments shown in the drawings. FIG. 1 is a diagram illustrating a configuration of a data mining system. Data mining control unit 1, data selection processing unit 2, preprocessing unit 3, data conversion processing unit 4, data mining processing unit 5, checkpoint processing unit 6, restart processing unit 7, processing data tree storage unit 8, execution phase It has a storage unit 9, a time series data storage unit 10, an extracted data storage unit 11, a preprocessed data storage unit 12, and a data converted data storage unit 13.

図２は、全体処理フローを示す図である。全体処理は、データマイニング制御部１が制御している。データ選択処理部２によるデータ選択処理（Ｓ２０２）、前処理部３による前処理（Ｓ２０４）、データ変換処理部４によるデータ変換処理（Ｓ２０６）、データマイニング処理部５によるデータマイニング処理（Ｓ２０８）、チェックポイント処理部６によるチェックポイント処理（Ｓ２１０）、及びリスタート処理部７によるリスタート処理（Ｓ２１２）は、後述するようにそれぞれの画面を表示し、画面上での操作によるイベントを取得して、そのイベントに応じた処理を行うように構成されている。特に、取得したイベントが処理選択ボタンである場合には、その処理選択ボタンの種類をステータスとして返すように構成されている。そして、そのステータスにより次の処理へ分岐するように、データマイニング制御部１は制御している。 FIG. 2 is a diagram showing an overall processing flow. The entire process is controlled by the data mining control unit 1. Data selection processing by the data selection processing unit 2 (S202), preprocessing by the preprocessing unit 3 (S204), data conversion processing by the data conversion processing unit 4 (S206), data mining processing by the data mining processing unit 5 (S208), The checkpoint processing (S210) by the checkpoint processing unit 6 and the restart processing (S212) by the restart processing unit 7 display respective screens as described later, and acquire events by operations on the screens. , And is configured to perform processing according to the event. In particular, when the acquired event is a process selection button, the type of the process selection button is returned as a status. And the data mining control part 1 is controlling so that it may branch to the next process with the status.

処理データツリー記憶部８、実行フェーズ記憶部９、時系列データ記憶部１０、抽出済みデータ記憶部１１、前処理済みデータ記憶部１２、及びデータ変換済みデータ記憶部１３は、各処理モジュール間のデータの受け渡しや状態の記憶のために用いられる。特に、処理データツリーは、操作の流れや履歴を管理するデータとして全体にかかわる。 The processing data tree storage unit 8, the execution phase storage unit 9, the time-series data storage unit 10, the extracted data storage unit 11, the preprocessed data storage unit 12, and the data converted data storage unit 13 Used for data transfer and status storage. In particular, the processing data tree is related as a whole as data for managing the flow and history of operations.

図３は、処理データツリーの例を示す図である。 FIG. 3 is a diagram illustrating an example of a processing data tree.

ユーザ名以外の表示（ＤｘｘｘやＳｘｘｘ）は、チェックポイント名を表示する。Ｄｘｘｘはユーザが明示的にチェックポイント処理を指定したものを表し、破線で示したＳｘｘｘは、システムが自動的にチェックポイントを採取したことを示す。システムが作成したチェックポイントは当該システム終了時に自動的に削除される。図中のＤ２１１とＤ２３１の間の前処理フェーズにチェックポイント名が表示されていないのは、以前の実行でシステムが自動的にチェックポイントを作成し、削除されたことを示す。したがって、チェックポイント名が表示されていない処理のリスタートはできない。このツリーはチェックポイント処理実行時に新たにエントリを作成する。リスタート処理はこのツリーの該当フェーズの該当データのボックスを左クリックすることにより実行される。各エントリの詳細情報（処理結果データ名、処理実行時パラメータ、処理実行時のコメント、処理実行後のコメント（解釈・評価））の表示や当該エントリの削除処理が可能である。当該ツリーから現在の処理フェーズが分かる（図では太線で表示）。 Display other than the user name (Dxxx or Sxxx) displays the checkpoint name. Dxxx indicates that the user has explicitly designated checkpoint processing, and Sxxx indicated by a broken line indicates that the system has automatically taken a checkpoint. Checkpoints created by the system are automatically deleted when the system is terminated. The fact that the checkpoint name is not displayed in the preprocessing phase between D211 and D231 in the figure indicates that the system automatically created and deleted the checkpoint in the previous execution. Therefore, it is not possible to restart a process in which the checkpoint name is not displayed. This tree creates a new entry when the checkpoint process is executed. The restart process is executed by left-clicking the corresponding data box in the corresponding phase of this tree. It is possible to display detailed information (processing result data name, process execution parameter, process execution comment, process execution comment (interpretation / evaluation)) of each entry and to delete the entry. The current processing phase can be seen from the tree (shown in bold in the figure).

次に、一連の操作を想定して、各処理モジュールの動作について説明する。 Next, the operation of each processing module will be described assuming a series of operations.

この例では、車輌で測定した時系列データを収集して分析する。車輌から時系列データとして収集可能なデータは数百種類あり、その代表的なデータには、車速、エンジン回転数、ギア位置、車間距離、ハンドル角度などがある。 In this example, time series data measured by a vehicle is collected and analyzed. There are hundreds of types of data that can be collected as time-series data from the vehicle, and representative data includes vehicle speed, engine speed, gear position, inter-vehicle distance, steering wheel angle, and the like.

これらのデータは時系列データとして所定のサンプリング間隔で収集されるものとし、このサンプリング間隔は、任意に指定可能である。 These data are collected as time-series data at a predetermined sampling interval, and this sampling interval can be arbitrarily specified.

車輌からの時系列データとして収集した各種センシングデータを分析するストーリを、図４から図１３に示す。なお、車輌から収集されたデータは、時系列データ記憶部１０に予め格納されているものとする。 A story for analyzing various sensing data collected as time series data from a vehicle is shown in FIGS. The data collected from the vehicle is assumed to be stored in advance in the time series data storage unit 10.

初期状態では、図４に示すメインメニュー画面が表示されている。上部のバーには、処理選択ボタン群が表示されている。このボタンを押下すると、そのボタンの種類がステータスとなり、図２のフローに従って、所定の処理モジュールが起動される。 In the initial state, the main menu screen shown in FIG. 4 is displayed. A processing selection button group is displayed on the upper bar. When this button is pressed, the type of the button becomes status, and a predetermined processing module is activated according to the flow of FIG.

ここでは、「データ選択」のボタンを選択した場合を想定する。データ選択処理部２によるデータ選択処理（Ｓ２０２）では、まず図５に示すデータ選択画面を表示する。この画面で、収集されたデータの中から今回の分析用にデータを抽出する。その為に、データ抽出パラメータとして、車輌、データ収集期間、時間帯などを指定する。 Here, it is assumed that the “data selection” button is selected. In the data selection process (S202) by the data selection processing unit 2, first, a data selection screen shown in FIG. 5 is displayed. On this screen, data is extracted from the collected data for the current analysis. Therefore, a vehicle, a data collection period, a time zone, etc. are designated as data extraction parameters.

具体的な処理について説明する。取得したイベントが、パラメータに係るものの場合には、そのパラメータを記憶し、「実行」を押下されると以下のように動作する。
（１）処理データツリー記憶部８から処理データツリーを入力し、新たなチェックポイントＩＤを採番する。
（２）記憶しているパラメータに基づいて、時系列データ記憶部１０からデータを抽出する。
（３）抽出データを抽出済みデータ記憶部１１へ保存する。
（４）ファイル名には、識別の為にユーザ名とチェックポイントＩＤを入れる。
（５）既にチェックポイント済みでない限り、この時点でシステムが自動的にチェックポイントを作成する。処理データツリーに採番したチェックポイントＩＤを追加して、入力パラメータとともに、処理データツリー記憶部８に記録する。 Specific processing will be described. When the acquired event relates to a parameter, the parameter is stored, and when “execute” is pressed, the following operation is performed.
(1) A processing data tree is input from the processing data tree storage unit 8, and a new checkpoint ID is assigned.
(2) Data is extracted from the time-series data storage unit 10 based on the stored parameters.
(3) The extracted data is stored in the extracted data storage unit 11.
(4) The user name and checkpoint ID are entered in the file name for identification.
(5) The system automatically creates a checkpoint at this point unless it has already been checkpointed. The checkpoint ID assigned to the processing data tree is added and recorded in the processing data tree storage unit 8 together with the input parameters.

次に、「前処理」のボタンを選択した場合を想定する。前処理部３による前処理（Ｓ２０４）では、図６に示す前処理画面を表示する。この処理では、データ補間を行う。例えば、抽出したデータのサンプリング間隔が異なっている場合には、サンプリング間隔を合わせる補間を行う。また、ノイズや異常値の除去なども行う。 Next, it is assumed that the “preprocess” button is selected. In the preprocessing (S204) by the preprocessing unit 3, the preprocessing screen shown in FIG. 6 is displayed. In this process, data interpolation is performed. For example, when the sampling intervals of the extracted data are different, interpolation is performed to match the sampling intervals. It also removes noise and abnormal values.

具体的な処理について説明する。取得したイベントが、パラメータに係るものの場合には、そのパラメータを記憶し、「実行」を押下されると以下のように動作する。
（１）処理データツリーを入力し、直前に実行したデータ選択のチェックポイントＩＤにつながる新たなチェックポイントＩＤを採番する。仮に、データ選択のチェックポイントＩＤがない場合は、エラーとする。データを選択していない場合には、前処理はできないからである。
（２）データ選択のチェックポイントＩＤに対応する抽出済みデータ記憶部１１のデータに対して、記憶しているパラメータに従って補間処理を実行する。
（３）処理結果を前処理済みデータ記憶部１２に保存する。ファイル名には、識別の為にユーザ名とチェックポイントＩＤを入れる。
（４）既にチェックポイント済みでない限り、この時点でシステムが自動的にチェックポイントを作成する。処理データツリーに採番したチェックポイントＩＤを追加して、入力パラメータとともに、処理データツリー記憶部８に記録する。 Specific processing will be described. When the acquired event relates to a parameter, the parameter is stored, and when “execute” is pressed, the following operation is performed.
(1) A processing data tree is input, and a new checkpoint ID connected to the checkpoint ID of the data selection executed immediately before is assigned. If there is no checkpoint ID for data selection, an error is assumed. This is because pre-processing cannot be performed when no data is selected.
(2) Interpolation processing is executed on the data in the extracted data storage unit 11 corresponding to the checkpoint ID for data selection according to the stored parameters.
(3) Save the processing result in the preprocessed data storage unit 12. In the file name, a user name and a checkpoint ID are entered for identification.
(4) The system automatically creates a checkpoint at this point unless it has already been checkpointed. The checkpoint ID assigned to the processing data tree is added and recorded in the processing data tree storage unit 8 together with the input parameters.

続いて、「チェックポイント」のボタンを選択した場合を想定する。チェックポイント処理部６によるチェックポイント処理（Ｓ２１０）では、直前の処理結果を保存する。その為に、処理データツリーに採番したチェックポイントＩＤを追加して、入力パラメータとともに、処理データツリー記憶部８に記録する。その間、図７に示すチェックポイントのウィンドウを表示する。 Next, assume that the “checkpoint” button is selected. In the checkpoint processing (S210) by the checkpoint processing unit 6, the previous processing result is saved. For this purpose, a checkpoint ID assigned to the processing data tree is added and recorded in the processing data tree storage unit 8 together with the input parameters. Meanwhile, a checkpoint window shown in FIG. 7 is displayed.

次に、「データマイニング」のボタンを選択した場合を想定する。データマイニング処理部５によるデータマイニング処理（Ｓ２０８）では、図８に示すデータマイニング画面を表示する。 Next, it is assumed that the “data mining” button is selected. In the data mining process (S208) by the data mining processing unit 5, the data mining screen shown in FIG. 8 is displayed.

取得したイベントが、パラメータ（分析ツールの選択）に係るものの場合には、その選択を記憶し、「実行」を押下されると以下のように動作する。
（１）処理データツリーを入力し、直前に実行したデータ変換処理、前処理またはデータ選択処理のいずれかを特定する。そして、その処理のチェックポイントＩＤにつながる新たなチェックポイントＩＤを採番する。前にデータ変換処理、前処理またはデータ選択処理のいずれも行っていない場合には、エラーとする。
（２）直前に実行した処理結果（抽出済みデータ記憶部１１、前処理済みデータ記憶部１２あるいはデータ変換済みデータ記憶部１３のファイル）を入力し、選択されている分析ツールを起動する。
（３）既にチェックポイント済みでない限り、この時点でシステムが自動的にチェックポイントを作成する。処理データツリーに採番したチェックポイントＩＤを追加して、入力パラメータ（分析ツール名）とともに、処理データツリー記憶部８に記録する。 If the acquired event relates to a parameter (selection of analysis tool), the selection is stored, and when “execute” is pressed, the following operation is performed.
(1) A processing data tree is input, and any of data conversion processing, preprocessing, or data selection processing executed immediately before is specified. Then, a new checkpoint ID connected to the checkpoint ID of the process is assigned. If no data conversion processing, preprocessing, or data selection processing has been performed previously, an error is assumed.
(2) The processing result (the file of the extracted data storage unit 11, the preprocessed data storage unit 12, or the data converted data storage unit 13) executed immediately before is input, and the selected analysis tool is activated.
(3) The system automatically creates a checkpoint at this point unless it has already been checkpointed. The checkpoint ID assigned to the processing data tree is added and recorded in the processing data tree storage unit 8 together with the input parameter (analysis tool name).

次に、この分析の結果、収集データから新たなデータ項目を算出して分析し直す必要があると判断した場合を想定する。その為、次に「データ選択」のボタンを選択したと想定する。 Next, it is assumed that, as a result of this analysis, it is determined that a new data item needs to be calculated from the collected data and analyzed again. Therefore, it is assumed that the “data selection” button is selected next.

図９に示すデータ選択画面で、再度データの抽出を行う。この例では、抽出したデータはサンプリング間隔が同じものとし、前処理（データ補間等）はスキップする。 Data is extracted again on the data selection screen shown in FIG. In this example, the extracted data has the same sampling interval, and preprocessing (data interpolation, etc.) is skipped.

次に、「データ変換」のボタンを選択した場合を想定する。データ変換処理部４によるデータ変換処理（Ｓ２０６）では、まず図１０に示すデータ変換画面を表示する。抽出したデータから新たなデータ項目（馬力、トルクなど）を追加する。 Next, it is assumed that the “data conversion” button is selected. In the data conversion process (S206) by the data conversion processing unit 4, first, the data conversion screen shown in FIG. 10 is displayed. New data items (horsepower, torque, etc.) are added from the extracted data.

取得したイベントが、パラメータに係るものの場合には、そのパラメータを記憶し、「追加」を押下されると以下のように動作する。
（１）処理データツリーを入力し、直前に実行した前処理またはデータ選択処理のいずれかを特定する。そして、その処理のチェックポイントＩＤにつながる新たなチェックポイントＩＤを採番する。前に前処理またはデータ選択処理のいずれも行っていない場合には、エラーとする。
（２）直前に実行した処理結果（抽出済みデータ記憶部１１あるいは前処理済みデータ記憶部１２のファイル）を入力し、処理を実行する。
（３）処理結果をデータ変換済みデータ記憶部１３へ保存する。ファイル名には、識別する為にユーザ名とチェックポイントＩＤを入れる。
（４）既にチェックポイント済みでない限り、この時点でシステムが自動的にチェックポイントを作成する。処理データツリーに採番したチェックポイントＩＤを追加して、入力パラメータ（データ項目名など）とともに、処理データツリー記憶部８に記録する。 If the acquired event relates to a parameter, the parameter is stored, and when “Add” is pressed, the following operation is performed.
(1) A processing data tree is input, and either the preprocessing executed immediately before or the data selection processing is specified. Then, a new checkpoint ID connected to the checkpoint ID of the process is assigned. If neither preprocessing nor data selection processing has been performed previously, an error is assumed.
(2) The processing result executed immediately before (the file in the extracted data storage unit 11 or the preprocessed data storage unit 12) is input, and the process is executed.
(3) Save the processing result in the data converted data storage unit 13. In the file name, a user name and a checkpoint ID are entered for identification.
(4) The system automatically creates a checkpoint at this point unless it has already been checkpointed. The checkpoint ID numbered in the processing data tree is added and recorded in the processing data tree storage unit 8 together with the input parameters (data item name, etc.).

そして、「データマイニング」のボタンを選択した場合を想定する。図１１に示すデータマイニング画面で、データ変換済みデータ記憶部１３に記憶している前述の変換を行ったデータを使用してデータ分析を行う。 Assume that the “data mining” button is selected. On the data mining screen shown in FIG. 11, data analysis is performed using the data subjected to the above-described conversion stored in the data converted data storage unit 13.

次に、「リスタート」のボタンを選択した場合を想定する。リスタート処理部７によるリスタート処理（Ｓ２１２）では、図１２に示すリスタート画面を表示する。リスタートでは、先に保存した状態に戻し、別の観点での分析を可能とする。
（１）処理データツリー記憶部８から処理データツリーを取得して表示する。
（２）取得したイベントがチェックポイントの選択の場合には、選択されたチェックポイントＩＤに対応付けて記憶しているパラメータを、処理データツリー記憶部８から取得する。
（３）パラメータを設定し（Ｓ２１３）、リスタートする処理のフェーズをステータスとして、データマイニング制御部１に当該処理の起動を指示する。 Next, it is assumed that the “restart” button is selected. In the restart process (S212) by the restart processing unit 7, a restart screen shown in FIG. 12 is displayed. At the restart, it returns to the previously saved state, and analysis from another viewpoint is possible.
(1) A processing data tree is acquired from the processing data tree storage unit 8 and displayed.
(2) When the acquired event is a checkpoint selection, the parameter stored in association with the selected checkpoint ID is acquired from the processing data tree storage unit 8.
(3) A parameter is set (S213), the phase of the process to be restarted is set as a status, and the data mining control unit 1 is instructed to start the process.

データマイニング制御部１は、リスタート処理で設定されたパラメータを用いて、指定された処理を起動する。 The data mining control unit 1 starts the designated process using the parameters set in the restart process.

ここでは、「Ｄ２１１」を選択した場合を想定する。図１３は、Ｄ２１１クリック後の画面を示す図である。このように、再スタートが可能となる。 Here, it is assumed that “D211” is selected. FIG. 13 is a diagram showing a screen after D211 is clicked. In this way, restart is possible.

全体の処理を終了する場合には、処理データツリーを検索し、システムが作成したチェクポイントがあれば、当該チェックポイントで作成したファイルと当該チェックポイントの記録を削除する。 When the entire process is terminated, the process data tree is searched, and if there is a checkpoint created by the system, the file created at the checkpoint and the record of the checkpoint are deleted.

上述のデータマイニングシステムは、コンピュータであり、各要素はプログラムにより処理を実行することができる。また、プログラムを記憶媒体に記憶させ、記憶媒体からコンピュータに読み取られるようにすることができる。 The data mining system described above is a computer, and each element can execute processing by a program. Further, the program can be stored in a storage medium so that the computer can read the program from the storage medium.

図１４は、データマイニングシステムのハードウエア構成を示す図である。バスに、演算装置１４０１、データ記憶装置１４０２、メモリ１４０３が接続されている。データ記憶装置１４０２は、例えばＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やハードディスクである。メモリ１４０３は、通常ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。 FIG. 14 is a diagram illustrating a hardware configuration of the data mining system. An arithmetic device 1401, a data storage device 1402, and a memory 1403 are connected to the bus. The data storage device 1402 is, for example, a ROM (Read Only Memory) or a hard disk. The memory 1403 is a normal RAM (Random Access Memory).

プログラムは、通常データ記憶装置１４０２に記憶されており、メモリ１４０３にロードされた状態で、順次演算装置１４０１に読み込まれ、処理を行う。 The program is normally stored in the data storage device 1402, and is loaded into the memory 1403 and sequentially read into the arithmetic device 1401 for processing.

データマイニングシステムの構成を示す図である。It is a figure which shows the structure of a data mining system. 全体処理フローを示す図である。It is a figure which shows the whole processing flow. 処理データツリーの例を示す図である。It is a figure which shows the example of a process data tree. メインメニュー画面を示す図である。It is a figure which shows a main menu screen. データ選択画面を示す図である。It is a figure which shows a data selection screen. 前処理画面を示す図である。It is a figure which shows a pre-processing screen. チェックポイントのウィンドウを示す図である。It is a figure which shows the window of a checkpoint. データマイニング画面を示す図である。It is a figure which shows a data mining screen. データ選択画面を示す図である。It is a figure which shows a data selection screen. データ変換画面を示す図である。It is a figure which shows a data conversion screen. データマイニング画面を示す図である。It is a figure which shows a data mining screen. リスタート画面を示す図である。It is a figure which shows a restart screen. Ｄ２１１クリック後の画面を示す図である。It is a figure which shows the screen after D211 click. データマイニングシステムのハードウエア構成を示す図である。It is a figure which shows the hardware constitutions of a data mining system. データマイニング操作の手順を示す図である。It is a figure which shows the procedure of data mining operation.

Explanation of symbols

１データマイニング制御部、２データ選択処理部、３前処理部、４データ変換処理部、５データマイニング処理部、６チェックポイント処理部、７リスタート処理部、８処理データツリー記憶部、９実行フェーズ記憶部、１０時系列データ記憶部、１１抽出済みデータ記憶部、１２前処理済みデータ記憶部、１３データ変換済みデータ記憶部。 1 data mining control unit, 2 data selection processing unit, 3 preprocessing unit, 4 data conversion processing unit, 5 data mining processing unit, 6 checkpoint processing unit, 7 restart processing unit, 8 processing data tree storage unit, 9 execution Phase storage unit, 10 time-series data storage unit, 11 extracted data storage unit, 12 preprocessed data storage unit, 13 data converted data storage unit.

Claims

In the data mining system,
(1) a time series data storage unit for storing time series data groups ;
(2) a data selection processing unit for extracting time series data from the time series data storage unit ;
(3) a preprocessing unit for preprocessing the extracted time-series data ;
(4) a data conversion processing unit that converts the extracted time-series data or pre-processed time-series data ;
(5) a data mining processing unit that performs data mining processing on the extracted time-series data, pre-processed time-series data, or converted time-series data ;
(6) Transitions for each phase of data selection processing, preprocessing, data conversion processing, and data mining processing are stored as a processing data tree associated with a checkpoint tree structure, and parameters used in the phase are also associated A processing data tree storage unit for storing
(7) A data mining control unit that controls transition of each phase of the data selection process, the pre-process, the data conversion process, and the data mining process in accordance with an instruction from the operator ,
The data selection processing unit, the preprocessing unit, the data conversion processing unit, and the data mining processing unit store the processing data tree storage unit in association with the transition and the parameter for their processing, and process data Add automatic checkpoints to the tree,
The data mining system
(8) When there is an instruction from the operator, the process immediately before the instruction is stored in the processing data tree storage unit in association with the transition and the parameter, and the check point of the operator instruction is stored in the processing data tree. A checkpoint processor to add,
( 9 ) A processing data tree that distinguishes between the checkpoints specified by the operator and the automatic checkpoints is displayed , prompts the operator to specify the phase to resume by selecting the checkpoint , and used in the specified phase It has a restart processing unit that reads parameters, sets the read parameters, and instructs the data mining control unit to start the specified phase ,
The data mining control unit searches the processing data tree when the entire process is finished, and deletes the checkpoint if there is the automatic checkpoint .

A time-series data storage unit that stores time-series data groups and a data tree that stores transitions for each phase of data selection processing, preprocessing, data conversion processing, and data mining processing in a checkpoint tree structure Further, a program for causing a computer to be a data mining system having a processing data tree storage unit that also associates and stores parameters used in the phase to execute the following procedure
(1) Data selection processing procedure for extracting time series data from the time series data storage unit
(2) Preprocessing procedure for preprocessing the extracted time series data
(3) Data conversion processing procedure for converting extracted time-series data or pre-processed time-series data
(4) Data mining processing procedure for data mining processing of extracted time series data, preprocessed time series data, or converted time series data
(5) Data mining control procedure for controlling transition of each phase of data selection processing, preprocessing, data conversion processing, and data mining processing in accordance with an operator's instruction
(6) For the processing of the data selection processing procedure, the preprocessing procedure, the data conversion processing procedure, and the data mining processing procedure, the transition and the parameter are associated with each other and stored in a processing data tree storage unit, Procedure for adding automatic checkpoints to the process data tree
(7) When there is an instruction from the operator, the process immediately before the instruction is stored in the processing data tree storage unit in association with the transition and the parameter, and the check point of the operator instruction is stored in the processing data tree. Checkpoint processing procedure to be added
(8) A processing data tree that distinguishes between the checkpoints specified by the operator and the automatic checkpoints is displayed, prompts the operator to specify the phase to resume by selecting the checkpoint, and used in the specified phase Restart processing procedure that reads the parameters, sets the read parameters, and instructs the data mining control procedure to start the specified phase
(9) A processing procedure for searching a processing data tree and deleting the check point if there is the automatic check point when the entire processing is terminated.

A time-series data storage unit that stores time-series data groups and a data tree that stores transitions for each phase of data selection processing, preprocessing, data conversion processing, and data mining processing in a checkpoint tree structure Further, a data mining support method by a data mining system having a processing data tree storage unit that also stores the parameters used in the phase in association with each other, the data mining support method having the following elements:
(1) Data selection processing step in which the data selection processing unit of the data mining system extracts time-series data from the time-series data storage unit
(2) A preprocessing step in which the preprocessing unit of the data mining system preprocesses the extracted time-series data.
(3) Data conversion processing step in which the data conversion processing unit of the data mining system converts the extracted time-series data or pre-processed time-series data.
(4) Data mining processing step in which the data mining processing unit of the data mining system performs data mining processing on the extracted time-series data, pre-processed time-series data, or converted time-series data.
(5) A data mining control process in which the data mining control unit of the data mining system controls the transition of each phase of the data selection process, the preprocess, the data conversion process, and the data mining process in accordance with an instruction from the operator.
(6) The data selection processing unit, the preprocessing unit, the data conversion processing unit, and the data mining processing unit of the data mining system include the data selection processing step, the preprocessing step, the data conversion processing step, and the data mining processing. A process of adding an automatic checkpoint to the process data tree by storing the transition and the parameter in the process data tree storage unit in association with the process of the process itself
(7) When the checkpoint processing unit of the data mining system receives an instruction from the operator, the processing immediately before the instruction is associated with the transition and the parameter and stored in the processing data tree storage unit. Checkpoint processing step to add operator-directed checkpoints to the data tree
(8) The restart processing unit of the data mining system displays a processing data tree in which the checkpoint specified by the operator is distinguished from the automatic checkpoint, and the operator designates the phase to be restarted by selecting the checkpoint. Restart process step that reads the parameters used in the specified phase, sets the read parameters, and instructs the data mining control step to start the specified phase
(9) A processing step of searching the processing data tree when the data mining control unit of the data mining system ends the entire processing, and deleting the checkpoint if there is the automatic checkpoint.