JP2021039523A

JP2021039523A - Data preparation support system for data utilization and its method

Info

Publication number: JP2021039523A
Application number: JP2019159980A
Authority: JP
Inventors: 山本　秀典; Hidenori Yamamoto; 秀典山本; 高志津野; Takashi Tsuno; 齊藤　元伸; Motonobu Saito; 元伸齊藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2021-03-11
Anticipated expiration: 2039-09-02
Also published as: JP7247060B2; KR20210027024A; KR102345302B1

Abstract

To provide a data preparation support system for data utilization and its method in order to promote utilization of data collected from a plurality of business systems.SOLUTION: The system determines a recommended analysis step for an analysis purpose (222), extracts a combination of data available for the recommended analysis step from target data, evaluates each of a plurality of extracted data combinations, creates a list in which each of the plurality of data combinations is ranked based on a result of the evaluation (223), and allows a user to select a predetermined data combination.SELECTED DRAWING: Figure 2

Description

本発明は、データの利活用を支援するためのシステム、特に、データの利活用のためのデータ準備を支援するシステムに係り、例えば、複数の業務システムが有する多種、大量のデータの中から、データ分析の目的に沿ったデータを選出、抽出したりするのに好適なシステムに関する。 The present invention relates to a system for supporting the utilization of data, particularly a system for supporting the preparation of data for the utilization of data, for example, from among a large amount of various types of data possessed by a plurality of business systems. It relates to a system suitable for selecting and extracting data according to the purpose of data analysis.

昨今、企業内に蓄積された多種多量のデータを利用して様々な業務課題を改善することが行われている。例えば、特開2004-29971号公報には、生産工程における生産装置や処理時刻のデータの中から所望するデータ解析に必要なデータのみを容易に抽出して、歩留り向上に有効な解析結果を得るために、データの説明変数に対しデータ項目のカテゴリを識別する付加文字列を付加し、データの異常値を特定値に置換あるいは削除するデータクレンジングを行い、データの目的変数の変動に基づく特徴情報を得て、解析処理時にデータのカテゴリを認識しカテゴリに対応した条件設定及び解析手順によりデータ解析を効率的に自動実行するデータ解析方法が提案されている。 In recent years, various business issues have been improved by utilizing a large amount of data accumulated in a company. For example, Japanese Patent Application Laid-Open No. 2004-29971 easily extracts only the data necessary for the desired data analysis from the data of the production apparatus and the processing time in the production process, and obtains an analysis result effective for improving the yield. Therefore, an additional character string that identifies the category of the data item is added to the explanatory variable of the data, data cleansing is performed to replace or delete the abnormal value of the data with a specific value, and the characteristic information based on the fluctuation of the objective variable of the data is performed. Therefore, a data analysis method has been proposed in which a data category is recognized during analysis processing and data analysis is efficiently and automatically executed by setting conditions and an analysis procedure corresponding to the category.

そして、特開2016-181150号公報には、要求される分析の変更に対応自在のデータ処理を行うために、入力されたすべての入力データを格納するデータウェアハウスと、入力データを統合して統合データを生成した後、統合データを格納する統合レイヤと、統合データを、不加算項目の１つ以上の組合せ毎に、少なくとも加算項目の数量又は不加算項目の数を集計して複数の集計データを生成した後、複数の集計データを格納する集計レイヤと、設定部で設定された分析データの生成に必要な条件に基づき、複数の集計データから１つの集計データを選択し、さらに１つの集計データから分析データを抽出した後、分析データを格納する分析レイヤと、を有するデータ処理システムが提案されている。 Then, Japanese Patent Application Laid-Open No. 2016-181150 integrates the data warehouse that stores all the input data and the input data in order to perform data processing that can respond to the required changes in the analysis. After generating the integrated data, the integrated layer that stores the integrated data and the integrated data are aggregated for each combination of one or more non-addition items, and at least the number of addition items or the number of non-addition items is aggregated. After generating the data, one aggregated data is selected from the plurality of aggregated data based on the aggregated layer that stores a plurality of aggregated data and the conditions necessary for generating the analysis data set in the setting unit, and one more. A data processing system having an analysis layer for storing the analysis data after extracting the analysis data from the aggregated data has been proposed.

特開2016-181150JP 2016-181150

近年、交通、電力、産業等の多くの分野において、多様化、多角化する様々な問題を解決するために、部署や業務を横断して収集された業務データを活用することが求められている。一方、夫々別々に構築、運用されてきた、複数の業務システム毎に異なる多種、大量のデータを扱うために、ユーザにはデータの理解や業務に対する経験や知識が必要であるものの、そのレベルは人によって異なり、そのことが、データを用いた課題の分析を進める上で妨げになっている。例えば、あるユーザは、自身が属する部署や自身の業務に係るデータを熟知していても、他の部署や他の業務に係るデータを把握できていないことが多く、その結果、複数の部署や業務を跨いで、ユーザが所望する分析に必要なデータを選出したり、データを加工したりする等、分析のための準備は困難になる。 In recent years, in many fields such as transportation, electric power, and industry, it is required to utilize business data collected across departments and businesses in order to solve various diversified and diversified problems. .. On the other hand, in order to handle a large amount of data, which is different for each of multiple business systems and has been constructed and operated separately, the user needs to understand the data and have experience and knowledge about the business, but the level is high. It varies from person to person, and this hinders the progress of data-based analysis of issues. For example, a user is often familiar with the department to which he / she belongs and the data related to his / her business, but often does not know the data related to other departments or other business, and as a result, multiple departments or It becomes difficult to prepare for analysis, such as selecting the data necessary for the analysis desired by the user and processing the data across tasks.

そこで、複数の業務毎の違い、業務毎のデータやシステムの違いに対する理解や知識がユーザに不足していても、ユーザが、円滑にデータの分析に着手できるように、データの準備のための作業負荷を低減させることが望まれる。 Therefore, even if the user lacks understanding and knowledge about the differences between multiple businesses and the data and systems for each business, the data preparation is done so that the users can start analyzing the data smoothly. It is desirable to reduce the workload.

上記特許文献１に開示された発明は、ユーザが対象データを理解していることを前提とするものであって、そのために、分析目的、分析手段、及び、分析のために用意すべきデータの内容を、ユーザが事前に整理しておくことが必要であり、特定種類のデータに対して、ユーザが想定した目的の下で、当該データが活用できるに過ぎない。一方、特許文献２に開示された発明に活用できるデータは、統合データになり得るデータに限られる。複数の業務システムからの多種多様なデータを一様に統合できるとは限らない。また統合データ、集計データから目的に合った分析データを作成するためには、ユーザは、元のデータを全て理解していることが必要となる。 The invention disclosed in Patent Document 1 is based on the premise that the user understands the target data, and for that purpose, the analysis purpose, the analysis means, and the data to be prepared for the analysis. It is necessary for the user to organize the contents in advance, and the data can only be used for a specific type of data for the purpose assumed by the user. On the other hand, the data that can be utilized for the invention disclosed in Patent Document 2 is limited to the data that can be integrated data. It is not always possible to uniformly integrate a wide variety of data from multiple business systems. In addition, in order to create analysis data suitable for the purpose from integrated data and aggregated data, the user needs to understand all the original data.

本発明は、複数の業務システムから収集したデータの利活用を促進するために、データの利活用のためのデータの準備を支援するシステム、及び、その方法を提供することを目的とする。 An object of the present invention is to provide a system and a method for supporting the preparation of data for data utilization in order to promote the utilization of data collected from a plurality of business systems.

前記目的を解決するために、本発明は、データの利活用のためのデータ準備を支援するシステムであって、処理装置と、記憶装置と、を備え、前記処理装置は、前記記憶装置に記録されたプログラムを実行することによって、複数の業務システムの夫々から業務データを収集し、当該業務データを前記記憶装置に対象データとして少なくとも一時的に蓄積し、ユーザ端末から前記対象データに対する分析目的を受信し、前記分析目的と前記対象データとに基づいて、前記分析目的に対して推奨される分析ステップを決定し、前記推奨される分析ステップに利用可能なデータの組合せを、前記対象データから抽出し、前記抽出されたデータの組合せの複数の夫々を評価し、当該評価の結果に基づいて、前記データの組合せの複数を夫々順位付けしたリストを作成し、当該リストを前記ユーザ端末に出力させ、ユーザが前記リストから所定のデータの組合せを選択できるようにした。 In order to solve the above object, the present invention is a system that supports data preparation for data utilization, and includes a processing device and a storage device, and the processing device records in the storage device. By executing the executed program, business data is collected from each of a plurality of business systems, the business data is stored in the storage device at least temporarily as target data, and the purpose of analysis of the target data is performed from the user terminal. Receive, determine the recommended analysis steps for the analysis purpose based on the analysis purpose and the target data, and extract a combination of data available for the recommended analysis step from the target data. Then, a plurality of each of the extracted data combinations is evaluated, a list in which the plurality of the data combinations are ranked based on the evaluation result is created, and the list is output to the user terminal. , Allowed the user to select a predetermined data combination from the list.

本発明によれば、複数の業務システムの多種多様、大量のデータに基づいて、課題解決などの分析等のために、これらデータの利活用を進める際でのデータ準備作業に於けるユーザの負荷を軽減し、以って、データの利活用を迅速、かつ、高品質に進めることができる。 According to the present invention, the load on the user in the data preparation work when advancing the utilization of these data for analysis such as problem solving based on a wide variety of a plurality of business systems and a large amount of data. Therefore, the utilization of data can be promoted quickly and with high quality.

データの利活用のためのデータ準備を支援するシステム構成の一例を示すブロック図である。It is a block diagram which shows an example of the system configuration which supports the data preparation for the utilization of data. サーバがデータの利活用のためのデータ準備を支援する動作を実現するためのシーケンスの一例である。This is an example of a sequence for realizing an operation in which a server supports data preparation for data utilization. サーバのモジュール構成の一例である。This is an example of a server module configuration. ユーザが作成する分析目的情報、サーバにて保持する基準情報の構成の一例である。This is an example of the configuration of analysis purpose information created by the user and reference information held by the server. 分析ステップとして全体像把握をユーザが指定すると、サーバが作成する、データ準備テーブルの一例である。This is an example of a data preparation table created by the server when the user specifies to grasp the whole picture as an analysis step. 分析ステップとして、“特異事象抽出”をユーザが指定すると、サーバが作成する、データ準備テーブルの一例である。This is an example of a data preparation table created by the server when the user specifies "specific event extraction" as an analysis step. 分析ステップとして、“要因分析／予測”をユーザが指定すると、サーバが作成する、データ準備テーブルの一例である。This is an example of a data preparation table created by the server when the user specifies "factor analysis / prediction" as the analysis step. サーバから、ユーザに提示、又は、提案される、データの組合せを管理する管理テーブルの一例を示す。An example of a management table that manages data combinations presented or proposed by the server to the user is shown. サーバがユーザに分析ステップを提案するための動作の一例に係る、フローチャートである。It is a flowchart which concerns on an example of the operation for the server to propose an analysis step to a user. ユーザが登録した分析ステップの適否を判定するためのマトリックスの第１の例である。This is the first example of a matrix for determining the suitability of an analysis step registered by a user. ユーザが登録した分析ステップの適否を判定するためのマトリックスの第２の例である。This is a second example of a matrix for determining the suitability of a user-registered analysis step. ユーザが登録した分析ステップの適否を判定するためのマトリックスの第３の例である。This is a third example of a matrix for determining the suitability of a user-registered analysis step. サーバがユーザに、分析ステップとしての全体像把握に利用可能なデータの組合せを、ユーザに提示するための動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation for presenting a combination of data which a server can use for grasping the whole picture as an analysis step to a user. サーバが、分析ステップとしての特異事象抽出に利用可能なデータの組合せを、ユーザに提示するための動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation for presenting the combination of data which a server can use for the peculiar event extraction as an analysis step to a user. サーバが、分析ステップとしての要因分析／予測に利用可能なデータの組合せを、ユーザに提示するための動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation for presenting a combination of data which a server can use for factor analysis / prediction as an analysis step to a user. サーバからユーザ端末に対して提供された、データ利活用を支援するためのグラフィカルユーザインターフェースの一例である。This is an example of a graphical user interface provided by the server to the user terminal to support data utilization. サーバからユーザ端末に対して提供された、データ利活用を支援するためのグラフィカルユーザインターフェースの他の例である。It is another example of a graphical user interface provided by a server to a user terminal to support data utilization. サーバからユーザ端末に対して提供された、データ利活用を支援するためのグラフィカルユーザインターフェースのさらに他の例である。It is yet another example of a graphical user interface provided by a server to a user terminal to support data utilization. 図８のフローチャートにおいて、複数のカラムの組合せを説明するためのテーブルである。In the flowchart of FIG. 8, it is a table for explaining the combination of a plurality of columns. 図１０のフローチャートにおいて、複数のカラムの組合せを説明するためのテーブルである。It is a table for demonstrating the combination of a plurality of columns in the flowchart of FIG.

次に、本発明の実施形態について説明する。図１は、本発明を実現するための計算機システムの一例の構成図である。この計算機システムは、複数の業務システムから収集したデータの利活用のためのデータ準備を支援するサーバ１０１（コンピュータ）を備える。サーバ１０１は、システムを利用するユーザが入力した、データ分析の目的と、サーバ１０１が収集した、複数の業務システムのデータとを照合して、ユーザに対して、最適な分析ステップを提案し、合わせて、この分析ステップに利用可能な有用度の高いデータを提案することによって、データの利活用のためのデータ準備を支援する。 Next, an embodiment of the present invention will be described. FIG. 1 is a block diagram of an example of a computer system for realizing the present invention. This computer system includes a server 101 (computer) that supports data preparation for utilizing data collected from a plurality of business systems. The server 101 collates the purpose of data analysis input by the user who uses the system with the data of a plurality of business systems collected by the server 101, proposes the optimum analysis step to the user, and proposes the optimum analysis step. At the same time, we support data preparation for data utilization by proposing highly useful data that can be used for this analysis step.

ユーザは、システムに分析目的を登録するだけで、そして、システムから提案、又は、提供されたデータに基づいて、課題解決のため等の分析を実行することにより、ユーザ自身がデータを収集したり、数多くのデータの中から有用なデータを抽出する等のデータ準備に係る作業を強いられることなく、ユーザが意図している、データ分析を行うことができる。 The user can collect the data by simply registering the analysis purpose in the system, and by performing the analysis such as for solving the problem based on the data proposed or provided by the system. , Data analysis intended by the user can be performed without being forced to perform data preparation work such as extracting useful data from a large number of data.

サーバ１０１には、複数の業務システム１０４、１０５、１０６と、複数のユーザ端末１０２、１０３とが接続されている。ユーザ端末は、ユーザが希望する分析目的をサーバ１０１に登録し、次いで、分析目的を実現する上での分析ステップ、及び、この分析ステップに利用可能なデータの提案を、サーバ１０１から受け、次いで、これらを確認後、データ分析を実施する、もしくは、サーバ１０１に対してデータ分析を実施させることができる。 A plurality of business systems 104, 105, 106 and a plurality of user terminals 102, 103 are connected to the server 101. The user terminal registers the analysis purpose desired by the user in the server 101, then receives an analysis step for realizing the analysis purpose and a proposal of data available for this analysis step from the server 101, and then receives a proposal from the server 101. After confirming these, the data analysis can be performed, or the server 101 can be made to perform the data analysis.

複数の業務システムの夫々から、サーバ１０１に、ユーザが所望するデータ分析の対象として、サーバ１０１が参照するデータ（以下、対象データ、という。）が提供される。なお、複数の業務システムの夫々からサーバ１０１へのデータ提供はネットワーク１０８を介した通信により実施する場合だけでなく、ネットワーク１０８を介さず、例えば、人手を介してのサーバ１０１へのデータ格納を行うことでもよい。複数の業務システムとしては、鉄道分野を例とすると、例えば、列車の運行管理システム、駅の管理システム、そして、保守管理システムを例示することができる。 Each of the plurality of business systems provides the server 101 with data referred to by the server 101 (hereinafter, referred to as target data) as a target of data analysis desired by the user. It should be noted that the data is provided to the server 101 from each of the plurality of business systems not only when the data is provided by communication via the network 108, but also when the data is stored in the server 101 manually, for example, without going through the network 108. You may do it. As a plurality of business systems, for example, in the railway field, a train operation management system, a station management system, and a maintenance management system can be exemplified.

業務システムからサーバ１０１に送られるデータは、特に、制限されなくてよく、例えば、データテーブル、その他、バイナリデータとしてのセンサ情報等、データの種類、データの方式等が制限されない。ユーザは、複数の業務システムを横断して、所定の課題や問題点の原因を総合的に探るために、サーバ１０１に、データ分析の目的を登録する。この目的としては、輸送業を例にすると、例えば、“列車の遅延”があり、即ち、遅延の原因を、複数の業務システムを横断して特定し、以って、遅延の予防、抑制に役立てようとすることがある。 The data sent from the business system to the server 101 does not have to be particularly limited, and for example, the data table, other sensor information as binary data, the type of data, the data method, and the like are not limited. The user registers the purpose of data analysis in the server 101 in order to comprehensively search for the cause of a predetermined problem or problem across a plurality of business systems. For this purpose, taking the transportation industry as an example, there is, for example, "train delay", that is, the cause of the delay can be identified across multiple business systems, and thus the prevention and control of the delay can be achieved. I may try to help.

サーバ１０１と、複数のユーザ端末１０２、１０３、・・・・の夫々とはネットワーク１０７を介して接続されている。サーバ１０１と、複数の業務システム１０４、１０５、１０６、・・・・の夫々とはネットワーク１０８を介して相互接続されている。 The server 101 and the plurality of user terminals 102, 103, ... Are connected to each other via the network 107. The server 101 and the plurality of business systems 104, 105, 106, ... Are interconnected via the network 108.

データ利活用、及び、データ準備のための基盤としてのサーバ１０１の主なハードウェア構成は、記憶装置（メインメモリ、ハードディスク、外部ストレージシステムの記憶領域）１１１、処理装置（ＣＰＵ）１１２、そして、通信装置１１３である。ユーザ端末１０２、１０３も同様である。処理装置１１２は、非一時的記録媒体（記憶装置）１１１に対象データを少なくとも一時的に記録する。 The main hardware configurations of the server 101 as a base for data utilization and data preparation are a storage device (main memory, hard disk, storage area of an external storage system) 111, a processing device (CPU) 112, and Communication device 113. The same applies to the user terminals 102 and 103. The processing device 112 records the target data at least temporarily on the non-temporary recording medium (storage device) 111.

図２は、サーバ１０１がデータの利活用のためのデータ準備を支援する動作を実現するためのシーケンスの一例である。なお、ユーザ２０１は、複数の部署を横断して収集された様々なデータに対して、所定の分析ステップに基づいて、問題発見、解決策の立案、等を行おうとしている者、又は、そのためのアプリケーションであってよい。 FIG. 2 is an example of a sequence for realizing an operation in which the server 101 supports data preparation for data utilization. In addition, the user 201 is a person who is trying to find a problem, formulate a solution, etc. based on a predetermined analysis step for various data collected across a plurality of departments, or for that purpose. Application may be.

複数の業務システム１０４−１０６の夫々は、サーバ１０１に業務データを登録する(２１１)。サーバ１０１は、業務データの記憶領域にアクセスして、業務データを参照し、データカタログ、データ定義を設定する(２２１)。サーバ１０１がデータカタログ等を設定するとは、データカタログ等を作成すること、又は、更新することを含む。サーバ１０１は、後述のとおり、ユーザが入力した分析目的と、自身が作成した基準情報（データカタログ、データ定義）とに基づいて、分析態様（分析ステップ、候補データリスト）を作成して、これをユーザに提示する。 Each of the plurality of business systems 104-106 registers business data in the server 101 (211). The server 101 accesses the storage area of the business data, refers to the business data, and sets the data catalog and the data definition (221). Setting the data catalog or the like by the server 101 includes creating or updating the data catalog or the like. As will be described later, the server 101 creates an analysis mode (analysis step, candidate data list) based on the analysis purpose input by the user and the reference information (data catalog, data definition) created by the server 101. To the user.

データカタログ、そして、データ定義等の基準情報は、サーバ１０１が、ユーザ２０１が意図する分析目的と、業務データとに基づいて、ユーザ２０１に、最適な分析ステップを提案し、そして、この分析ステップに利用可能な有用度の高いデータを提案するために、業務データを分別する、評価する等、業務データの解析のための基準、又は、指標等の一例として理解されればよい。データカタログ、データ定義の詳細は後述する。 For the data catalog and the reference information such as the data definition, the server 101 proposes the optimum analysis step to the user 201 based on the analysis purpose intended by the user 201 and the business data, and this analysis step. In order to propose highly useful data that can be used in the above, it may be understood as an example of a standard or an index for analyzing business data such as sorting and evaluating business data. Details of the data catalog and data definition will be described later.

ユーザ２０１は、ユーザ端末１０２を介して、所望する、データ分析の目的をサーバ１０１に登録する(２３１)。サーバ１０１は、業務システムデータ、そして、基準情報（データカタログ、データ定義）とに基づいて、ユーザが登録した分析目的を評価して、ユーザに推奨する分析ステップを判定、判別、決定、そして、選定する等して確定し、これをユーザ端末１０２に報知する(２２２)。 The user 201 registers the desired purpose of data analysis in the server 101 via the user terminal 102 (231). The server 101 evaluates the analysis purpose registered by the user based on the business system data and the reference information (data catalog, data definition), determines, discriminates, determines, and determines the analysis step recommended to the user. It is selected and confirmed, and this is notified to the user terminal 102 (222).

ユーザ２０１は、サーバ１０１から提示され、推奨された分析ステップを確認する、又は、複数の分析ステップの中から所定の分析ステップを選択する等して、分析ステップを確定させて、これをサーバ１０１に報知する(２３２)。 The user 201 determines the analysis step by confirming the recommended analysis step presented by the server 101 or selecting a predetermined analysis step from the plurality of analysis steps, and confirms the analysis step, and confirms the analysis step. (232).

サーバ１０１は、複数の業務システムから収集したデータの中から、分析ステップに適した、複数の候補データを抽出して、これをユーザ２０１に提示する(２２３)。ユーザ２０１は、サーバ１０１から送信された、複数の候補データのリストを参照して、分析対象とするデータを決定、或いは、選択等によって設定して、確定した分析ステップ(２３２)に基づいて、分析対象データを分析する。また、これをサーバ１０１に報知する(２３３)。サーバ１０１は、ステップ２２２、ステップ２２３、そして、ステップ２３３の結果に基づいて、基準情報（データカタログ／データ定義）を更新する(２２４)。 The server 101 extracts a plurality of candidate data suitable for the analysis step from the data collected from the plurality of business systems, and presents the candidate data to the user 201 (223). The user 201 refers to a list of a plurality of candidate data transmitted from the server 101, determines the data to be analyzed, sets it by selection or the like, and based on the determined analysis step (232). Analyze the data to be analyzed. In addition, this is notified to the server 101 (233). The server 101 updates the reference information (data catalog / data definition) based on the results of step 222, step 223, and step 233 (224).

データカタログ、データ定義等の基準情報は、サーバ１０１の管理者、又は、運営者によって作成、又は、更新されてよい。サーバ１０１は、ユーザの分析目的、業務データに基づく機械学習を行うことによって、データカタログ等を作成、又は、更新してもよい。 The reference information such as the data catalog and the data definition may be created or updated by the administrator or the operator of the server 101. The server 101 may create or update a data catalog or the like by performing machine learning based on the user's analysis purpose and business data.

分析目的（後述の図４：４０１）はデータ分析のターゲットを端的に示す情報であってよく、例えば、“列車遅延”である。ユーザ２０１は、分析目的を、例えば、“分析ステップ”、“ＫＰＩ”、そして、“着目データ項目”によって記述してサーバに登録することができる。“分析ステップ”、そして、“ＫＰＩ”は必須情報とし、“着目データ項目”は任意情報としてよい。ＫＰＩは、データの分析において、ユーザが注目する要素であり、例えば、既述の“列車遅延”である。ＫＰＩは、キーワードの他、算出式によって定義されてもよい。 The purpose of the analysis (FIG. 4: 401 below) may be information that simply indicates the target of the data analysis, for example, "train delay". The user 201 can describe the analysis purpose by, for example, "analysis step", "KPI", and "data item of interest" and register it in the server. The "analysis step" and "KPI" may be essential information, and the "data item of interest" may be optional information. The KPI is an element that the user pays attention to in the analysis of data, for example, the above-mentioned "train delay". The KPI may be defined by a calculation formula in addition to the keyword.

着目データ項目は、例えば、ＫＰＩを評価するための軸として着目されるべき要素、又は、ＫＰＩと同時に監視すべき名称である。例えば、ＫＰＩが“列車遅延”として、着目データ項目は、“発車時刻”、“到着時刻”、“運行日時”、そして、“遅延時分”の少なくとも一つである。 The data item of interest is, for example, an element that should be focused as an axis for evaluating the KPI, or a name that should be monitored at the same time as the KPI. For example, when the KPI is "train delay", the data item of interest is at least one of "departure time", "arrival time", "operation date and time", and "delay time".

分析ステップは、業務上の問題解決等のために分析を進める上での実施段階を表したものであり、複数のタイプがあり、主として、対象データの全体像を把握するための態様、対象データから特異事象を抽出するための態様、そして、対象データから要因を分析して、結果を予測する態様が例示される。これらを、以後、(1)全体像把握、(2)特異事象抽出、(3)要因分析／予測、ということとする。 The analysis step represents the implementation stage in advancing the analysis for solving business problems, etc., and there are multiple types, mainly the mode for grasping the overall picture of the target data and the target data. An embodiment for extracting a peculiar event from the data, and an embodiment for analyzing a factor from the target data and predicting the result are exemplified. These are referred to as (1) grasping the whole picture, (2) extracting specific events, and (3) factor analysis / prediction.

ユーザ２０１は、複数の分析ステップを分析目的に含ませることができる。複数のステップには、優先度、順番等の優劣の区別があってもよい。 User 201 can include a plurality of analysis steps for analysis purposes. There may be a distinction between superiority and inferiority in the plurality of steps such as priority and order.

前記分析ステップの夫々の態様について、ユーザ２０１が用い得る計算手法として、例えば、全体像把握には傾向分析、特異事象抽出には外れ値検出、及び／又は、変化点抽出、要因分析／予測には、回帰分析／重回帰分析、及び／又は、分類／クラスタ分析がある。 Regarding each aspect of the analysis step, as a calculation method that can be used by the user 201, for example, trend analysis for grasping the whole picture, outlier detection for singular event extraction, and / or change point extraction, factor analysis / prediction. Has regression analysis / multiple regression analysis and / or classification / cluster analysis.

サーバ１０１は、記憶装置１１１に、基準情報（“定義情報”と称してもよい）として、データカタログ、データ定義(２２１)を有する。データカタログ（図４：４０２）には、サーバ１０１が複数の業務システム１０４から収集した対象データのデータカタログと、対象データを加工した加工データのデータカタログとがある。加工データのデータカタログには、対象データから加工データを得るための算出式に関する情報を含んでよい。 The server 101 has a data catalog and a data definition (221) in the storage device 111 as reference information (may be referred to as “definition information”). The data catalog (FIG. 4: 402) includes a data catalog of target data collected by the server 101 from a plurality of business systems 104 and a data catalog of processed data obtained by processing the target data. The processing data data catalog may include information on a calculation formula for obtaining processing data from the target data.

対象データに関するカタログとして、例えば、“カラム名”、“名称・意味”、“補足情報”、等の項目からなるものがある。一例として、｛arrival time,到着時刻,・・・｝、｛KRT,キロ程,・・・｝がある。なお、カラム名とは、対象データ（データテーブル）のカラムに記録された名称である。 As a catalog related to the target data, for example, there is a catalog consisting of items such as "column name", "name / meaning", and "supplementary information". As an example, there are {arrival time, arrival time, ...}, {KRT, about km, ...}. The column name is a name recorded in the column of the target data (data table).

加工データに関するカタログとして、“データ名”、“算出式”、“名称・意味”、“補足情報”等の項目からなるものがあり、一例として、｛delay,“実績時刻”-“計画時刻”,遅延時分,・・・｝がある。 Catalogs related to processing data include items such as "data name", "calculation formula", "name / meaning", and "supplementary information". As an example, {delay, "actual time"-"planned time" , Delay time, ...}.

既述のデータ定義２２１は、例えば、４Ｗ辞書（図４：４０３）と、データ推移パタン（図４：４０４）と、を備える。４Ｗ辞書は、対象データの全体像の傾向を４Ｗ(When、Where、What、Who)の観点から把握するためのものであり、４Ｗのカテゴリ毎にキーワードを含む。例えば、４Ｗ辞書データは、“キーワード”、“カテゴリ”の組合せからなり、一例として、｛キーワード,カテゴリ｝=｛時刻,when｝、｛date,when｝、｛キロ程,where｝、｛Kilometrage,where｝、｛駅コード,where｝である。 The data definition 221 described above includes, for example, a 4W dictionary (FIG. 4: 403) and a data transition pattern (FIG. 4: 404). The 4W dictionary is for grasping the tendency of the whole image of the target data from the viewpoint of 4W (When, Where, What, Who), and includes keywords for each 4W category. For example, 4W dictionary data consists of a combination of "keyword" and "category". As an example, {keyword, category} = {time, when}, {date, when}, {kilometer, where}, {Kilometrage, where}, {station code, where}.

データ推移パタンは、４Ｗ辞書に加えて値の推移を含むものであり、例えば、“データ名・種別”、“カテゴリ”、“データ型”、“値変化幅”、“下限”、そして、“上限”とからなる。一例として、｛キロ程,where,integer,1,0,300｝、｛到着時刻(分),when,integer,5,0,1440｝がある。 The data transition pattern includes the transition of values in addition to the 4W dictionary. For example, "data name / type", "category", "data type", "value change range", "lower limit", and " It consists of "upper limit". Examples include {kilometers, where, integer, 1,0,300} and {arrival time (minutes), when, integer, 5,0,1440}.

図３は、サーバ１０１のモジュール構成の一例である。サーバ１０１には、既述の図２において説明した、諸機能を実現するためのミドルウェア３０１が実装されている。ミドルウェア３０１は、サーバ１０１の記憶装置１１１に保存されたプログラムを実行する処理装置１１２によって実現される、複数の機能モジュールを備えている。そして、ミドルウェア３０１は、業務システム１０４の対象データ３１１、対象データ３１１から特徴量として抽出される等して加工された加工データ３１２とを、記憶装置１１１に備えている。 FIG. 3 is an example of the module configuration of the server 101. The server 101 is equipped with the middleware 301 for realizing various functions described in FIG. 2 described above. The middleware 301 includes a plurality of functional modules realized by a processing device 112 that executes a program stored in the storage device 111 of the server 101. Then, the middleware 301 is provided in the storage device 111 with the target data 311 of the business system 104 and the processed data 312 processed by being extracted as a feature amount from the target data 311.

ミドルウェア３０１は、対象データ３１１、加工データ３１２を管理するデータ管理モジュール３２１を備え、データカタログ４０２、データ定義３１５、そして、ユーザ２０１が登録した分析目的の適否を判定するためのマトリックス３１３を記憶装置１１１に備えている。 The middleware 301 includes a data management module 321 that manages the target data 311 and the machining data 312, and stores the data catalog 402, the data definition 315, and the matrix 313 for determining the suitability of the analysis purpose registered by the user 201. It is prepared for 111.

ミドルウェア３０１は、既述の基準情報（データカタログ、データ定義）を管理する基準情報管理モジュール３２２、サーバ１０１にアクセスしてデータ準備に係る作業を行うユーザを管理するユーザ管理モジュール３２３、定義情報を参照し、そして、ユーザの分析目的情報４０１に基づいて、ユーザに分析ステップ、及び/又は、作業項目の提案を行う分析ステップ提案モジュール３２４を備える。 The middleware 301 has a reference information management module 322 that manages the above-mentioned reference information (data catalog, data definition), a user management module 323 that manages a user who accesses the server 101 and performs work related to data preparation, and definition information. It is provided with an analysis step proposal module 324 that references and / or proposes an analysis step and / or a work item to the user based on the user's analysis objective information 401.

ミドルウェア３０１は、データ定義３１５を参照して、ユーザの分析目的情報４０１に基づいて分析用データを提案する分析用データ提案モジュール３２５と、ユーザに対する提案情報をデータ準備テーブル３１４として更新、管理する提案実行管理モジュール３２６と、ユーザ端末１０２、１０３に、ミドルウェア３０１の機能にアクセスするためのインタフェースを提供するインタフェースモジュール３２７と、ネットワーク１０７、１０８を介して、ユーザ端末１０２、１０３、そして、業務システム１０４・・・・と通信を行うデータ通信モジュール３２８と、分析ステップが全体像把握である場合の分析用データ提案のための、例えば、傾向分析等の処理を実行する全体像把握モジュール３３１と、分析ステップが特異事象である場合の分析用データ提案のための、例えば、特異点(例えば、外れ値、大きな変化点、大きな偏り)を抽出する処理を実行する特異事象抽出モジュール３３２と、分析ステップが要因分析／予測である場合の分析用データ提案のための、例えば、回帰分析、重回帰分析等の処理を実行する要因分析・予測モジュール３３３と、を備える。 The middleware 301 refers to the data definition 315, and has an analysis data proposal module 325 that proposes analysis data based on the user's analysis purpose information 401, and a proposal that updates and manages the proposal information for the user as a data preparation table 314. The execution management module 326, the interface module 327 that provides the user terminals 102 and 103 with an interface for accessing the functions of the middleware 301, the user terminals 102 and 103, and the business system 104 via the networks 107 and 108. Analysis with a data communication module 328 that communicates with ..., and an overall image grasp module 331 that executes processing such as trend analysis for proposing data for analysis when the analysis step is an overall image grasp. The analysis step includes, for example, a singular event extraction module 332 that executes a process of extracting singular points (for example, an outlier, a large change point, and a large bias) for data proposal for analysis when the step is a singular event. It is provided with a factor analysis / prediction module 333 for executing processing such as regression analysis and multiple regression analysis for data proposal for analysis in the case of factor analysis / prediction.

既述の基準情報管理モジュール３２２は、基準情報（データカタログ４０２、データ定義３１５、マトリックス３１３）を管理する。例えば、この情報は、産業分野毎（鉄道、電力等）に存在してよく、基準情報管理モジュール３２２は、産業分野毎に基準情報を管理し、業務データ、又は、分析目的等に応じて、所定分野の基準情報を選択するようにしてもよい。 The reference information management module 322 described above manages reference information (data catalog 402, data definition 315, matrix 313). For example, this information may exist for each industrial field (railway, electric power, etc.), and the standard information management module 322 manages the standard information for each industrial field, and depending on the business data or the purpose of analysis, etc. Reference information in a predetermined field may be selected.

なお、既述のモジュールとは、処理装置がプログラムを実行することにより実現される機能であって、例えば、部、手段、要素、回路、又は、ユニットと言い換えられてもよい。 The module described above is a function realized by the processing device executing a program, and may be paraphrased as, for example, a part, a means, an element, a circuit, or a unit.

次に、サーバ１０１によって作成され、ユーザに提示される分析用データの候補リストについて説明する。図５Ａの５０１は、分析ステップとして“全体像把握”をユーザが指定すると、サーバ１０１が作成する、前記データ準備テーブル３１４（候補リスト）の一例である。分析用データ提案モジュール３２５が、このテーブルを作成、又は、更新する。このテーブルは、順位５１１、識別情報５１２、ＫＰＩ５１３、着目データ１(When)５１４、着目データ２(Where)５１５、着目データ３(What)５１６、着目データ４(Who)５１７、レコード数５１８、そして、出力ファイル５１９を備える。 Next, a candidate list of analysis data created by the server 101 and presented to the user will be described. 501 of FIG. 5A is an example of the data preparation table 314 (candidate list) created by the server 101 when the user specifies "grasp the whole picture" as the analysis step. The data proposal module 325 for analysis creates or updates this table. This table contains rank 511, identification information 512, KPI 513, focus data 1 (When) 514, focus data 2 (Where) 515, focus data 3 (What) 516, focus data 4 (Who) 517, number of records 518, and , The output file 519 is provided.

順位５１１には、識別情報５１２により特定されるデータ組合せ（行）に対して、優先度に基づき割り振られる順位に関する情報が格納される。データ組合せとは、対象データ３１１のテーブルが“ＫＰＩデータ”カラム名として含む、当該カラムのレコードと、“when”に該当する着目データ１（例えば、既述の“時刻”）を、対象データ３１１のテーブルがカラム名として含む、当該カラムのレコードと、“where”に該当する着目データ２（例えば、既述の“キロ程”）に係る同様なレコードと、“what”に該当する着目データ３（例えば、既述の“列番”）に係る同様なレコードと、そして、“who”に該当する着目データ４（例えば、“運転士ID”）に係る同様なレコードとの組合せをいう。 The rank 511 stores information regarding the rank assigned based on the priority for the data combination (row) specified by the identification information 512. The data combination is the target data 311 including the record of the column included in the table of the target data 311 as the “KPI data” column name and the data of interest 1 (for example, the above-mentioned “time”) corresponding to “when”. The column name included in the table of 1), the similar record related to the attention data 2 corresponding to “where” (for example, the above-mentioned “kilo”), and the attention data 3 corresponding to “what”. It refers to a combination of a similar record related to (for example, the above-mentioned “column number”) and a similar record related to the data of interest 4 (for example, “driver ID”) corresponding to “who”.

なお、着目データ１−４（５１４−５１７）のそれぞれには、分析目的情報４０１を構成するユーザ入力の、“着目データ項目”と同一又は類似のものがあれば、これが記載される。着目データ１−４（５１４−５１７）のうち、該当するものがない、即ち、４Ｗ辞書に登録が無く、かつ、着目データ項目も規定されていない場合には、空白が記録される。 If each of the data of interest 1-4 (514-517) has the same or similar data as the "data item of interest" of the user input constituting the analysis purpose information 401, this is described. If there is no corresponding data 1-4 (514-517), that is, there is no registration in the 4W dictionary and the data item of interest is not specified, a blank is recorded.

ＫＰＩ５１３には、分析目的情報４０１を構成する、ユーザ入力の“ＫＰＩ”と、同一、又は、類似のものが格納される。 The KPI 513 stores the same or similar as the user-input "KPI" that constitutes the analysis purpose information 401.

レコード数５１８には、識別情報５１２により特定されるデータ組合せにおける、有効状態にあるレコード数が格納される。出力ファイル５１９には、識別情報５１２により特定されるデータ組合せのレコードを出力するファイルのパスが格納される。 The number of records 518 stores the number of records in the valid state in the data combination specified by the identification information 512. The output file 519 stores the path of the file that outputs the record of the data combination specified by the identification information 512.

図５Ｂの５０２は、分析ステップとして、“特異事象抽出”をユーザが指定すると、サーバ１０１が作成する、データ準備テーブル３１４の一例である。特異事象リストでもあるテーブル５０２において、テーブル５０１と同一の符号についての説明は同じである。テーブル５０２は、特異点(外れ値)数５２０、特異点(変化点)数５２１、特異点(偏り)数５２２を備える。 FIG. 502 of FIG. 5B is an example of the data preparation table 314 created by the server 101 when the user specifies “specific event extraction” as the analysis step. In Table 502, which is also a singular event list, the description of the same code as in Table 501 is the same. Table 502 includes 520 singular points (outliers), 521 singular points (change points), and 522 singular points (bias).

特異点(外れ値)数５２０には、識別情報５１２により特定されるデータ組合せにおける、ＫＰＩから抽出した特異点の種別が“外れ値”であるものの数に関する情報が格納される。 The number of singular points (outliers) 520 stores information on the number of data combinations specified by the identification information 512 in which the type of singular points extracted from the KPI is "outliers".

特異点(変化点)数５２１には、識別情報５１２により特定されるデータ組合せにおける、ＫＰＩから抽出した特異点の種別が“大きな変化点”であるものの数に関する情報が格納される。特異点(偏り)数５２２には、識別情報５１２により特定されるデータ組合せにおけるＫＰＩから抽出した特異点の種別が“大きな偏り”であるものの数に関する情報が格納される。 The number of singular points (change points) 521 stores information on the number of data combinations specified by the identification information 512 in which the type of singular points extracted from the KPI is a "major change point". The singularity (bias) number 522 stores information on the number of singular points whose type is "large bias" extracted from the KPI in the data combination specified by the identification information 512.

図５Ｃの５０３は、分析ステップとして、“要因分析／予測”をユーザが指定すると、サーバ１０１が作成する、データ準備テーブル３１４の一例である。目的変数・説明変数組合せリストでもあるテーブル５０３において、テーブル５０１と同一の符号についての説明は同じである。 503 of FIG. 5C is an example of the data preparation table 314 created by the server 101 when the user specifies "factor analysis / prediction" as the analysis step. In Table 503, which is also a combination list of objective variables and explanatory variables, the description of the same code as that of Table 501 is the same.

目的変数・説明変数組合せリスト５０３は、目的変数５３０、説明変数５３１、レコード数５１８、判定値１（５３３）、判定値２（５３４）、判定値３（５３５）、判定値４（５３６）を備える。目的変数５３０には、分析目的に含まれるＫＰＩが格納される。説明変数データ５３１には、ＫＰＩ名を含むカラム以外で、ユーザによって選択された、カラムの名称が記録される。 The objective variable / explanatory variable combination list 503 contains objective variable 530, explanatory variable 531, number of records 518, determination value 1 (533), determination value 2 (534), determination value 3 (535), and determination value 4 (536). Be prepared. The KPI included in the analysis purpose is stored in the objective variable 530. In the explanatory variable data 531, the name of the column selected by the user other than the column containing the KPI name is recorded.

判定値１（５３３）には、識別情報５１２により特定される、データの組合せの順位５１１を算出するために用いる１つ目の判定値データが格納され、判定値２（５３４）には、２つ目の判定値データが格納され、判定値３（５３５）には、３つ目の判定値データに関する情報が格納され、判定値４（５３６）には、４つ目の判定値データに関する情報が格納される。これら判定値については図１０において、詳しく説明する。 The determination value 1 (533) stores the first determination value data used for calculating the rank 511 of the data combination specified by the identification information 512, and the determination value 2 (534) is 2 The third judgment value data is stored, the judgment value 3 (535) stores the information about the third judgment value data, and the judgment value 4 (536) stores the information about the fourth judgment value data. Is stored. These determination values will be described in detail with reference to FIG.

図５Ｄは、サーバ１０１から、ユーザに提示、又は、提案される、データの組合せを管理する管理テーブル５０４の一例である。このテーブルは、データ管理モジュール３２１によって、作成、又は、更新される。管理テーブル５０４は、サーバ１０１が、ユーザからの分析目的情報４０１と、基準情報２２１と、そして、対象データ３１１とに基づいてユーザに提案するデータ、に関する情報を格納する。 FIG. 5D is an example of a management table 504 that manages a combination of data presented or proposed to the user by the server 101. This table is created or updated by the data management module 321. The management table 504 stores information about the analysis purpose information 401 from the user, the reference information 221 and the data proposed to the user based on the target data 311 by the server 101.

提案実行管理モジュール３２６は、ユーザからの分析目的情報４０１を受け付けて、分析ステップを推奨し、そして、分析に利用されるデータを提案する際に、テーブル５０４を作成、又は、更新する。このテーブル５０４は、識別情報５４１、ＫＰＩ（５１３）、データ組合せ５４３、利用回数５４４、利用人数５４５、更新日時５４６を含む。 The proposal execution management module 326 receives the analysis purpose information 401 from the user, recommends the analysis step, and creates or updates the table 504 when proposing the data to be used for the analysis. This table 504 includes identification information 541, KPI (513), data combination 543, number of uses 544, number of users 545, update date and time 546.

識別情報５４１には、ユーザへのデータ提案を識別するための情報が格納される。ＫＰＩ（５１３）には、識別情報５４１によって特定されるデータ提案におけるＫＰＩデータが格納される。データ組合せ５４３には、識別情報５４１により特定されるユーザに提案されるデータの組合せが、分析ステップが全体像把握、又は、特異事象抽出である場合に格納され、分析ステップが要因分析・予測である場合、説明変数の組合せが格納される。 The identification information 541 stores information for identifying a data proposal to the user. The KPI (513) stores the KPI data in the data proposal specified by the identification information 541. In the data combination 543, the combination of data proposed to the user specified by the identification information 541 is stored when the analysis step is grasping the whole picture or extracting a peculiar event, and the analysis step is factor analysis / prediction. If so, the combination of explanatory variables is stored.

利用回数５４４は、識別情報５４１によって特定されるデータ提案におけるデータ組合せの利用回数に関する情報が格納され、ユーザによる利用の都度更新される。 The usage count 544 stores information regarding the usage count of the data combination in the data proposal specified by the identification information 541, and is updated each time the user uses the data combination.

利用人数５４５には、識別情報５４１によって特定されるデータ提案におけるデータ組合せの利用人数に関する情報が格納される。この利用人数は、延べ人数でもよいし、異なるユーザの人数でもよい。この情報は、識別情報５４１により特定されるデータ提案が、ユーザに利用される都度更新される。更新日時５４６には、候補リスト５０１のレコードが更新された日時が格納される。 The number of users 545 stores information regarding the number of users of the data combination in the data proposal specified by the identification information 541. The number of users may be the total number of users or the number of different users. This information is updated each time the data proposal specified by the identification information 541 is used by the user. The update date and time 546 stores the date and time when the record in the candidate list 501 was updated.

図６は、サーバ１０１がユーザに分析ステップを推奨するための動作の一例に係るフローチャートである。サーバ１０１は、ユーザの入力情報２３１と、基準情報２２１、対象データ３１１とに基づいて、ユーザの分析目的が適切か否かを判定し、判定結果を、ユーザに推奨する分析ステップとして出力する。 FIG. 6 is a flowchart relating to an example of the operation for the server 101 to recommend the analysis step to the user. The server 101 determines whether or not the user's analysis purpose is appropriate based on the user's input information 231 and the reference information 221 and the target data 311 and outputs the determination result as an analysis step recommended to the user.

サーバ１０１は、ユーザから分析目的情報４０１の登録２３１があると、フローチャートを開始する。分析ステップ提案モジュール３２４は、ステップ６０１において、ユーザからの分析目的情報４０１に含まれる、ＫＰＩと着目データ項目夫々の名称を、対象データ３１１のテーブルのカラム名と比較して、比較の結果を、一致度として算出する。同一の名称が“一致”となることは勿論であるが、類似の名称を“一致”に含めてもよい。“類似”とは、例えば、類義語をいう。“一致度”とは、一致の累計、一致の割合、そして、閾値との比較等でよい。モジュール３２４は、対象データが複数ある場合には、対象データごとに一致度を算出し、例えば、対象データごとの一致度を累計して、複数の対象データの一致度としてよい。 The server 101 starts the flowchart when the user has registered 231 of the analysis purpose information 401. In step 601 the analysis step proposal module 324 compares the names of the KPI and the data item of interest included in the analysis purpose information 401 from the user with the column names of the table of the target data 311 and compares the results of the comparison with the column names of the table of the target data 311. Calculated as the degree of agreement. It goes without saying that the same name is "match", but similar names may be included in "match". “Similar” means, for example, a synonym. The “matching degree” may be the cumulative total of matching, the percentage of matching, comparison with the threshold value, or the like. When there are a plurality of target data, the module 324 may calculate the degree of matching for each target data, and for example, accumulate the degree of matching for each target data to obtain the degree of matching for the plurality of target data.

次いで、分析ステップ提案モジュール３２４は、ステップ６０２において、ＫＰＩ、そして、着目データ項目夫々の名称と、データカタログ４０２のデータ項目名とを比較して既述の一致度を計算する。 Next, in step 602, the analysis step proposal module 324 compares the names of the KPIs and the data items of interest with the data item names of the data catalog 402, and calculates the degree of agreement described above.

次いで、分析ステップ提案モジュール３２４は、ステップ６０３において、ＫＰＩ、そして、着目データ項目夫々の名称と、データ定義３１５（４Ｗ辞書、データ推移パタン辞書）にあるデータ項目名とを比較し、一致度を求める。 Next, in step 603, the analysis step proposal module 324 compares the names of the KPIs and the data items of interest with the data item names in the data definition 315 (4W dictionary, data transition pattern dictionary), and determines the degree of agreement. Ask.

ステップ６０１において、一致度が高いということは、ユーザの分析目的に対する、対象データのデータ量が不足していないこと、及び／又は、対象データの質が高いことを示すものであり、そして、ステップ６０２、６０３において、一致度が高いということは、対象データを分別、分類、判別、又は、評価等するための情報が、ユーザの分析目的に適合していることを示す。一致度を、例えば、適合度、親和度、該当度等と言い換えてもよい。 In step 601 the high degree of agreement indicates that the amount of target data is not insufficient and / or the quality of the target data is high for the purpose of analysis of the user, and the step In 602 and 603, a high degree of agreement indicates that the information for sorting, classifying, discriminating, or evaluating the target data is suitable for the analysis purpose of the user. The degree of agreement may be rephrased as, for example, goodness of fit, affinity, applicability, and the like.

分析ステップ提案モジュール３２４は、ステップ６０４において、ユーザからの分析目的情報４０１にある分析ステップが、“要因分析／予測”であるか否かを判定する。分析ステップ提案モジュール３２４が、ステップ６０４を否定判定すると、ステップ６０５において、基準情報（データカタログ４０２、データ定義３１５）の充実度を算出する。充実度とは、基準情報の分析目的に対する有効性、有用性、又は、信頼性を表す指標、例えば、情報の豊富さの程度を示す指標であり、これは、例えば、データカタログ４０２、データ定義３１５への情報登録件数、そして、参照回数、等のアクセス頻度に基づいて決定されてよい。充実度を有効度等と言い換えてもよい。 In step 604, the analysis step proposal module 324 determines whether or not the analysis step in the analysis purpose information 401 from the user is “factor analysis / prediction”. When the analysis step proposal module 324 determines that step 604 is negative, in step 605, the degree of fulfillment of the reference information (data catalog 402, data definition 315) is calculated. The degree of fulfillment is an index showing the effectiveness, usefulness, or reliability of the reference information for the purpose of analysis, for example, an index showing the degree of abundance of information, which is, for example, data catalog 402, data definition. It may be determined based on the number of information registrations in 315 and the access frequency such as the number of references. The degree of fulfillment may be rephrased as the degree of effectiveness.

分析ステップ提案モジュール３２４がステップ６０４を肯定判定すると、ステップ６０６に移動し、分析ステップとしての“要因分析／予測”に於ける、説明変数候補となる加工データ３１２の充実度を算出する（ステップ６０６）。この充実度は、加工データのカラム名の数、有効なレコードの数の多さから算出されてよい。 When the analysis step proposal module 324 determines affirmatively in step 604, the process proceeds to step 606 and calculates the degree of fulfillment of the machining data 312 as an explanatory variable candidate in "factor analysis / prediction" as the analysis step (step 606). ). This degree of fulfillment may be calculated from the number of column names in the processing data and the number of valid records.

分析ステップ提案モジュール３２４は、ステップ６０７において、ステップ６０５〜６０６の結果に基づいて、図７Ａ等に示す、ユーザが登録した分析ステップ２３１の適否を判定するためのマトリックス３１３を参照して、ユーザに推奨すべき、分析ステップ、及び／又は、作業項目を、判定、判別等をすることによって決定する。 In step 607, the analysis step proposal module 324 refers to the matrix 313 for determining the suitability of the analysis step 231 registered by the user, which is shown in FIG. 7A or the like, based on the results of steps 605 to 606, to the user. The analysis steps and / or work items that should be recommended are determined by making judgments, discriminations, and the like.

分析ステップ提案モジュール３２４は、ステップ６０７の決定内容をステップ６０８において、ユーザに提示して、ユーザの確認、選択等を求める。以上によって、サーバ１０１は、フローチャートを終了する。なお、分析ステップ、そして、作業項目を纏めて、例えば、分析態様、又は、分析手法等と呼んでよい。 The analysis step proposal module 324 presents the determined content of step 607 to the user in step 608, and requests confirmation, selection, and the like of the user. As a result, the server 101 ends the flowchart. The analysis steps and work items may be collectively referred to as, for example, an analysis mode or an analysis method.

分析ステップ提案モジュール３２４は、ユーザに提案すべき分析ステップを判定するための基準であるマトリックスとして、ユーザが分析目的情報（分析ステップ）として、“全体像把握”を選択した場合には、マトリックス７０１（図７Ａ）を採用し、“特異事象抽出”を選択した場合には、マトリックス７０２（図７Ｂ）を採用し、“要因分析／予測”を選択した場合には、マトリックス７０３（図７Ｃ）を採用する。 The analysis step proposal module 324 is a matrix 701 when the user selects "overview grasp" as the analysis purpose information (analysis step) as a matrix which is a standard for determining the analysis step to be proposed to the user. (Fig. 7A) is adopted, and when "specific event extraction" is selected, matrix 702 (FIG. 7B) is adopted, and when "factor analysis / prediction" is selected, matrix 703 (FIG. 7C) is adopted. adopt.

判定マトリックス７０１（図７Ａ）は、ステップ６０１−６０３の分析目的情報との一致度と、ステップ６０５のデータカタログ、データ定義の充実度との相関を規定したものであり、分析ステップ提案モジュール３２４が、一致度、そして、充実度を夫々所定の閾値と比較して、その高低を決定してよい。 The determination matrix 701 (FIG. 7A) defines the correlation between the degree of agreement with the analysis purpose information in steps 601-603 and the degree of completeness of the data catalog and data definition in step 605. , The degree of agreement, and the degree of fulfillment may be compared with each predetermined threshold value to determine the level.

判定マトリックス７０１の“分析目的情報との一致度”は、ステップ６０１−６０３夫々の一致度を、例えば、加算したもの、平均したもの等でよい。ステップ６０１−６０３夫々の一致度のうち、所定のステップの一致度を優先させるようにしてもよい。判定マトリックス７０１の“データカタログ／データ定義の充実度”は、夫々の充実度、例えば、加算したもの、平均したもの等でよい。一方の充実度を優先させるようにしてもよい。 The “degree of agreement with the analysis purpose information” of the determination matrix 701 may be, for example, an addition or an average of the degree of agreement in each of steps 601-603. Of the matching degrees of each of the steps 601-603, the matching degree of a predetermined step may be prioritized. The “data catalog / data definition fulfillment” of the determination matrix 701 may be each fulfillment, for example, an addition, an average, or the like. One of the fulfillment levels may be prioritized.

一致度が“高い”、かつ、充実度が“高い”場合、そして、一致度が“高い”、かつ、充実度が“低い”場合、ユーザの分析目的に適する、対象データは十分量存在し得るから、分析ステップ提案モジュール３２４は、ユーザがサーバ１０１に登録した分析ステップとしての“全体像把握”を、そのまま推奨してユーザに提示する。 If the degree of agreement is "high" and the degree of fulfillment is "high", and if the degree of agreement is "high" and the degree of fulfillment is "low", there is a sufficient amount of target data suitable for the user's analysis purpose. Therefore, the analysis step proposal module 324 recommends the "overall picture grasp" as the analysis step registered in the server 101 by the user as it is and presents it to the user.

一方、一致度が“低い”、かつ、充実度が“高い”場合、対象データ量が不足している可能性があるから、“全体像把握”が直ちに実施されることは好ましくはないため、分析ステップ提案モジュール３２４は、“全体像把握”を維持しながらも、それを実現するのに必要な作業項目として、先ずは、“データ追加”を推奨する。 On the other hand, if the degree of agreement is "low" and the degree of fulfillment is "high", the amount of target data may be insufficient, and it is not preferable to immediately carry out "overall picture grasp". The analysis step proposal module 324 first recommends "additional data" as a work item necessary to realize the "overall picture grasp" while maintaining the "big picture grasp".

また、一致度が“低い”、かつ、充実度が“低い”場合、ユーザの分析目的に適するデータは存在していたとしても、そもそも、データカタログ４０２、そして、データ定義３１５の質が十分でない可能性があるため、分析ステップ提案モジュール３２４は、分析ステップとして“全体像把握”の実施は好ましくないと判定して、作業項目として、統計によるデータ理解促進、データカタログ４０２、データ定義３１５の拡充を推奨する。 Further, when the degree of agreement is "low" and the degree of fulfillment is "low", the quality of the data catalog 402 and the data definition 315 is not sufficient in the first place even if there is data suitable for the user's analysis purpose. Since there is a possibility, the analysis step proposal module 324 determines that it is not preferable to carry out "overall picture grasp" as an analysis step, and promotes data understanding by statistics, expands data catalog 402, and expands data definition 315 as work items. Is recommended.

判定マトリックス７０２（図７Ｂ）の“分析目的情報との一致度”、“高い”、“低い”の意義は、判定マトリックス７０１のものと同じである。“分析目的情報に一致のデータの充実度”は、ステップ６０１−６０３にてユーザからの分析目的情報４０１との一致度が高いと判定される、対象データ３１１におけるデータ、または、データカタログ４０２、データ定義３１５のデータ項目に該当する対象データ３１１（または、加工データ３１２）におけるデータの充実度を示す。 The meanings of "concordance with analysis purpose information", "high", and "low" in the determination matrix 702 (FIG. 7B) are the same as those in the determination matrix 701. “Fulfillment of data matching the analysis purpose information” is determined in steps 601-603 that the degree of matching with the analysis purpose information 401 from the user is high, the data in the target data 311 or the data catalog 402, The degree of data enrichment in the target data 311 (or processing data 312) corresponding to the data item of the data definition 315 is shown.

この充実度は、これらデータ（テーブル）の有効なレコードの数の多さから算出されてよい。一致度が、“高い”、かつ、充実度が“高い”場合、分析ステップ提案モジュール３２４は、ユーザの分析目的に適する、対象データ量は十分であり得るから、ユーザが登録した、分析ステップとしての“特異事象抽出”を、そのまま推奨する。 This degree of fulfillment may be calculated from the large number of valid records of these data (tables). When the degree of agreement is "high" and the degree of fulfillment is "high", the analysis step proposal module 324 is suitable for the analysis purpose of the user, and the target data amount may be sufficient. "Extraction of peculiar events" is recommended as it is.

一方、一致度が“高い”、かつ、充実度が“低い”場合、ユーザの分析目的に適するデータが十分に存在し得ても、特異事象としてのデータレコード数が相対的に不足している可能性があるため、分析ステップ提案モジュール３２４は、分析ステップとして“特異事象抽出”の実施は困難であると判定して、作業項目として、対象データを追加することを推奨する。 On the other hand, when the degree of agreement is "high" and the degree of fulfillment is "low", the number of data records as a peculiar event is relatively insufficient even if there may be sufficient data suitable for the user's analysis purpose. Since there is a possibility, the analysis step proposal module 324 determines that it is difficult to carry out "specific event extraction" as an analysis step, and recommends that the target data be added as a work item.

また、一致度が“低い”、かつ、データの充実度が“高い”場合、又は、一致度が“低い”、かつ、充実度が“低い”場合、分析ステップ提案モジュール３２４は、ユーザが望む分析目的に適する、対象データが不足している、もしくは、ユーザが、十分に、データを理解できていない可能性があると判断して、分析ステップとして“特異事象抽出”を実施することは困難であり、ユーザに、先ず、“全体像把握”に変更して、“全体像把握”に戻って、これからデータ分析を始めることを推奨する。この場合、分析ステップ提案モジュール３２４は、判定マトリックス７０１（図７Ａ）に基づいて、“全体像把握”としての適否を判定する。 Further, when the degree of agreement is "low" and the degree of data enrichment is "high", or when the degree of agreement is "low" and the degree of enrichment is "low", the analysis step proposal module 324 is desired by the user. It is difficult to perform "specific event extraction" as an analysis step, judging that the target data is insufficient or the user may not fully understand the data, which is suitable for the analysis purpose. Therefore, it is recommended that the user first change to "Overview grasp", return to "Overview grasp", and start data analysis from now on. In this case, the analysis step proposal module 324 determines the suitability as "overall image grasp" based on the determination matrix 701 (FIG. 7A).

判定マトリックス７０３（図７Ｃ）の“分析目的情報との一致度”について、“高い”、“低い”の意義は、判定マトリックス７０１のものと同じである。判定マトリックス７０３の“加工データの充実度”は加工データに対するものであって、“充実度”、そして、その“高い”、“低い”自体の意義は、既述のとおりである。 Regarding the "degree of agreement with the analysis purpose information" of the determination matrix 703 (FIG. 7C), the meanings of "high" and "low" are the same as those of the determination matrix 701. The "enrichment of processing data" in the determination matrix 703 is for the processing data, and the meanings of the "enrichment" and the "high" and "low" themselves are as described above.

一致度が“高い”、かつ、充実度が“高い”場合、ユーザの分析目的に適する対象データ量は十分であるから、分析ステップ提案モジュール３２４は、ユーザが指定した分析ステップとしての“要因分析／予測”をそのまま推奨する。 When the degree of agreement is “high” and the degree of fulfillment is “high”, the amount of target data suitable for the user's analysis purpose is sufficient. Therefore, the analysis step proposal module 324 uses “factor analysis” as the analysis step specified by the user. / Prediction is recommended as it is.

一致度が“高い”、かつ、充実度が“低い”場合、分析ステップ提案モジュール３２４は、ユーザの分析目的に適する、対象データが十分に存在し得ても、要因分析／予測を実施するには説明変数となる、加工データのレコード数が不足し得るから、分析ステップとしての“要因分析／予測”の実施は困難であると判定して、作業項目として加工データの拡充をユーザに推奨する。 When the degree of agreement is “high” and the degree of fulfillment is “low”, the analysis step proposal module 324 is suitable for the user's analysis purpose, and even if there may be sufficient target data, factor analysis / prediction can be performed. Judges that it is difficult to carry out "factor analysis / prediction" as an analysis step because the number of processing data records, which is an explanatory variable, may be insufficient, and recommends the user to expand the processing data as a work item. ..

一致度が“低い”、かつ、充実度が“高い”場合、分析ステップ提案モジュール３２４は、ユーザの分析目的に適する、対象データ自体が不足している可能性があるから、ユーザに、“全体像把握”に戻って検討することを推奨する。 If the degree of agreement is "low" and the degree of fulfillment is "high", the analysis step proposal module 324 may lack the target data itself, which is suitable for the user's analysis purpose. It is recommended to go back to "image grasping" and consider it.

一致度が“低い”、かつ、充実度が“低い”場合、分析ステップ提案モジュール３２４は、ユーザの分析目的に適する、対象データ自体が不足している可能性がある、もしくは、データ理解が十分ではない可能性があるから、また要因分析／予測を実施するには説明変数となるデータのレコード数が不足している可能性があるから、ユーザに、“全体像把握”に戻ること、そして、作業項目として加工データを拡充することを推奨する。 If the degree of agreement is "low" and the degree of fulfillment is "low", the analysis step proposal module 324 is suitable for the user's analysis purpose, the target data itself may be insufficient, or the data understanding is sufficient. Because it may not be, and because the number of records of the data that is the explanatory variable may be insufficient to perform factor analysis / prediction, the user is asked to return to the “big picture”, and , It is recommended to expand the machining data as a work item.

図８は、サーバ１０１がユーザに、分析ステップとしての“全体像把握”に利用可能なデータの組合せを、ユーザに提示するための動作の一例を示すフローチャートである。サーバ１０１は、判定マトリックス７０１に基づいて、分析ステップとして、“全体像把握”を決定すると、図８のフローチャートをスタートさせる。 FIG. 8 is a flowchart showing an example of an operation for the server 101 to present to the user a combination of data that can be used for “grasping the whole picture” as an analysis step. When the server 101 determines "overall image grasp" as an analysis step based on the determination matrix 701, the server 101 starts the flowchart of FIG.

分析用データ提案モジュール３２５は、ステップ８０１において、ユーザ２０１から受け付けた分析目的情報４０１（２３１）からＫＰＩと着目データ項目とを抽出する。次いで、分析用データ提案モジュール３２５は、ステップ８０２において、対象データ３１１のテーブルから前記ＫＰＩに該当するカラム、即ち、ＫＰＩの名称と同一又は類似名のカラムを抽出する。図１２において、対象データ３１１は２つのデータ（テーブル）３１１Ａ，３１１Ｂを備え、テーブル３１１ＡのカラムＢがＫＰＩと同一名のカラムである。 In step 801 the analysis data proposal module 325 extracts the KPI and the data item of interest from the analysis purpose information 401 (231) received from the user 201. Next, in step 802, the analysis data proposal module 325 extracts a column corresponding to the KPI, that is, a column having the same or similar name as the KPI name from the table of the target data 311. In FIG. 12, the target data 311 includes two data (tables) 311A and 311B, and the column B of the table 311A is a column having the same name as the KPI.

分析用データ提案モジュール３２５は、ステップ８０３において、対象データ３１１から着目データ項目に該当するカラムを抽出する。分析用データ提案モジュール３２５は、ステップ８０４において、対象データ３１１から、４Ｗ辞書２２１の４Ｗ(When、Where、What、Who)夫々のキーワードに該当する（キーワードに同一、又は、類似する）、対象データ３１１のカラムを抽出する。なお、分析用データ提案モジュール３２５は、ステップ８０２、及び／又は、ステップ８０３において、抽出されたカラムが、ステップ８０４で再度抽出された場合には、これを除く。 In step 803, the analysis data proposal module 325 extracts a column corresponding to the data item of interest from the target data 311. In step 804, the analysis data proposal module 325 corresponds to each of the 4W (When, Where, What, Who) keywords of the 4W dictionary 221 from the target data 311 (same or similar to the keyword), and the target data. Extract 311 columns. The analysis data proposal module 325 excludes the columns extracted in step 802 and / or step 803 when they are extracted again in step 804.

分析用データ提案モジュール３２５は、ステップ８０５において、対象データ３１１における各カラムのレコードを参照して、データ推移パタン定義４０４にある４Ｗ辞書２２１の４Ｗの夫々のキーワードのデータ値の推移パタンに該当するカラムを対象データ３１１から抽出する。なお、分析用データ提案モジュール３２５は、ステップ８０２、及び／又は、ステップ８０３において、抽出されたカラムが、ステップ８０５で再度抽出された場合には、これを除く。 In step 805, the data proposal module 325 for analysis refers to the record of each column in the target data 311 and corresponds to the transition pattern of the data value of each of the 4W keywords of the 4W dictionary 221 in the data transition pattern definition 404. The column is extracted from the target data 311. The analysis data proposal module 325 excludes the columns extracted in step 802 and / or step 803 when they are extracted again in step 805.

分析用データ提案モジュール３２５は、ステップ８０６において、ステップ８０３、ステップ８０４、又は、ステップ８０５で抽出されたカラムの優先度を設定する。分析用データ提案モジュール３２５は、例えば、前記３つのステップにて抽出されたカラムの優先度を２つのステップもしくは１つのステップのみで抽出されたカラムより優先度を高くする。 In step 806, the analysis data proposal module 325 sets the priority of the columns extracted in step 803, step 804, or step 805. The analysis data proposal module 325 sets the priority of the column extracted in the above three steps higher than that of the column extracted in only two steps or one step, for example.

分析用データ提案モジュール３２５は、ステップ８０７において、ステップ８０４−８０５のカラム（４Ｗ候補カラム）のリストを、ステップ８０６に係る優先度が分かるように、ユーザに提示し、ステップ８０８において、ユーザからの要求があれば、４Ｗ候補カラムの絞込みを実施して、一部のカラムをデータ分析対象から除外することができる。 In step 807, the data proposal module 325 for analysis presents a list of columns (4W candidate columns) of steps 804-805 to the user so that the priority related to step 806 can be known, and in step 808, the user presents the list. If requested, 4W candidate columns can be narrowed down and some columns can be excluded from the data analysis target.

分析用データ提案モジュール３２５は、ステップ８０９において、ステップ８０２で抽出した、ＫＰＩに該当するカラム（ＫＰＩカラム）と、ステップ８０３で抽出した着目データ項目に該当するカラム（着目データ項目カラム）、ステップ８０４−８０８に係るカラム（４Ｗ候補カラム）と、からなる組合せを作成する。以後、この組合せを、“合成カラム組合せ”という。 In step 809, the analysis data proposal module 325 includes a column corresponding to KPI (KPI column) extracted in step 802, a column corresponding to the data item of interest extracted in step 803 (data item column of interest), and step 804. Create a combination consisting of a column (4W candidate column) according to −808. Hereinafter, this combination will be referred to as a "synthetic column combination".

ステップ８０３で抽出した着目データ項目に該当するカラムと、ステップ８０４−８０８に係る４Ｗ候補カラムと、を纏めて、ＫＰＩカラムに対して軸候補となる“軸候補カラム”と称することとする。 The column corresponding to the data item of interest extracted in step 803 and the 4W candidate column according to step 804-808 are collectively referred to as an "axis candidate column" which is an axis candidate for the KPI column.

図１２において、テーブル３１１ＡのカラムＣが軸候補カラム１であり、テーブル３１１ＡのカラムＤが軸候補カラム２であり、テーブル３１１ＢのカラムＧが軸候補カラム２であり、テーブル３１１ＡのカラムＨが軸候補カラム３である（カラムＤとカラムＧの名称は同一または類似）。テーブル３１１Ｃが合成カラム組合せである。 In FIG. 12, column C of table 311A is axis candidate column 1, column D of table 311A is axis candidate column 2, column G of table 311B is axis candidate column 2, and column H of table 311A is axis. Candidate column 3 (the names of column D and column G are the same or similar). Table 311C is a composite column combination.

分析用データ提案モジュール３２５は、同一又は類似の名称のＫＰＩカラムが複数のデータテーブルで重複する場合、そして、同一又は類似の名称の軸候補カラムが複数のデータテーブルで重複する場合、重複するカラムを交換するようにして合成カラム組合せを複数作成する。図１２の例に即して説明すると、分析用データ提案モジュール３２５は、合成カラム組合せとしてのテーブル３１１Ｃを、軸候補カラム２がテーブル３１１ＡのカラムＣのものと、軸候補カラム２がテーブル３１１ＢのカラムＧのものの二つ、作成する。即ち、分析用データ提案モジュール３２５は、合成カラム組合せ１（カラムＢ，Ｃ，Ｄ，Ｈ）と合成カラム組合せ２（カラムＢ，Ｃ，Ｇ，Ｈ）の２つを定義する。同一名のＫＰＩカラムの数を“ｋ”とし、同一名の軸候補１の数を“ｍ１”、同一名の軸候補２の数を“ｍ２”・・・同一名の軸候補ｎの数を“ｍｎ”とすると、合成カラム組合せの数は“ｋ*ｍ１*ｍ２*・・・*ｍｎ”になる。もしくは同一または類似の名称の軸候補カラム同士をキーとしてテーブル３１１Ａ、３１１Ｂを結合することにより合成カラム組合せを作成する。 The data proposal module 325 for analysis is a duplicate column when KPI columns with the same or similar names are duplicated in a plurality of data tables, and when axis candidate columns with the same or similar names are duplicated in a plurality of data tables. Create multiple composite column combinations by exchanging. Explaining according to the example of FIG. 12, in the analysis data proposal module 325, the table 311C as a composite column combination, the axis candidate column 2 is the column C of the table 311A, and the axis candidate column 2 is the table 311B. Create two of the columns G. That is, the analysis data proposal module 325 defines two combinations, a synthetic column combination 1 (columns B, C, D, H) and a synthetic column combination 2 (columns B, C, G, H). The number of KPI columns with the same name is "k", the number of axis candidates 1 with the same name is "m1", the number of axis candidates 2 with the same name is "m2", and the number of axis candidates n with the same name is "m2". If "mn" is set, the number of synthetic column combinations is "k * m1 * m2 * ... * mn". Alternatively, a composite column combination is created by combining tables 311A and 311B using axis candidate columns having the same or similar names as keys.

分析用データ提案モジュール３２５は、ステップ８１０において、複数の合成カラム組合せから、有効なレコード数が零であるカラム組合せを除外する。有効なレコード数が零であるカラム組合せとは、カラム組合せの全てのレコードにデータが記録されていないものをいう。 In step 810, the data proposal module 325 for analysis excludes the column combination in which the number of valid records is zero from the plurality of composite column combinations. A column combination in which the number of valid records is zero means that no data is recorded in all the records of the column combination.

次いで、分析用データ提案モジュール３２５は、ステップ８１１において、合成カラム組合せの夫々について、夫々の組合せのレコードの特性、属性、例えば、レコードに含まれる複数のデータ値の変動パタンを判定する。分析用データ提案モジュール３２５は、この判定結果を、分析用データの候補の作成に反映させる。 Next, in step 811 the analytical data proposal module 325 determines the characteristics and attributes of the records of each combination, for example, the variation pattern of a plurality of data values contained in the records, for each of the synthetic column combinations. The analysis data proposal module 325 reflects this determination result in the creation of analysis data candidates.

データ値の変動パタンは、例えば、“連続値”、“連続繰返し値”、又は、“離散値”である。“連続値”は、数値データで、全てのレコードでほとんど値が異なるパタンである。“連続繰返し値”は、数値データで、全てのレコードについて、データ値が一定範囲内で増減することを繰り返すパタンである。“離散値”は、文字列データ、または、数値データで、ユニークな値の数がレコード数の一定割合以下であるパタンである。 The fluctuation pattern of the data value is, for example, a "continuous value", a "continuous repetition value", or a "discrete value". "Continuous value" is numerical data, and is a pattern in which the values are almost different in all records. The "continuous repetition value" is numerical data, and is a pattern in which the data value is repeatedly increased or decreased within a certain range for all records. The "discrete value" is a pattern in which the number of unique values is a certain percentage or less of the number of records in character string data or numerical data.

分析用データ提案モジュール３２５は、ステップ８１２において、合成カラム組合せ毎にデータ値変動パタンを判別し、パタンが“連続値”であることを判別すると、ステップ８１３に移行して、合成カラム組合せの夫々について、全ての合成カラム組合せを結合して、全レコードの数を算出する。 In step 812, the data proposal module 325 for analysis discriminates the data value fluctuation pattern for each composite column combination, and when it is determined that the pattern is a “continuous value”, the process proceeds to step 813, and each of the composite column combinations For, all synthetic column combinations are combined to calculate the total number of records.

分析用データ提案モジュール３２５は、ステップ８１２において、データ値変動パタンを判別し、パタンが“連続繰返し値”であることを判別すると、ステップ８１４に移行して、合成カラム組合せの夫々について、データの繰返しの範囲毎にレコードを分類し、さらに、ステップ８１５に移行して、繰り返し範囲毎に、合成カラム組合せの夫々を結合して、レコード数を算出する。 When the data value proposal module 325 for analysis determines the data value fluctuation pattern in step 812 and determines that the pattern is a “continuous repetition value”, the process proceeds to step 814 to obtain data for each of the composite column combinations. The records are classified according to the repetition range, and further, the process proceeds to step 815, and the combined column combinations are combined for each repetition range to calculate the number of records.

分析用データ提案モジュール３２５は、ステップ８１２において、データ値変動パタンを判別し、パタンが“離散値”であることを判別すると、ステップ８１６に移行して、ステップ８１１において選択した軸候補カラムと４Ｗの同分類にて“離散値”であることが判別された他の軸候補カラムを１つ以上選出し、ステップ８１７において、ＫＰＩカラムとステップ８１６で選出した複数の軸候補カラムを結合し、文字列データと数値データとの各分類でのレコード数を算出する。 When the data value proposal module 325 for analysis determines the data value fluctuation pattern in step 812 and determines that the pattern is a “discrete value”, the process proceeds to step 816, and the axis candidate column and 4W selected in step 811 One or more other axis candidate columns determined to be "discrete values" in the same classification are selected, and in step 817, the KPI column and the plurality of axis candidate columns selected in step 816 are combined to form a character. Calculate the number of records in each classification of column data and numerical data.

分析用データ提案モジュール３２５は、ステップ８１８において、ステップ８１１−８１７の結果に基づいて、レコード数をデータ値変動パタン毎に集計し、ステップ８１９において、合成カラム組合せの全てに処理が完了しているか否かを判定し、これを否定すると、ステップ８１１にリターンし、これを肯定判定すると、ステップ８２０に移行する。 In step 818, the data proposal module 325 for analysis aggregates the number of records for each data value fluctuation pattern based on the result of steps 811-817, and in step 819, is the processing completed for all the composite column combinations? If it is determined whether or not it is, and if it is denied, it returns to step 811. If it is determined affirmatively, it proceeds to step 820.

分析用データ提案モジュール３２５は、ステップ８２０において、データ値変動パタン毎に、全組合せの夫々の有用度を、例えば、有効状態のレコード数（データ量）をキーにして評価し、全組合せを評価値の降順にソートして候補リスト（図５Ａの５０１）を作成してユーザに提示する。 In step 820, the data proposal module 325 for analysis evaluates the usefulness of each combination for each data value fluctuation pattern, for example, using the number of records (data amount) in the valid state as a key, and evaluates all combinations. A candidate list (501 in FIG. 5A) is created by sorting in descending order of values and presented to the user.

分析用データ提案モジュール３２５は、候補リスト５０１の順位５１１に、ステップ８２０で決定された順位を格納し、ＫＰＩ、そして、着目データ項目を、ＫＰＩ５１３、着目データ１−４（５１４−５１７）に、有効レコード数をレコード数５１８に、データ組合せのファイル出力先を出力ファイル５１９にそれぞれ格納される。 The data proposal module 325 for analysis stores the rank determined in step 820 in the rank 511 of the candidate list 501, and puts the KPI and the data item of interest in KPI 513 and the data of interest 1-4 (514-517). The number of valid records is stored in the number of records 518, and the file output destination of the data combination is stored in the output file 519.

分析用データ提案モジュール３２５は、ステップ８２１において、候補リスト５０１に含まれる、カラム組合せ夫々について、傾向分析を実施し、その結果をグラフとして作成し、ユーザに提示する。ユーザは、グラフを参照して、所定の組合せを指定、選択することができる。ユーザは、候補リストの中から所望の組合せを選択してもよい。この場合、選択された組合せについてのみ、グラフが作成されてもよい。 In step 821, the analysis data proposal module 325 performs trend analysis for each column combination included in the candidate list 501, creates a graph of the results, and presents the results to the user. The user can specify and select a predetermined combination by referring to the graph. The user may select a desired combination from the candidate list. In this case, the graph may be created only for the selected combination.

分析用データ提案モジュール３２５は、ステップ８２２において、ユーザによって指定された組合せに係るカラムのレコードをＣＳＶファイル等として出力する。ＣＳＶファイルは、ユーザが改めて対象データの傾向分析を実施する際に使用できる。 In step 822, the analysis data proposal module 325 outputs the record of the column related to the combination specified by the user as a CSV file or the like. The CSV file can be used when the user performs a trend analysis of the target data again.

分析用データ提案モジュール３２５は、ステップ８２３において、ステップ８２０で作成された候補リスト５０１、ステップ８２２で指定された組合せに基づいて、データカタログ４０２とデータ定義２２１とを更新、又は、追加する。したがって、データ分析が進むにしたがって、データカタログとデータ定義とが改良されていく。 In step 823, the data proposal module 325 for analysis updates or adds the data catalog 402 and the data definition 221 based on the combination of the candidate list 501 created in step 820 and the combination specified in step 822. Therefore, as the data analysis progresses, the data catalog and data definition will be improved.

図９は、サーバ１０１が、分析ステップとしての“特異事象抽出”に利用可能なデータの組合せを、ユーザに提示するための動作の一例を示すフローチャートである。サーバ１０１は、判定マトリックス７０２に基づいて、分析ステップとして、“特異事象抽出”を決定すると、図９のフローチャートをスタートさせる。 FIG. 9 is a flowchart showing an example of an operation for the server 101 to present to the user a combination of data that can be used for “specific event extraction” as an analysis step. When the server 101 determines "specific event extraction" as an analysis step based on the determination matrix 702, the server 101 starts the flowchart of FIG.

図９のフローチャートにおいて、図８のフローチャートのステップ８０１―８１９の工程は同じである。分析用データ提案モジュール３２５は、ステップ９０３において、全カラム組合せの夫々について順位付けを行い、順位が閾値以上であるの組合せを抽出する。順位付けは、複数の組合せに夫々おける有効なレコードの多さに基づいたものでよい。 In the flowchart of FIG. 9, the steps of steps 801-819 of the flowchart of FIG. 8 are the same. In step 903, the analytical data proposal module 325 ranks each of all column combinations and extracts combinations whose rank is equal to or higher than the threshold value. The ranking may be based on the number of valid records in each of the plurality of combinations.

分析用データ提案モジュール３２５は、ステップ９０４において、ステップ９０３において抽出した、複数の組合せの一つの組合せのデータについて特異点の有無を判定し、ステップ９０５で特異点の判定を肯定すると、ステップ９０６において、特異点の数を算出する。特異点とは例えば、外れ値、大きな変化点、大きな偏り、等でよい。分析用データ提案モジュール３２５は、前記特異点に関する情報を特異事象リスト５０２に追加する。 In step 904, the data proposal module 325 for analysis determines the presence or absence of a singular point in the data of one combination of the plurality of combinations extracted in step 903, and if the determination of the singular point is affirmed in step 905, the determination of the singular point is affirmed in step 906. , Calculate the number of singular points. The singularity may be, for example, an outlier, a large change point, a large bias, or the like. The analytical data proposal module 325 adds information about the singularity to the singular event list 502.

分析用データ提案モジュール３２５は、ステップ９０５を否定すると、ステップ９０６を経ることなく、ステップ９０７に移行し、既述の組合せの全てについて、ステップ９０４、９０５、９０６を適用したか否かを判定し、否定判定するとステップ９０４に戻り、肯定判定するとステップ９０８に移行する。 If the analysis data proposal module 325 denies step 905, it proceeds to step 907 without going through step 906, and determines whether or not steps 904, 905, and 906 have been applied to all the combinations described above. If a negative determination is made, the process returns to step 904, and if an affirmative determination is made, the process proceeds to step 908.

分析用データ提案モジュール３２５は、ステップ９０８において、全ての組合せ夫々の有用度を、例えば、特異点の数に基づいて、昇順にソートする等して順位付けし、特異事象リスト５０２を作成して、これをユーザに提示する。特異点の数は、例えば、外れ値の数、変化点の数、そして、偏りの数を合計したもの、或いは、平均であってもよい。 In step 908, the data proposal module 325 for analysis ranks the usefulness of each combination by, for example, sorting them in ascending order based on the number of singular points, and creates a singular event list 502. , Present this to the user. The number of singular points may be, for example, the sum of the number of outliers, the number of change points, and the number of biases, or an average.

分析用データ提案モジュール３２５は、特異事象リスト５０２の順位５１１に、組合せ毎の順位を格納し、ＫＰＩをＫＰＩ５１３に格納し、着目データ項目を着目データ１−４（５１４−５１７）に格納し、組合せ毎の有効レコード数をレコード数５１８に格納し、組合せ毎の特異点（外れ値、大きな変化点、大きな偏り）の数を特異点数５２０−５２２に格納し、組合せ毎のデータファイルの出力先を出力ファイル５１９に格納する。 The analysis data proposal module 325 stores the rank for each combination in the rank 511 of the singular event list 502, stores the KPI in the KPI 513, stores the data items of interest in the data of interest 1-4 (514-517), and stores the ranks of interest. The number of valid records for each combination is stored in the number of records 518, the number of singular points (outliers, large change points, large bias) for each combination is stored in the number of singular points 520-522, and the output destination of the data file for each combination is stored. Is stored in the output file 519.

分析用データ提案モジュール３２５は、ステップ９０９において、全ての組合せ毎に特異事象算出グラフを作成し、ユーザに提示する。後は、図８のフローチャートのステップ８２２，８２３と同じである。 In step 909, the analysis data proposal module 325 creates a peculiar event calculation graph for each combination and presents it to the user. The rest is the same as steps 822 and 823 of the flowchart of FIG.

図１０は、サーバ１０１が、分析ステップとしての要因分析／予測に利用可能なデータの組合せを、図８，９のフローチャートと同様にリストとして、ユーザに提示するための動作の一例を示すフローチャートである。サーバ１０１は、判定マトリックス７０３に基づいて、分析ステップとして、“要因分析／予測”を決定すると、図１０のフローチャートをスタートさせる。 FIG. 10 is a flowchart showing an example of an operation for the server 101 to present to the user a combination of data that can be used for factor analysis / prediction as an analysis step as a list as in the flowcharts of FIGS. 8 and 9. is there. When the server 101 determines "factor analysis / prediction" as an analysis step based on the determination matrix 703, the server 101 starts the flowchart of FIG.

分析用データ提案モジュール３２５は、図１０のフローチャートを開始すると、ステップ１００１において、ユーザより受け付けた分析目的情報４０１に基づいて、ＫＰＩ（ＫＰＩ名称、ＫＰＩ算出式）を抽出する。 When the analysis data proposal module 325 starts the flowchart of FIG. 10, the analysis data proposal module 325 extracts KPIs (KPI names, KPI calculation formulas) based on the analysis purpose information 401 received from the user in step 1001.

分析用データ提案モジュール３２５は、ステップ１００２において、ＫＰＩの名称に該当する、対象データ３１１、及び／又は、加工データ３１２のカラム名を抽出し、これを目的変数カラムとする。分析用データ提案モジュール３２５は、ＫＰＩ算出式に基づいて、目的変数カラムを作成してもよい。図１３において、対象データ３１１ＣのカラムＢが、ＫＰＩに該当した目的変数カラムである。 In step 1002, the analysis data proposal module 325 extracts the column names of the target data 311 and / or the processing data 312 corresponding to the KPI name, and uses this as the objective variable column. The data proposal module 325 for analysis may create an objective variable column based on the KPI calculation formula. In FIG. 13, column B of the target data 311C is an objective variable column corresponding to KPI.

分析用データ提案モジュール３２５は、ステップ１００３において、対象データ３１１のテーブル、または、加工データ３１２のテーブルから説明変数の候補となるカラムの所定数を選択する。説明変数の候補となるカラムは、対象データ３１１のテーブル、または、加工データ３１２のテーブルのカラムのうち、ステップ１００２で選出された目的変数カラムを除いたカラムである。 In step 1003, the analysis data proposal module 325 selects a predetermined number of columns as candidates for explanatory variables from the table of target data 311 or the table of machining data 312. The columns that are candidates for the explanatory variables are the columns of the target data 311 table or the processing data 312 table, excluding the objective variable column selected in step 1002.

分析用データ提案モジュール３２５は、目的変数カラムに対する、説明変数カラムの一つ又は複数の組合せを全て作成する。図１３は、対象データ３１１（テーブル３１１Ａ）のカラムＢ以外の全てのカラム、加工データ３１２（テーブル３１１Ｄ）の全てのカラムが説明変数カラムになり得ることを示している。例えば、選択する説明変数カラムの数が２つであるとすると、分析用データ提案モジュール３２５は、対象データ３１１ＡのカラムＢ以外の全てのカラムと、加工データ３１１Ｄの全てのカラムから２つのカラムの全ての組合せを複数作成し、夫々の組合せと目的変数カラムとを合わせて、目的変数カラムと説明変数カラムとの合成カラムからなるテーブルを複数構成する。図１３の３１１Ｅ-１〜３１１Ｅ−４・・・・・の夫々が合成カラムからなるテーブルである。 The analytical data proposal module 325 creates all one or more combinations of explanatory variable columns for the objective variable column. FIG. 13 shows that all columns of the target data 311 (table 311A) other than column B and all columns of the processing data 312 (table 311D) can be explanatory variable columns. For example, assuming that the number of explanatory variable columns to be selected is two, the analysis data proposal module 325 has two columns from all columns other than column B of the target data 311A and all columns of the processing data 311D. Create a plurality of all combinations, and combine each combination with the objective variable column to form a plurality of tables consisting of a composite column of the objective variable column and the explanatory variable column. Each of 311E-1 to 311E-4 ... In FIG. 13 is a table composed of synthetic columns.

分析用データ提案モジュール３２５は、ステップ１００４において、対象データのテーブル、そして、加工データのテーブルの全てのカラム間の相関係数を算出する。分析用データ提案モジュール３２５は、ステップ１００５において、ステップ１００２で抽出した目的変数カラムと、ステップ１００３で選出した数の説明変数の候補カラムとの全ての組合せに対して、回帰分析、または、重回帰分析等の学習を実施する。 In step 1004, the analysis data proposal module 325 calculates the correlation coefficient between the target data table and all the columns of the machining data table. In step 1005, the data proposal module 325 for analysis performs regression analysis or multiple regression for all combinations of the objective variable column extracted in step 1002 and the candidate columns of the number of explanatory variables selected in step 1003. Conduct learning such as analysis.

分析用データ提案モジュール３２５は、ステップ１００６において、目的変数カラムと説明変数カラムとの全ての組合せの夫々について有用度を演算し、演算結果に基づいて、全ての組合せ夫々を順位付けしたリストを作成する。ユーザは所望の組合せを選択できる。 In step 1006, the data proposal module 325 for analysis calculates the usefulness of each combination of the objective variable column and the explanatory variable column, and creates a list in which all combinations are ranked based on the calculation result. To do. The user can select the desired combination.

複数の組合せのうちの夫々の組合せの有用度は、例えば、（１）目的変数と説明変数の相関係数の絶対値（ステップ１００４)、（２）ステップ１００５での、学習結果の良さ(決定変数、正解率)（３）目的変数のカラムと説明変数のカラムとに於ける、有効な値を含むレコードの全数、そして、（４）説明変数間の相関係数の絶対値（ステップ１００４)の少なくとも一つの優劣であってよく、分析用データ提案モジュール３２５は、有用度の昇順、又は、降順によって、組合せを順位付ける。 The usefulness of each combination among the plurality of combinations is, for example, (1) the absolute value of the correlation coefficient between the objective variable and the explanatory variable (step 1004), and (2) the goodness (determination) of the learning result in step 1005. Variables, accuracy rate) (3) The total number of records including valid values in the objective variable column and the explanatory variable column, and (4) the absolute value of the correlation coefficient between the explanatory variables (step 1004). The analysis data proposal module 325 ranks the combinations in ascending or descending order of usefulness.

分析用データ提案モジュール３２５は、有用度（１）−有用度（４）夫々の順位を総合した順位を決定して、これを目的変数・説明変数組合せリスト５０３の順位５１１に、ＫＰＩを目的変数（ＫＰＩ）５３０に、一つ以上の説明変数の名称を説明変数５３１に、全ての組合せの有効レコード数をレコード数５１８に、有用度（１）-（４）を有用度１−４（５３３―５３６）に、組合せ夫々のレコードのファイルの出力先を出力ファイル５１９に、それぞれ格納される。 The data proposal module 325 for analysis determines the rank that is the sum of the ranks of usefulness (1) -usefulness (4), and sets this as the rank 511 of the objective variable / explanatory variable combination list 503 and KPI as the objective variable. In (KPI) 530, the name of one or more explanatory variables is set to explanatory variable 531, the number of valid records of all combinations is set to 518, and the usefulness (1)-(4) is set to usefulness 1-4 (533). -536), the output destination of the file of each combination record is stored in the output file 519, respectively.

分析用データ提案モジュール３２５は、ステップ１００７において、ステップ１００６で作成したリストに含まれる目的変数・説明変数の組合せ毎に学習結果グラフを作成し、ユーザに提示して、ユーザが所望の組合せを選択できるようにする。ここでリストの上位から指定された数の目的変数・説明変数の組合せのみのグラフを作成することも可能である。 In step 1007, the analysis data proposal module 325 creates a learning result graph for each combination of the objective variable and the explanatory variable included in the list created in step 1006, presents it to the user, and selects the desired combination by the user. It can be so. Here, it is also possible to create a graph containing only the combinations of the objective variable and the explanatory variable specified from the top of the list.

そして、分析用データ提案モジュール３２５は、ステップ８２２において、ステップ１００７の結果を参照したユーザにより指定された説明変数カラム及び目的変数カラムに関するデータレコードをＣＳＶファイル等に出力する。このファイルは、ユーザが自ら要因分析／予測を実施する際に使用される。 Then, in step 822, the analysis data proposal module 325 outputs a data record relating to the explanatory variable column and the objective variable column specified by the user who referred to the result of step 1007 to a CSV file or the like. This file is used by the user to perform factor analysis / prediction by himself / herself.

分析用データ提案モジュール３２５は、ステップ１００６でのリスト５０３の作成結果、このリストからのユーザによる選択の結果を基に、データカタログ、データ定義３１５の変更、情報の追記を行う。 The data proposal module 325 for analysis changes the data catalog and the data definition 315 and adds information based on the result of creating the list 503 in step 1006 and the result of selection by the user from this list.

図１１Ａ−図１１Ｃは、サーバ１０１から、ユーザ端末１０２、１０３に対して提供された、データ利活用を支援するためのグラフィックインターフェースの例である。提案実行管理モジュール３２６は、分析用データ提案モジュール３２５と、分析手法提案モジュール３２４に基づいて、分析目的をリクエストしたユーザ端末に、グラフィックインターフェース用画面を表示させる。 11A-11C are examples of graphic interfaces provided by the server 101 to the user terminals 102 and 103 to support data utilization. The proposal execution management module 326 causes the user terminal requesting the analysis purpose to display the graphic interface screen based on the analysis data proposal module 325 and the analysis method proposal module 324.

図１１Ａの画面１１０１において、ユーザが指定した分析ステップとしての“全体像把握”と、これに対する、サーバ１０１が“データ追加”を推奨することが第１の領域１１１１に表示されている。 On the screen 1101 of FIG. 11A, it is displayed in the first area 1111 that the "overall image grasp" as the analysis step specified by the user and the server 101 recommending "addition of data" for the analysis step.

さらに、分析目的情報４０１と対象データとの一致度と、データカタログ、データ定義情報の充実度とからなる第２の領域１１１２と、データ追加のための不足データと、データカタログの格納先、そして、データ定義の格納先とからなる第３の領域１１１３とが補足情報として表示されている。 Further, a second area 1112 consisting of a degree of coincidence between the analysis purpose information 401 and the target data, a data catalog, and a degree of enrichment of data definition information, insufficient data for adding data, a storage destination of the data catalog, and , The third area 1113, which is the storage destination of the data definition, is displayed as supplementary information.

図１１Ｂの画面１１０２は、推奨分析ステップが全体像把握である、前記候補リスト５０１の一例である。複数のカラムを組合せたデータについて、その複数（ステップ８２０）が有用度順に羅列されている（１１２１）。領域１１２２は、データ組合せ毎で、傾向分析の結果得られたグラフの例である（ステップ８２１）。 Screen 1102 of FIG. 11B is an example of the candidate list 501 in which the recommended analysis step is grasping the whole picture. For the data in which a plurality of columns are combined, the plurality of data (step 820) are listed in order of usefulness (1121). Region 1122 is an example of a graph obtained as a result of trend analysis for each data combination (step 821).

図１１Ｃの画面１１０３、推奨分析ステップが要因分析／予測である、前記目的変数・説明変数組合せリスト５０３の一例である。複数のカラムを組合せたデータについて、その複数（ステップ１００６）が有用度順に羅列されている（１１３１）。領域１１３２は、データ組合せ毎で、学習の結果を示すグラフである。 Screen 1103 of FIG. 11C is an example of the objective variable / explanatory variable combination list 503 in which the recommended analysis step is factor analysis / prediction. For data in which a plurality of columns are combined, a plurality of the data (step 1006) are listed in order of usefulness (1131). The area 1132 is a graph showing the learning result for each data combination.

既述の実施形態によれば、サーバ１０１は、ユーザ２０１が複数の業務システムの多種多様、大量のデータを活用して課題分析等を行う際、ユーザが分析目的をサーバに登録すれば、ユーザの負荷を軽減しながら、分析に有用なデータのリストをユーザに提示できる。ユーザは、データ利活用を行う前のデータの準備作業を迅速かつ容易に実現できるようになる。 According to the above-described embodiment, when the user 201 performs problem analysis or the like by utilizing a wide variety of a plurality of business systems and a large amount of data, if the user registers the analysis purpose in the server, the user A list of useful data for analysis can be presented to the user while reducing the burden on the user. The user will be able to quickly and easily realize the data preparation work before data utilization.

サーバ１０１は、ユーザに推奨した分析ステップを実行しようとしている当該ユーザに対して、分析目的に関連するデータを提示しようとする際、分析目的（ＫＰＩ、着目データ項目）に関連するカラムと、分析目的と適合性があり、かつ、充実度も高い基準情報に関連するカラムと、を組合せ、両カラムのレコードをユーザに提示するため、ユーザは、分析目的をサーバに登録しさえすれば、分析目的に関連するデータの組合せを広範に取得することができる。 When the server 101 tries to present data related to the analysis purpose to the user who is trying to execute the analysis step recommended to the user, the server 101 analyzes the column related to the analysis purpose (KPI, data item of interest) and the analysis. In order to combine columns related to standard information that are compatible with the purpose and have a high degree of fulfillment and present the records of both columns to the user, the user only needs to register the analysis purpose in the server for analysis. A wide range of data combinations related to the purpose can be obtained.

１０１サーバ
１０２、１０３ユーザ端末
１０４-１０６業務システム 101 Server 102, 103 User terminal 104-106 Business system

Claims

It is a system that supports data preparation for data utilization.
Processing equipment and
Equipped with a storage device,
The processing device executes a program recorded in the storage device.
Business data is collected from each of a plurality of business systems, and the business data is stored in the storage device as target data at least temporarily.
Receive the analysis purpose for the target data from the user terminal and
Based on the analytical objectives and the target data, the recommended analytical steps for the analytical objectives are determined.
A combination of data available for the recommended analysis step is extracted from the target data and
Each of the plurality of combinations of the extracted data was evaluated and
Based on the result of the evaluation, a list in which multiple combinations of the above data are ranked is created.
The list is output to the user terminal, and the list is output.
Allowed the user to select a predetermined data combination from the list.
The system.

The storage device includes reference information for the target data.
The processing device is
Determining the recommended analysis steps based on the reference information and
Determining the recommended analysis step is performed by determining one of the multiple types of analysis implementation stages for the target data at the analysis implementation stage.
Have,
The system according to claim 1.

The processing device is
Calculate the degree to which the analysis purpose conforms to the target data or the target data and the reference information.
The degree of effectiveness of the reference information for the analysis purpose is calculated.
Based on the results of both calculations, the stage of performing any of the above types of analysis is determined.
The system according to claim 2.

The analysis purpose includes an analysis performing stage desired by the user among the performing analysis stages of the plurality of types of analysis.
The processing device is
Based on the results of both of the above calculations, the user evaluates the stage of analysis desired, and this evaluation includes maintaining the stage of performing the analysis or changing to another stage of analysis.
Based on the results of the evaluation, the user is presented with the recommended stage of analysis.
The system according to claim 3.

The analytical objectives include KPIs.
The reference information includes dictionary data for the target data.
The processing device is
The degree to which the analysis purpose is suitable for the target data is calculated by comparing the KPI with the column name of the target data.
The degree to which the analysis purpose conforms to the reference information is calculated by comparing the KPI with the dictionary data.
The system according to claim 3.

The storage device includes reference information for the target data.
The processing device is
Extracting a combination of data available for the recommended analysis step from the subject data
Extracting data corresponding to the analysis purpose from the target data, and
Extracting the data corresponding to the reference information from the target data and
Run from
Based on the data corresponding to the analysis purpose and the data corresponding to the reference information, a plurality of combinations of the data are configured.
Evaluate each of the multiple combinations of data based on the amount of data.
The system according to claim 1.

The processing device is
Extracting the data corresponding to the analysis purpose from the target data is executed by extracting the column corresponding to the analysis purpose from the target data.
Extracting the data corresponding to the reference information from the target data is executed by extracting the column corresponding to the reference information from the target data from the target data.
As a combination of the data, a plurality of combinations of a column corresponding to the analysis purpose and a column corresponding to the reference information are configured.
As the amount of data, the number of records for each of the plurality of combinations is calculated.
The system according to claim 6.

The processing device is
For each of the plurality of types in the stage of performing the analysis, each of the plurality of combinations of the extracted data is evaluated.
The system according to claim 2.

The processing device is
For each of the plurality of combinations, it is determined which data value fluctuation pattern is provided among the plurality of data value fluctuation patterns.
The number of records is calculated for each data value fluctuation pattern.
The system according to claim 7.

The processing device is
The column of the target data is extracted based on each of the 4W information of the dictionary data.
The system according to claim 5.

Give the user information about recommended analysis steps or work items for the user's registration of analysis objective information, as well as information about the combination of available data related to the user-specified KPIs and data items of interest in each analysis step. The system according to claim 1, wherein the system has an output device for presenting.

It is a method to support data preparation for data utilization,
Computer
Business data is collected from each of multiple business systems, and the business data is stored in the storage device as target data at least temporarily.
Receive the analysis purpose for the target data from the user terminal and
Based on the analytical objectives and the target data, the recommended analytical steps for the analytical objectives are determined.
A combination of data available for the recommended analysis step is extracted from the target data and
Each of the plurality of combinations of the extracted data was evaluated and
Based on the result of the evaluation, a list in which multiple combinations of the above data are ranked is created.
The list is output to the user terminal, and the list is output.
Allowed the user to select a predetermined data combination from the list.
The method.