JP5584917B2

JP5584917B2 - Data analysis system and data analysis method

Info

Publication number: JP5584917B2
Application number: JP2011107212A
Authority: JP
Inventors: 孝介柳井
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-05-12
Filing date: 2011-05-12
Publication date: 2014-09-10
Anticipated expiration: 2031-05-12
Also published as: JP2012238207A

Description

本発明は、データを分析するデータ分析システムに関し、特に、自動的に分析を進めるデータ分析システムに関する。 The present invention relates to a data analysis system that analyzes data, and more particularly to a data analysis system that automatically performs analysis.

ＩＴ(Information Technology)技術の発達によって業務ログ及びセンサデータ等のデータが大量に収集されるようになってきている。企業及び自治体には、大量に保持するこれらのデータを分析して業務を最適化したいというニーズがある。 With the development of IT (Information Technology) technology, a large amount of data such as business logs and sensor data has been collected. Companies and local governments have a need to analyze these large amounts of data to optimize their operations.

業務ログ及びセンサデータ等のデータを用いて業務を最適化しようとする場合、まず、データの全体的な傾向を把握するためにデータが集計分析される。ここで、集計分析とは、例えば、データを構成する項目ごとの頻度分布を算出すること、及び、データを構成する複数の項目に対してクロス集計を実行すること等をいう。 When trying to optimize a business using data such as a business log and sensor data, first, the data is aggregated and analyzed in order to grasp the overall tendency of the data. Here, the tabulation analysis refers to, for example, calculating a frequency distribution for each item constituting the data and executing cross tabulation on a plurality of items constituting the data.

通常、このような集計分析はデータ分析作業の初期段階で実行され、１００以上の集計分析が実行される。データの意味を理解している専門家及び統計の専門家が試行錯誤して集計分析作業を進める必要がある。 Usually, such total analysis is performed in the initial stage of data analysis work, and 100 or more total analysis is performed. It is necessary for experts who understand the meaning of the data and statistics experts to proceed with tabulation and analysis work through trial and error.

例えば、ある商品カテゴリの月ごとの売り上げが集計分析される場合、１月の売り上げが他の月の２倍あったとする。この場合、この商品カテゴリに属する商品ごとに１月の売り上げを分析することによって、分析をさらに進めることが考えられる。また、年度と月の２軸で当該商品カテゴリの売り上げをクロス集計し、毎年同じ傾向が表れているかを分析することが考えられる。 For example, when the monthly sales of a certain product category are aggregated and analyzed, it is assumed that the sales in January are twice that of other months. In this case, it is conceivable that the analysis is further advanced by analyzing the sales in January for each product belonging to this product category. It is also possible to cross-tabulate sales for the product category on two axes, fiscal and monthly, and analyze whether the same trend appears every year.

このように集計分析作業は、以前の分析結果に基づいて、新しい分析方法を考案し、考案した分析方法を実行する。したがって、集計分析では、試行錯誤のプロセスが何回も繰り返され、最終的に１００以上の集計分析結果が得られる。 Thus, the total analysis work devise a new analysis method based on the previous analysis result, and executes the devised analysis method. Therefore, in the total analysis, a trial and error process is repeated many times, and finally 100 or more total analysis results are obtained.

このような集計分析は、作業をする専門家の人的コストがかかり、また時間もかかる。さらに１００以上の集計分析を実行した場合、分析を実行した者であっても、多数の分析結果を十分に理解し、解釈することが困難になる。 Such a tabulation analysis is costly and time consuming for specialists who work. Further, when 100 or more total analyzes are executed, it becomes difficult for a person who has executed the analysis to fully understand and interpret a large number of analysis results.

データを構成する項目を複数のグループに分け、クロス集計を実行しグラフを生成する技術が知られている（例えば、特許文献１参照）。 A technique is known in which items constituting data are divided into a plurality of groups, cross tabulation is performed, and a graph is generated (for example, see Patent Document 1).

情報伝送ログを収集・集計する場合にアンケート調査が実施され、アンケート調査の分析結果から時間的変化の大きな特徴を抽出し、抽出した特徴に基づいてアンケートの実施範囲や実施項目を決定する技術が知られている（例えば、特許文献２参照）。 A questionnaire survey is conducted when collecting and aggregating information transmission logs. A technology that extracts the characteristics of large temporal changes from the analysis results of the questionnaire survey and determines the scope and items of the questionnaire based on the extracted characteristics. It is known (see, for example, Patent Document 2).

データを構成する項目に適切な視覚的プロットが自動的に生成される技術が知られている（例えば、特許文献３参照）。 A technique is known in which a visual plot suitable for items constituting data is automatically generated (see, for example, Patent Document 3).

特開２００７−１１４６８号公報JP 2007-11468 A 特開２００６−１８３５７号公報JP 2006-18357 A 特表２００９−５０８２１０号公報Special table 2009-508210

特許文献１〜特許文献３に記載された技術では、集計分析が自動で実行されないので、集計分析の人的コスト及び時間コストを解決できない。 In the techniques described in Patent Documents 1 to 3, since the total analysis is not automatically performed, the human cost and the time cost of the total analysis cannot be solved.

そこで、人が試行錯誤して進める集計分析を、コンピュータが自動的に実行する技術が望まれる。これによって、集計分析における人的コストを軽減し、また、集計分析作業にかかる時間も短縮することが期待できる。また、コンピュータが、分析結果をさらに分析するかを判断する場合に用いる基準をユーザに提示することによって、ユーザが当該基準を用いて分析結果を解釈し、分析結果をランキングすることができ、当該ユーザに提示される基準は多数の分析結果を理解する助けになる。 Thus, a technique is desired in which a computer automatically performs a total analysis that is advanced by trial and error. As a result, it is expected that the human cost in the total analysis is reduced and the time required for the total analysis work is also shortened. Also, by presenting to the user the criteria used when the computer determines whether to analyze the analysis results further, the user can interpret the analysis results using the criteria and rank the analysis results. The criteria presented to the user helps to understand a large number of analysis results.

しかしながら、コンピュータが自動的に集計分析を実行するには、以下の技術的な課題がある。 However, there are the following technical problems for the computer to automatically perform the aggregation analysis.

第一に、コンピュータが、分析結果をランダムに分析すると、分析数が発散してしまい、コンピュータリソースを浪費してしまうという問題がある。特に、価値の低い分析結果の分析が優先的に多数実行されると、時間がかかるだけで、価値のある分析結果が得られなくなる。 First, if the computer analyzes the analysis results at random, there is a problem that the number of analyzes diverges and computer resources are wasted. In particular, if a large number of low-value analysis results are preferentially executed, it takes a long time and a valuable analysis result cannot be obtained.

第二に、コンピュータが分析結果をさらに分析するためには、分析結果を詳細に調べ、分析結果のどの部分を深堀するかを決定するためのアルゴリズムが必要である。 Second, in order for the computer to further analyze the analysis results, an algorithm is required to examine the analysis results in detail and determine which part of the analysis results to deepen.

第三に、大量のデータに対して、コンピュータが多数の集計分析を実行する必要があるため、分析を効率的に実行するためのアルゴリズムが必要である。特に、コンピュータが大量データに対して分析処理を実行する場合、価値が低い分析結果が多く出力されることによるファイル入出力がネックとなり、分析処理の実行速度が低下するので、このファイル入出力を効率化する必要がある。そこで、コンピュータに可能な限り多くの分析処理を実行させ、分析処理の実行後に分析結果を評価し、評価の高い分析結果を優先的にユーザに提示する。これによって、価値の高い分析結果のみをユーザに提示でき、ユーザに提示する分析結果の精度を向上させることができる。このためには、コンピュータは、可能な限り多くの分析処理を実行する必要があり、分析処理の実行速度を向上させることが重要な課題となる。 Third, since it is necessary for a computer to perform a large number of aggregate analyzes on a large amount of data, an algorithm for efficiently performing the analysis is necessary. In particular, when a computer performs analysis processing on a large amount of data, file input / output due to the output of many low-value analysis results becomes a bottleneck, and the execution speed of analysis processing decreases. It is necessary to improve efficiency. Therefore, the computer executes as much analysis processing as possible, evaluates the analysis result after executing the analysis processing, and preferentially presents the analysis result with high evaluation to the user. Thereby, only the analysis result with high value can be presented to the user, and the accuracy of the analysis result presented to the user can be improved. For this purpose, the computer needs to execute as many analysis processes as possible, and it is an important issue to improve the execution speed of the analysis processes.

本発明は、上記に鑑みてなされたものであって、自動的に集計分析を実行するデータ分析システムを提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a data analysis system that automatically performs tabulation analysis.

本発明の代表的な一例を示せば、プログラムを実行するプロセッサ、及び前記プロセッサによって実行されるプログラムを格納する記憶領域を備え、データを分析するデータ分析システムであって、前記データに対して分析処理を実行し、前記分析処理の分析結果を前記記憶領域に格納する分析実行部と、前記分析結果を評価し、前記分析結果から新たな分析処理を生成する複数の推論ルール部と、前記複数の推論ルール部を制御する推論ルール制御部と、前記分析実行部の未処理の分析処理を管理する分析処理管理部と、を備え、前記推論ルール部は、前記分析結果を評価する分析結果評価部と、前記分析結果から前記新たな分析処理を生成する分析処理生成部と、を含み、前記分析結果評価部は、前記記憶領域に格納された分析結果を読み込み、当該分析結果の特性を示す特徴量を算出することによって前記分析結果を評価し、当該分析結果と当該特徴量とを対応付けて前記記憶領域に格納し、前記推論ルール制御部は、前記記憶領域に格納された分析結果のうち前記特徴量が所定条件を満たす分析結果を前記新たな分析処理の生成元の候補として選択し、前記分析処理生成部は、前記推論ルール制御部によって選択された分析結果から新たな分析処理を生成し、前記分析処理管理部は、前記分析処理生成部によって生成された分析処理を前記分析実行部に実行させることを特徴とする。 A typical example of the present invention is a data analysis system that includes a processor that executes a program and a storage area that stores a program executed by the processor, and analyzes the data. An analysis execution unit that executes a process and stores an analysis result of the analysis process in the storage area; a plurality of inference rule units that evaluate the analysis result and generate a new analysis process from the analysis result; An inference rule control unit for controlling the inference rule unit, and an analysis processing management unit for managing an unprocessed analysis process of the analysis execution unit, wherein the inference rule unit evaluates the analysis result And an analysis processing generation unit that generates the new analysis processing from the analysis result, and the analysis result evaluation unit reads the analysis result stored in the storage area. The analysis result is evaluated by calculating a feature value indicating the characteristic of the analysis result, the analysis result and the feature value are associated with each other and stored in the storage area, and the inference rule control unit The analysis result stored in the storage area is selected as a candidate for the generation source of the new analysis process, and the analysis process generation unit is selected by the inference rule control unit. A new analysis process is generated from the analysis result, and the analysis process management unit causes the analysis execution unit to execute the analysis process generated by the analysis process generation unit.

本発明によれば、自動的に集計分析を実行するデータ分析システムを提供できる。 ADVANTAGE OF THE INVENTION According to this invention, the data analysis system which performs a total analysis automatically can be provided.

本発明の実施形態のデータ分析システムの説明図である。It is explanatory drawing of the data analysis system of embodiment of this invention. 本発明の実施形態のデータ分析システムの機能ブロック図である。It is a functional block diagram of a data analysis system of an embodiment of the present invention. 本発明の実施形態の分析対象データの一例の説明図である。It is explanatory drawing of an example of the analysis object data of embodiment of this invention. 本発明の実施形態の分析対象データの一例の説明図である。It is explanatory drawing of an example of the analysis object data of embodiment of this invention. 本発明の実施形態の分析対象データの一例の説明図である。It is explanatory drawing of an example of the analysis object data of embodiment of this invention. 本発明の実施形態の分析スクリプトの一例の説明図である。It is explanatory drawing of an example of the analysis script of embodiment of this invention. 本発明の実施形態の分析スクリプトの一例の説明図である。It is explanatory drawing of an example of the analysis script of embodiment of this invention. 本発明の実施形態の分析結果の一例の説明図である。It is explanatory drawing of an example of the analysis result of embodiment of this invention. 本発明の実施形態の分析結果のグラフの一例の説明図である。It is explanatory drawing of an example of the graph of the analysis result of embodiment of this invention. 本発明の実施形態の分析スクリプトテーブルのテーブルスキーマの説明図である。It is explanatory drawing of the table schema of the analysis script table of embodiment of this invention. 本発明の実施形態の評価値テーブルのテーブルスキーマの説明図である。It is explanatory drawing of the table schema of the evaluation value table of embodiment of this invention. 本発明の実施形態の分析スクリプト生成処理のデータの流れを示すシーケンス図である。It is a sequence diagram which shows the data flow of the analysis script production | generation process of embodiment of this invention. 本発明の実施形態の分析スクリプト生成処理のフローチャートである。It is a flowchart of the analysis script production | generation process of embodiment of this invention. 本発明の実施形態の分析実行処理のデータの流れを示すシーケンス図である。It is a sequence diagram which shows the data flow of the analysis execution process of embodiment of this invention. 本発明の実施形態の分析実行処理のフローチャートである。It is a flowchart of the analysis execution process of embodiment of this invention. 本発明の実施形態の分析スクリプトのマージの説明図である。It is explanatory drawing of the merge of the analysis script of embodiment of this invention. 本発明の実施形態の分析結果表示部が表示する分析結果一覧表示画面の説明図である。It is explanatory drawing of the analysis result list display screen which the analysis result display part of embodiment of this invention displays. 本発明の実施形態の分析結果表示部が表示する分析結果表示画面の説明図である。It is explanatory drawing of the analysis result display screen which the analysis result display part of embodiment of this invention displays. 本発明の実施形態の分析結果表示部が表示するソートパラメタ設定画面の説明図である。It is explanatory drawing of the sort parameter setting screen which the analysis result display part of embodiment of this invention displays. 本発明の実施形態の分析ユーザインタフェース部が表示する分析スクリプト入力画面の説明図である。It is explanatory drawing of the analysis script input screen which the analysis user interface part of embodiment of this invention displays. 本発明の実施形態の複数の計算機によって構成されるデータ分析システム１００のブロック図である。1 is a block diagram of a data analysis system 100 configured by a plurality of computers according to an embodiment of this invention.

以下、図面を参照して本発明の実施形態を説明する。なお、本明細書において、コンピュータ等のデータ処理装置が、その処理部においてプログラムによって実現する機能を、「処理」、「部」、及び「手段」と表現する場合がある。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In this specification, functions realized by a program in a processing unit such as a computer may be expressed as “processing”, “unit”, and “means”.

本実施形態は、データを自動的に分析するデータ分析システム１００（図１参照）に係るものである。 The present embodiment relates to a data analysis system 100 (see FIG. 1) that automatically analyzes data.

図１は、本発明の実施形態のデータ分析システム１００の説明図である。 FIG. 1 is an explanatory diagram of a data analysis system 100 according to the embodiment of this invention.

図１では、データ分析システム１００を汎用計算機（コンピュータ）として説明するが、これに限定されず、データ分析システム１００は複数の汎用計算機によって構成されてもよい。 In FIG. 1, the data analysis system 100 is described as a general-purpose computer (computer). However, the present invention is not limited to this, and the data analysis system 100 may be configured by a plurality of general-purpose computers.

データ分析システム１００は、入力デバイス１０１、ネットワークデバイス１０２、中央処理部（プロセッサ、又は、Central Processing Unit：ＣＰＵ）１０３、主記憶部１０４、補助記憶部１０５、及び表示デバイス１０７を備える。 The data analysis system 100 includes an input device 101, a network device 102, a central processing unit (processor or central processing unit: CPU) 103, a main storage unit 104, an auxiliary storage unit 105, and a display device 107.

入力デバイス１０１、ネットワークデバイス１０２、ＣＰＵ１０３、主記憶部１０４、補助記憶部１０５、及び表示デバイス１０７は、バス１０６を介して接続される。 The input device 101, network device 102, CPU 103, main storage unit 104, auxiliary storage unit 105, and display device 107 are connected via a bus 106.

入力デバイス１０１は、ユーザからの入力を受け付ける入力部であり、例えば、キーボード及びマウス等である。ネットワークデバイス１０２は、インターネット等のネットワークに接続するためのネットワークインタフェース部である。 The input device 101 is an input unit that receives input from a user, and is, for example, a keyboard and a mouse. The network device 102 is a network interface unit for connecting to a network such as the Internet.

ＣＰＵ１０３は、主記憶部１０４に格納されたプログラムを実行する。主記憶部１０４は、ＣＰＵ１０３がアクセス可能な記憶領域であって、ＣＰＵ１０３によって実行されるプログラムを格納する。主記憶部１０４は、例えばメモリ等である。補助記憶部１０５は、ＣＰＵ１０３が直接アクセスできない記憶領域であって、ＣＰＵ１０３が実行するプログラムを含む各種データを格納する。補助記憶部１０５は、例えば、ＨＤＤ（Hard Disk Drive）等である。 The CPU 103 executes a program stored in the main storage unit 104. The main storage unit 104 is a storage area accessible by the CPU 103 and stores a program executed by the CPU 103. The main storage unit 104 is, for example, a memory. The auxiliary storage unit 105 is a storage area that cannot be directly accessed by the CPU 103, and stores various data including programs executed by the CPU 103. The auxiliary storage unit 105 is, for example, an HDD (Hard Disk Drive).

入力デバイス１０１及び表示デバイス１０７は、ユーザインタフェース部を構成する。 The input device 101 and the display device 107 constitute a user interface unit.

図２は、本発明の実施形態のデータ分析システム１００の機能ブロック図である。 FIG. 2 is a functional block diagram of the data analysis system 100 according to the embodiment of this invention.

データ分析システム１００は、分析結果データベース２００、推論ルール制御部２０１、推論ルール部２０２Ａ及び２０２Ｂ（以下、総称して推論ルール部２０２という）、ジョブ制御部２０８、分析実行部２０９、分析結果表示部２１３、分析ユーザインタフェース部２１４、並びに、グラフ生成部２１５を備える。 The data analysis system 100 includes an analysis result database 200, an inference rule control unit 201, inference rule units 202A and 202B (hereinafter collectively referred to as an inference rule unit 202), a job control unit 208, an analysis execution unit 209, and an analysis result display unit. 213, an analysis user interface unit 214, and a graph generation unit 215.

推論ルール部２０２Ａは、分析結果評価部２０３Ａ及び分析スクリプト生成部２０４Ａを備える。同じく、推論ルール部２０２Ｂは、分析結果評価部２０３Ｂ及び分析スクリプト生成部２０４Ｂを備える。分析結果評価部２０３Ａ及び２０３Ｂを総称して分析結果評価部２０３といい、分析スクリプト生成部２０４Ａ及び２０４Ｂを総称して分析スクリプト生成部２０４という。 The inference rule unit 202A includes an analysis result evaluation unit 203A and an analysis script generation unit 204A. Similarly, the inference rule unit 202B includes an analysis result evaluation unit 203B and an analysis script generation unit 204B. The analysis result evaluation units 203A and 203B are collectively referred to as an analysis result evaluation unit 203, and the analysis script generation units 204A and 204B are collectively referred to as an analysis script generation unit 204.

また、分析実行部２０９は、分析スクリプト解釈部２１０、分析対象データ管理部２１１、及び分析対象データ２１２を備える。 The analysis execution unit 209 includes an analysis script interpretation unit 210, an analysis target data management unit 211, and analysis target data 212.

推論ルール制御部２０１、推論ルール部２０２、ジョブ制御部２０８、分析実行部２０９、分析結果表示部２１３、分析ユーザインタフェース部２１４、及び、グラフ生成部２１５の一部又は全部は、図１に示すＣＰＵ１０３がプログラムを実行することによって実現される。 Part or all of the inference rule control unit 201, the inference rule unit 202, the job control unit 208, the analysis execution unit 209, the analysis result display unit 213, the analysis user interface unit 214, and the graph generation unit 215 are shown in FIG. This is realized by the CPU 103 executing the program.

分析結果表示部２１３及び分析ユーザインタフェース部２１４は、図１に示す入力デバイス１０１及び表示デバイス１０７のハードウェア構成を含む。 The analysis result display unit 213 and the analysis user interface unit 214 include the hardware configurations of the input device 101 and the display device 107 shown in FIG.

分析結果データベース２００及び分析対象データ２１２は、図１に示す主記憶部１０４又は補助記憶部１０５等の記憶領域に格納される。 The analysis result database 200 and the analysis target data 212 are stored in a storage area such as the main storage unit 104 or the auxiliary storage unit 105 shown in FIG.

図２に示す各部、分析結果データベース２００、及び分析対象データ２１２は、同一のコンピュータに実装されてもよいし、複数のコンピュータに分散して実装されてもよい。 Each unit, the analysis result database 200, and the analysis target data 212 illustrated in FIG. 2 may be mounted on the same computer or may be distributed and mounted on a plurality of computers.

分析結果表示部２１３及び分析ユーザインタフェース部２１４は、入力デバイス１０１及び表示デバイス１０７を備える他に、入力デバイス１０１及び表示デバイス１０７に接続される構成を備えてもよい。 The analysis result display unit 213 and the analysis user interface unit 214 may include a configuration connected to the input device 101 and the display device 107 in addition to the input device 101 and the display device 107.

以下に、データ分析システム１００の各構成部について説明する。 Hereinafter, each component of the data analysis system 100 will be described.

分析結果データベース２００は、分析処理を記述した分析スクリプトの実行結果である分析結果等が登録される分析スクリプトテーブル１０００（図１０参照）、及び、分析結果の評価値が登録される評価値テーブル１１００（図１１参照）を含む。ここで、分析スクリプトとは、ＤＳＬ（Domain Specific Language）又はプログラミング言語で記述された分析処理の実行命令をいう。また、評価値は、分析結果の特性を示す特徴量ともいう。 The analysis result database 200 includes an analysis script table 1000 (see FIG. 10) in which an analysis result that is an execution result of an analysis script describing an analysis process is registered, and an evaluation value table 1100 in which an evaluation value of the analysis result is registered. (See FIG. 11). Here, the analysis script refers to an execution instruction for analysis processing described in DSL (Domain Specific Language) or a programming language. The evaluation value is also referred to as a feature amount indicating the characteristic of the analysis result.

なお、分析スクリプトテーブル１０００は、図１０で詳細を説明する。評価値テーブル１１００は、図１１で詳細を説明する。 Details of the analysis script table 1000 will be described with reference to FIG. Details of the evaluation value table 1100 will be described with reference to FIG.

推論ルール制御部２０１は、推論ルール部２０２を制御し、複数の推論ルール部と連携して動作する。 The inference rule control unit 201 controls the inference rule unit 202 and operates in cooperation with a plurality of inference rule units.

具体的には、推論ルール制御部２０１は、分析結果データベース２００の分析スクリプトテーブル１０００に格納された分析結果を取得し、取得した分析結果の特性を示す特徴量（評価値）を分析結果評価部２０３に算出させ、分析結果評価部２０３が算出した評価値を評価値テーブル１１００に格納する。また、推論ルール制御部２０１は、評価値テーブル１１００に格納された分析結果から評価値が所定の条件を満たす分析結果を取得し、取得した分析結果から新たな分析スクリプトを分析スクリプト生成部２０４に生成され、生成した新たな分析スクリプトを分析スクリプトテーブル１０００に格納する。 Specifically, the inference rule control unit 201 acquires the analysis result stored in the analysis script table 1000 of the analysis result database 200, and analyzes the characteristic amount (evaluation value) indicating the characteristic of the acquired analysis result as the analysis result evaluation unit. The evaluation value calculated by the analysis result evaluation unit 203 is stored in the evaluation value table 1100. Further, the inference rule control unit 201 acquires an analysis result whose evaluation value satisfies a predetermined condition from the analysis result stored in the evaluation value table 1100, and sends a new analysis script to the analysis script generation unit 204 from the acquired analysis result. The generated new analysis script is stored in the analysis script table 1000.

推論ルール部２０２は、分析結果の評価処理及び新たな分析スクリプトの生成処理を推論ルールに基づいて実行する。なお、推論ルールとは、評価処理及び生成処理の基準となるもので、例えば、頻度集計及びクロス集計等である。 The inference rule unit 202 executes an analysis result evaluation process and a new analysis script generation process based on the inference rules. The inference rule is a standard for the evaluation process and the generation process, such as frequency tabulation and cross tabulation.

推論ルールが頻度集計である推論ルール部２０２は、頻度集計を実行する分析スクリプトを生成し、突出した値を有する分析結果の評価値が高くなるように分析結果を評価する。 The inference rule unit 202 whose inference rule is frequency aggregation generates an analysis script for executing frequency aggregation, and evaluates the analysis result so that the evaluation value of the analysis result having a prominent value is high.

一方、推論ルールがクロス集計である推論ルール部２０２は、クロス集計を実行する分析スクリプトを生成し、一様な値を有する分析結果の評価値が高くなるように分析結果を評価する。 On the other hand, the inference rule unit 202 whose inference rule is cross tabulation generates an analysis script for executing cross tabulation and evaluates the analysis result so that the evaluation value of the analysis result having a uniform value becomes high.

なお、異なる推論ルール部２０２には、異なる推論ルールが設定されるものとする。これによって、データ分析システム１００が複数の推論ルール部２０２を備えることによって、多様な分析が可能となる。 Note that different inference rules are set in the different inference rule sections 202. As a result, the data analysis system 100 includes a plurality of inference rule units 202, thereby enabling various analyzes.

ジョブ制御部２０８は、分析結果データベース２００の分析スクリプトテーブル１０００から未実行の分析スクリプトを取得し、取得した分析スクリプトを分析ジョブとして分析実行部２０９の待ち行列に入れる。ジョブ制御部２０８は、分析実行部２０９の分析スクリプトの進捗を監視し、分析実行部２０９が分析スクリプトを実行した結果である分析結果を分析結果データベース２００の分析スクリプトテーブル１０００及び評価値テーブル１１００に格納する。以上より、ジョブ制御部２０８は、分析実行部２０９の分析処理を管理する分析処理管理部として機能するといえる。 The job control unit 208 acquires an unexecuted analysis script from the analysis script table 1000 of the analysis result database 200 and places the acquired analysis script in the queue of the analysis execution unit 209 as an analysis job. The job control unit 208 monitors the progress of the analysis script of the analysis execution unit 209, and the analysis result, which is the result of the analysis execution unit 209 executing the analysis script, is stored in the analysis script table 1000 and the evaluation value table 1100 of the analysis result database 200. Store. From the above, it can be said that the job control unit 208 functions as an analysis processing management unit that manages the analysis processing of the analysis execution unit 209.

分析実行部２０９は、分析スクリプトを解釈して、分析対象データ２１２に対して分析処理を実行する。分析実行部２０９は、分析スクリプト解釈部２１０、分析対象データ管理部２１１、及び分析対象データ２１２を備える。 The analysis execution unit 209 interprets the analysis script and executes analysis processing on the analysis target data 212. The analysis execution unit 209 includes an analysis script interpretation unit 210, an analysis target data management unit 211, and analysis target data 212.

分析スクリプト解釈部２１０は、分析スクリプトを解釈し、分析スクリプトをＣＰＵ１０３が実行可能な形式に変換する。 The analysis script interpretation unit 210 interprets the analysis script and converts the analysis script into a format that can be executed by the CPU 103.

分析対象データ管理部２１１は、分析対象データ２１２を管理し、分析対象データ２１２に必要な処理を実行する。 The analysis target data management unit 211 manages the analysis target data 212 and executes processing necessary for the analysis target data 212.

分析実行部２０９は、例えば、特開２０１１−１３７５８号公報に記載されているデータ処理装置及びＲＤＢＭＳ（Relational Database Management System）等に相当する。 The analysis execution unit 209 corresponds to, for example, a data processing device and an RDBMS (Relational Database Management System) described in Japanese Patent Application Laid-Open No. 2011-13758.

分析実行部２０９は、大量のデータに対して分析処理を実行する場合、多数の計算機上によって構成される分散計算システムであってもよい。 The analysis execution unit 209 may be a distributed calculation system configured on a large number of computers when executing analysis processing on a large amount of data.

分析結果表示部２１３は、分析結果データベース２００に格納された分析結果等を表示デバイス１０７に表示する。また、分析結果表示部２１３は、ユーザからの入力に基づいて、表示デバイス１０７に表示される表示内容の切り替え、また、表示デバイス１０７に表示された分析結果を並べ替える。 The analysis result display unit 213 displays the analysis result stored in the analysis result database 200 on the display device 107. The analysis result display unit 213 switches display contents displayed on the display device 107 and rearranges the analysis results displayed on the display device 107 based on an input from the user.

分析ユーザインタフェース部２１４は、ユーザによって図２０に示す分析スクリプト入力画面２０００を介して作成された分析スクリプトの入力を受け付け、受け付けた分析スクリプトを分析実行部２０９に実行させる。また、分析ユーザインタフェース部２１４は、ユーザから入力された分析スクリプトの分析結果を表示デバイス１０７に表示する。また、分析ユーザインタフェース部２１４は、ユーザからの指示に基づいて、分析実行部２０９によって分析処理が実行された分析結果を分析結果データベース２００の分析スクリプトテーブル１０００に格納する。 The analysis user interface unit 214 receives an input of an analysis script created by the user via the analysis script input screen 2000 shown in FIG. 20 and causes the analysis execution unit 209 to execute the received analysis script. The analysis user interface unit 214 displays the analysis result of the analysis script input by the user on the display device 107. Further, the analysis user interface unit 214 stores the analysis result obtained by performing the analysis process by the analysis execution unit 209 in the analysis script table 1000 of the analysis result database 200 based on an instruction from the user.

以上より、分析ユーザインタフェース部２１４は、ユーザが作成した分析スクリプトの入力を受け付ける分析スクリプト入力受付部として機能するといえる。 From the above, it can be said that the analysis user interface unit 214 functions as an analysis script input receiving unit that receives an input of an analysis script created by a user.

推論ルール制御部２０１は、分析ユーザインタフェース部２１４が受け付けたユーザによって作成された分析スクリプトの分析結果に基づいて推論ルール部２０２に新たな分析スクリプトを生成させることもできる。これによって、ユーザは、データ分析システム１００が自動的に進める集計分析の分析方針を、分析ユーザインタフェース部２１４を介して修正できる。 The inference rule control unit 201 can also cause the inference rule unit 202 to generate a new analysis script based on the analysis result of the analysis script created by the user received by the analysis user interface unit 214. As a result, the user can correct the analysis policy of the aggregate analysis automatically advanced by the data analysis system 100 via the analysis user interface unit 214.

グラフ生成部２１５は、分析結果に基づいて棒グラフ及び散布図等のグラフデータを生成し、生成したグラフデータを分析結果表示部２１３及び分析ユーザインタフェース部２１４等を介して表示デバイス１０７に表示する。 The graph generation unit 215 generates graph data such as a bar graph and a scatter diagram based on the analysis result, and displays the generated graph data on the display device 107 via the analysis result display unit 213, the analysis user interface unit 214, and the like.

次に、分析対象データ２１２の例について図３〜図５を用いて説明する。 Next, an example of the analysis target data 212 will be described with reference to FIGS.

図３は、本発明の実施形態の分析対象データ２１２の一例の説明図である。 FIG. 3 is an explanatory diagram illustrating an example of the analysis target data 212 according to the embodiment of this invention.

図３では、分析対象データ２１２として、ＸＭＬタグ付きのデータ３００を用いて説明するが、分析対象データ２１２は、図３に示すデータ３００以外のデータであってもよいことは明らかである。 In FIG. 3, the analysis target data 212 is described using data 300 with an XML tag, but it is obvious that the analysis target data 212 may be data other than the data 300 illustrated in FIG. 3.

図３に示す分析対象データ２１２は、ユーザが体につけた加速度センサのログを表す。このログには、「ユーザＩＤ」、「行動」、「日付」、「時刻」、「ｘ方向の加速度」、「ｙ方向の加速度」、及び「ｚ方向の加速度」がＸＭＬタグによって定義され、これらの値が記憶されている。 The analysis target data 212 shown in FIG. 3 represents a log of the acceleration sensor attached to the body by the user. In this log, “user ID”, “behavior”, “date”, “time”, “acceleration in x direction”, “acceleration in y direction”, and “acceleration in z direction” are defined by XML tags, These values are stored.

なお、図３では、１ユーザ分の分析対象データ２１２しか示されていないが、実際には複数のユーザの分析対象データ２１２がある。 In FIG. 3, only the analysis target data 212 for one user is shown, but there is actually analysis target data 212 for a plurality of users.

図４は、本発明の実施形態の分析対象データ２１２の一例の説明図である。 FIG. 4 is an explanatory diagram illustrating an example of the analysis target data 212 according to the embodiment of this invention.

分析対象データ２１２は図３に示すＸＭＬタグ付きのデータ３００以外のデータであってもよく、図４では、図３に示すＸＭＬタグ付きのデータ３００からＸＭＬタグを除外したデータ４００を示している。 The analysis target data 212 may be data other than the data 300 with the XML tag shown in FIG. 3, and FIG. 4 shows the data 400 excluding the XML tag from the data 300 with the XML tag shown in FIG. .

図５は、本発明の実施形態の分析対象データ２１２の一例の説明図である。 FIG. 5 is an explanatory diagram illustrating an example of the analysis target data 212 according to the embodiment of this invention.

図５では、図３に示すデータ３００又は図４に示すデータ４００を表形式に変換したデータ５００を示している。 FIG. 5 shows data 500 obtained by converting the data 300 shown in FIG. 3 or the data 400 shown in FIG. 4 into a table format.

次に、分析スクリプトの例について図６及び図７を用いて説明する。 Next, an example of an analysis script will be described with reference to FIGS.

図６は、本発明の実施形態の分析スクリプトの一例の説明図である。 FIG. 6 is an explanatory diagram illustrating an example of the analysis script according to the embodiment of this invention.

図６に示す分析スクリプトの例は、図３に示す加速度センサのログである分析対象データ２１２から加速度の絶対値の頻度分布を算出する頻度集計分析スクリプト６００である。ここで、加速度の絶対値とは、ｘ方向の加速度、ｙ方向の加速度、及びｚ方向の加速度をそれぞれ二乗して、二乗した値の和の平方根である。 An example of the analysis script shown in FIG. 6 is a frequency tabulation analysis script 600 that calculates a frequency distribution of absolute values of acceleration from the analysis target data 212 that is a log of the acceleration sensor shown in FIG. Here, the absolute value of acceleration is the square root of the sum of squared values obtained by squaring the acceleration in the x direction, the acceleration in the y direction, and the acceleration in the z direction.

図６の「logs: histogram(0、 20、 1)」は、図３に示すデータ３００の「logs」タグの階層の頻度分布を算出することを意味し、引数は、０から２０までの値を１刻みで計数することを意味する。また、図６の「target: sqrt(x*x + y*y + z*z)」は加速度の絶対値を四捨五入した値を対象とすることを意味する。 “Logs: histogram (0, 20, 1)” in FIG. 6 means that the frequency distribution of the “logs” tag hierarchy of the data 300 shown in FIG. 3 is calculated, and the argument is a value from 0 to 20 Is counted in increments of 1. Further, “target: sqrt (x * x + y * y + z * z)” in FIG. 6 means that a value obtained by rounding off the absolute value of acceleration is targeted.

図７は、本発明の実施形態の分析スクリプトの一例の説明図である。 FIG. 7 is an explanatory diagram illustrating an example of an analysis script according to the embodiment of this invention.

図７は、図６に示す分析スクリプト６００をＳＱＬ言語で記述した分析スクリプト７００である。 FIG. 7 shows an analysis script 700 in which the analysis script 600 shown in FIG. 6 is described in the SQL language.

なお、本実施形態の分析スクリプトは、図６及び図７に示す分析スクリプトに限定されず、他の分析スクリプトであってもよい。 Note that the analysis script of the present embodiment is not limited to the analysis script shown in FIGS. 6 and 7 and may be another analysis script.

図８は、本発明の実施形態の分析結果の一例の説明図である。 FIG. 8 is an explanatory diagram illustrating an example of an analysis result according to the embodiment of this invention.

図８に示す分析結果の例は、図３に示すデータ３００（分析対象データ２１２）に対して図６に示す分析スクリプト６００が実行された分析結果のデータ８００である。 An example of the analysis result shown in FIG. 8 is analysis result data 800 obtained by executing the analysis script 600 shown in FIG. 6 on the data 300 (analysis target data 212) shown in FIG.

図８に示す一行目は、加速度の絶対値を四捨五入した値が０のログが２１２４０件であることを意味する。図８に示す二行目は、加速度の絶対値を四捨五入した値が１のログが１２４６件であることを意味する。図８に示す三行目以下は、一行目及び二行目と同じであるので説明を省略する。 The first line shown in FIG. 8 means that there are 21240 logs with a value of 0 obtained by rounding off the absolute value of acceleration. The second line shown in FIG. 8 means that 1246 logs have a value of 1 obtained by rounding off the absolute value of acceleration. Since the third and subsequent lines shown in FIG. 8 are the same as the first and second lines, the description thereof is omitted.

図９は、本発明の実施形態の分析結果のグラフの一例の説明図である。 FIG. 9 is an explanatory diagram illustrating an example of a graph of an analysis result according to the embodiment of this invention.

図９に示す分析結果のグラフの例は、図８に示す分析結果のデータ８００のグラフ９００である。 An example of the analysis result graph illustrated in FIG. 9 is a graph 900 of the analysis result data 800 illustrated in FIG.

横軸は加速度の絶対値を四捨五入した値であり、縦軸は分析結果のログの件数である。 The horizontal axis is the value obtained by rounding off the absolute value of acceleration, and the vertical axis is the number of logs of analysis results.

次に、分析結果データベース２００について図１０及び図１１を用いて説明する。 Next, the analysis result database 200 will be described with reference to FIGS.

分析結果データベース２００は、図２で説明したように、分析スクリプト及び分析結果等を格納する分析スクリプトテーブル１０００、及び、分析結果の評価値を格納する評価値テーブル１１００を含む。 As described with reference to FIG. 2, the analysis result database 200 includes an analysis script table 1000 that stores analysis scripts, analysis results, and the like, and an evaluation value table 1100 that stores evaluation values of analysis results.

分析結果データベース２００は、例えばＲＤＢＭＳ（Relational Database Management System）等の公知の技術によって管理される。 The analysis result database 200 is managed by a known technique such as an RDBMS (Relational Database Management System).

図１０は、本発明の実施形態の分析スクリプトテーブル１０００のテーブルスキーマの説明図である。 FIG. 10 is an explanatory diagram of a table schema of the analysis script table 1000 according to the embodiment of this invention.

分析スクリプトテーブル１０００は、analysis＿id１００１、data＿id１００２、created＿at１００３、updated＿at１００４、parent＿analysis＿ids１００５、rule＿id１００６、title１００７、script１００８、expected＿fitness１００９、及びresult１０１０を含む。 The analysis script table 1000 includes analysis_id 1001, data_id 1002, created_at 1003, updated_at 1004, parent_analysis_ids 1005, rule_id 1006, title 1007, script 1008, expected_fitness 1009, and result 1010.

analysis＿id１００１には、分析スクリプトテーブル１０００内のレコードの一意な識別子（以下、分析ＩＤという）が登録される。data＿id１００２には、分析処理の対象となる分析対象データ２１２の種類を示す識別子が登録される。 In the analysis_id 1001, a unique identifier (hereinafter referred to as an analysis ID) of a record in the analysis script table 1000 is registered. Registered in data_id 1002 is an identifier indicating the type of analysis target data 212 to be analyzed.

created＿at１００３にはレコードが追加された日時が登録される。updated＿at１００４にはレコードが更新された日時が登録される。 In created_at 1003, the date and time when the record was added is registered. In updated_at 1004, the date and time when the record was updated is registered.

parent＿analysis＿ids１００５には、script１００８に登録された分析スクリプトの作成元となった分析結果の分析ＩＤが登録される。 In parent_analysis_ids 1005, an analysis ID of an analysis result that is a creation source of the analysis script registered in script 1008 is registered.

rule＿id１００６には、script１００８に登録された分析スクリプトを作成した分析スクリプト生成部２０４の一意な識別子（ルールＩＤ）が登録される。図１０では、rule＿id１００６にはcross及びscaleが登録されるが、crossはクロス集計の推論ルール部２０２のルールＩＤを示し、scaleは頻度分布の推論ルール部２０２のルールＩＤを示す。 In rule_id 1006, a unique identifier (rule ID) of the analysis script generation unit 204 that created the analysis script registered in script 1008 is registered. In FIG. 10, cross and scale are registered in rule_id 1006, where cross indicates the rule ID of the cross tabulation inference rule unit 202, and scale indicates the rule ID of the frequency distribution inference rule unit 202.

title１００７には、script１００８に登録された分析スクリプトによる集計分析のタイトルが登録される。script１００８には、分析スクリプト生成部２０４によって生成された分析スクリプトが登録される。expected＿fitness１００９には、レコードに登録された分析スクリプトの評価値の予測値が登録される。予測値の算出方法については、図１３で詳細を説明する。result１０１０には、レコードのscript１００８に登録された分析スクリプトの分析結果が登録される。 In title 1007, the title of the aggregate analysis by the analysis script registered in script 1008 is registered. In the script 1008, the analysis script generated by the analysis script generation unit 204 is registered. In expected_fitness 1009, a predicted value of the evaluation value of the analysis script registered in the record is registered. Details of the calculation method of the predicted value will be described with reference to FIG. In result 1010, the analysis result of the analysis script registered in the script 1008 of the record is registered.

図１１は、本発明の実施形態の評価値テーブル１１００のテーブルスキーマの説明図である。 FIG. 11 is an explanatory diagram of a table schema of the evaluation value table 1100 according to the embodiment of this invention.

評価値テーブル１１００は、fitness＿id１１０１、analysis＿id１１０２、created＿at１１０３、updated＿at１１０４、rule＿id１１０５、measure１１０６、及びvalue１１０７を含む。 The evaluation value table 1100 includes fitness_id 1101, analysis_id 1102, created_at 1103, updated_at 1104, rule_id 1105, measure 1106, and value 1107.

fitness＿id１１０１には、評価値テーブル１１００内のレコードの一意な識別子が登録される。analysis＿id１１０２には、評価した分析結果の分析ＩＤが登録される。評価値テーブル１１００は、分析ＩＤによって、分析スクリプトテーブル１０００と対応付けられる。一つの分析結果は、複数の推論ルール部２０２の分析結果評価部２０３によって評価されるので、分析スクリプトテーブル１０００の一つのレコードは、評価値テーブル１１００の複数のレコードに対応し、これらのレコードは分析ＩＤによって対応付けられる。 In fitness_id 1101, a unique identifier of a record in the evaluation value table 1100 is registered. In analysis_id 1102, an analysis ID of the evaluated analysis result is registered. The evaluation value table 1100 is associated with the analysis script table 1000 by the analysis ID. Since one analysis result is evaluated by the analysis result evaluation unit 203 of the plurality of inference rule units 202, one record in the analysis script table 1000 corresponds to a plurality of records in the evaluation value table 1100, and these records are Corresponding by analysis ID.

created＿at１１０３には、評価値テーブル１１００のレコードが追加された日時が登録される。 In created_at 1103, the date and time when the record of the evaluation value table 1100 was added is registered.

updated＿at１１０４には、評価値テーブル１１００のレコードが更新された日時が登録される。 In updated_at 1104, the date and time when the record of the evaluation value table 1100 was updated is registered.

rule＿id１１０５には、評価値を算出した分析結果評価部２０３に設定された推論ルールが登録される。measure１１０６には、分析結果評価部２０３が評価値を算出した評価基準の一意な識別子が登録される。value１１０７には評価値が登録される。 In rule_id 1105, an inference rule set in the analysis result evaluation unit 203 that calculates the evaluation value is registered. In the measure 1106, a unique identifier of the evaluation criterion for which the analysis result evaluation unit 203 has calculated the evaluation value is registered. An evaluation value is registered in value 1107.

次に、分析スクリプト作成処理について図１２及び図１３を用いて説明する。 Next, the analysis script creation process will be described with reference to FIGS.

図１２は、本発明の実施形態の分析スクリプト生成処理のデータの流れを示すシーケンス図である。 FIG. 12 is a sequence diagram illustrating a data flow of the analysis script generation process according to the embodiment of this invention.

分析スクリプト生成処理で使用するデータは、推論ルール制御部２０１、分析結果データベース２００、及び推論ルール部２０２の間でやりとりされる。 Data used in the analysis script generation process is exchanged among the inference rule control unit 201, the analysis result database 200, and the inference rule unit 202.

各推論ルール部２０２には一意な識別子（ルールＩＤ）が設定されており、説明を簡略化するために、図１２では推論ルール部２０２ＡにはルールＩＤ「０」が設定され、推論ルール部２０２ＢにはルールＩＤ「１」が設定されているものとする。 Each inference rule unit 202 is set with a unique identifier (rule ID). To simplify the explanation, in FIG. 12, the inference rule unit 202A is set with a rule ID “0”, and the inference rule unit 202B. It is assumed that the rule ID “1” is set in.

最初にルールＩＤ「０」が設定された推論ルール部２０２Ａが分析スクリプト生成処理を実行し、続いて、ルールＩＤ「１」が設定された推論ルール部２０２Ｂが分析スクリプト生成処理を実行する。 The inference rule part 202A to which the rule ID “0” is set first executes the analysis script generation process, and then the inference rule part 202B to which the rule ID “1” is set executes the analysis script generation process.

なお、図１２では、二つの推論ルール部２０２が図示されるが、三つ以上の推論ルール部２０２があっても、一つの推論ルール部２０２が分析スクリプト処理を実行した後、次の推論ルール部２０２が分析スクリプト生成処理を実行する。 In FIG. 12, two inference rule units 202 are illustrated. However, even if there are three or more inference rule units 202, after one inference rule unit 202 executes the analysis script processing, the next inference rule is displayed. The unit 202 executes analysis script generation processing.

まず、推論ルール制御部２０１は、分析結果データベース２００の分析スクリプトテーブル１０００のレコードから、分析スクリプト生成処理を実行する推論ルール部２０２Ａ（rule＿id=0）が未だ評価していない分析結果が登録されたレコードを取得する（１２０１）。 First, the inference rule control unit 201 registers an analysis result that has not yet been evaluated by the inference rule unit 202A (rule_id = 0) that executes the analysis script generation process, from the record of the analysis script table 1000 of the analysis result database 200. A record is acquired (1201).

次に、推論ルール制御部２０１は、ステップ１２０１の処理で取得した分析スクリプトテーブル１０００のレコードを推論ルール部２０２Ａに渡す。推論ルール部２０２Ａは、当該レコードの評価値を推論ルール部２０２Ａに設定された評価基準に従って算出し、算出した評価値と当該評価値の算出に用いた評価基準とを推論ルール制御部２０１に渡す（１２０２）。 Next, the inference rule control unit 201 passes the record of the analysis script table 1000 acquired in the process of step 1201 to the inference rule unit 202A. The inference rule unit 202A calculates the evaluation value of the record in accordance with the evaluation criterion set in the inference rule unit 202A, and passes the calculated evaluation value and the evaluation criterion used for calculating the evaluation value to the inference rule control unit 201. (1202).

次に、推論ルール制御部２０１は、推論ルール部２０２Ａから渡された評価値を、分析結果データベース２００の評価値テーブル１１００に登録する（１２０３）。 Next, the inference rule control unit 201 registers the evaluation value passed from the inference rule unit 202A in the evaluation value table 1100 of the analysis result database 200 (1203).

次に、推論ルール制御部２０１は、分析結果データベース２００の評価値テーブル１１００を参照し、所定条件が成立するレコードを分析スクリプトの生成元候補として所定個数取得する（１２０４）。所定条件については、図１３のステップ１３０５の処理で詳細を説明する。 Next, the inference rule control unit 201 refers to the evaluation value table 1100 of the analysis result database 200, and acquires a predetermined number of records that satisfy a predetermined condition as analysis script generation source candidates (1204). Details of the predetermined condition will be described in the processing of step 1305 in FIG.

次に、推論ルール制御部２０１は、ステップ１２０４の処理で取得した生成元候補に基づいて、分析スクリプト生成処理を実行する推論ルール部２０２Ａに分析スクリプトを生成させる（１２０５）。 Next, the inference rule control unit 201 causes the inference rule unit 202A that executes the analysis script generation process to generate an analysis script based on the generation source candidate acquired in the process of step 1204 (1205).

次に、推論ルール制御部２０１は、推論ルール部２０２Ａによって生成された分析スクリプトを分析結果データベース２００の分析スクリプトテーブル１０００に登録する（１２０６）。 Next, the inference rule control unit 201 registers the analysis script generated by the inference rule unit 202A in the analysis script table 1000 of the analysis result database 200 (1206).

続いて、推論ルール制御部２０１は、推論ルール部２０２Ａに対して分析スクリプト生成処理を実行したので、推論ルール部２０２Ｂ（rule＿id=1）に対して分析スクリプト生成処理を実行する。 Subsequently, since the inference rule control unit 201 has executed the analysis script generation process on the inference rule unit 202A, the inference rule control unit 201 executes the analysis script generation process on the inference rule unit 202B (rule_id = 1).

なお、推論ルール部２０２Ｂの分析スクリプト生成処理は、推論ルール部２０２Ａの分析スクリプト生成処理と同じなので、同じ符号を付与し、説明を省略する。 Since the analysis script generation process of the inference rule unit 202B is the same as the analysis script generation process of the inference rule unit 202A, the same reference numerals are given and description thereof is omitted.

図１３は、本発明の実施形態の分析スクリプト生成処理のフローチャートである。 FIG. 13 is a flowchart of analysis script generation processing according to the embodiment of this invention.

分析スクリプト生成処理は、推論ルール制御部２０１及び推論ルール部２０２を実現するプログラムを実行するＣＰＵ１０３によって実行される。 The analysis script generation process is executed by the CPU 103 that executes a program that implements the inference rule control unit 201 and the inference rule unit 202.

推論ルール制御部２０１は、分析スクリプト生成処理を実行する推論ルール部２０２に設定されたルールＩＤを示す変数（rule＿id）を０に設定する（１３０１）。 The inference rule control unit 201 sets a variable (rule_id) indicating a rule ID set in the inference rule unit 202 that executes the analysis script generation process to 0 (1301).

次に、推論ルール制御部２０１は、図１２に示すステップ１２０１の処理で説明したように、分析結果データベースの分析スクリプトテーブル１０００のレコードから、分析スクリプト生成処理を実行する推論ルール部２０２が未だ評価していない分析結果が登録されたレコードを評価対象分析結果として取得する（１３０２）。 Next, the inference rule control unit 201 has not yet evaluated the inference rule unit 202 that executes the analysis script generation process from the record of the analysis script table 1000 of the analysis result database, as described in the process of step 1201 shown in FIG. Records in which analysis results that have not been registered are registered are acquired as evaluation target analysis results (1302).

ステップ１３０２の処理について詳細に説明する。 The processing in step 1302 will be described in detail.

推論ルール制御部２０１は、分析スクリプトテーブル１０００のレコードのうち、result１０１０にＮＵＬＬ以外が登録されたレコード（分析済レコード）を取得する。 The inference rule control unit 201 acquires a record (analyzed record) in which records other than NULL are registered in the result 1010 among the records of the analysis script table 1000.

そして、推論ルール制御部２０１は、取得した分析済レコードのうち、どの推論ルール部２０２にも未だ評価されていないレコードを評価対象分析結果として取得する。具体的には、推論ルール制御部２０１は、分析済レコードのanalysis＿id１００１に登録された分析ＩＤが評価値テーブル１１００にanalysis＿id１１０２に登録されていない分析済レコードを評価対象分析結果として取得する。 Then, the inference rule control unit 201 acquires a record that has not been evaluated by any inference rule unit 202 among the acquired analyzed records as an evaluation target analysis result. Specifically, the inference rule control unit 201 acquires, as an evaluation target analysis result, an analyzed record in which the analysis ID registered in the analysis_id 1001 of the analyzed record is not registered in the analysis value id 1102 in the evaluation value table 1100.

また、推論ルール制御部２０１は、すでに評価された分析済レコードであっても、現在の変数（rule＿id）によって特定される推論ルール部２０２によってまだ評価されていない分析済レコードを評価対象分析結果として取得する。具体的には、推論ルール制御部２０１は、分析済レコードのanalysis＿id１００１に登録された分析ＩＤが評価値テーブル１１００にanalysis＿id１１０２に登録されたレコードが存在する場合であっても、当該評価値テーブル１１００のレコードのrule＿id１１０５に登録されたルールＩＤがステップ１３０１の処理で設定されたルールＩＤと異なる場合、当該分析ＩＤがanalysis＿id１００１に登録された分析スクリプトテーブル１０００のレコードを評価対象分析結果として取得する。 Further, the inference rule control unit 201 uses an analyzed record that has not been evaluated yet by the inference rule unit 202 specified by the current variable (rule_id) as an evaluation target analysis result, even if the analyzed record has already been evaluated. get. Specifically, the inference rule control unit 201 is configured so that the analysis ID registered in the analysis_id 1001 of the analyzed record has a record registered in the analysis_id 1102 in the evaluation value table 1100. When the rule ID registered in the rule_id 1105 of the record is different from the rule ID set in the processing of Step 1301, the record of the analysis script table 1000 in which the analysis ID is registered in the analysis_id 1001 is acquired as the evaluation target analysis result.

次に、推論ルール制御部２０１は、ステップ１３０２の処理で取得した評価対象分析結果を、変数（rule＿id）によって特定される推論ルール部２０２の分析結果評価部２０３に評価させる（１３０３）。 Next, the inference rule control unit 201 causes the analysis result evaluation unit 203 of the inference rule unit 202 specified by the variable (rule_id) to evaluate the evaluation object analysis result acquired in the process of step 1302 (1303).

具体的には、推論ルール制御部２０１は、評価対象分析結果となる分析スクリプトテーブル１０００のレコードに含まれるanalysis＿id１００１に登録された分析ＩＤ、当該レコードに含まれるscript１００８に登録された分析スクリプト、及び当該レコードに含まれるresult１０１０に登録された分析結果を、分析結果評価部２０３に渡す。 Specifically, the inference rule control unit 201 analyzes the analysis ID registered in the analysis_id 1001 included in the record of the analysis script table 1000 that is the evaluation target analysis result, the analysis script registered in the script 1008 included in the record, and the The analysis result registered in the result 1010 included in the record is passed to the analysis result evaluation unit 203.

分析結果評価部２０３は、推論ルール制御部２０１から渡された分析結果及び分析結果評価部２０３に設定された評価基準に基づいて評価値を算出し、算出した評価値及び評価基準を推論ルール制御部２０１に渡す。 The analysis result evaluation unit 203 calculates an evaluation value based on the analysis result passed from the inference rule control unit 201 and the evaluation standard set in the analysis result evaluation unit 203, and the calculated evaluation value and the evaluation standard are controlled by the inference rule control. To the unit 201.

ここで、分析結果評価部２０３における分析結果の評価方法について簡単に説明する。なお、評価方法の詳細については後述する。 Here, the analysis result evaluation method in the analysis result evaluation unit 203 will be briefly described. Details of the evaluation method will be described later.

分析結果評価部２０３は分析スクリプト生成部２０４と組になっており、分析スクリプトによって生成される分析スクリプトの分析処理と分析結果評価部２０３による分析結果の評価方法とは対応している。 The analysis result evaluation unit 203 is paired with the analysis script generation unit 204, and the analysis processing of the analysis script generated by the analysis script corresponds to the analysis result evaluation method by the analysis result evaluation unit 203.

例えば、分析スクリプト生成部２０４が頻度集計を実行する分析スクリプトを生成する場合、当該分析スクリプト生成部２０４に対応する分析結果評価部２０３は、分析結果の突出度合いを評価する。具体的には、分析結果評価部２０３は、分析結果の最大値から、最大値を除いた他の分析結果の値の平均値を引いた値を評価値として算出する。 For example, when the analysis script generation unit 204 generates an analysis script for performing frequency aggregation, the analysis result evaluation unit 203 corresponding to the analysis script generation unit 204 evaluates the degree of protrusion of the analysis result. Specifically, the analysis result evaluation unit 203 calculates, as an evaluation value, a value obtained by subtracting an average value of other analysis result values excluding the maximum value from the maximum value of the analysis result.

また、分析スクリプト生成部２０４がクロス集計を実行する分析スクリプトを生成する場合、当該分析スクリプト生成部２０４に対応する分析結果評価部２０３は、分析結果の一様度合いを評価する。具体的には、分析結果評価部２０３は、各分析結果と分析結果の平均値との差の絶対値を評価値として算出する。 When the analysis script generation unit 204 generates an analysis script for executing cross tabulation, the analysis result evaluation unit 203 corresponding to the analysis script generation unit 204 evaluates the degree of uniformity of the analysis result. Specifically, the analysis result evaluation unit 203 calculates the absolute value of the difference between each analysis result and the average value of the analysis results as an evaluation value.

次に、推論ルール制御部２０１は、分析結果評価部２０３から渡された評価値を分析結果データベース２００の評価値テーブル１１００に登録する（１３０４）。 Next, the inference rule control unit 201 registers the evaluation value passed from the analysis result evaluation unit 203 in the evaluation value table 1100 of the analysis result database 200 (1304).

具体的には、推論ルール制御部２０１は、分析結果評価部２０３から渡された評価値に対応する分析ＩＤをanalysis＿id１１０２に登録し、変数（rule＿id）に設定されたルールＩＤをrule＿id１１０５に登録し、分析結果評価部２０３から渡された評価基準をmeasure１１０６に登録する。また、推論ルール制御部２０１は、評価値テーブル１１００内で一意な識別子を生成し、生成した識別子をfitness＿id１１０１に登録する。また、推論ルール制御部２０１は、現在の日時をcreated＿at１１０３及びupdated＿at１１０４に登録する。 Specifically, the inference rule control unit 201 registers the analysis ID corresponding to the evaluation value passed from the analysis result evaluation unit 203 in analysis_id 1102, registers the rule ID set in the variable (rule_id) in rule_id 1105, The evaluation criterion passed from the analysis result evaluation unit 203 is registered in the measure 1106. The inference rule control unit 201 generates a unique identifier in the evaluation value table 1100 and registers the generated identifier in fitness_id 1101. The inference rule control unit 201 registers the current date and time in created_at 1103 and updated_at 1104.

次に、推論ルール制御部２０１は、分析結果データベース２００の評価値テーブル１１００から、所定条件が成立するレコードを分析スクリプトの生成元候補として所定数取得する（１３０５）。 Next, the inference rule control unit 201 acquires a predetermined number of records satisfying a predetermined condition from the evaluation value table 1100 of the analysis result database 200 as analysis script generation source candidates (1305).

所定条件について説明する。 The predetermined condition will be described.

推論ルール制御部２０１は、評価値テーブル１１００を参照し、変数（rule＿id）に設定されたルールＩＤによって特定される推論ルール部２０２の分析結果評価部２０３によって評価された評価値が大きい順に所定数のレコード（評価値レコード）を取得する。具体的には、推論ルール制御部２０１は、評価値テーブル１１００のrule＿id１１０５に登録されたルールＩＤが変数（rule＿id）に設定されたルールＩＤと一致するレコードを、value１１０７に登録された評価値が大きい順に所定数（例えば２０個）取得する。 The inference rule control unit 201 refers to the evaluation value table 1100, and determines a predetermined number in descending order of evaluation values evaluated by the analysis result evaluation unit 203 of the inference rule unit 202 specified by the rule ID set in the variable (rule_id). Get the record (evaluation value record). Specifically, the inference rule control unit 201 has a record in which the rule ID registered in the rule_id 1105 of the evaluation value table 1100 matches the rule ID set in the variable (rule_id), and the evaluation value registered in the value 1107 is large. A predetermined number (for example, 20) is acquired in order.

そして、推論ルール制御部２０１は、分析スクリプトテーブル１０００を参照し、取得した評価値レコードの評価対象となった分析結果が、変数（rule＿id）に設定されたルールＩＤによって特定される推論ルール部２０２の分析スクリプト生成部２０４によって生成された分析スクリプトの分析結果でないレコードを生成元候補として取得する。具体的には、推論ルール制御部２０１は、評価値レコードのanalysis＿id１１０２に登録された分析ＩＤが分析スクリプトテーブル１０００のanalysis＿id１００１に登録された分析ＩＤと一致する分析スクリプトテーブル１０００のレコードを参照し、rule＿id１００６に登録されたルールＩＤが変数（rule＿id）に設定されたルールＩＤと一致するか否かを判定する。推論ルール制御部２０１は、rule＿id１００６に登録されたルールＩＤが変数（rule＿id）に設定されたルールＩＤと一致する場合、当該分析ＩＤがanalysis＿id１１０２に登録されたレコードを評価値レコードから削除し、残った評価値レコードを生成元候補として取得する。 Then, the inference rule control unit 201 refers to the analysis script table 1000, and the inference rule unit 202 in which the analysis result that is the evaluation target of the acquired evaluation value record is specified by the rule ID set in the variable (rule_id). A record that is not the analysis result of the analysis script generated by the analysis script generation unit 204 is acquired as a generation source candidate. Specifically, the inference rule control unit 201 refers to the record of the analysis script table 1000 in which the analysis ID registered in the analysis_id 1102 of the evaluation value record matches the analysis ID registered in the analysis_id 1001 of the analysis script table 1000, and rule_id 1006 It is determined whether or not the rule ID registered in (1) matches the rule ID set in the variable (rule_id). When the rule ID registered in the rule_id 1006 matches the rule ID set in the variable (rule_id), the inference rule control unit 201 deletes the record in which the analysis ID is registered in the analysis_id 1102 from the evaluation value record and remains An evaluation value record is acquired as a generation source candidate.

これによって、変数（rule＿id）に設定されたルールＩＤによって特定される推論ルール部２０２の分析スクリプト生成部２０４が新たな分析スクリプトを生成するものであるが、以上の処理によって、分析スクリプト生成部２０４が生成した分析スクリプトによる分析結果に基づいて同じ分析スクリプト生成部２０４が同じ分析スクリプトを生成する無駄な処理を削減できる。 Thus, the analysis script generation unit 204 of the inference rule unit 202 specified by the rule ID set in the variable (rule_id) generates a new analysis script. By the above processing, the analysis script generation unit 204 is generated. It is possible to reduce useless processing in which the same analysis script generation unit 204 generates the same analysis script based on the analysis result of the analysis script generated by the.

なお、ステップ１２０５の処理では、推論ルール制御部２０１は、評価値が所定値以上である分析結果という条件を付加してもよい。 In the process of step 1205, the inference rule control unit 201 may add a condition of an analysis result that has an evaluation value equal to or greater than a predetermined value.

次に、推論ルール制御部２０１は、変数（rule＿id）に設定されたルールＩＤによって特定される推論ルール部２０２の分析スクリプト生成部２０４に生成元候補を渡す。分析スクリプト生成部２０４は、推論ルール制御部２０１から渡された生成元候補に基づいて所定数の分析スクリプトを新たに生成し、生成した分析スクリプトのタイトルを生成し、生成した分析スクリプトの予測値を算出する（１３０６）。 Next, the inference rule control unit 201 passes the generation source candidate to the analysis script generation unit 204 of the inference rule unit 202 specified by the rule ID set in the variable (rule_id). The analysis script generation unit 204 newly generates a predetermined number of analysis scripts based on the generation source candidates passed from the inference rule control unit 201, generates the title of the generated analysis script, and the predicted value of the generated analysis script Is calculated (1306).

推論ルール制御部２０１が分析スクリプト生成部２０４に渡す生成元候補は、評価値テーブル１１００のanalysis＿id１１０２、created＿at１１０３、updated＿at１１０４、rule＿id１１０５、measure１１０６、及びvalue１１０７に登録された情報、並びに、analysis＿id１１０２に登録された分析ＩＤに対応する分析スクリプトテーブル１０００のレコードのscript１００８に登録された分析スクリプトを含む。 The generation candidate that the inference rule control unit 201 passes to the analysis script generation unit 204 includes the information registered in the analysis_id 1102, created_at 1103, updated_at 1104, rule_id 1105, measure 1106, and value 1107 of the evaluation value table 1100, and the analysis ID registered in the analysis_id 1102 The analysis script registered in the script 1008 of the record of the analysis script table 1000 corresponding to is included.

分析スクリプト生成部２０４が生成する分析スクリプトの例について説明する。あるルールＩＤ（図１０ではscale）の分析スクリプト生成部２０４は、頻度分布の刻み幅を小さくして頻度分布を集計する分析スクリプトを生成する。また、他のあるルールＩＤ（図１０ではcross）の分析スクリプト生成部２０４は、ある分析結果に時刻を新たな軸に加えて２軸でクロス集計をする分析スクリプトを生成する。 An example of an analysis script generated by the analysis script generation unit 204 will be described. The analysis script generation unit 204 of a certain rule ID (scale in FIG. 10) generates an analysis script that aggregates the frequency distribution by reducing the step size of the frequency distribution. Further, the analysis script generation unit 204 of another certain rule ID (cross in FIG. 10) generates an analysis script that adds the time to a new analysis result to a new axis and performs cross tabulation on two axes.

分析スクリプト生成部２０４の予測値の算出方法の一例としては、分析スクリプト生成部２０４は、新たに生成された分析スクリプトの生成元の分析結果の評価値の平均値を算出し、算出した平均値を予測値としてもよい。また、他の例としては、分析スクリプト生成部２０４は、新たに生成された分析スクリプトの生成元の分析結果の評価値の最大値を予測値としてもよい。 As an example of the calculation method of the predicted value of the analysis script generation unit 204, the analysis script generation unit 204 calculates the average value of the evaluation values of the analysis results of the generation source of the newly generated analysis script, and calculates the calculated average value May be used as a predicted value. As another example, the analysis script generation unit 204 may use the maximum evaluation value of the analysis result of the generation source of the newly generated analysis script as the predicted value.

なお、ステップ１３０６の処理で生成される分析スクリプトの数は、式１によって算出される。 Note that the number of analysis scripts generated by the processing in step 1306 is calculated by Equation 1.

Ｐｉ−Ｗｉ×Ｑｉ・・・（式１）
Ｐｉ、Ｗｉ、及びＱｉについて説明する。 Pi-Wi × Qi (Formula 1)
Pi, Wi, and Qi will be described.

Ｐｉは、分析実行部２０９において、あるルールＩＤの分析スクリプト生成部２０４によって生成された分析スクリプトのジョブ実行待ちが存在しない場合に、当該分析スクリプト生成部２０４によって生成される分析スクリプトの数である。 Pi is the number of analysis scripts generated by the analysis script generation unit 204 when there is no job execution waiting for the analysis script generated by the analysis script generation unit 204 of a certain rule ID in the analysis execution unit 209. .

Ｗｉは、ルールＩＤごとにユーザが設定可能な重み付けの値である。 Wi is a weighting value that can be set by the user for each rule ID.

Ｑｉは、あるルールＩＤの分析スクリプト生成部２０４によって生成された分析スクリプトのうち、ジョブ実行待ちの分析スクリプトの数である。 Qi is the number of analysis scripts waiting for job execution among the analysis scripts generated by the analysis script generation unit 204 of a certain rule ID.

以上によって、ジョブ実行待ちの分析スクリプトの数が多い場合、分析スクリプト生成部２０４によって生成される分析スクリプトの数を少なくできるので、処理負荷を軽減することができる。一方、ジョブ実行待ちの分析スクリプトの数が少ない場合、分析スクリプト生成部２０４によって生成される分析スクリプトの数を多くできるので、より詳細な集計分析を実行することができる。 As described above, when the number of analysis scripts waiting for job execution is large, the number of analysis scripts generated by the analysis script generation unit 204 can be reduced, so that the processing load can be reduced. On the other hand, when the number of analysis scripts waiting for job execution is small, the number of analysis scripts generated by the analysis script generation unit 204 can be increased, so that more detailed aggregate analysis can be performed.

さらに、ＷｉをルールＩＤごとにユーザが設定可能なため、各分析スクリプト生成部２０４によって生成される分析スクリプトの処理負荷を考慮して、各分析スクリプト生成部２０４によって生成される分析スクリプトの数を設定することができる。 Furthermore, since Wi can be set for each rule ID by the user, the number of analysis scripts generated by each analysis script generation unit 204 is determined in consideration of the processing load of the analysis script generated by each analysis script generation unit 204. Can be set.

ステップ１３０６の処理では、分析スクリプト生成部２０４は、新たに生成した分析スクリプト、当該分析スクリプトのタイトル、及び当該分析スクリプトの予測値の組を推論ルール制御部２０１に渡す。 In the processing of step 1306, the analysis script generation unit 204 passes the newly generated analysis script, the title of the analysis script, and the predicted value of the analysis script to the inference rule control unit 201.

推論ルール制御部２０１は、分析スクリプト生成部２０４から渡された分析スクリプト、分析スクリプトのタイトル、及び予測値を分析結果データベース２００の分析スクリプトテーブル１０００に登録する（１３０７）。 The inference rule control unit 201 registers the analysis script, the analysis script title, and the predicted value passed from the analysis script generation unit 204 in the analysis script table 1000 of the analysis result database 200 (1307).

具体的には、推論ルール制御部２０１は、新たに生成された分析スクリプトの分析処理の対象となる分析対象データ２１２の種類を示す識別子をdata＿id１００２に登録し、新たに生成された分析スクリプトの生成元となった分析結果の分析ＩＤをparent＿analysis＿ids１００５に登録し、分析スクリプト生成部２０４から渡された分析スクリプトのタイトルをtitle１００７に登録し、分析スクリプト生成部２０４から渡された分析スクリプトをscript１００８に登録し、分析スクリプト生成部２０４から渡された予測値をexpected＿fitness１００９に登録する。なお、推論ルール制御部２０１は、分析スクリプトテーブル１０００で一意な分析ＩＤを生成し、生成した分析ＩＤをanalysis＿id１００１に登録し、created＿at１００３及びupdated＿at１００４に現在の日時を登録する。また、推論ルール制御部２０１は、新たに生成された分析スクリプトの分析結果を未算出であるので、result１０１０にＮＵＬＬ値を登録する。なお、新たに生成された分析スクリプトが分析実行部２０９によって実行された場合に、ジョブ制御部２０８が当該分析スクリプトの分析結果をresult１０１０に登録する。 Specifically, the inference rule control unit 201 registers, in data_id 1002, an identifier indicating the type of the analysis target data 212 that is the target of analysis processing of the newly generated analysis script, and generates a newly generated analysis script. The analysis ID of the original analysis result is registered in parent_analysis_ids 1005, the title of the analysis script passed from the analysis script generation unit 204 is registered in title 1007, and the analysis script passed from the analysis script generation unit 204 is registered in script 1008. The predicted value passed from the analysis script generation unit 204 is registered in the expected_fitness 1009. The inference rule control unit 201 generates a unique analysis ID in the analysis script table 1000, registers the generated analysis ID in analysis_id 1001, and registers the current date and time in created_at 1003 and updated_at 1004. Further, since the inference rule control unit 201 has not yet calculated the analysis result of the newly generated analysis script, the inference rule control unit 201 registers a NULL value in the result 1010. When a newly generated analysis script is executed by the analysis execution unit 209, the job control unit 208 registers the analysis result of the analysis script in the result 1010.

次に、推論ルール制御部２０１は、変数（rule＿id）に設定されたルールＩＤをインクリメントしたルールＩＤを変数（rule＿id）に新たに設定する（１３０８）。 Next, the inference rule control unit 201 newly sets a rule ID obtained by incrementing the rule ID set in the variable (rule_id) in the variable (rule_id) (1308).

そして、推論ルール制御部２０１は、ステップ１３０８の処理で設定した変数（rule＿id）が、データ分析システム１００に備わるすべての推論ルール部２０２の数であるＮ以下であるか否かを判定する（１３０９）。 Then, the inference rule control unit 201 determines whether or not the variable (rule_id) set in step 1308 is equal to or less than N, which is the number of all inference rule units 202 included in the data analysis system 100 (1309). ).

ステップ１３０９の処理で、変数（rule＿id）がＮ以下であると判定された場合、ステップ１３０２の処理に戻り、変数（rule＿id）に設定されたルールＩＤの推論ルール部２０２に対してステップ１３０２〜１３０８の処理を実行する。 If it is determined in step 1309 that the variable (rule_id) is N or less, the process returns to step 1302 and steps 1302 to 1308 are performed for the inference rule part 202 of the rule ID set in the variable (rule_id). Execute the process.

一方、ステップ１３０９の処理で、変数変数（rule＿id）がＮよりも大きいと判定された場合、すべての推論ルール部２０２に対して処理が終了したので、推論ルール制御部２０１は、一定時間スリープした後（１３１０）、ステップ１３０１の処理から再度処理を実行する。なお、分析スクリプト生成処理は、ユーザからの強制終了等がない限り、無限ループで繰り返し実行される。分析スクリプト生成処理では、以前の分析結果に基づいて深堀分析を実行する分析スクリプトが生成されるので、分析スクリプト生成処理が繰り返されるたびに新たな分析スクリプトが生成され続ける。 On the other hand, if it is determined in step 1309 that the variable variable (rule_id) is larger than N, the processing is completed for all the inference rule units 202, so that the inference rule control unit 201 sleeps for a certain period of time. After (1310), the processing is executed again from the processing in step 1301. Note that the analysis script generation processing is repeatedly executed in an infinite loop unless there is a forced termination from the user. In the analysis script generation process, an analysis script for executing the deep analysis is generated based on the previous analysis result, so that a new analysis script continues to be generated each time the analysis script generation process is repeated.

以下、推論ルール部２０２による評価値算出及び分析スクリプト生成の例を詳細に説明する。 Hereinafter, an example of evaluation value calculation and analysis script generation by the inference rule unit 202 will be described in detail.

推論ルール部２０２による評価値算出及び分析スクリプト生成の例として、頻度の細分化が考えられる。 As an example of evaluation value calculation and analysis script generation by the inference rule unit 202, frequency subdivision can be considered.

例えば、図９に示す分析結果のグラフにおいて、加速度の絶対値が「０」のレコードが２０万件以上と他のレコードから突出している。このため、次の集計分析として、加速度の絶対値が「０」付近のデータを０．１刻みで頻度を詳細に計数し、グラフ化することが考えられる。このようなルールにおいて、分析結果評価部２０３は、グラフの突出度合いを測る尺度を用いて、分析結果を評価する。例えば、分析結果評価部２０３は、頻度の最大値から、最大値を除いた他の頻度値の平均を引いた値を評価値として、分析結果を評価できる。分析スクリプト生成部２０４は、図６に示す「histogram(0、 20、 1)」の引数を、「histogram(0、 1、 0.1)」と変更した新たな分析スクリプトを生成できる。また、分析スクリプト生成部２０４は、「ＡのＸからＹにおけるＺ刻みの頻度分布」というひな型の文字列を保持し、ひな型の「Ａ」を「sqrt(x*x + y*y + z*z)」に、ひな型の「Ｘ」を「０」に、ひな型の「Ｙ」を「１」に、ひな型の「Ｚ」を「０．１」に置換することによってタイトルを生成する。 For example, in the graph of the analysis result shown in FIG. 9, 200,000 or more records with an absolute value of acceleration “0” protrude from other records. For this reason, as the next total analysis, it is conceivable that the frequency of the data in which the absolute value of the acceleration is in the vicinity of “0” is counted in 0.1 increments and graphed. In such a rule, the analysis result evaluation unit 203 evaluates the analysis result using a scale for measuring the degree of protrusion of the graph. For example, the analysis result evaluation unit 203 can evaluate the analysis result using a value obtained by subtracting the average of other frequency values excluding the maximum value from the maximum value of the frequency as an evaluation value. The analysis script generation unit 204 can generate a new analysis script in which the argument of “histogram (0, 20, 1)” shown in FIG. 6 is changed to “histogram (0, 1, 0.1)”. The analysis script generation unit 204 holds a model character string “frequency distribution of Z increments from X to Y of A” and converts “A” of the model into “sqrt (x * x + y * y + z *”. The title is generated by replacing “X” of the template with “0”, “Y” of the template with “1”, and “Z” of the template with “0.1” in “z)”.

推論ルール部２０２による評価値算出及び分析スクリプト生成の他の例として、クロス集計が考えられる。 As another example of evaluation value calculation and analysis script generation by the inference rule unit 202, cross tabulation can be considered.

例えば、図９に示すグラフでは、加速度の絶対値の「１」から「８」は一様分布に近い分布である。このため、他の軸を加えて、加速度の絶対値の軸と他の軸（例えば、時刻の軸）との２軸でクロス集計することによって、加速度の絶対値の頻度分布に特徴が現れる可能性がある。このため、次に集計分析する場合、時刻を横軸、加速度の絶対値を縦軸とし、クロス集計を行うことが考えられる。なお、クロス集計の分析スクリプトの生成元となる分析結果が二つ以上であってもよい。 For example, in the graph shown in FIG. 9, the absolute values “1” to “8” of acceleration are distributions close to a uniform distribution. For this reason, characteristics can appear in the frequency distribution of the absolute value of acceleration by adding other axes and performing cross tabulation on the two axes of the absolute value axis of acceleration and another axis (for example, the time axis). There is sex. For this reason, in the next tabulation analysis, it is conceivable to perform cross tabulation with the time as the horizontal axis and the absolute value of acceleration as the vertical axis. It should be noted that there may be two or more analysis results from which the cross tabulation analysis script is generated.

クロス集計が分析スクリプト生成部２０４によって新たに生成される場合、分析結果評価部２０３は、グラフの一様度合いを測る尺度を用いて、分析結果を評価する。このようなグラフの一様度合いを測る尺度を示す評価値は、例えば、各頻度と頻度の平均値との差の絶対値を合計することによって算出できる。 When the cross tabulation is newly generated by the analysis script generation unit 204, the analysis result evaluation unit 203 evaluates the analysis result using a scale that measures the degree of uniformity of the graph. The evaluation value indicating the scale for measuring the degree of uniformity of the graph can be calculated by, for example, summing the absolute values of the differences between the frequencies and the average value of the frequencies.

また、クロス集計の分析スクリプトを生成する分析スクリプト生成部２０４は、図６に示す「logs: histogram(0、 20、 1)」を「logs: histogram(1、 9、 1)、 histogram(0、 24、 1)」に変更し、「target: sqrt(x*x + y*y + z*z)」を「target: sqrt(x*x + y*y + z*z)、 hour(time)」に変更することによって、新たなスクリプトを生成できる。ここで、hour(time)はtimeタグから時刻の情報を取得させる関数である。また、図６に示すＤＳＬでは、カンマで区切ることでクロス集計ができることと仮定している。タイトルに関しては、分析スクリプト生成部２０４において、「Ａと時刻のクロス集計」というひな型の文字列を保持しておき、Ａを「sqrt(x*x + y*y + z*z)」に置き換えることでタイトルを生成する。 Further, the analysis script generation unit 204 that generates a cross tabulation analysis script changes “logs: histogram (0, 20, 1)” shown in FIG. 6 to “logs: histogram (1, 9, 1), histogram (0, 24, 1) '' and `` target: sqrt (x * x + y * y + z * z) '' to `` target: sqrt (x * x + y * y + z * z), hour (time) ”Can be used to generate a new script. Here, hour (time) is a function for acquiring time information from the time tag. Further, in the DSL shown in FIG. 6, it is assumed that cross tabulation can be performed by separating with a comma. With respect to the title, the analysis script generation unit 204 stores a character string “A and time cross tabulation” and replaces A with “sqrt (x * x + y * y + z * z)”. To generate a title.

本発明の推論ルール部２０２は上記の例のルールに限られず、様々なルールが考えられる。多様な推論ルールを本発明に組み込むことによって、有用な集計分析が自動的に実行できるようになる。 The inference rule unit 202 of the present invention is not limited to the rules in the above example, and various rules can be considered. By incorporating various inference rules into the present invention, useful aggregation analysis can be performed automatically.

図１４は、本発明の実施形態の分析実行処理のデータの流れを示すシーケンス図である。 FIG. 14 is a sequence diagram illustrating a data flow of the analysis execution process according to the embodiment of this invention.

分析実行処理で使用するデータは、ジョブ制御部２０８、分析結果データベース２００、及び分析実行部２０９の間でやりとりされる。 Data used in the analysis execution process is exchanged among the job control unit 208, the analysis result database 200, and the analysis execution unit 209.

まず、ジョブ制御部２０８は、分析結果データベース２００の分析スクリプトテーブル１０００からジョブ実行待ちの分析スクリプトのレコードを取得する（１４０１）。 First, the job control unit 208 acquires a record of an analysis script waiting for job execution from the analysis script table 1000 of the analysis result database 200 (1401).

次に、ジョブ制御部２０８は、ステップ１４０１の処理で取得したレコードによって特定される分析スクリプトを分析実行部２０９に渡し、分析処理実行命令を分析実行部２０９に渡す（１４０２）。分析実行部２０９は、ジョブ制御部２０８から分析処理実行命令を渡されると、ジョブ制御部２０８から渡された分析スクリプトを実行する。 Next, the job control unit 208 passes the analysis script specified by the record acquired in step 1401 to the analysis execution unit 209 and passes an analysis process execution command to the analysis execution unit 209 (1402). When the analysis execution command is passed from the job control unit 208, the analysis execution unit 209 executes the analysis script passed from the job control unit 208.

ジョブ制御部２０８は、ステップ１４０２の処理で分析処理実行命令を分析実行部２０９に渡すと、待機状態に移行し、周期的に分析スクリプトの実行が完了したか否かを分析実行部２０９に問い合わせる（１４０３、１４０４）。 When the job control unit 208 passes the analysis execution instruction to the analysis execution unit 209 in the process of step 1402, the job control unit 208 shifts to a standby state and periodically inquires the analysis execution unit 209 whether or not the execution of the analysis script is completed. (1403, 1404).

分析実行部２０９が分析スクリプトの実行を完了すると、ジョブ制御部２０８は、分析実行部２０９から分析結果を取得する（１４０５）。そして、ジョブ制御部２０８は、ステップ１４０５の処理で取得した分析結果を分析スクリプトテーブル１０００に格納する（１４０６）。 When the analysis execution unit 209 completes the execution of the analysis script, the job control unit 208 acquires the analysis result from the analysis execution unit 209 (1405). Then, the job control unit 208 stores the analysis result acquired in step 1405 in the analysis script table 1000 (1406).

図１５は、本発明の実施形態の分析実行処理のフローチャートである。 FIG. 15 is a flowchart of analysis execution processing according to the embodiment of this invention.

分析実行処理は、ジョブ制御部２０８、及び分析実行部２０９を実現するプログラムを実行するＣＰＵ１０３によって実行される。 The analysis execution process is executed by the CPU 103 that executes a program that implements the job control unit 208 and the analysis execution unit 209.

まず、ジョブ制御部２０８は、分析結果データベース２００の分析スクリプトテーブル１０００を参照し、予測値（expected＿fitness１００９）が高い順にジョブ実行待ちの分析スクリプトを取得する（１５０１）。なお、ジョブ制御部２０８は、分析スクリプトテーブル１０００のresult１０１０がＮＵＬＬであれば当該レコードのscript１００８に登録された分析スクリプトをジョブ実行待ちの分析スクリプトとして判定する。 First, the job control unit 208 refers to the analysis script table 1000 of the analysis result database 200, and acquires analysis scripts waiting for job execution in descending order of predicted value (expected_fitness 1009) (1501). Note that if the result 1010 of the analysis script table 1000 is NULL, the job control unit 208 determines the analysis script registered in the script 1008 of the record as an analysis script waiting for job execution.

次に、ジョブ制御部２０８は、ステップ１５０１の処理で取得した分析スクリプトをマージして一つの分析スクリプトに合成する（１５０２）。これによって、ＣＰＵ１０３は、分析スクリプトの実行が完了するごとに、次の分析スクリプトを読み込まなくてもよくなり、ファイルの入出力回数を減少させることができ、高速に分析処理を実行できる。なお、ステップ１５０２の処理の詳細は、図１６で詳細を説明する。また、ステップ１５０２の処理では、ジョブ制御部２０８でなく分析実行部２０９によって実行されてもよい。 Next, the job control unit 208 merges the analysis scripts acquired in step 1501 and combines them into one analysis script (1502). Thereby, the CPU 103 does not need to read the next analysis script every time the execution of the analysis script is completed, the number of file input / output operations can be reduced, and the analysis process can be executed at high speed. Details of the processing in step 1502 will be described with reference to FIG. Further, the processing in step 1502 may be executed by the analysis execution unit 209 instead of the job control unit 208.

次に、分析実行部２０９は、ステップ１５０２の処理で合成された分析スクリプトを実行する（１５０３）。具体的には、分析実行部２０９の分析スクリプト解釈部２１０が、分析スクリプトを、分析対象データ２１２に対して実行する。 Next, the analysis execution unit 209 executes the analysis script synthesized in step 1502 (1503). Specifically, the analysis script interpretation unit 210 of the analysis execution unit 209 executes the analysis script on the analysis target data 212.

ジョブ制御部２０８は、分析実行部２０９で分析スクリプトの実行が完了するまで処理を待機し（１５０４）、分析実行部２０９で分析スクリプトの実行が完了すると、分析実行部２０９から分析結果を取得する（１５０５）。 The job control unit 208 waits for processing until the analysis execution unit 209 completes execution of the analysis script (1504). When the analysis execution unit 209 completes execution of the analysis script, the job control unit 208 acquires the analysis result from the analysis execution unit 209. (1505).

次に、ジョブ制御部２０８は、ステップ１５０５の処理で取得した分析結果を分析結果データベース２００の分析スクリプトテーブル１０００のresult１０１０に登録する（１５０６）。 Next, the job control unit 208 registers the analysis result acquired in step 1505 in the result 1010 of the analysis script table 1000 of the analysis result database 200 (1506).

次に、ジョブ制御部２０８は、一定時間スリープした後（１５０７）、ステップ１５０１の処理に戻って分析実行処理を再開する。ジョブ制御部２０８は、一定時間のスリープを挟んで、ユーザからの強制終了がない限り、無限ループで分析実行処理を繰り返し実行する。推論ルール制御部２０１が新しい分析スクリプトを分析結果データベース２００の分析スクリプトテーブル１０００に登録し続けるため、ジョブ制御部２０８も生成された分析スクリプトを分析実行部２０９に実行させ、分析結果を分析結果データベース２００の分析スクリプトテーブル１０００に登録し続けなければならない。 Next, after sleeping for a certain time (1507), the job control unit 208 returns to the process of step 1501 and restarts the analysis execution process. The job control unit 208 repeatedly executes the analysis execution process in an infinite loop as long as there is no forced termination from the user with a certain period of sleep. Since the inference rule control unit 201 keeps registering new analysis scripts in the analysis script table 1000 of the analysis result database 200, the job control unit 208 also causes the analysis execution unit 209 to execute the generated analysis script, and the analysis results are stored in the analysis result database. It must continue to be registered in the 200 analysis script tables 1000.

図１６は、本発明の実施形態の分析スクリプトのマージの説明図である。 FIG. 16 is an explanatory diagram of merging analysis scripts according to the embodiment of this invention.

図１６で１６００及び１６０１は、マージされる分析スクリプトを示し、１６０２は、マージ後の一つに合成された分析スクリプトを示す。 In FIG. 16, reference numerals 1600 and 1601 denote analysis scripts to be merged, and reference numeral 1602 denotes an analysis script synthesized into one after merging.

１６０２の上方の下線部が分析スクリプト１６００に対応する部分であり、１６０２の下方の下線部が分析スクリプト１６０１に対応する部分である。 The underlined portion above 1602 is a portion corresponding to the analysis script 1600, and the underlined portion below 1602 is a portion corresponding to the analysis script 1601.

図１６に示すように、下線部を除外した部分のプログラムがテンプレートとして予め用意されており、ジョブ制御部２０８又は分析実行部２０９は、分析スクリプトを参照し、下線部をテンプレートに挿入することによって、複数の分析スクリプトをマージした一つの分析スクリプトを生成する。 As shown in FIG. 16, a program of a part excluding the underlined portion is prepared in advance as a template, and the job control unit 208 or the analysis execution unit 209 refers to the analysis script and inserts the underlined portion into the template. A single analysis script is generated by merging a plurality of analysis scripts.

分析実行部２０９は、図３に示す分析対象データ２１２に対して１６０２のmapper関数を呼び出し、分析結果を取得する。分析実行部２０９は、取得した分析結果に対してreducer関数が呼び出される。これによって、分析実行部２０９は、分析スクリプト１６００に記述された分析処理、及び分析スクリプト１６０１に記述された分析処理を同時に実行できる。 The analysis execution unit 209 calls the mapper function 1602 on the analysis target data 212 illustrated in FIG. 3 and acquires the analysis result. The analysis execution unit 209 calls a reducer function on the acquired analysis result. Accordingly, the analysis execution unit 209 can simultaneously execute the analysis process described in the analysis script 1600 and the analysis process described in the analysis script 1601.

また、分析対象データ２１２の容量が膨大である場合、当該分析対象データ２１２のファイル入出力が原因で分析実行処理の処理速度が低下してしまう。また、分析スクリプト１６００に記述された分析、及び分析スクリプト１６０１に記述された分析は、同じ分析対象データ２１２を分析するので、分析スクリプト１６００及び１６０１を一つの分析スクリプト１６０２にマージすることによって、分析対象データ２１２の一度読み込みによって分析が可能となり、分析実行時間を短縮できる。これによって、実行できる分析スクリプトの数を増やすことができるため、評価が高いものを優先的にユーザに表示することで、分析結果の有用性という観点での精度を高めることができる。 Further, when the capacity of the analysis target data 212 is enormous, the processing speed of the analysis execution process decreases due to file input / output of the analysis target data 212. Further, since the analysis described in the analysis script 1600 and the analysis described in the analysis script 1601 analyze the same analysis target data 212, the analysis scripts 1600 and 1601 are merged into one analysis script 1602 to perform analysis. Analysis can be performed by reading the target data 212 once, and analysis execution time can be shortened. As a result, the number of analysis scripts that can be executed can be increased. Therefore, by displaying preferentially what is highly evaluated to the user, the accuracy in terms of the usefulness of the analysis results can be increased.

図１７は、本発明の実施形態の分析結果表示部２１３が表示する分析結果一覧表示画面１７００の説明図である。 FIG. 17 is an explanatory diagram of an analysis result list display screen 1700 displayed by the analysis result display unit 213 according to the embodiment of this invention.

分析結果一覧表示画面１７００は、検索キーワード入力フィールド１７０１、検索ボタン１７０２、分析結果一覧表示フィールド１７０３、及びソートボタン１７０９を含む。 The analysis result list display screen 1700 includes a search keyword input field 1701, a search button 1702, an analysis result list display field 1703, and a sort button 1709.

検索キーワード入力フィールド１７０１は、検索キーワードが入力されるテキストフィールドである。検索ボタン１７０２は、分析結果データベース２００に登録された分析結果から検索キーワード入力フィールド１７０１に入力された検索キーワードと一致する分析結果を分析結果表示部２１３に検索させるために操作されるボタンである。なお、分析結果の検索方法は、例えば、分析結果データベース２００の分析スクリプトテーブル１０００のtitle１００７に登録されたタイトルが検索キーワードと一致する分析結果を検索する。 A search keyword input field 1701 is a text field in which a search keyword is input. The search button 1702 is operated to cause the analysis result display unit 213 to search for an analysis result that matches the search keyword input to the search keyword input field 1701 from the analysis result registered in the analysis result database 200. The analysis result search method searches, for example, an analysis result whose title registered in the title 1007 of the analysis script table 1000 of the analysis result database 200 matches the search keyword.

分析結果一覧表示フィールド１７０３は、検索キーワード入力フィールド１７０１に入力された検索キーワードと一致する分析結果の一覧を表示するフィールドである。 The analysis result list display field 1703 is a field for displaying a list of analysis results that match the search keyword input in the search keyword input field 1701.

分析結果一覧表示フィールド１７０３は、グラフボタン１７０４、分析ＩＤ１７０５、データＩＤ１７０６、タイトル１７０７、推論ルール１７０８を含む。 The analysis result list display field 1703 includes a graph button 1704, an analysis ID 1705, a data ID 1706, a title 1707, and an inference rule 1708.

グラフボタン１７０４は、当該レコードの分析結果をグラフ表示した分析結果表示画面１８００（図１８参照）を分析結果表示部２１３に表示させるために操作されるボタンである。分析ＩＤ１７０５には分析ＩＤが表示される。データＩＤ１７０６には分析処理の対象となる分析対象データ２１２の種類を示す識別子が表示される。タイトル１７０７には分析結果を取得した分析スクリプトのタイトルが表示される。推論ルール１７０８には分析結果を取得した分析スクリプトを生成した推論ルール部２０２のルールＩＤが表示される。 The graph button 1704 is operated to display an analysis result display screen 1800 (see FIG. 18) on which the analysis result of the record is displayed on the analysis result display unit 213. An analysis ID is displayed in the analysis ID 1705. In the data ID 1706, an identifier indicating the type of the analysis target data 212 to be analyzed is displayed. The title 1707 displays the title of the analysis script that acquired the analysis result. The inference rule 1708 displays the rule ID of the inference rule unit 202 that generated the analysis script that acquired the analysis result.

なお、分析結果一覧表示フィールド１７０３には、分析結果が所定の順序（ソートパラメタ）でソートされて表示される。図１７では、所定の順序は、分析結果の評価値の高い順であるとする。このソートに用いられる評価値は総合評価値といい、具体的には、ある分析結果の異なる推論ルールで算出された評価値の平均値である。 The analysis result list display field 1703 displays the analysis results sorted in a predetermined order (sort parameter). In FIG. 17, it is assumed that the predetermined order is the order from the highest evaluation value of the analysis result. The evaluation value used for this sort is called a comprehensive evaluation value, and specifically, an average value of evaluation values calculated by different inference rules for a certain analysis result.

ソートボタン１７０９は、分析結果一覧表示フィールド１７０３に表示された分析結果のソートパラメタを設定するためのソートパラメタ設定画面１９００（図１９参照）を分析結果表示部２１３に表示させるために操作されるボタンである。 The sort button 1709 is a button operated to display the sort parameter setting screen 1900 (see FIG. 19) for setting the sort parameter of the analysis result displayed in the analysis result list display field 1703 on the analysis result display unit 213. It is.

図１８は、本発明の実施形態の分析結果表示部２１３が表示する分析結果表示画面１８００の説明図である。 FIG. 18 is an explanatory diagram of an analysis result display screen 1800 displayed by the analysis result display unit 213 according to the embodiment of this invention.

分析結果表示画面１８００は、グラフ表示フィールド１８０１、評価値表示フィールド１８０２、ユーザ名入力フィールド１８０３、推論ルール入力フィールド１８０４、評価尺度入力フィールド１８０５、評価値入力フィールド１８０６、及び評価ボタン１８０７を含む。 The analysis result display screen 1800 includes a graph display field 1801, an evaluation value display field 1802, a user name input field 1803, an inference rule input field 1804, an evaluation scale input field 1805, an evaluation value input field 1806, and an evaluation button 1807.

グラフ表示フィールド１８０１は、グラフボタン１７０４が操作された分析結果をグラフ表示するためのフィールドである。なお、分析結果表示部２１３がグラフボタン１７０４が操作されたことを検出すると、グラフ生成部２１５にグラフボタン１７０４が操作された分析結果のグラフデータを生成させ、グラフ生成部２１５によって生成されたグラフデータをグラフ表示フィールドに表示する。 A graph display field 1801 is a field for displaying a graph of an analysis result obtained by operating the graph button 1704. When the analysis result display unit 213 detects that the graph button 1704 has been operated, the graph generation unit 215 generates graph data of the analysis result in which the graph button 1704 has been operated, and the graph generated by the graph generation unit 215 Display data in a graph display field.

評価値表示フィールド１８０２は、当該分析結果の各推論ルール部２０２の分析結果評価部２０３によって算出された評価値を表示するためのフィールドである。評価値表示フィールド１８０２には、評価値を算出した推論ルール部２０２のルールＩＤ、評価値を算出した評価尺度、及び評価値が表示される。 The evaluation value display field 1802 is a field for displaying the evaluation value calculated by the analysis result evaluation unit 203 of each inference rule unit 202 of the analysis result. The evaluation value display field 1802 displays the rule ID of the inference rule unit 202 that calculated the evaluation value, the evaluation scale that calculated the evaluation value, and the evaluation value.

ユーザ名入力フィールド１８０３、推論ルール入力フィールド１８０４、評価尺度入力フィールド１８０５、及び、評価値入力フィールド１８０６は、当該分析結果の評価値をユーザが変更する場合に値を入力するためのフィールドである。 A user name input field 1803, an inference rule input field 1804, an evaluation scale input field 1805, and an evaluation value input field 1806 are fields for inputting values when the user changes the evaluation value of the analysis result.

ユーザ名入力フィールド１８０３は、ユーザ名を入力するためのテキストフィールドである。推論ルール入力フィールド１８０４は、変更する評価値を算出した推論ルール部２０２のルールＩＤを入力するためのフィールドである。評価尺度入力フィールド１８０５は、変更する評価値を算出した評価尺度を入力するためのフィールドである。評価値入力フィールド１８０６は、変更する評価値を入力するためのフィールドである。 A user name input field 1803 is a text field for inputting a user name. The inference rule input field 1804 is a field for inputting the rule ID of the inference rule unit 202 that has calculated the evaluation value to be changed. An evaluation scale input field 1805 is a field for inputting an evaluation scale in which an evaluation value to be changed is calculated. The evaluation value input field 1806 is a field for inputting an evaluation value to be changed.

評価ボタン１８０７は、グラフ表示フィールド１８０１にグラフ表示されている分析結果の評価値のうち推論ルール入力フィールド１８０４に入力されたルールＩＤ及び評価尺度入力フィールド１８０５に入力された評価尺度によって特定される評価値を、評価値入力フィールド１８０６に入力された評価値に変更するために操作されるボタンである。 The evaluation button 1807 is an evaluation specified by the rule ID input in the inference rule input field 1804 and the evaluation scale input in the evaluation scale input field 1805 among the evaluation values of the analysis results displayed in the graph display field 1801. The button is operated to change the value to the evaluation value input in the evaluation value input field 1806.

具体的には、分析結果表示部２１３は、分析結果データベース２００の評価値テーブル１１００のレコードのうち、analysis＿id１１０２に登録された分析ＩＤがグラフ表示フィールド１８０１にグラフ表示されている分析結果の分析ＩＤと一致するレコードを取得する。そして、分析結果表示部２１３は、取得したレコードのうち、rule＿id１１０５に登録されたルールＩＤが推論ルール入力フィールド１８０４に入力されたルールＩＤと一致し、かつ、measure１１０６に登録された評価尺度が評価尺度入力フィールド１８０５に入力された評価尺度と一致するレコードのvalue１１０７を評価値入力フィールド１８０６に入力された評価値に変更する。この場合、分析結果表示部２１３は、value１１０７を変更したレコードのupdated＿at１１０４に現在の日時を登録する。また、分析結果表示部２１３は、分析結果データベース２００の分析スクリプトテーブル１０００のanalysis＿id１００１に登録された分析ＩＤがグラフ表示フィールド１８０１にグラフ表示されている分析結果の分析ＩＤと一致するレコードのtitle１００７にユーザ名入力フィールド１８０３に入力されたユーザ名を追加し、updated＿at１００４に現在の日時を登録する。 Specifically, the analysis result display unit 213 includes the analysis ID registered in the analysis_id 1102 among the records in the evaluation value table 1100 of the analysis result database 200 and the analysis ID of the analysis result displayed in the graph display field 1801 as a graph. Get matching records. Then, the analysis result display unit 213 matches the rule ID registered in the rule_id 1105 with the rule ID input in the inference rule input field 1804 among the acquired records, and the evaluation scale registered in the measure 1106 is an evaluation scale. The value 1107 of the record that matches the evaluation scale input in the input field 1805 is changed to the evaluation value input in the evaluation value input field 1806. In this case, the analysis result display unit 213 registers the current date and time in updated_at 1104 of the record in which value 1107 is changed. In addition, the analysis result display unit 213 displays the analysis ID registered in the analysis_id 1001 of the analysis script table 1000 of the analysis result database 200 in the title 1007 of the record that matches the analysis ID of the analysis result displayed in the graph display field 1801. The user name input in the name input field 1803 is added, and the current date and time is registered in updated_at 1004.

これによって、ユーザが評価値を設定することができ、ユーザがさらに分析を進めたい分析結果の評価値を高く設定することができ、分析が発散してしまうことも防止できる。 Thereby, the user can set the evaluation value, the evaluation value of the analysis result that the user wants to further analyze can be set high, and the analysis can be prevented from being scattered.

図１９は、本発明の実施形態の分析結果表示部２１３が表示するソートパラメタ設定画面１９００の説明図である。 FIG. 19 is an explanatory diagram of a sort parameter setting screen 1900 displayed by the analysis result display unit 213 according to the embodiment of this invention.

ソートパラメタ設定画面１９００は、ソートボタン１９０１、及び重み付け指定フィールド１９０２を含む。 The sort parameter setting screen 1900 includes a sort button 1901 and a weight designation field 1902.

ソートボタン１９０１は、重み付け指定フィールド１９０２で指定された重み付けで評価値を再度計算して、図１７に示す分析結果一覧表示フィールド１７０３に表示された分析結果を再度計算した評価値順にソートするために操作されるボタンである。 The sort button 1901 recalculates the evaluation values with the weights specified in the weight specification field 1902, and sorts the analysis results displayed in the analysis result list display field 1703 shown in FIG. 17 in the order of the recalculated evaluation values. The button to be operated.

重み付け指定フィールド１９０２は、各推論ルール部２０２の各評価尺度ごとに重みの指定を受け付けるためのフィールドであり、推論ルール１９０３、評価尺度１９０４、及び重み１９０５を含む。 The weight designation field 1902 is a field for accepting designation of weight for each evaluation scale of each inference rule unit 202, and includes an inference rule 1903, an evaluation scale 1904, and a weight 1905.

推論ルール１９０３には、分析結果一覧表示フィールド１７０３に表示された分析結果を評価したすべての推論ルール部２０２のルールＩＤが表示される。評価尺度１９０４には、分析結果一覧表示フィールド１７０３に表示された分析結果を評価したすべての推論ルール部２０２の評価尺度が表示される。 In the inference rules 1903, rule IDs of all inference rule sections 202 that have evaluated the analysis results displayed in the analysis result list display field 1703 are displayed. The evaluation scale 1904 displays the evaluation scales of all inference rule units 202 that have evaluated the analysis results displayed in the analysis result list display field 1703.

重み１９０５は、ユーザが各推論ルール部２０２の評価尺度の重み付けを調整（変更）するためのスライドバーである。ユーザが重み１９０５のスライドバーを調整することによって、ユーザが所望する重み付けを指定できる。 A weight 1905 is a slide bar for the user to adjust (change) the weight of the evaluation scale of each inference rule unit 202. The user can specify a weight desired by the user by adjusting the slide bar having the weight 1905.

ソートボタン１９０１が操作された場合について詳細に説明する。 A case where the sort button 1901 is operated will be described in detail.

分析結果表示部２１３は、ソートボタン１９０１が操作されたことを検出すると、重み付け指定フィールド１９０２で指定された各推論ルール部２０２の各評価尺度の重み付けを適用して、各分析結果の総合評価値を算出する。総合評価値は、分析結果の算出されている評価値を、当該評価値を算出した推論ルール部２０２及び評価基準の重み付けを乗算した値を算出し、重み付けをしたすべての評価値を加算した値である。 When the analysis result display unit 213 detects that the sort button 1901 has been operated, the analysis result display unit 213 applies the weight of each evaluation scale of each inference rule unit 202 specified in the weighting specification field 1902 to obtain a comprehensive evaluation value of each analysis result. Is calculated. The total evaluation value is a value obtained by calculating a value obtained by multiplying the evaluation value for which the analysis result is calculated by the weighting of the inference rule unit 202 for calculating the evaluation value and the evaluation criterion, and adding all the weighted evaluation values. It is.

例えば、ある分析結果は、推論ルール「cross」の評価尺度「一様」で評価値Ｅ１が算出され、推論ルール「scale」の評価尺度「極値」で評価値Ｅ２が算出され、推論ルール「scale」の評価尺度「最大値」で評価値Ｅ３が算出されており、推論ルール「cross」の評価尺度「一様」の重み付けＷ１と指定され、推論ルール「scale」の評価尺度「極値」の重み付けＷ２と指定され、推論ルール「scale」の評価尺度「最大値」の重み付けＷ３と指定されていた場合、総合評価値は、（Ｗ１×Ｅ１＋Ｗ２×Ｅ２＋Ｗ３×Ｅ３）／３によって算出される。 For example, for an analysis result, the evaluation value E1 is calculated with the evaluation scale “uniform” of the inference rule “cross”, the evaluation value E2 is calculated with the evaluation scale “extreme value” of the inference rule “scale”, and the inference rule “ The evaluation value E3 is calculated with the evaluation scale “maximum value” of “scale”, designated as the weight W1 of the evaluation scale “uniform” of the inference rule “cross”, and the evaluation scale “extreme value” of the inference rule “scale” If the weighting W2 of the evaluation rule “scale” and the weighting W3 of the “maximum value” are designated, the total evaluation value is calculated by (W1 × E1 + W2 × E2 + W3 × E3) / 3.

そして、分析結果表示部２１３は、総合評価値の降順（総合評価値の高い順）に分析結果をソートして、ソートした順で分析結果を分析結果一覧表示フィールド１７０３に表示する。 The analysis result display unit 213 sorts the analysis results in descending order of the comprehensive evaluation values (in descending order of the comprehensive evaluation values), and displays the analysis results in the analysis result list display field 1703 in the sorted order.

以上によって、ユーザの観点に合致した評価基準で分析結果の一覧をソートできる。 As described above, the list of analysis results can be sorted according to the evaluation criteria that matches the user's viewpoint.

図２０は、本発明の実施形態の分析ユーザインタフェース部２１４が表示する分析スクリプト入力画面２０００の説明図である。 FIG. 20 is an explanatory diagram of an analysis script input screen 2000 displayed by the analysis user interface unit 214 according to the embodiment of this invention.

分析スクリプト入力画面２０００は、ユーザが所定の入力をした場合に分析ユーザインタフェース部２１４が表示する画面であり、ユーザが作成した分析スクリプトの入力を受け付けるための画面である。 The analysis script input screen 2000 is a screen displayed by the analysis user interface unit 214 when the user makes a predetermined input, and is a screen for accepting input of an analysis script created by the user.

分析スクリプト入力画面２０００は、分析スクリプト入力フィールド２００１、テスト実行ボタン２００２、分析実行ボタン２００３、ログ表示フィールド２００４、グラフ表示フィールド２００５、ユーザ名入力フィールド２４０５、タイトル入力フィールド２４０６、及び保存ボタン２４０７を含む。 The analysis script input screen 2000 includes an analysis script input field 2001, a test execution button 2002, an analysis execution button 2003, a log display field 2004, a graph display field 2005, a user name input field 2405, a title input field 2406, and a save button 2407. .

分析スクリプト入力フィールド２００１は、ユーザが分析スクリプトを入力するためのテキストフィールドである。 An analysis script input field 2001 is a text field for the user to input an analysis script.

テスト実行ボタン２００２は、分析スクリプト入力フィールド２００１に入力された分析スクリプトを分析実行部２０９にテスト実行させるために操作されるボタンである。 The test execution button 2002 is a button operated to cause the analysis execution unit 209 to execute a test on the analysis script input in the analysis script input field 2001.

分析実行ボタン２００３は、分析スクリプト入力フィールド２００１に入力された分析スクリプトを分析実行部２０９に実行させるために操作されるボタンである。 The analysis execution button 2003 is a button operated to cause the analysis execution unit 209 to execute the analysis script input in the analysis script input field 2001.

ログ表示フィールド２００４には、テスト実行ボタン２００２が操作された場合、又は分析実行ボタン２００３が操作された場合の分析スクリプトの分析結果、及び分析スクリプトの分析処理の進捗に関するログ等が表示される。 In the log display field 2004, an analysis result of the analysis script when the test execution button 2002 is operated or an analysis execution button 2003 is operated, a log regarding the progress of the analysis processing of the analysis script, and the like are displayed.

グラフ表示フィールド２００５には、分析スクリプト入力フィールド２００１に入力された分析スクリプトの分析結果がグラフ表示される。 In the graph display field 2005, the analysis result of the analysis script input in the analysis script input field 2001 is displayed in a graph.

ユーザ名入力フィールド２４０５は、分析スクリプトを作成したユーザのユーザ名が入力されるテキストフィールドである。タイトル入力フィールド２４０６は、ユーザが作成した分析スクリプトのタイトルが入力されるテキストフィールドである。 The user name input field 2405 is a text field for inputting the user name of the user who created the analysis script. A title input field 2406 is a text field into which a title of an analysis script created by the user is input.

保存ボタン２４０７は、ユーザが作成した分析スクリプトの分析結果を分析ユーザインタフェース部２１４が分析結果データベース２００に登録するために操作するボタンである。 The save button 2407 is a button operated by the analysis user interface unit 214 to register the analysis result of the analysis script created by the user in the analysis result database 200.

保存ボタン２４０７が操作された場合について詳細に説明する。 A case where the save button 2407 is operated will be described in detail.

保存ボタン２４０７が操作された場合、分析ユーザインタフェース部２１４は、ユーザが作成した分析結果を分析結果データベース２００の分析スクリプトテーブル１０００に登録する。具体的には、分析ユーザインタフェース部２１４は、分析スクリプトテーブル１０００に新たなレコードを追加し、analysis＿id１００１に一意な分析ＩＤを登録し、data＿id１００２に当該分析スクリプトの分析の対象となった分析対象データ２１２の種類を示す識別子が登録され、created＿at１００３及びupdated＿at１００４に現在の日時を登録し、parent＿analysis＿ids１００５にはＮＵＬＬを登録する。また、分析ユーザインタフェース部２１４は、当該新たなレコードのtitle１００７にタイトル入力フィールド２４０６に入力されたタイトル及びユーザ名入力フィールド２４０５に入力されたユーザ名を登録し、script１００８に分析スクリプト入力フィールド２００１に入力された分析スクリプトを登録し、expected＿fitness１００９及びresult１０１０にＮＵＬＬを登録する。 When the save button 2407 is operated, the analysis user interface unit 214 registers the analysis result created by the user in the analysis script table 1000 of the analysis result database 200. Specifically, the analysis user interface unit 214 adds a new record to the analysis script table 1000, registers a unique analysis ID in the analysis_id 1001, and sets the analysis target data 212 that is the analysis target of the analysis script in the data_id 1002. An identifier indicating the type of the current data is registered, the current date and time are registered in created_at 1003 and updated_at 1004, and NULL is registered in parent_analysis_ids 1005. Also, the analysis user interface unit 214 registers the title input in the title input field 2406 and the user name input in the user name input field 2405 in the title 1007 of the new record, and inputs the analysis script input field 2001 in the script 1008. The registered analysis script is registered, and NULL is registered in expected_fitness 1009 and result 1010.

分析スクリプトテーブル１０００にユーザが作成した分析スクリプトの分析結果が登録されるので、当該分析結果から新たな分析スクリプトが生成される。これによって、ユーザが所望の分析スクリプトを作成することによって、データ分析システム１００で実行される分析がユーザの意図しない方向に進むことを修正できる。 Since the analysis result of the analysis script created by the user is registered in the analysis script table 1000, a new analysis script is generated from the analysis result. As a result, it is possible to correct that the analysis executed in the data analysis system 100 proceeds in a direction not intended by the user by creating a desired analysis script by the user.

図２１は、本発明の実施形態の複数の計算機によって構成されるデータ分析システム１００のブロック図である。 FIG. 21 is a block diagram of the data analysis system 100 including a plurality of computers according to the embodiment of this invention.

図２１では、データ分析システム１００の各部が複数の計算機に分散する一例として、データ分析システム１００が、管理計算機２１００、推論ルール計算機２１１０Ａ、２１１０Ｂ、及び分析実行計算機２１２０を備える場合について説明する。なお、図２１の構成のうち図２と同じ構成は同じ符号を付与し、説明を省略する。 In FIG. 21, a case where the data analysis system 100 includes a management computer 2100, inference rule computers 2110A, 2110B, and an analysis execution computer 2120 will be described as an example in which each part of the data analysis system 100 is distributed to a plurality of computers. 21 that are the same as those in FIG. 2 are given the same reference numerals, and descriptions thereof are omitted.

管理計算機２１００は、分析結果データベース２００、推論ルール制御部２０１、ジョブ制御部２０８、分析結果表示部２１３、分析ユーザインタフェース部２１４、及びグラフ生成部２１５を備える。 The management computer 2100 includes an analysis result database 200, an inference rule control unit 201, a job control unit 208, an analysis result display unit 213, an analysis user interface unit 214, and a graph generation unit 215.

推論ルール計算機２１１０Ａは推論ルール部２０２Ａを備え、推論ルール計算機２１１０Ｂは推論ルール部２０２Ｂを備える。 The inference rule computer 2110A includes an inference rule unit 202A, and the inference rule computer 2110B includes an inference rule unit 202B.

分析実行計算機２１２０は分析実行部２０９を備える。 The analysis execution computer 2120 includes an analysis execution unit 209.

このように、本実施形態のデータ分析システム１００の各部は複数の計算機に分散していてもよい。 As described above, each unit of the data analysis system 100 according to the present embodiment may be distributed among a plurality of computers.

本発明は、データ分析装置に関し、特に、自動的に集計分析を実行する技術に適用可能である。 The present invention relates to a data analysis apparatus, and in particular, can be applied to a technique for automatically executing a total analysis.

１０１入力デバイス
１０２ネットワークデバイス
１０３ＣＰＵ
１０４主記憶部
１０５補助記憶部
１０６バス
２００分析結果データベース
２０１推論ルール制御部
２０２推論ルール部
２０３分析結果評価部
２０４分析スクリプト生成部
２０８ジョブ制御部
２０９分析実行部
２１０分析スクリプト解釈部
２１１分析対象データ管理部
２１２分析対象データ
２１３分析結果表示部
２１４分析ユーザインタフェース部
２１５グラフ生成部 101 Input device 102 Network device 103 CPU
DESCRIPTION OF SYMBOLS 104 Main memory part 105 Auxiliary memory part 106 Bus 200 Analysis result database 201 Inference rule control part 202 Inference rule part 203 Analysis result evaluation part 204 Analysis script generation part 208 Job control part 209 Analysis execution part 210 Analysis script interpretation part 211 Analysis object data Management unit 212 Analysis target data 213 Analysis result display unit 214 Analysis user interface unit 215 Graph generation unit

Claims

A data analysis system comprising a processor for executing a program, and a storage area for storing a program executed by the processor, and analyzing data,
Performing an analysis process on the data, and storing an analysis result of the analysis process in the storage area;
A plurality of inference rule parts for evaluating the analysis result and generating a new analysis process from the analysis result;
An inference rule control unit for controlling the plurality of inference rule units;
An analysis processing management unit that manages unprocessed analysis processing of the analysis execution unit,
The inference rule unit includes an analysis result evaluation unit that evaluates the analysis result, and an analysis process generation unit that generates the new analysis process from the analysis result,
The analysis result evaluation unit reads the analysis result stored in the storage area, evaluates the analysis result by calculating a feature amount indicating a characteristic of the analysis result, and associates the analysis result with the feature amount. And store it in the storage area,
The inference rule control unit selects an analysis result in which the feature quantity satisfies a predetermined condition from analysis results stored in the storage area as a candidate for a generation source of the new analysis process,
The analysis processing generation unit generates a new analysis process from the analysis result selected by the inference rule control unit,
The analysis processing management unit causes the analysis execution unit to execute the analysis processing generated by the analysis processing generation unit.

An input unit for receiving input from the user;
An analysis result display unit for outputting an analysis result display screen showing one analysis result,
The analysis result display unit
Including a feature value designation area that can be designated from the input unit as a value of the feature value of the output analysis result, and outputting the analysis result display screen;
2. The data analysis system according to claim 1, wherein a value designated in the feature quantity designation area is stored in the storage area as the feature quantity of the output analysis result.

An input unit for receiving input from the user;
An analysis result that generates a list of analysis results obtained by extracting analysis results that match a predetermined condition from the analysis results stored in the storage area, and that outputs an analysis result list display screen showing the list of the generated analysis results A display unit,
The analysis result display unit
For each of the analysis result evaluation units that have calculated the feature amount, a weight designation area that can be designated from the input unit is output.
The analysis result display unit weights the feature amount based on the weighting designated in the weighting designation region,
The data analysis system according to claim 1, wherein the analysis results displayed on the analysis result list display screen are rearranged in the order of the weighted feature amounts.

An analysis script input receiving unit for receiving an input from a user of an analysis script describing the analysis process;
The analysis script input receiving unit
Causing the analysis execution unit to execute the analysis processing by the analysis script that has received the input;
The data analysis system according to claim 1, wherein an analysis result of an analysis script that has received the input is stored in the storage area.

When the analysis process generation unit generates the new analysis process, the analysis process generation unit generates the feature quantity of the analysis result that is a generation source of the new analysis process, and calculates the feature quantity of the analysis result by the new analysis process. Set it to the predicted value,
The data analysis system according to claim 1, wherein the analysis processing management unit determines a processing order of the analysis processing generated by the analysis processing generation unit based on the predicted value.

The analysis processing management unit or the analysis execution unit merges an analysis script describing the plurality of analysis processes into one script,
The data analysis system according to claim 1, wherein the analysis execution unit executes a plurality of analysis processes by executing the merged script.

The analysis processing generation unit determines the number of the analysis processes to be generated based on the weight set for each inference rule unit and the number of execution processes not yet executed by the analysis execution unit. The data analysis system according to claim 1.

The analysis result evaluation unit calculates the feature amount based on a rule corresponding to an analysis process generated by the analysis process generation unit belonging to the same inference rule unit,
The inference rule control unit has a predetermined feature amount calculated by the analysis result evaluation unit belonging to the same inference rule unit as the analysis processing generation unit that generates the new analysis process among the analysis results stored in the storage area. The data analysis system according to claim 1, wherein an analysis result that satisfies a condition is selected as a candidate for a generation source of the new analysis process.

A data analysis method for analyzing data in a computer system comprising a processor for executing a program and a storage area for storing a program executed by the processor,
The computer system is
Performing an analysis process on the data, and storing an analysis result of the analysis process in the storage area;
A plurality of inference rule parts for evaluating the analysis result and generating a new analysis process from the analysis result;
An inference rule control unit for controlling the plurality of inference rule units;
An analysis processing management unit that manages unprocessed analysis processing of the analysis execution unit,
The inference rule unit includes an analysis result evaluation unit that evaluates the analysis result, and an analysis process generation unit that generates the new analysis process from the analysis result,
The method
The analysis result evaluation unit reads the analysis result stored in the storage area, and the rule corresponding to the analysis process generated by the analysis process generation unit belonging to the same inference rule unit, the feature amount indicating the characteristic of the analysis result Evaluating the analysis result by calculating based on, and storing the analysis result and the feature amount in association with each other in the storage area;
Among the analysis results stored in the storage area, the inference rule control unit is an analysis result evaluated by an analysis result evaluation unit belonging to the same inference rule unit as an analysis processing generation unit that generates a new analysis process, Selecting an analysis result for which the feature quantity satisfies a predetermined condition as a candidate for a generation source of the new analysis process;
The analysis processing generation unit generating a new analysis processing from the analysis result selected by the inference rule control unit;
And a step of causing the analysis execution management section to cause the analysis execution section to execute the analysis process generated by the analysis process generation section.

An input unit for receiving input from the user;
An analysis result display unit for outputting an analysis result display screen showing one analysis result,
The method
The analysis result display unit including a feature amount designation area in which a user can designate a value of the feature amount of the output analysis result, and outputting the analysis result display screen;
The analysis result display unit includes a step of storing the value designated in the feature quantity designation area in the storage area as the feature quantity of the output analysis result. Data analysis method.

An input unit for receiving input from the user;
An analysis result that generates a list of analysis results obtained by extracting analysis results that match a predetermined condition from the analysis results stored in the storage area, and that outputs an analysis result list display screen showing the list of the generated analysis results A display unit,
The method
The analysis result display unit outputs a weighting designation area in which weighting can be designated from the input unit for each analysis result evaluation unit that has calculated the feature amount;
The analysis result display unit weights the feature amount based on the weighting designated in the weighting designation region,
The data analysis according to claim 9, wherein the analysis result display unit includes a step of rearranging the analysis results displayed on the analysis result list display screen in the order of the weighted feature quantities. Method.

An analysis script input receiving unit for receiving an input from a user of an analysis script describing the analysis process;
The method
The analysis script input receiving unit causing the analysis execution unit to execute the analysis processing by the analysis script that has received the input;
The data analysis method according to claim 9, further comprising a step in which the analysis script input reception unit stores an analysis result of the analysis script that has received the input in the storage area.

When the analysis process generation unit generates the new analysis process, the feature value of the analysis result that is a generation source of the new analysis process is set as the feature value of the analysis result by the new analysis process. A step to set the predicted value;
The data analysis method according to claim 9, further comprising: a step in which the analysis processing management unit determines a processing order of the analysis processing generated by the analysis processing generation unit based on the predicted value.

The analysis processing management unit or the analysis execution unit, merging the analysis script describing the plurality of analysis processing into one script;
The data analysis method according to claim 9, further comprising: executing a plurality of analysis processes by executing the merged script by the analysis execution unit.

The analysis processing generation unit includes the step of determining the number of analysis processes to be generated based on the weight set for each inference rule unit and the number of execution processes not yet executed by the analysis execution unit. The data analysis method according to claim 9.

In the step of calculating the feature value by the analysis result evaluation unit, the feature value is calculated based on a rule corresponding to an analysis process generated by the analysis process generation unit belonging to the same inference rule unit,
In the step in which the inference rule control unit selects a candidate for the generation source of the new analysis process, the same inference rule as the analysis process generation unit that generates the new analysis process among the analysis results stored in the storage area The data analysis method according to claim 9, wherein an analysis result in which a feature amount calculated by the analysis result evaluation unit belonging to a condition satisfies a predetermined condition is selected as a candidate for a generation source of the new analysis process.