JP2017037511A

JP2017037511A - Analyzer, analyzing method, and analyzing program

Info

Publication number: JP2017037511A
Application number: JP2015158857A
Authority: JP
Inventors: 健史小沢; Tsuyoshi Ozawa; 真鬼塚; Makoto Onizuka
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2015-08-11
Filing date: 2015-08-11
Publication date: 2017-02-16

Abstract

PROBLEM TO BE SOLVED: To provide an analyzer capable of carrying out a series of analysis processing at a high speed by reducing the number of access to a table storing records of analyzing objects even when attributes of plural selection conditions are different from each other.SOLUTION: A multi query generation section 51 receives specifications of a plurality of selection conditions each having an attribute different from each other, and generates a query for each of the selection conditions. A query tree processing section 522 determines whether a record stored in the storage 4 matches with the respective selection conditions corresponding to the plural queries generated by the multi query generation section 51 and analyzes the record which matches with the selection condition on each selection condition.SELECTED DRAWING: Figure 1

Description

本発明は、分析装置、分析方法、および分析プログラムに関する。 The present invention relates to an analysis apparatus, an analysis method, and an analysis program.

近年、ビッグデータと呼ばれる大量のデータから有益な情報を抽出する技術の重要性が増してきている。例えば、販売データの地域や時期の影響を考慮して販売戦略を決める場合に、グループ化演算（group-by）と集約関数（aggregation）とにより表わされる分析命令に対し、全体データの分析結果からの乖離が大きい分析結果を生み出す部分データが有益である場合が多い。 In recent years, the importance of technology for extracting useful information from a large amount of data called big data has increased. For example, when the sales strategy is decided in consideration of the influence of the region and time of sales data, the analysis result expressed by the group data (group-by) and the aggregation function (aggregation) is used. In many cases, partial data that produces analysis results with a large discrepancy between the two is useful.

ここで、部分データとは、属性Ａの値がｌである場合に、Ａ＝ｌで表される選択条件により選択処理を行うクエリにより、全体データから抽出されるレコード群を意味する。また、乖離が大きい分析結果を生み出す部分データとは、次のように定義される。すなわち、全体データであるテーブルＤに対して分析命令ｑを実行した結果をｑ（Ｄ）と表し、テーブルＤの任意の部分データｖ（Ｄ）に対して分析命令ｑを実行した結果をｑ＊ｖ（Ｄ）と表す。この場合に、乖離が大きい分析結果を生み出す部分データとは、ｑ（Ｄ）とｑ＊ｖ（Ｄ）との乖離が最大となるｖ（Ｄ）を意味し、次式（１）のように定義される。 Here, the partial data means a group of records extracted from the entire data by a query that performs selection processing according to the selection condition represented by A = 1 when the value of the attribute A is l. In addition, partial data that produces analysis results with a large discrepancy are defined as follows. That is, the result of executing the analysis instruction q on the table D as the entire data is represented as q (D), and the result of executing the analysis instruction q on the arbitrary partial data v (D) of the table D is represented by q *. v (D). In this case, the partial data that produces an analysis result with a large divergence means v (D) that maximizes the divergence between q (D) and q * v (D). Defined.

ここで、Ｕ（Ａ，Ｂ）とは、ＡとＢとの乖離を算出する関数である。このＵ（Ａ，Ｂ）は、入力データＤに対して差分計算が可能な関数とする。例えば、ユークリッド距離は、次式（２）に示すように、入力データをＤ、差分入力データをΔｄとしたときに、ＤとΔｄとを分配可能、すなわち、交換法則と結合法則とが成立する関数であり、乖離を算出する関数として利用することができる。 Here, U (A, B) is a function for calculating the difference between A and B. U (A, B) is a function that can calculate a difference with respect to input data D. For example, the Euclidean distance can be distributed between D and Δd when the input data is D and the difference input data is Δd as shown in the following equation (2), that is, the exchange law and the coupling law are established. It is a function and can be used as a function for calculating the deviation.

上記のように表される乖離が大きい分析結果を生み出す部分データを特定するためには、任意の部分データｖ（Ｄ）に対する分析命令ｑについて、ｑ（Ｄ）とｑ＊ｖ（Ｄ）との乖離の度合いを算出する必要があるが、これには膨大な時間を要する。 In order to specify the partial data that produces the analysis result with a large difference expressed as described above, for the analysis command q for the arbitrary partial data v (D), q (D) and q * v (D) It is necessary to calculate the degree of divergence, but this takes an enormous amount of time.

従来、複数の選択処理を行うクエリを実行する場合には、各クエリがテーブルＤにアクセスするため、実行するクエリ件数の増加に伴い、線形に実行時間が増加するという問題があった。そこで、複数の選択処理の選択条件の属性Ａが同一である場合に、選択処理を共有化してテーブルに対するアクセス回数を削減する技術が知られている。例えば、order by句を用いて選択処理で指定される選択条件の属性Ａが同一である複数の選択処理のクエリを１つのクエリに等価変換して、データベース管理システム（ＤＢＭＳ）を用いてクエリの処理が行われている（非特許文献１参照）。 Conventionally, when a query that performs a plurality of selection processes is executed, each query accesses the table D. Therefore, there is a problem that the execution time increases linearly as the number of queries to be executed increases. Therefore, a technique is known that reduces the number of accesses to the table by sharing the selection process when the attribute A of the selection conditions of the plurality of selection processes is the same. For example, by performing equivalent conversion of a plurality of selection processing queries having the same selection condition attribute A specified in the selection processing using the order by clause into one query, the database management system (DBMS) is used to Processing is performed (see Non-Patent Document 1).

水野陽平，岸田吉弘，荒瀬由紀，本庄利守，鬼塚真，「有用性が高い分析結果を生み出す部分データの効率的探索」，2015年，DEIM Forum 2015 D6-4Yohei Mizuno, Yoshihiro Kishida, Yuki Arase, Toshimori Honjo, Makoto Onizuka, “Efficient search of partial data that produces highly useful analysis results”, 2015, DEIM Forum 2015 D6-4

しかしながら、複数の選択処理で指定される各選択条件の属性Ａが異なる場合には、選択処理を共有化することができず、テーブルに対するアクセス回数を削減することができなかった。 However, when the attribute A of each selection condition specified in a plurality of selection processes is different, the selection process cannot be shared, and the number of accesses to the table cannot be reduced.

本発明は、上記に鑑みてなされたものであって、複数の選択条件の属性が異なる場合にも、分析対象のレコードが格納されるテーブルへのアクセス回数を削減して分析処理を高速化することを目的とする。 The present invention has been made in view of the above, and reduces the number of accesses to a table in which records to be analyzed are stored even when attributes of a plurality of selection conditions are different, thereby speeding up analysis processing. For the purpose.

上述した課題を解決し、目的を達成するために、本発明に係る分析装置は、属性の異なる複数の選択条件の指定を受け付け、該選択条件ごとにクエリを生成する生成部と、記憶部に格納されるレコードごとに、前記生成部によって生成された複数のクエリに対応する各選択条件に合致するかをそれぞれ判定し、前記選択条件ごとに、該選択条件に合致するレコードを分析する分析部と、を備えることを特徴とする。 In order to solve the above-described problems and achieve the object, the analysis apparatus according to the present invention accepts designation of a plurality of selection conditions having different attributes, and generates a query for each selection condition, and a storage unit An analysis unit that determines whether each selection condition corresponding to a plurality of queries generated by the generation unit is satisfied for each record to be stored, and analyzes a record that matches the selection condition for each selection condition And.

本発明によれば、複数の選択条件の属性が異なる場合にも、分析対象のレコードが格納されるテーブルへのアクセス回数を削減して分析処理を高速化することができる。 According to the present invention, even when the attributes of a plurality of selection conditions are different, the number of accesses to a table storing records to be analyzed can be reduced to speed up the analysis process.

図１は、本発明の一実施形態に係る分析装置の概略構成を示す模式図である。FIG. 1 is a schematic diagram showing a schematic configuration of an analyzer according to an embodiment of the present invention. 図２は、本実施形態の分析処理の対象のテーブルのデータ構成を例示する図である。FIG. 2 is a diagram illustrating a data configuration of the analysis target table according to the present embodiment. 図３は、本実施形態のクエリ木を例示する模式図である。FIG. 3 is a schematic diagram illustrating the query tree of this embodiment. 図４は、本実施形態の複数選択処理を説明するための説明図である。FIG. 4 is an explanatory diagram for explaining a multiple selection process according to the present embodiment. 図５は、本実施形態のクエリ木処理部の機能を説明するための説明図である。FIG. 5 is an explanatory diagram for explaining the function of the query tree processing unit of the present embodiment. 図６は、本実施形態の部分データの分析結果と全体データの分析結果との乖離の度合いを説明するための説明図である。FIG. 6 is an explanatory diagram for explaining the degree of deviation between the analysis result of partial data and the analysis result of overall data according to the present embodiment. 図７は、本実施形態の複数選択処理が未処理のレコード群における販売金額のヒストグラムを例示する図である。FIG. 7 is a diagram illustrating an example of a sales amount histogram in a record group that has not been subjected to the multiple selection processing according to the present embodiment. 図８は、本実施形態の複数選択処理が未処理のレコード群における顧客年齢のヒストグラムを例示する図である。FIG. 8 is a diagram illustrating a histogram of customer ages in a record group that has not been subjected to the multiple selection process of the present embodiment. 図９は、本実施形態の複数選択処理手順を示すフローチャートである。FIG. 9 is a flowchart showing the multiple selection processing procedure of the present embodiment. 図１０は、本実施形態の乖離判定処理手順を示すフローチャートである。FIG. 10 is a flowchart showing the deviation determination processing procedure of the present embodiment. 図１１は、分析プログラムを実行するコンピュータを示す図である。FIG. 11 is a diagram illustrating a computer that executes an analysis program.

以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. In addition, this invention is not limited by this embodiment. Moreover, in description of drawing, the same code | symbol is attached | subjected and shown to the same part.

［分析装置］
図１は、本実施形態に係る分析装置の概略構成を示す模式図である。分析装置１は、ワークステーションやパソコン等の汎用コンピュータで実現され、入力部２と、出力部３と、記憶部４と、制御部５とを備える。 [Analysis equipment]
FIG. 1 is a schematic diagram illustrating a schematic configuration of an analyzer according to the present embodiment. The analysis device 1 is realized by a general-purpose computer such as a workstation or a personal computer, and includes an input unit 2, an output unit 3, a storage unit 4, and a control unit 5.

入力部２は、キーボードやマウス等の入力デバイスを用いて実現され、データ分析者による入力操作に対応して、制御部５に対して処理開始などの各種指示情報を入力する。出力部３は、液晶ディスプレイなどの表示装置、プリンター等の印刷装置、情報通信装置等によって実現され、後述する分析処理の結果等をデータ分析者に対して出力する。 The input unit 2 is realized by using an input device such as a keyboard or a mouse, and inputs various instruction information such as processing start to the control unit 5 in response to an input operation by a data analyst. The output unit 3 is realized by a display device such as a liquid crystal display, a printing device such as a printer, an information communication device, and the like, and outputs a result of analysis processing described later to a data analyst.

記憶部４は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現され、後述する分析処理の対象のテーブルＤを格納する。記憶部４は、ＬＡＮやインターネットなどの電気通信回線を介して制御部５と通信する構成としてもよい。 The storage unit 4 is realized by a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores a table D to be analyzed later. The storage unit 4 may be configured to communicate with the control unit 5 via an electric communication line such as a LAN or the Internet.

図２は、テーブルＤのデータ構成を例示する図である。図２に示すように、テーブルＤでは、各レコードについて、複数の属性Ａの各属性値ｌが特定されている。図２には、受注のテーブルＤが示されており、属性Ａとして、受注番号、販売金額、受注年月、店舗名、商品ジャンル、顧客年齢、および顧客性別が含まれている。そして、例えば受注番号１のレコードについて、販売金額が８０００円、受注年月が２０１５年４月、店舗名が大阪、商品ジャンルが衣服、顧客年齢が４５、顧客性別が男性であることが例示されている。 FIG. 2 is a diagram illustrating a data configuration of the table D. As illustrated in FIG. As shown in FIG. 2, in the table D, each attribute value 1 of the plurality of attributes A is specified for each record. FIG. 2 shows an order table D, and the attribute A includes an order number, sales amount, order date, store name, product genre, customer age, and customer sex. For example, for the record of order number 1, the sales amount is 8000 yen, the order date is April 2015, the store name is Osaka, the product genre is clothing, the customer age is 45, and the customer gender is male. ing.

制御部５は、ＣＰＵ（Central Processing Unit）等の演算処理装置がメモリに記憶された処理プログラムを実行することにより、図１に例示するように、複数クエリ生成部５１、複数クエリ実行部５２として機能する。 As illustrated in FIG. 1, the control unit 5 performs a plurality of query generation units 51 and a plurality of query execution units 52 by executing a processing program stored in a memory by an arithmetic processing unit such as a CPU (Central Processing Unit). Function.

複数クエリ生成部５１は、属性の異なる複数の選択条件の指定を受け付け、該選択条件ごとにクエリを生成する。具体的に、複数クエリ生成部５１は、データ分析者が入力した分析命令の対象のテーブルＤの属性Ａに基づいて、属性を特定する選択条件により部分データを取得する選択処理を実行する複数のクエリを生成する。 The multiple query generation unit 51 receives designation of a plurality of selection conditions having different attributes, and generates a query for each of the selection conditions. Specifically, the multiple query generation unit 51 executes a selection process for acquiring partial data according to a selection condition for specifying an attribute, based on the attribute A of the table D that is the target of the analysis instruction input by the data analyst. Generate a query.

例えば、次式（３）に例示する複数の選択条件に対応して、複数のクエリが生成される。本実施形態では、顧客性別を男性とする選択条件１に合致する部分データを取得するためのクエリ、顧客年齢を４０代とする選択条件２に合致する部分データを取得するためのクエリ、商品ジャンルを衣服とする選択条件３に合致する部分データを取得するためのクエリ、商品ジャンルを雑貨とする選択条件４に合致する部分データを取得するためのクエリ等が生成される。 For example, a plurality of queries are generated corresponding to a plurality of selection conditions exemplified in the following formula (3). In the present embodiment, a query for acquiring partial data that matches the selection condition 1 with the customer gender as male, a query for acquiring partial data that matches the selection condition 2 with the customer age 40s, and the product genre A query for acquiring partial data that matches the selection condition 3 with clothes as a clothing, a query for acquiring partial data that matches the selection condition 4 with goods genre as miscellaneous goods, and the like are generated.

複数クエリ実行部５２は、クエリ木構築部５２１と、クエリ木処理部５２２とを含む。クエリ木構築部５２１は、複数クエリ生成部５１によって生成されたクエリから、各選択条件で指定される属性Ａと属性値ｌとをノードとするクエリ木を構築する。具体的に、クエリ木構築部５２１は、複数クエリ生成部５１が生成した複数のクエリを、クエリ木として、指定された選択条件をキー情報としてキャッシュメモリに記憶する。 The multiple query execution unit 52 includes a query tree construction unit 521 and a query tree processing unit 522. The query tree construction unit 521 constructs a query tree having the attribute A and the attribute value l specified by each selection condition as nodes from the query generated by the multiple query generation unit 51. Specifically, the query tree construction unit 521 stores the plurality of queries generated by the multiple query generation unit 51 as a query tree and the specified selection condition in the cache memory as key information.

図３は、クエリ木の構成を例示する図である。図３に示すように、クエリ木では、各選択条件が属性Ａとその属性値ｌとを接続するパスで表されている。図３には、顧客性別を男性とする選択条件１と、顧客年齢を４０代とする選択条件２と、商品ジャンルを衣服とする選択条件３と、商品ジャンルを雑貨とする選択条件４との４つの選択条件からなるクエリ木が例示されている。 FIG. 3 is a diagram illustrating a configuration of a query tree. As shown in FIG. 3, in the query tree, each selection condition is represented by a path connecting the attribute A and its attribute value l. FIG. 3 shows selection condition 1 for male sex, selection condition 2 for customer age 40, selection condition 3 for product genre, and selection condition 4 for product genre as miscellaneous goods. A query tree composed of four selection conditions is illustrated.

クエリ木処理部５２２は、複数選択処理と、乖離判定処理とを含む分析処理を行う。複数選択処理では、クエリ木処理部５２２が、記憶部４に格納されるテーブルＤのレコードごとに、複数クエリ生成部５１によって生成された複数のクエリに対応する各選択条件に合致するかをそれぞれ判定し、選択条件ごとに、該選択条件に合致するレコードに対する分析命令ｑを実行する。これにより、各選択条件に合致するレコードの集合である部分データの分析結果を取得する。 The query tree processing unit 522 performs analysis processing including multiple selection processing and divergence determination processing. In the multiple selection process, the query tree processing unit 522 determines for each record of the table D stored in the storage unit 4 whether the selection conditions corresponding to the multiple queries generated by the multiple query generation unit 51 are met. For each selection condition, the analysis instruction q for the record that matches the selection condition is executed. As a result, an analysis result of partial data that is a set of records that match each selection condition is acquired.

具体的に、複数選択処理では、クエリ木処理部５２２が、記憶部４の分析命令ｑの対象のテーブルＤを参照し、レコードごとに、クエリ木の選択条件と一致する属性Ａの属性値ｌがあるか否かを確認する。一致する場合に、クエリ木処理部５２２は、同じ属性値ｌをもつ複数選択処理が処理済みのレコードに対して分析命令ｑを実行し、当該レコードについての分析結果を適当なメモリ領域に記憶させる。クエリ木処理部５２２は、以上の処理をテーブルＤの全レコードについて行う。 Specifically, in the multiple selection process, the query tree processing unit 522 refers to the table D that is the target of the analysis instruction q in the storage unit 4, and the attribute value l of the attribute A that matches the query tree selection condition for each record. Check if there is any. If they match, the query tree processing unit 522 executes the analysis instruction q for the record that has been subjected to the multiple selection processing having the same attribute value l, and stores the analysis result for the record in an appropriate memory area. . The query tree processing unit 522 performs the above processing for all the records in the table D.

ここで、図２〜５を参照して、例えば、分析命令ｑが次式（４）で表される場合、すなわち、受注のテーブルＤについて、受注年月ごとに販売金額の合計を算出することである場合について説明する。 Here, referring to FIGS. 2 to 5, for example, when the analysis instruction q is expressed by the following equation (4), that is, for the order table D, the total sales amount is calculated for each order year and month. The case where it is is demonstrated.

この場合に、図２に示した受注のテーブルＤのうち、受注番号１のレコードについて、図３に示したクエリ木の選択条件１〜３に一致する。そこで、クエリ木処理部５２２は、次式（５）に示すように、クエリ木の選択条件１〜３について、分析命令ｑを実行し、２０１５年４月の販売金額に８０００円を計上して販売金額の合計を８０００円とする。そして、クエリ木処理部５２２は、図４に例示するように、この分析結果をクエリ木のパスすなわち該当する選択条件１〜３のそれぞれに対応づけて記憶させる。 In this case, the order number 1 record in the order table D shown in FIG. 2 matches the selection conditions 1 to 3 of the query tree shown in FIG. Therefore, as shown in the following equation (5), the query tree processing unit 522 executes the analysis instruction q for the query tree selection conditions 1 to 3, and records 8000 yen in the sales amount in April 2015. The total sales amount is 8,000 yen. Then, as illustrated in FIG. 4, the query tree processing unit 522 stores the analysis result in association with each path of the query tree, that is, the corresponding selection conditions 1 to 3.

次に、受注番号２のレコードについて属性Ａの値ｌを確認すると、図３に示したクエリ木の選択条件１〜４のいずれにも一致しない。そこで、検索部５２２は、受注番号２のレコードについては検索から外し、次のレコードに処理を移す。 Next, when the value 1 of the attribute A is confirmed for the record of the order number 2, it does not match any of the selection conditions 1 to 4 of the query tree shown in FIG. Therefore, the search unit 522 removes the record of the order number 2 from the search, and moves the process to the next record.

次に、受注番号３のレコードについて属性Ａの値ｌを確認すると、図３に示したクエリ木の選択条件１にのみ一致する。そこで、クエリ木処理部５２２は、次式（６）に示すように、クエリ木の選択条件１についてのみ分析命令ｑを実行し、２０１５年４月の販売金額に４０００円を計上して販売金額の合計を１２０００円とする。そして、クエリ木処理部５２２は、図５に例示するように、この分析結果をクエリ木のパスすなわち該当する選択条件１と対応づけて記憶させる。なお、クエリ木の選択条件２，３については、２０１５年４月の販売金額の合計は８０００円のままとしている。 Next, when the value 1 of the attribute A is confirmed for the record with the order number 3, only the selection condition 1 of the query tree shown in FIG. Therefore, as shown in the following equation (6), the query tree processing unit 522 executes the analysis instruction q only for the selection condition 1 of the query tree, and adds 4000 yen to the sales amount in April 2015 to calculate the sales amount. The total of 12,000 yen. Then, as illustrated in FIG. 5, the query tree processing unit 522 stores the analysis result in association with the path of the query tree, that is, the corresponding selection condition 1. For the query tree selection conditions 2 and 3, the total sales amount in April 2015 remains 8000 yen.

このように、複数選択処理によれば、クエリ木処理部５２２は、テーブルＤのレコードごとに、複数の属性Ａの異なる選択条件により指定された複数の選択処理のクエリを一度に実行した後に分析命令ｑを実行し、各選択条件に合致する部分データの分析結果を得る。これにより、分析処理を高速化することができる。 As described above, according to the multiple selection processing, the query tree processing unit 522 performs analysis after executing a plurality of selection processing queries specified by different selection conditions of the plurality of attributes A at a time for each record of the table D. The instruction q is executed, and an analysis result of partial data that matches each selection condition is obtained. Thereby, the analysis process can be speeded up.

乖離判定処理では、クエリ木処理部５２２が、上記したレコードごとの複数選択処理がテーブルＤの全てのレコードに対して完了する前に、各選択条件に合致する部分データの分析結果と全体データの分析結果との乖離の度合いを算出し、該乖離の度合いが最大となり得ない選択条件について、以降のレコードに対する複数選択処理を中止する。 In the divergence determination process, the query tree processing unit 522 performs analysis of partial data that matches each selection condition and the entire data before the above-described multiple selection process for each record is completed for all the records in the table D. The degree of divergence from the analysis result is calculated, and the multiple selection processing for the subsequent records is canceled for the selection condition where the degree of divergence cannot be maximized.

ここで、図６を参照して、部分データの分析結果と全体データの分析結果との乖離の度合いについて説明する。図６には、上記式（３）に示した選択条件１〜３に対応する部分データの上記式（４）に示した分析命令ｑを実行した分析結果の、テーブルＤの全体データの分析結果からの乖離度合いすなわちユークリッド距離が示されている。図６のｘ軸は受注年月を示し、ｙ軸は正規化された販売金額を示す。 Here, with reference to FIG. 6, the degree of deviation between the analysis result of the partial data and the analysis result of the entire data will be described. FIG. 6 shows an analysis result of the entire data of the table D as an analysis result of executing the analysis instruction q shown in the above formula (4) of the partial data corresponding to the selection conditions 1 to 3 shown in the above formula (3). The degree of deviation from the distance, that is, the Euclidean distance is shown. The x-axis in FIG. 6 indicates the order date and the y-axis indicates the normalized sales amount.

ここで、全体データの分析結果と選択条件１の部分データの分析結果とのユークリッド距離は、各月のユークリッド距離の合計で表される（上記式（２）参照）。また、各月のユークリッド距離は、各月の正規化された販売金額の差で表される。 Here, the Euclidean distance between the analysis result of the entire data and the analysis result of the partial data of the selection condition 1 is represented by the sum of the Euclidean distances for each month (see the above formula (2)). Further, the Euclidean distance for each month is represented by the difference in the normalized sales amount for each month.

図６において、各月の全商品の正規化された販売金額と男性の正規化された販売金額との差、すなわち各月のユークリッド距離Ｕ（全商品、男性）が、２０１５年１月が０．１、２月が０．０４、３月が０．１９、４月が０．０２であったとする。この場合に、全体データの分析結果と選択条件１の部分データの分析結果とのユークリッド距離Ｕ（全商品、男性）は、２０１５年１月のユークリッド距離０．１と、２月のユークリッド距離０．０４と、３月のユークリッド距離０．１９と、４月のユークリッド距離０．０２とを合計した０．３５と算出される。同様に、全体データの分析結果と選択条件２の部分データの分析結果とのユークリッド距離Ｕ（全商品、４０代）は、０．３０と算出される。また、全体データの分析結果と選択条件３の部分データの分析結果とのユークリッド距離Ｕ（全商品、衣服）は、０．７４と算出される。したがって、次式（７）を満たす乖離度合いが最大の部分データは、選択条件３の衣類の部分データであることがわかる。 In FIG. 6, the difference between the normalized sales price of all products in each month and the normalized sales price of men, that is, the Euclidean distance U (all products, men) of each month is 0 in January 2015. Suppose that January and February are 0.04, March is 0.19, and April is 0.02. In this case, the Euclidean distance U (all products, male) between the analysis result of the entire data and the analysis result of the partial data of the selection condition 1 is Euclidean distance 0.1 in January 2015 and Euclidean distance 0 in February. .04, March Euclidean distance 0.19 and April Euclidean distance 0.02 are calculated as 0.35. Similarly, the Euclidean distance U (all products, 40s) between the analysis result of the entire data and the analysis result of the partial data of the selection condition 2 is calculated as 0.30. Further, the Euclidean distance U (all products, clothes) between the analysis result of the entire data and the analysis result of the partial data of the selection condition 3 is calculated as 0.74. Therefore, it can be understood that the partial data with the maximum deviation degree satisfying the following expression (7) is the partial data of the clothing of the selection condition 3.

本実施形態のクエリ木処理部５２２は、乖離の度合いを算出する際、全てのレコードの統計情報と、レコードごとの複数選択処理がテーブルＤの全てのレコードに対して完了する前に取得した、各選択条件に合致する処理済みのレコードの統計情報とを用いて、各選択条件に合致する未処理のレコードの統計情報を導出し、該未処理のレコードの統計情報を用いて乖離の度合いを算出する。 When calculating the degree of divergence, the query tree processing unit 522 of the present embodiment acquires the statistical information of all records and the multiple selection process for each record before completing all the records in the table D. The statistical information of the unprocessed record that matches each selection condition is derived using the statistical information of the processed record that matches each selection condition, and the degree of deviation is calculated using the statistical information of the unprocessed record. calculate.

具体的に、クエリ木処理部５２２は、乖離判定処理に先立って、予めテーブルＤの全レコード群すなわち全体データについて、分析命令ｑの対象の属性Ａの値ｌの分布を示すヒストグラムと、各部分データの選択条件で指定されている属性Ａの値ｌの分布を示すヒストグラムとを作成しておく。例えば、上記式（４）に示した分析命令ｑの対象の属性すなわち販売金額の値の分布を示すヒストグラムを作成しておく。また、上記式（３）に示した選択条件１の属性すなわち顧客性別、選択条件２の属性すなわち顧客年齢、および選択条件３の属性すなわち商品ジャンルの各属性値ｌの分布を示すヒストグラムを作成しておく。 Specifically, prior to the divergence determination process, the query tree processing unit 522 has a histogram showing the distribution of the value 1 of the attribute A subject to the analysis instruction q for all record groups in the table D, that is, the entire data, and each part. A histogram indicating the distribution of the value l of the attribute A specified by the data selection condition is created. For example, a histogram showing the distribution of the attribute of the analysis command q shown in the above formula (4), that is, the value of the sales amount, is created. Further, a histogram showing the distribution of each attribute value 1 of the selection condition 1 attribute shown in the above formula (3), that is, the customer gender, the selection condition 2 attribute, ie, the customer age, and the selection condition 3, ie, the product genre Keep it.

乖離判定処理では、クエリ木処理部５２２は、まず、上記複数選択処理が処理済みのレコード群Ｄ１について、各選択条件に合致する部分データの分析結果（以下、ｑ＊選択条件（Ｄ１）とも記す）と、Ｄ１の分析結果ｑ（Ｄ１）との乖離の度合いを算出する。この場合に、乖離の度合いとして、例えば、次式（８）に示すようにユークリッド距離が算出されたものとする。 In the divergence determination process, the query tree processing unit 522 first describes the analysis result of partial data that matches each selection condition (hereinafter referred to as q * selection condition (D1)) for the record group D1 that has been subjected to the multiple selection process. ) And the analysis result q (D1) of D1 is calculated. In this case, as the degree of deviation, for example, it is assumed that the Euclidean distance is calculated as shown in the following equation (8).

ここで、乖離の度合いを算出する関数に対して、次式（９）が成立する。 Here, the following equation (9) is established for the function for calculating the degree of deviation.

そこで、クエリ木処理部５２２は、レコードごとの複数選択処理が未処理のレコード群Ｄ２の各属性値ｌのヒストグラムを用いて、該未処理のレコード群Ｄ２に対して複数選択処理を行った場合の、各選択条件に合致する部分データの分析結果と全体データの分析結果との乖離の度合いの上限値および下限値を算出する。なお、説明の簡略化のため、分析命令ｑの対象の属性値ｌは正の値をとるものとする。ただし、分析命令ｑの対象の属性値ｌが負の値をとる場合には、値域を正に変換することにより、同様に扱うことが可能となる。 Therefore, when the query tree processing unit 522 performs the multiple selection process on the unprocessed record group D2 using the histogram of the attribute values 1 of the record group D2 in which the multiple selection process for each record is not processed, The upper limit value and the lower limit value of the degree of deviation between the analysis result of the partial data that matches each selection condition and the analysis result of the entire data are calculated. For simplification of explanation, it is assumed that the target attribute value l of the analysis instruction q takes a positive value. However, when the target attribute value l of the analysis instruction q takes a negative value, it can be handled in the same manner by converting the value range to positive.

クエリ木処理部５２２は、まず、レコードごとの複数選択処理が処理済みのレコード群Ｄ１の各属性Ａの値ｌの分布を示すヒストグラムを作成する。そして、全体データのヒストグラムから、該処理済みのレコード群Ｄ１のヒストグラムを減算することにより、未処理のレコード群Ｄ２の各属性値ｌのヒストグラムを導出する。例えば、図７は、未処理のレコード群Ｄ２についての、分析命令ｑの対象の属性Ａすなわち販売金額の値の分布を例示するヒストグラムである。また、図８は、未処理のレコード群Ｄ２についての、顧客年齢の値の分布を例示するヒストグラムである。 First, the query tree processing unit 522 creates a histogram indicating the distribution of the value l of each attribute A of the record group D1 that has been subjected to the multiple selection processing for each record. Then, the histogram of each attribute value l of the unprocessed record group D2 is derived by subtracting the histogram of the processed record group D1 from the histogram of the entire data. For example, FIG. 7 is a histogram illustrating the distribution of the value of the attribute A that is the object of the analysis command q, that is, the sales amount, for the unprocessed record group D2. FIG. 8 is a histogram illustrating the distribution of customer age values for the unprocessed record group D2.

クエリ木処理部５２２は、未処理のレコード群Ｄ２のヒストグラムから、各選択条件の属性Ａの値ｌをもつレコード数を取得する。例えば、図８によれば、選択条件２すなわち顧客年齢が４０代の未処理のレコード数が２件であることがわかる。 The query tree processing unit 522 acquires the number of records having the value 1 of the attribute A of each selection condition from the histogram of the unprocessed record group D2. For example, according to FIG. 8, it can be seen that the number of unprocessed records for selection condition 2, that is, customer age 40s, is two.

ここで取得された未処理のレコードが最大の値をとった場合に、未処理のレコード群Ｄ２の分析結果と未処理のレコード群Ｄ２のうちの各選択条件に合致する部分データの分析結果との乖離の度合いが上限値となる。そこで、クエリ木処理部５２２は、未処理のレコード群Ｄ２の分析対象の属性Ａの値ｌのヒストグラムから大きい順に未処理のレコード数の属性値ｌを抽出する。 When the unprocessed record acquired here takes the maximum value, the analysis result of the unprocessed record group D2 and the analysis result of the partial data that matches each selection condition of the unprocessed record group D2 The degree of deviation is the upper limit. Therefore, the query tree processing unit 522 extracts the attribute value l of the number of unprocessed records from the histogram of the value l of the attribute A to be analyzed in the unprocessed record group D2.

例えば、図７によれば、販売金額が大きい順に２件の属性値ｌを抽出すると、１００００円が２件である。これにより、未処理のレコード群Ｄ２の分析結果と未処理のレコード群Ｄ２のうちの選択条件２に合致する部分データの分析結果との乖離の度合いの上限値は、１００００＋１００００＝２００００と算出される。したがって、テーブルＤの分析結果と選択条件２に合致する部分データの分析結果との乖離の度合いの上限値は、上記式（８）のユークリッド距離を参照し、次式（１０）に示すように、１２００００＋（１００００＋１００００）＝１４００００と算出される。 For example, according to FIG. 7, if two attribute values 1 are extracted in descending order of sales amount, 10000 yen is two. Thereby, the upper limit value of the degree of deviation between the analysis result of the unprocessed record group D2 and the analysis result of the partial data that matches the selection condition 2 in the unprocessed record group D2 is calculated as 10000 + 10000 = 20000. . Therefore, the upper limit value of the degree of divergence between the analysis result of Table D and the analysis result of the partial data that matches the selection condition 2 refers to the Euclidean distance of the above equation (8), as shown in the following equation (10): , 120,000+ (10000 + 10000) = 14,000.

同様に、各選択条件の属性Ａの値ｌをもつ未処理のレコードが最小の値をとった場合に、未処理のレコード群Ｄ２の分析結果と未処理のレコード群Ｄ２のうちの各選択条件に合致する部分データの分析結果との乖離の度合いが下限値となる。そこで、クエリ木処理部５２２は、未処理のレコード群Ｄ２の分析対象の属性Ａの値ｌのヒストグラムから小さい順に未処理のレコード数の属性値を抽出する。 Similarly, when the unprocessed record having the value l of the attribute A of each selection condition takes the minimum value, the analysis result of the unprocessed record group D2 and each selection condition of the unprocessed record group D2 The degree of deviation from the analysis result of the partial data that matches is the lower limit. Therefore, the query tree processing unit 522 extracts the attribute values of the number of unprocessed records from the histogram of the value 1 of the attribute A to be analyzed in the unprocessed record group D2.

例えば、未処理のレコード群Ｄ２のうち選択条件１の属性Ａの値ｌをもつレコード数が３件だった場合に、図７によれば、販売金額が小さい順に３件の属性値ｌを抽出すると、４０００円が２件、６０００円が１件である。そこで、未処理のレコード群Ｄ２の分析結果と未処理のレコード群Ｄ２のうちの選択条件１に合致する部分データの分析結果との乖離の度合いの下限値は、４０００＋４０００＋６０００＝１４０００と算出される。したがって、テーブルＤの分析結果と選択条件１に合致する部分データの分析結果との乖離の度合いの下限値は、上記式（８）のユークリッド距離を参照し、次式（１１）に示すように、２５００００＋（４０００＋４０００＋６０００）＝２６４０００と算出される。 For example, when the number of records having the value 1 of the attribute A of the selection condition 1 in the unprocessed record group D2 is 3, according to FIG. 7, three attribute values l are extracted in ascending order of sales amount. Then, 4000 yen is 2 cases and 6000 yen is 1 case. Therefore, the lower limit value of the degree of deviation between the analysis result of the unprocessed record group D2 and the analysis result of the partial data matching the selection condition 1 in the unprocessed record group D2 is calculated as 4000 + 4000 + 6000 = 14000. Therefore, the lower limit value of the degree of deviation between the analysis result of Table D and the analysis result of the partial data that matches the selection condition 1 refers to the Euclidean distance of the above equation (8), and is expressed by the following equation (11): , 250,000+ (4000 + 4000 + 6000) = 264000.

クエリ木処理部５２２は、複数の各選択条件に合致する各部分データの分析結果と全体データの分析結果との乖離の度合いの下限値と上限値とを比較することにより、乖離の度合いが最大となり得ない選択条件を特定する。例えば、ある選択条件に合致する部分データの乖離の度合いの下限値が、他の選択条件に合致する部分データの乖離の度合いの上限値より大きい場合には、後者の乖離の度合いは最大とはなり得ない。上記の例では、選択条件２に合致する部分データの乖離の度合いの上限値が、選択条件１に合致する部分データの乖離の度合いの下限値を下回ることから、選択条件２に合致する部分データの乖離の度合いは最大値とはなり得ないことがわかる。この場合に、クエリ木処理部５２２は、以降のレコードに対する複数選択処理において、特定された選択条件２についての処理を中止する。 The query tree processing unit 522 compares the lower limit value and the upper limit value of the degree of divergence between the analysis results of the partial data and the analysis results of the whole data that match a plurality of selection conditions, so that the degree of divergence is maximized. Identify the selection conditions that cannot be. For example, if the lower limit of the degree of divergence of partial data that meets a certain selection condition is greater than the upper limit of the degree of divergence of partial data that meets other selection conditions, the latter degree of divergence is It can't be. In the above example, the upper limit value of the degree of divergence of the partial data that matches the selection condition 2 is less than the lower limit value of the degree of divergence of the partial data that matches the selection condition 1, so the partial data that matches the selection condition 2 It can be seen that the degree of deviation cannot be the maximum value. In this case, the query tree processing unit 522 stops the process for the specified selection condition 2 in the multiple selection process for subsequent records.

このように、乖離判定処理によれば、レコードごとの複数選択処理がテーブルＤの全てのレコードに対して完了する前に、未処理のレコード群Ｄ２に対して複数選択処理を行った場合に、各選択条件に合致する部分データの分析結果と全体データの分析結果との乖離の度合いが最大となり得るか否かがわかる。そこで、クエリ木処理部５２２は、各選択条件に合致する部分データの分析結果と全体データの分析結果との乖離の度合いが最大となり得ない選択条件について、以降のレコードの複数選択処理を中止する。これにより、分析処理を高速化して、全体データの分析結果との乖離の度合いが高い部分データを選択する選択条件を探索することができる。 As described above, according to the divergence determination process, when the multiple selection process for each record is performed on the unprocessed record group D2 before the multiple selection process for each record is completed for all the records in the table D, It can be seen whether or not the degree of divergence between the analysis result of the partial data that matches each selection condition and the analysis result of the entire data can be maximized. Therefore, the query tree processing unit 522 cancels the subsequent multiple selection processing of records for a selection condition in which the degree of divergence between the analysis result of partial data that matches each selection condition and the analysis result of the entire data cannot be maximized. . Thereby, it is possible to speed up the analysis process and search for a selection condition for selecting partial data having a high degree of deviation from the analysis result of the entire data.

［分析処理］
次に、図９および図１０のフローチャートを参照して、分析装置１における分析処理手順について説明する。まず、図９は、複数選択処理手順を示すフローチャートである。図９のフローチャートは、例えば、データ分析者により入力部２を介して分析開始の指示入力があったタイミングで開始となる。 [Analysis processing]
Next, the analysis processing procedure in the analyzer 1 will be described with reference to the flowcharts of FIGS. 9 and 10. First, FIG. 9 is a flowchart showing a multiple selection processing procedure. The flowchart in FIG. 9 starts at the timing when an analysis start instruction is input via the input unit 2 by the data analyst, for example.

クエリ木処理部５２２は、まず、キャッシュメモリに記憶されたクエリ木を読み込んで（ステップＳ１）、記憶部４の分析命令ｑの対象のテーブルＤを参照し、複数選択処理が未処理のレコードの有無を確認し（ステップＳ２）、未処理のレコードがある場合には（ステップＳ２，Ｙｅｓ）、当該未処理のレコードを読み込む（ステップＳ３）。このレコードの各属性Ａの値ｌが、選択条件を表すクエリ木のパスに一致すれば、クエリ木処理部５２２が、当該属性Ａの値ｌをもつ複数選択処理が処理済みのレコード群に対して分析命令を実行する（ステップＳ４）。また、クエリ木処理部５２２は、分析結果をパスに対応づけて適当なメモリ領域に記憶させる。 First, the query tree processing unit 522 reads the query tree stored in the cache memory (step S1), refers to the table D that is the target of the analysis instruction q in the storage unit 4, and selects a record that has not undergone multiple selection processing. The presence or absence is confirmed (step S2), and if there is an unprocessed record (step S2, Yes), the unprocessed record is read (step S3). If the value l of each attribute A of this record matches the path of the query tree representing the selection condition, the query tree processing unit 522 applies the multiple selection processing having the value l of the attribute A to the record group that has been processed. The analysis instruction is executed (step S4). The query tree processing unit 522 stores the analysis result in an appropriate memory area in association with the path.

一方、ステップＳ２の処理で、未処理のレコードがない場合には（ステップＳ２、Ｎｏ）、クエリ木処理部５２２は、一連の複数選択処理を終了させる。このようにして、クエリ木処理部５２２は、テーブルＤの全てのレコードについて、複数選択処理を行う。 On the other hand, when there is no unprocessed record in the process of step S2 (step S2, No), the query tree processing unit 522 ends the series of multiple selection processes. In this way, the query tree processing unit 522 performs multiple selection processing for all the records in the table D.

図１０は、乖離判定処理手順を示すフローチャートである。図１０のフローチャートは、例えば、上記の複数選択処理がテーブルＤから検索された全てのレコードに対して完了する前に、データ分析者により入力部２を介して乖離判定の指示入力があったタイミングで開始となる。 FIG. 10 is a flowchart showing the deviation determination processing procedure. The flowchart of FIG. 10 shows, for example, the timing at which a data analyzer inputs a divergence determination instruction via the input unit 2 before the above-described multiple selection processing is completed for all records retrieved from the table D. It starts with.

クエリ木処理部５２２は、まず、全体データについてのヒストグラムを読み込む（ステップＳ１１）。具体的に、分析命令ｑの対象の属性Ａの値ｌの分布を示すヒストグラムと、各部分データの選択条件で指定されている属性Ａの値ｌの分布を示すヒストグラムとが読み込まれる。 The query tree processing unit 522 first reads a histogram for the entire data (step S11). Specifically, a histogram indicating the distribution of the value 1 of the attribute A subject to the analysis instruction q and a histogram indicating the distribution of the value l of the attribute A specified by the selection condition of each partial data are read.

次に、クエリ木処理部５２２は、複数選択処理が未処理のレコード群Ｄ２の各属性Ａの値ｌのヒストグラムを導出する（ステップＳ１２）。具体的に、クエリ木処理部５２２は、まず、レコードごとの複数選択処理が処理済みのレコード群Ｄ１の各属性Ａの値ｌの分布を示すヒストグラムを作成する。そして、全体データのヒストグラムから、該処理済みのレコード群Ｄ１のヒストグラムを減算することにより、未処理のレコード群Ｄ２の各属性Ａの値ｌのヒストグラムを導出する。 Next, the query tree processing unit 522 derives a histogram of the value l of each attribute A of the record group D2 in which the multiple selection process has not been processed (step S12). Specifically, the query tree processing unit 522 first creates a histogram indicating the distribution of the value l of each attribute A of the record group D1 that has been subjected to the multiple selection process for each record. Then, the histogram of the value l of each attribute A of the unprocessed record group D2 is derived by subtracting the histogram of the processed record group D1 from the histogram of the entire data.

次に、クエリ木処理部５２２は、レコードごとの複数選択処理が未処理のレコード群Ｄ２の各属性Ａの値ｌのヒストグラムを用いて、各選択条件に合致する部分データの分析結果と全体データの分析結果との乖離の度合いの上限値および下限値を算出する。そして、乖離の度合いが最大となり得ない選択条件について、以降のレコードに対する複数選択処理をスキップする（ステップＳ１３）。 Next, the query tree processing unit 522 uses the histogram of the value 1 of each attribute A of the record group D2 in which the multiple selection processing for each record is not processed, and the analysis result of the partial data that matches each selection condition and the entire data The upper limit value and the lower limit value of the degree of deviation from the analysis result are calculated. Then, for a selection condition where the degree of divergence cannot be maximized, the multiple selection process for subsequent records is skipped (step S13).

以上、説明したように、本実施形態の分析装置１では、複数クエリ生成部５１が、属性の異なる複数の選択条件の指定を受け付け、該選択条件ごとにクエリを生成する。また、クエリ木処理部５２２が、記憶部４に格納されるレコードごとに、複数クエリ生成部５１によって生成された複数のクエリに対応する各選択条件に合致するかをそれぞれ判定し、選択条件ごとに、該選択条件に合致するレコードを分析する複数選択処理を行う。 As described above, in the analysis device 1 according to the present embodiment, the multiple query generation unit 51 accepts designation of a plurality of selection conditions having different attributes, and generates a query for each selection condition. In addition, the query tree processing unit 522 determines, for each record stored in the storage unit 4, whether or not each of the selection conditions corresponding to the plurality of queries generated by the multiple query generation unit 51 matches, and for each selection condition In addition, a multiple selection process for analyzing records that meet the selection condition is performed.

このように、クエリ木処理部５２２は、テーブルＤのレコードごとに、複数の属性Ａの異なる選択条件により指定された複数の選択処理のクエリを一度に実行した後に分析命令ｑを実行し、各選択条件に合致する部分データの分析結果を得る。これにより、テーブルＤへのアクセス回数を削減して分析処理を高速化することができる。 As described above, the query tree processing unit 522 executes the analysis instruction q after executing a plurality of selection processing queries specified by different selection conditions of the plurality of attributes A for each record of the table D at a time. An analysis result of partial data that matches the selection condition is obtained. As a result, the number of accesses to the table D can be reduced to speed up the analysis process.

また、クエリ木処理部５２２は、上記のレコードごとの複数選択処理がテーブルＤの全てのレコードに対して完了する前に、各選択条件に合致する部分データの分析結果と全体データＤの分析結果との乖離の度合いを算出し、該乖離の度合いが最大となり得ない選択条件について、以降のレコードに対する複数選択処理を中止する乖離判定処理を行う。 The query tree processing unit 522 also analyzes the partial data analysis result and the overall data D analysis result that match each selection condition before the multiple selection processing for each record is completed for all the records in the table D. The degree of divergence is calculated, and a divergence determination process for canceling a plurality of selection processes for subsequent records is performed for a selection condition in which the degree of divergence cannot be maximized.

乖離の度合いを算出する際、クエリ木処理部５２２は、全体データＤの統計情報と、レコードごとの複数選択処理がテーブルＤの全てのレコードに対して完了する前に取得した、各選択条件に合致する処理済みのレコードＤ１の統計情報とを用いて、各選択条件に合致する未処理のレコードＤ２の統計情報を導出し、該未処理のレコードＤ２の統計情報を用いて乖離の度合いを算出する。 When calculating the degree of divergence, the query tree processing unit 522 uses the statistical information of the entire data D and each selection condition acquired before the multiple selection processing for each record is completed for all the records in the table D. The statistical information of the unprocessed record D2 that matches each selection condition is derived using the statistical information of the matched processed record D1, and the degree of divergence is calculated using the statistical information of the unprocessed record D2. To do.

このように、クエリ木処理部５２２は、各選択条件に合致する部分データの分析結果と全体データの分析結果との乖離の度合いが最大となり得ない選択条件を判定し、当該選択条件について、以降のレコードの複数選択処理を中止する。これにより、分析処理を高速化して、全体データの分析結果との乖離の度合いが高い部分データを選択する選択条件を探索することができる。 In this way, the query tree processing unit 522 determines a selection condition in which the degree of divergence between the analysis result of partial data that matches each selection condition and the analysis result of the entire data cannot be maximized. Cancel the multiple selection process of the record. Thereby, it is possible to speed up the analysis process and search for a selection condition for selecting partial data having a high degree of deviation from the analysis result of the entire data.

このように、本実施形態の分析装置１によれば、分析対象のデータが格納されるテーブルＤから部分データを取得するための複数の選択処理で指定される各選択条件の属性Ａが異なる場合にも、テーブルＤへのアクセス回数を削減して分析処理を高速化することができる。 As described above, according to the analysis device 1 of the present embodiment, when the attribute A of each selection condition specified in a plurality of selection processes for acquiring partial data from the table D in which data to be analyzed is stored is different. In addition, the number of accesses to the table D can be reduced to speed up the analysis process.

［他の実施形態］
［プログラム］
上記実施形態に係る分析装置１が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。この場合、コンピュータがプログラムを実行することにより、上記実施形態と同様の効果を得ることができる。さらに、係るプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータに読み込ませて実行することにより上記実施形態と同様の処理を実現してもよい。以下に、分析装置１と同様の機能を実現する分析プログラムを実行するコンピュータの一例を説明する。 [Other Embodiments]
[program]
It is also possible to create a program in which processing executed by the analysis apparatus 1 according to the above embodiment is described in a language that can be executed by a computer. In this case, the same effect as the above-described embodiment can be obtained by the computer executing the program. Furthermore, the program similar to the above-described embodiment may be realized by recording the program on a computer-readable recording medium, and reading and executing the program recorded on the recording medium. Hereinafter, an example of a computer that executes an analysis program that realizes the same function as the analysis apparatus 1 will be described.

図１１に示すように、分析プログラムを実行するコンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有する。これらの各部は、バス１０８０によって接続される。 As shown in FIG. 11, a computer 1000 that executes an analysis program includes, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface. 1070. These units are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０３１に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１０４１に接続される。ディスクドライブ１０４１には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース１０５０には、例えば、マウス１０５１およびキーボード１０５２が接続される。ビデオアダプタ１０６０には、例えば、ディスプレイ１０６１が接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031. The disk drive interface 1040 is connected to the disk drive 1041. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041. For example, a mouse 1051 and a keyboard 1052 are connected to the serial port interface 1050. For example, a display 1061 is connected to the video adapter 1060.

ここで、図１１に示すように、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３およびプログラムデータ１０９４を記憶する。上記実施形態で説明した各テーブルは、例えばハードディスクドライブ１０３１やメモリ１０１０に記憶される。 Here, as shown in FIG. 11, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each table described in the above embodiment is stored in the hard disk drive 1031 or the memory 1010, for example.

また、分析プログラムは、例えば、コンピュータ１０００によって実行される指令が記述されたプログラムモジュール１０９３として、ハードディスクドライブ１０３１に記憶される。具体的には、上記実施形態で説明した分析装置１が実行する各処理が記述されたプログラムモジュールが、ハードディスクドライブ１０３１に記憶される。 Further, the analysis program is stored in the hard disk drive 1031 as a program module 1093 in which a command executed by the computer 1000 is described, for example. Specifically, a program module describing each process executed by the analysis apparatus 1 described in the above embodiment is stored in the hard disk drive 1031.

また、分析プログラムによる情報処理に用いられるデータは、プログラムデータ１０９４として、例えば、ハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、ハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して、上述した各手順を実行する。 Data used for information processing by the analysis program is stored as program data 1094 in, for example, the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the hard disk drive 1031 to the RAM 1012 as necessary, and executes the above-described procedures.

なお、分析プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ１０４１等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、分析プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ＬＡＮ（Local Area Network）やＷＡＮ（Wide Area Network）等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 Note that the program module 1093 and the program data 1094 related to the analysis program are not limited to being stored in the hard disk drive 1031, but are stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1041 or the like. May be. Alternatively, the program module 1093 and the program data 1094 related to the analysis program are stored in another computer connected via a network such as a LAN (Local Area Network) or a WAN (Wide Area Network), and are transmitted via the network interface 1070. It may be read by the CPU 1020.

以上、本発明者によってなされた発明を適用した実施形態について説明したが、本実施形態による本発明の開示の一部をなす記述および図面により本発明は限定されることはない。すなわち、本実施形態に基づいて当業者等によりなされる他の実施形態、実施例および運用技術等は全て本発明の範疇に含まれる。 As mentioned above, although embodiment which applied the invention made | formed by this inventor was described, this invention is not limited with the description and drawing which make a part of indication of this invention by this embodiment. That is, other embodiments, examples, operational techniques, and the like made by those skilled in the art based on this embodiment are all included in the scope of the present invention.

１分析装置
２入力部
３出力部
４記憶部
５制御部
５１複数クエリ生成部
５２複数クエリ実行部
５２１クエリ木構築部
５２２クエリ木処理部 DESCRIPTION OF SYMBOLS 1 Analyzer 2 Input part 3 Output part 4 Storage part 5 Control part 51 Multiple query production | generation part 52 Multiple query execution part 521 Query tree construction part 522 Query tree processing part

Claims

Receiving a specification of a plurality of selection conditions having different attributes, and generating a query for each of the selection conditions;
For each record stored in the storage unit, it is determined whether each selection condition corresponding to a plurality of queries generated by the generation unit is met, and the record that matches the selection condition is analyzed for each selection condition An analysis unit to
An analysis apparatus comprising:

A construction unit that constructs a query tree having the attribute and the attribute value specified by each selection condition as nodes from the query generated by the generation unit;
The analysis unit determines, for each record stored in the storage unit, whether the query tree selection condition constructed by the construction unit is met, and for each selection condition, a record that meets the selection condition The analyzer according to claim 1, wherein the analyzer is analyzed.

The analysis unit diverges between the analysis result of the record that matches each selection condition and the analysis result of all the records in the storage unit before the processing for each record is completed for all the records in the storage unit. The analysis apparatus according to claim 1, wherein the processing for the subsequent records is stopped for a selection condition in which the degree of deviation cannot be maximized.

The analysis unit, the statistical information of all the records in the storage unit, and the processed records that match each selection condition acquired before the processing for each record is completed for all the records in the storage unit The statistical information of unprocessed records that match each selection condition is derived using the statistical information of the unprocessed records, and the degree of deviation is calculated using the statistical information of the unprocessed records. The analyzer described in 1.

An analysis method executed by an analyzer,
A generation step of accepting specification of a plurality of selection conditions having different attributes and generating a query for each of the selection conditions;
For each record stored in the storage unit, it is determined whether each selection condition corresponding to a plurality of queries generated in the generation step is met, and a record that matches the selection condition is analyzed for each selection condition Analysis process to
The analysis method characterized by including.

A generation step of accepting specification of a plurality of selection conditions having different attributes and generating a query for each of the selection conditions;
For each record stored in the storage unit, it is determined whether each selection condition corresponding to the plurality of queries generated in the generation step is satisfied, and the record that matches the selection condition is analyzed for each selection condition Analysis steps to
An analysis program to make a computer execute.