JP2014157510A

JP2014157510A - Stream data processing system, method, and program

Info

Publication number: JP2014157510A
Application number: JP2013028404A
Authority: JP
Inventors: Tatsuhiro Chiba; 立寛千葉; Haruki Imai; 晴基今井; Yasushi Negishi; 康根岸; Satoshi Koseki; 聰古関; Hideaki Komatsu; 秀昭小松
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2013-02-15
Filing date: 2013-02-15
Publication date: 2014-08-28

Abstract

PROBLEM TO BE SOLVED: To increase the speed of response report generation processing for a query in a system that processes stream data coming in real time.SOLUTION: A system for processing real-time stream data is allowed to generate, when setting a query definition including a time resolution in advance, a pre-aggregation table according to the query definition; update (adding or deleting) an aggregation result for each time resolution; and perform a query to data, which is generated in a certain time range, by using the created pre-aggregation table.

Description

この発明は、リアルタイムに到来するストリーム・データを処理するための技法に関し、より詳しくは、リアルタイムに到来するストリーム・データのクエリ結果を得るための技法に関する。 The present invention relates to a technique for processing stream data arriving in real time, and more particularly to a technique for obtaining a query result of stream data arriving in real time.

例えば、道路交通シミュレーションにおいて、時々刻々と到来する車の速度やエリアなどの多数のデータに基づき、クエリ結果を迅速に得たいという要望がある。 For example, in a road traffic simulation, there is a demand for quickly obtaining a query result based on a large amount of data such as the speed and area of a vehicle that arrives every moment.

リアルタイム・データに対するクエリの例として、SQLライクな疑似コードで記述すると以下のようなものがある。
SELECT MAX(speed) from "TABLE"
TIME
FROM time.now()
TO time.now() - time.MIN(10)
GroupBy AREA
この式で、"TABLE"とは、時々刻々とハードディスクに蓄積されたリアルタイム・データのテーブルのことである。この式は、現時点から10分前までの間のデータで、最高速度を求めるものであるが、クエリが投入された時点で、全データをスキャンする、ディスクアクセスなどのコストが生じるので、クエリに対する応答レポート生成に多くの時間が必要になる。 An example of a query for real-time data is described as follows in SQL-like pseudo code.
SELECT MAX (speed) from "TABLE"
TIME
FROM time.now ()
TO time.now ()-time.MIN (10)
GroupBy AREA
In this equation, “TABLE” is a table of real-time data accumulated on the hard disk from moment to moment. This formula is the data from the current time to 10 minutes before, and the maximum speed is calculated, but when the query is entered, the cost of scanning all data, disk access, etc. occurs, so for the query It takes a lot of time to generate a response report.

もし計算の範囲(時間窓)が事前に決まっているなら、プリアグリゲーション(pre-aggregation)テーブルが事前集計でき、それによってクエリ結果生成の時間を短縮できるのだが、一般的には道路交通シミュレーションによって蓄積されるデータの場合は刻一刻と計算の範囲が変化して一意に決められないため、プリアグリゲーション・テーブルを用意できない。 If the calculation range (time window) is determined in advance, the pre-aggregation table can be pre-aggregated, thereby shortening the query result generation time. In the case of accumulated data, the pre-aggregation table cannot be prepared because the calculation range changes every moment and cannot be determined uniquely.

特開２００６−３３８４３２号公報は、ストリームデータの一部、もしくは全部複製し、不揮発性の記憶媒体にアーカイブし、リアルタイムデータとアーカイブデータをシームレスに利用できる機構と、複数のストリームデータ処理システムを連携させ、クエリ処理の性能を向上させる機構を開示する。 Japanese Patent Laid-Open No. 2006-338432 links a plurality of stream data processing systems with a mechanism capable of replicating part or all of stream data, archiving the data in a nonvolatile storage medium, and seamlessly using real-time data and archive data. And a mechanism for improving the performance of query processing is disclosed.

特開２０１０−１０８０７３号公報は、時間ウィンドウを含むクエリのメモリ使用量を一定に保ち、かつ全ての入力データを考慮した正確な計算処理を実現するストリームデータを提供するために、ストリームデータ処理サーバが、クエリ時間解像度変更部で、時間ウィンドウをより小さい幅のサブウィンドウに区切り、クエリ処理エンジンがストリームデータ受付時にサブウィンドウ単位の集計処理を実行して集約タプルを生成し、時間ウィンドウを含むクエリの計算結果を、この集約タプルに対する集計処理によって計算する技法を開示する。 Japanese Patent Laid-Open No. 2010-108073 discloses a stream data processing server in order to provide stream data that realizes accurate calculation processing in consideration of all input data while keeping the memory usage of a query including a time window constant. However, the query time resolution changer divides the time window into smaller sub-windows, and the query processing engine performs aggregation processing for each sub-window when stream data is received, generates an aggregate tuple, and calculates the query including the time window. A technique for calculating a result by an aggregation process for the aggregation tuple is disclosed.

特開２０１０−２１７９６８号公報は、時刻情報が付与されたデータを時系列順に受信し、あらかじめ登録された複数のクエリによって処理する計算機システムで、受信したデータを処理する現用系の計算機と、障害発生時に現用系の計算機の替わりに受信したデータを処理する待機系の計算機とが含まれ、現用系の計算機は、あらかじめ定義された順序で各クエリによって受信したデータを処理し、各クエリによる受信したデータの処理結果を中間データとして、所定のタイミングで、クエリごとに待機系の計算機に送信し、待機系の計算機は、現用系の計算機に障害が発生した場合には、受信したデータ及び中間データを処理することによってデータを復元することを開示する。 Japanese Patent Laid-Open No. 2010-217968 is a computer system that receives data to which time information is attached in chronological order and processes the data according to a plurality of queries registered in advance, and an active computer that processes the received data, This includes a standby computer that processes received data instead of the active computer when it occurs, and the active computer processes the data received by each query in a predefined order and receives it by each query The processing result of the processed data is sent as intermediate data to the standby computer for each query at a predetermined timing. When a failure occurs in the active computer, the standby computer receives the received data and intermediate data Disclosing data by processing the data is disclosed.

特開２０１１−５９９６７号公報は、時系列に時刻情報が付与されたストリームデータを生成し、その生成したストリームデータについて、登録されたクエリに基づいてストリームデータ処理を行う計算機システムのストリームデータの生成方法であって、計算機システムに、クエリと、ストリームデータを構成する複数種類の構成要素を表すストリーム定義とからそのクエリに対応する前記構成要素を示すクエリ情報を記憶する記憶部と、ストリームデータを生成・送信するデータ生成部と、データ生成部から送信されたストリームデータの処理を行うストリームデータ処理部と、を設け、データ生成部が、そのクエリ情報に基づいて、ストリームデータ処理部に送信するストリームデータからより少ないストリームデータを生成することを開示する。 Japanese Patent Laid-Open No. 2011-59967 generates stream data to which time information is given in time series, and generates stream data for a computer system that performs stream data processing on the generated stream data based on a registered query A method for storing, in a computer system, a storage unit that stores query information indicating a component corresponding to the query from a query and stream definitions representing a plurality of types of components constituting the stream data, and stream data. A data generation unit to generate / transmit and a stream data processing unit to process the stream data transmitted from the data generation unit are provided, and the data generation unit transmits to the stream data processing unit based on the query information Generate less stream data from stream data To disclosure.

しかし、これらの従来技術は、クエリに対する応答レポート生成の時間を短縮する問題に取り組むものではない。 However, these prior arts do not address the problem of reducing the time to generate response reports for queries.

特開２００６−３３８４３２号公報JP 2006-338432 A 特開２０１０−１０８０７３号公報JP 2010-108073 A 特開２０１０−２１７９６８号公報JP 2010-217968 A 特開２０１１−５９９６７号公報JP 2011-59967 A

この発明の目的は、リアルタイムに到来するストリーム・データを処理するためのシステムにおいて、クエリに対する応答レポート生成処理を高速化することにある。 An object of the present invention is to speed up a response report generation process for a query in a system for processing stream data coming in real time.

この発明によれば、時間に関する制約を前提とした、リアルタイム・ストリーム・データを処理するためのシステムにおいて、時間解像度を含めたクエリ定義を事前に設定する際に、クエリ定義に従いプリアグリゲーション・テーブルを生成し、アグリゲーション結果を時間解像度ごとに更新(追加・削除)し、作成したプリアグリゲーション・テーブルを利用して、ある時間範囲に生成されたデータに対してクエリを行うことを可能ならしめるようにシステムを構成することによって、上記課題が解決される。 According to the present invention, when a query definition including time resolution is set in advance in a system for processing real-time stream data based on time constraints, a pre-aggregation table is set according to the query definition. Generate, update (add / delete) the aggregation results for each time resolution, and use the created pre-aggregation table to make it possible to query the data generated in a certain time range The above problem is solved by configuring the system.

この発明に従うシステムは、予めユーザが作成した、時間解像度と、時間ウインドウ情報を含むクエリ定義に従い、プリアグリゲーション・テーブル要求を解析して、プリアグリゲーション・テーブル更新スケジューラを登録するとともに、プリアグリゲーション・テーブル更新プログラムを自動生成する。 The system according to the present invention analyzes a pre-aggregation table request and registers a pre-aggregation table update scheduler according to a query definition including a time resolution and time window information created in advance by a user, and registers a pre-aggregation table. Automatically generate an update program.

すなわち、プリアグリゲーション・テーブル更新スケジューラは、時間解像度毎に、時間解像度範囲のデータ抽出フィルタを生成し、プリアグリゲーション・テーブル更新プログラムは、センサなどからリアルタイムに到来するストリーム・データに時間解像度範囲のデータ抽出フィルタを適用することによって、集計結果のみを含むKey, Valueの値を生成し、それをプリアグリゲーション・テーブルに追加する。 That is, the pre-aggregation table update scheduler generates a data extraction filter in the time resolution range for each time resolution, and the pre-aggregation table update program reads the data in the time resolution range into stream data that arrives in real time from a sensor or the like. By applying the extraction filter, a value of Key and Value including only the aggregation result is generated and added to the pre-aggregation table.

この発明に従うシステムは、クライアント・システムからクエリを受信することに応答して、そのクエリ内の時間範囲のデータをフィルタするキーを用いたリダクション・プロセスをプリアグリゲーション・テーブルに適用することで、高速にクエリに対応するレポートを生成する。 In response to receiving a query from a client system, the system according to the present invention applies a reduction process with a key that filters data in the time range in the query to the pre-aggregation table to speed up the process. Generate a report corresponding to the query.

この発明に従うシステムは、プリアグリゲーション・テーブル更新プログラムによって、時間ウインドウの時間よりも古いデータをプリアグリゲーション・テーブルの末尾から削除し、これによって、プリアグリゲーション・テーブルのデータが増大し続けることを防ぐ。 The system according to the present invention deletes data older than the time window by the pre-aggregation table update program from the end of the pre-aggregation table, thereby preventing the data of the pre-aggregation table from continuing to increase.

以上のように、この発明によれば、集計されたエントリをもち、サイズが時間ウインドウの範囲に収まるようにエントリを削除していくようにしたプリアグリゲーション・テーブルを生成して、このプリアグリゲーション・テーブルに基づきクエリに対するレポートを生成するようにすることによって、集計されたエントリを利用して集計処理を合理化するとともに、限定されたサイズのプリアグリゲーション・テーブルをスキャンすればいいので、短時間でクエリに対応するレポートを生成できるという効果を奏する。 As described above, according to the present invention, the pre-aggregation table is generated by generating the pre-aggregation table having the aggregated entries and deleting the entries so that the size is within the time window range. By generating reports for queries based on tables, you can streamline the aggregation process using the aggregated entries and scan a limited size pre-aggregation table, so you can quickly query The effect is that a report corresponding to can be generated.

この発明を実施するためのハードウェア構成の概要を示す図である。It is a figure which shows the outline | summary of the hardware constitutions for implementing this invention. この発明を実施するための機能構成の概要を示す図である。It is a figure which shows the outline | summary of the function structure for implementing this invention. プリアグリケーション・テーブルを生成するための機能構成を示す図である。It is a figure which shows the function structure for producing | generating a pre-aggregation table. プリアグリケーション・テーブルを生成するためのフローチャートを示す図である。It is a figure which shows the flowchart for producing | generating a pre-aggregation table. プリアグリケーション・テーブルを生成するためのスキーマと、プリアグリケーション・テーブルのエントリの例を示す図である。It is a figure which shows the example for generating the pre-aggregation table, and the entry of a pre-aggregation table. プリアグリケーション・テーブルにおいて、時間窓から外れたエントリを削除する様子を示す図である。It is a figure which shows a mode that the entry which remove | deviated from the time window is deleted in a pre-aggregation table. クエリ処理の機能構成を示す図である。It is a figure which shows the function structure of a query process. クエリ処理のフローチャートを示す図である。It is a figure which shows the flowchart of a query process. クエリ処理の例を図式的に示す図である。It is a figure which shows the example of a query process typically.

以下、図面に従って、本発明の実施例を説明する。これらの実施例は、本発明の好適な態様を説明するためのものであり、発明の範囲をここで示すものに限定する意図はないことを理解されたい。また、以下の図を通して、特に断わらない限り、同一符号は、同一の対象を指すものとする。 Embodiments of the present invention will be described below with reference to the drawings. It should be understood that these examples are for the purpose of illustrating preferred embodiments of the invention and are not intended to limit the scope of the invention to what is shown here. Further, throughout the following drawings, the same reference numerals denote the same objects unless otherwise specified.

図１は、本発明を実施するためのシステムの一例の概要構成を示す図である。図１において、車両１０４、１０６は、パケット通信網１０２を通じて、サーバ１１０ａに、プローブカー・データを送信する。尚、実際は、何千台あるいは何万台レベルの車両からプローブカー・データを受信するが、ここでは例示的に２台だけ表示されていることに留意されたい。 FIG. 1 is a diagram showing a schematic configuration of an example of a system for carrying out the present invention. In FIG. 1, vehicles 104 and 106 transmit probe car data to a server 110a through a packet communication network 102. In practice, probe car data is received from thousands or tens of thousands of vehicles, but it should be noted that only two are displayed here as an example.

プローブカーデータを利用した交通データ収集技法は公知であり、例えば、特開２００４−１１０４５８号公報、特開２００４−１５６９８２号公報、特開２００７−１９３７０５号公報、特開２００４−２４１９８７号公報などを参照されたい。このシステムは、そのような周知の技法を使用して、車両の速度、位置、加速度などのデータを時々刻々し受け取る。 Traffic data collection techniques using probe car data are known. For example, JP 2004-110458 A, JP 2004-156882 A, JP 2007-193705 A, JP 2004-241987 A, etc. Please refer. The system uses such well-known techniques to receive data such as vehicle speed, position, and acceleration from time to time.

サーバ１１０ａは、ＬＡＮ、ＷＡＮ、ＦＨＨＴなどのどれかの接続構成である、ネットワーク１０８に接続され、ネットワーク１０８にはさらに、サーバ１１０ｂ、１１０ｃ、・・・、１１０ｍ、サーバ１１２、１１６、１１８ａ、・・・、１１８ｋが接続されている。これらのサーバとして、これには限定されないが、インターナョナル・ビジネス・マシーンズ・コーポレーションから購入可能な、IBM(R) System X、System i、System pなどの機種のサーバを使うことができる。これらのサーバで使用可能なオペレーティング・システムとして、AIX（商標）、UNIX(商標)、Linux(商標)、Windows(商標)2003 Serverなどがあるが、この実施例では、Linux(商標)を導入するものとする。 The server 110a is connected to the network 108 having any connection configuration such as LAN, WAN, FHHT, etc., and the network 108 further includes servers 110b, 110c,..., 110m, servers 112, 116, 118a,. .. 118k are connected. These servers include, but are not limited to, IBM (R) System X, System i, System p, and other servers that can be purchased from International Business Machines Corporation. Operating systems that can be used with these servers include AIX (trademark), UNIX (trademark), Linux (trademark), and Windows (trademark) 2003 Server. In this embodiment, Linux (trademark) is introduced. Shall.

サーバ１１０ａ、サーバ１１０ｂ、１１０ｃ、・・・、１１０ｍには、IBM(R) InfoSphere Streamsが導入され、サーバ１１０ａ、サーバ１１０ｂ、１１０ｃ、・・・、１１０ｍは、ストリーム・サーバを構成する。 The server 110a, the servers 110b, 110c,..., 110m are installed with IBM® InfoSphere Streams, and the server 110a, the servers 110b, 110c,.

サーバ１１６にはHadoopが導入され、サーバ１１６において、Hadoopのプロジェクトの１つであるHbaseを使って、カラムナー・テーブル(columnar table)１１６ａが実現される。 Hadoop is installed in the server 116, and a columnar table 116a is realized in the server 116 using Hbase which is one of Hadoop projects.

サーバ１１２にもHadoopが導入され、サーバ１１２において、Hadoopのプロジェクトの１つであるHiveを使って、クエリ・サーバ１１２ａが実現される。サーバ１１２には、クライアント・コンピュータ１１４が接続され、クライアント・コンピュータ１１４には、ＧＵＩの操作により、後述する時間粒度(time resolution)と、時間窓(time window)の情報を含むクエリ定義ファイルを生成するためのプログラムが導入されている。 Hadoop is also introduced into the server 112, and the server 112 implements the query server 112a by using Hive, which is one of Hadoop projects. A client computer 114 is connected to the server 112, and a query definition file including information on time resolution and time window, which will be described later, is generated by operating the GUI on the client computer 114. A program has been introduced.

サーバ１１８ａ、・・・、１１８ｋは、複数台の各々に、JobTracker、NameNode、TaskTracker、DataNodeの役割を割り当てることにより、MapReduceを構成する。あるいは、単一のサーバで、これらの全ての役割を担わせることもできる。 The servers 118a,..., 118k configure MapReduce by assigning the roles of JobTracker, NameNode, TaskTracker, and DataNode to each of the plurality of servers. Alternatively, a single server can play all these roles.

次に、図２を参照して、本発明のシステムの機能構成の概要について説明する。図２において、車両１０４は、センサ１０４ａを備え、アラータ(alerter)１０４ｂを介して、プローブカー・データを、サーバ１１０ａ、サーバ１１０ｂ、１１０ｃ、・・・、１１０ｍから構成されるストリーム・システム２００に送信する。ストリーム・システム２００は、接続マネジャ２０２、アラータ(alerter)２０４、マテリアライズド・ビュー用ＥＴＬ２０６をもち、接続マネジャ２０２は、受信したプローブカー・データを、アラータ２０４と、マテリアライズド・ビュー用ＥＴＬ２０６に送る。 Next, an overview of the functional configuration of the system of the present invention will be described with reference to FIG. In FIG. 2, a vehicle 104 includes a sensor 104a, and transmits probe car data to a stream system 200 including a server 110a, servers 110b, 110c,..., 110m via an alerter 104b. Send. The stream system 200 includes a connection manager 202, an alerter 204, and a materialized view ETL 206, and the connection manager 202 sends the received probe car data to the alerter 204 and the materialized view ETL 206.

サーバ１１６は、マテリアライズド・ビュー用ＥＴＬ２０６から受け取ったデータに基づき、主記憶上に、Hbaseにより、カラムナー(columnar)・テーブル１１６ａを作成する。 Based on the data received from the materialized view ETL 206, the server 116 creates a columnar table 116a on the main memory by Hbase.

サーバ１１６は、サーバ１１８ａ、・・・、１１８ｋのうちの所定のサーバ群により構成されたMapReduce２０８により、プリアグリゲーション・テーブル（PAT）１１６ｂを主記憶上に生成する。サーバ１１６はまた、サーバ１１８ａ、・・・、１１８ｋのうちの所定のサーバ群により構成されたMapReduce２１０により、クエリ・サーバ112aから受け取ったクエリに従い、プリアグリゲーション・テーブル１１６ｂからデータを抽出してレポートを作成する処理を行う。 The server 116 generates a pre-aggregation table (PAT) 116b on the main memory by MapReduce 208 configured by a predetermined server group among the servers 118a,. The server 116 also extracts a report from the pre-aggregation table 116b according to the query received from the query server 112a by the MapReduce 210 configured by a predetermined server group among the servers 118a,. Process to create.

MapReduce２０８上で実行されるプログラムは、クライアント・コンピュータ１１４でユーザにより作成される、クエリ定義ファイルの内容に基づき生成される。 The program executed on MapReduce 208 is generated based on the contents of the query definition file created by the user on the client computer 114.

クエリアプリ定義ファイルの内容の例は、以下のとおりである。
Application: {
AggFunction :{avg,sum,cnt,max,min},
GroupBy: {
"Area" : Hash1(gps), // 47 types
"Age" : Hash2(age) // 6 types
},
Measures: {
"speed" : float, // per hour
"LineChange" : int, // per intr
"quickBrake" : int, // per intr
}
resolution : 60 // 1 min
window : 3600 // 60 mins
} An example of the contents of the query application definition file is as follows.
Application: {
AggFunction: {avg, sum, cnt, max, min},
GroupBy: {
"Area": Hash1 (gps), // 47 types
"Age": Hash2 (age) // 6 types
},
Measures: {
"speed": float, // per hour
"LineChange": int, // per intr
"quickBrake": int, // per intr
}
resolution: 60 // 1 min
window: 3600 // 60 mins
}

クエリアプリ定義ファイルの内容の例において、AggFunctionは使用される集計関数をリストする。avgは平均、sumは合計、cntはカウント、maxは最大、minは最小をあらわす。 In the example query app definition file content, AggFunction lists the aggregate functions used. avg is average, sum is total, cnt is count, max is maximum, min is minimum.

GroupByは、どのキーで集約するかを示す。ここでは、Area(区域)と、Age(年代)で集約すると指定されている。ここで、例えば、Hash1(gps)とあるのは、gpsの値を入力して、都道府県名を返す関数であり、次のようなコードである。
Hash1(gps) {
if gps.lt > 10 and gps.lg < 100:
return "tokyo"
elif gps.lt > 12 and gps.lg > 10:
return "osaka"
...
} GroupBy indicates which key is used for aggregation. Here, it is specified to be aggregated by Area (area) and Age (age). Here, for example, Hash1 (gps) is a function that inputs the value of gps and returns the prefecture name, and has the following code.
Hash1 (gps) {
if gps.lt> 10 and gps.lg <100:
return "tokyo"
elif gps.lt> 12 and gps.lg> 10:
return "osaka"
...
}

Hash2(age)も、年代を層別する関数として適宜定義される。 Hash2 (age) is also appropriately defined as a function that stratifies age.

プリアグリゲーション・テーブルで、レコードのキーは、時間+集約キーで生成される。一例では、キーは、2012/01/01_10:00:00_tokyo_10のようになる。 In the pre-aggregation table, the record key is generated by time + aggregation key. In one example, the key is like 2012/01 / 01_10: 00: 00_tokyo_10.

resolutionとあるのは、プリアグリゲーション・テーブルでデータを集約する時間粒度であり、windowとあるのは、その時間が経過したら、プリアグリゲーション・テーブルからレコードが消されてしまう時間の幅である。 “Resolution” is the time granularity for aggregating data in the pre-aggregation table, and “window” is the width of time that records are deleted from the pre-aggregation table after the time has elapsed.

図３は、そのようにして作成されたクエリアプリ定義ファイルに基づき、プリアグリゲーション・テーブル(PAT)を生成し、PATスケジューラを登録し、更にはMapReduce２０８を動作させるプログラムを生成するための機能ブロックを示す図である。 FIG. 3 shows a functional block for generating a pre-aggregation table (PAT) based on the query application definition file thus created, registering the PAT scheduler, and further generating a program for operating MapReduce 208. FIG.

図３において、クエリアプリ定義ファイル３０２は、クライアント・コンピュータ１１４で作成されて、例えば、クライアント・コンピュータ１１４のハードディスク上または、主記憶上に保持されているとする。 In FIG. 3, it is assumed that the query application definition file 302 is created by the client computer 114 and is held on the hard disk or the main memory of the client computer 114, for example.

クエリサービスを実行するサーバ１１２のハードディスクには、クエリアプリ定義ファイル３０２を要求解析するPAT要求解析モジュール３０４と、PAT要求解析モジュール３０４の解析結果に基づきプリアグリゲーション・テーブル１１６ｂを作成するPAT作成モジュール３０６と、PAT要求解析モジュール３０４の解析結果に基づきPATスケジューラ３１２にスケジュールを登録するPATスケジューラ登録モジュール３０８と、PAT要求解析モジュール３０４の解析結果に基づきMapReduce２０８を動作させるためのMAPPER/REDUCERプログラムを生成するためのPAT更新プログラム作成モジュール３１０が保存され、クライアント・コンピュータ１１４などからの操作に応答して、適宜呼び出されて主記憶にロードされ実行される。 On the hard disk of the server 112 that executes the query service, a PAT request analysis module 304 that performs request analysis of the query application definition file 302, and a PAT creation module 306 that creates a pre-aggregation table 116b based on the analysis result of the PAT request analysis module 304. And a PAT scheduler registration module 308 for registering a schedule in the PAT scheduler 312 based on the analysis result of the PAT request analysis module 304, and a MAPPER / REDUCER program for operating MapReduce 208 based on the analysis result of the PAT request analysis module 304 The PAT update program creation module 310 is stored, and is appropriately called, loaded into the main memory and executed in response to an operation from the client computer 114 or the like.

尚、この実施例では、クエリサービスを実行するサーバ１１２を、PAT要求解析などを行わせるサーバとして兼用しているが、別のサーバを立てて、PAT要求解析、PAT作成、PATスケジューラ登録、及びPAT更新プログラム作成などの処理を実行させるようにしてもよい。 In this embodiment, the server 112 that executes the query service is also used as a server that performs PAT request analysis, etc., but another server is set up to perform PAT request analysis, PAT creation, PAT scheduler registration, and Processing such as creation of a PAT update program may be executed.

サーバ１１６においては、ストリーム・システム２００から受領したデータに従い主記憶上に形成したカラムナー・テーブル１１６ａに基づき、MapReduce２０８が、PATスケジューラ３１２のスケジューリングに従い動作して、集約したデータ３１４を作成し、プリアグリゲーション・テーブル１１６ｂに追加する。 In the server 116, based on the columner table 116a formed on the main memory according to the data received from the stream system 200, MapReduce 208 operates according to the scheduling of the PAT scheduler 312 to create the aggregated data 314, and pre-aggregation Add to table 116b.

次に図４のフローチャートを参照して、プリアグリゲーション・テーブル(PAT)の生成に関連する処理について説明する。図４のステップ４０２で、PAT要求解析モジュール３０４が、クエリアプリ定義ファイル３０２を読み込んで、クエリ定義Dに対するPAT生成処理を開始する。 Next, processing related to generation of a pre-aggregation table (PAT) will be described with reference to the flowchart of FIG. In step 402 of FIG. 4, the PAT request analysis module 304 reads the query application definition file 302 and starts PAT generation processing for the query definition D.

ステップ４０４で、PAT要求解析モジュール３０４は、PATが存在するかどうか判断し、もしそうでないなら、ステップ４０６で、PAT定義と合致する(K,V)のスキーマを定義してPAT作成モジュール３０６を呼び出し、データを保存するPATを生成する。ここでPAT定義とは、時間解像度、(任意個の)グループ化指示列、対象データ列、集約関数の組み合わせをKEYとするとき、グループ化後の対象データ列に対して集約関数を適用結果のスカラー値をVALUEとして参照可能なKey Value Store型のテーブルである。図５は、スキーマに基づくPAT作成モジュール３０６の処理の例を示す図である。すなわち、PAT作成モジュール３０６は、ステップ５０２で新規テーブルを作成し、クエリアプリ定義ファイル３０２に基づき、ステップ５０４でキーを生成し、ステップ５０６でカラム・ファミリ(CF)を定義し、プリアグリゲーション・テーブル１１６ｂを生成する。 In step 404, the PAT request analysis module 304 determines whether or not PAT exists, and if not, in step 406, the PAT creation module 306 is defined by defining a schema (K, V) that matches the PAT definition. Call and generate PAT to save data. Here, the PAT definition is the combination of time resolution, (arbitrary) grouping instruction string, target data string, and aggregate function as KEY. This is a Key Value Store type table that can refer to scalar values as VALUE. FIG. 5 is a diagram illustrating an example of processing of the PAT creation module 306 based on the schema. That is, the PAT creation module 306 creates a new table in step 502, generates a key in step 504 based on the query application definition file 302, defines a column family (CF) in step 506, and creates a pre-aggregation table. 116b is generated.

キーは、時間+集約フィールドであらわされる。この例では、Area+Ageが集約フィールドである。 The key is represented as a time + summary field. In this example, Area + Age is the aggregate field.

この段階では、空のプリアグリゲーション・テーブル１１６ｂだけであるが、参考のために、具体的なエントリの例も示す。この例では、時間解像度(resolution) = 60秒なので、先頭のエントリ2012/01/01_10:00:00_tokyo_10 {“avg”:42.3, “max”:120.2, “min”:0.0, “cnt”:10, “sum”:423}...は、60秒間の集約データである。 At this stage, there is only an empty pre-aggregation table 116b, but an example of a specific entry is also shown for reference. In this example, time resolution (resolution) = 60 seconds, so the first entry 2012/01 / 01_10: 00: 00_tokyo_10 {“avg”: 42.3, “max”: 120.2, “min”: 0.0, “cnt”: 10 , “Sum”: 423} ... is aggregated data for 60 seconds.

図４のフローチャートに戻って、ステップ４０８では、PAT要求解析モジュール３０４は、クエリ定義Dに対応したPATへのデータ追加を行う集約プログラム(=P)が存在するか判断し、そうでないなら、ステップ４１０で、PAT更新プログラム作成モジュール310を呼び出して、センサーデータを入力として、Dの定義に従い集計し、集計時刻、グループ化列、対象データ列、集約関数をKEY、この集計結果をVALUEとするレコードを生成するプログラムを生成する。 Returning to the flowchart of FIG. 4, in step 408, the PAT request analysis module 304 determines whether there is an aggregation program (= P) that adds data to the PAT corresponding to the query definition D. In 410, the PAT update program creation module 310 is called, the sensor data is input, the data is aggregated according to the definition of D, the aggregation time, the grouping column, the target data column, the aggregation function is KEY, and this aggregation result is VALUE. Generate a program that generates

下記に示すのは、
Application: {
AggFunction :{avg,sum,cnt,max,min},
GroupBy: {
"Area" : Hash1(gps), // 47 types
"Age" : Hash2(age) // 6 types
},
Measures: {
"speed" : float, // per hour
"LineChange" : int, // per intr
"quickBrake" : int, // per intr
}
resolution : 60 // 1 min
window : 3600 // 60 mins
}
というクエリアプリ定義ファイル３０２からPAT作成モジュール３０６が生成するMap/Reduce用の集約プログラムのコードの例を示す。ここで示すのは疑似コードであるが、実際は、Java(R)、Pythonなどの既存の任意のプログラミング言語で記述することができる。
class MAPPER:
method INITIALIZE():
Set<K,V> set = new Set<K,V>() // K=str, V=tuple
method MAP(string t, row r):
str s = String.format(“%s-%s-%s”, time, Hash1(r.gps), Hash2(r.age))
// for CF:Speed
float sum1 = r.speed, float cnt1 = 1
float max1 = max(r.speed), float min1 = min(r.speed)
tuple t1 = new tuple(sum:sum1, cnt:cnt1, max:max1, min:min1)
set.add(“SP”, t1)
// for CF:LineChange
float sum2 = r.lineChange, float cnt2 = 1
float max2 = max(r.lineChange), float min2 = min(r.lineChange)
tuple t2 = new tuple(sum:sum2, cnt:cnt2, max:max2, min:min2)
set.add(“LC”, t2)
EMIT(str s, Set set)
class REDUCER:
method INITIALIZE():
float sum1,sum2,cnt1,cnt2,max1,max2,min1,min2 = 0

method REDUCE(str s, sets[r1, r2, ..]):
for all Set set ∈ sets[r1, r2, …] do
sum1 = sum1 + set.get(“SP”).sum, sum2 = sum2 + set.get(“LC”).sum
cnt1 = cnt1 + set.get(“SP”).cnt, cnt2 = cnt2 + set.get(“LC”).cnt
max1 = set.get(“SP”).max > max1 ? Set.get(“SP”).max : max1

done
writeToIncrementalDB(s, CF=“SP”, sum1, cnt1, max1, min1)
writeToIncrementalDB(s, CF=“LC”, sum2, cnt2, max2, min2)
このようにして生成された集約プログラムのコードは、MapReduceシステム２０８を動作させる。尚、ここではresolutionとwindowの値はあらわれないが、それらの設定値は、PATスケジューラ３１２に設定される。 The following shows
Application: {
AggFunction: {avg, sum, cnt, max, min},
GroupBy: {
"Area": Hash1 (gps), // 47 types
"Age": Hash2 (age) // 6 types
},
Measures: {
"speed": float, // per hour
"LineChange": int, // per intr
"quickBrake": int, // per intr
}
resolution: 60 // 1 min
window: 3600 // 60 mins
}
An example of Map / Reduce aggregation program code generated by the PAT creation module 306 from the query application definition file 302 is shown. This is pseudo code, but in fact it can be written in any existing programming language such as Java (R), Python, etc.
class MAPPER:
method INITIALIZE ():
Set <K, V> set = new Set <K, V> () // K = str, V = tuple
method MAP (string t, row r):
str s = String.format (“% s-% s-% s”, time, Hash1 (r.gps), Hash2 (r.age))
// for CF: Speed
float sum1 = r.speed, float cnt1 = 1
float max1 = max (r.speed), float min1 = min (r.speed)
tuple t1 = new tuple (sum: sum1, cnt: cnt1, max: max1, min: min1)
set.add (“SP”, t1)
// for CF: LineChange
float sum2 = r.lineChange, float cnt2 = 1
float max2 = max (r.lineChange), float min2 = min (r.lineChange)
tuple t2 = new tuple (sum: sum2, cnt: cnt2, max: max2, min: min2)
set.add (“LC”, t2)
EMIT (str s, Set set)
class REDUCER:
method INITIALIZE ():
float sum1, sum2, cnt1, cnt2, max1, max2, min1, min2 = 0

method REDUCE (str s, sets [r1, r2, ..]):
for all Set set ∈ sets [r1, r2,…] do
sum1 = sum1 + set.get (“SP”). sum, sum2 = sum2 + set.get (“LC”). sum
cnt1 = cnt1 + set.get (“SP”). cnt, cnt2 = cnt2 + set.get (“LC”). cnt
max1 = set.get (“SP”). max> max1? Set.get (“SP”). max: max1

done
writeToIncrementalDB (s, CF = “SP”, sum1, cnt1, max1, min1)
writeToIncrementalDB (s, CF = “LC”, sum2, cnt2, max2, min2)
The code of the aggregate program generated in this way causes the MapReduce system 208 to operate. Note that the resolution and window values do not appear here, but their setting values are set in the PAT scheduler 312.

図４のフローチャートに戻って、PAT要求解析モジュール３０４は、ステップ４１２で、PATスケジューラ登録集約モジュール３０８によって、ＰＡＴスケジューラ３１２に、プログラムを時間解像度ごとに起動するようにスケジューリングする。ＰＡＴスケジューラ３１２は、実質的に、上記のMapReduce用のコードの外にあるドライバの中でハンドルすることができる。例えば:
*/1 * * * * hadoop jar PAT-Update-MR.jar
などとすると、1分ごとに更新プログラムが起動する。起動されたプログラムのドライバrでは、まず現在時刻を取得し、どこの時点に対する集約をするかを決定する。例えば、現在時刻を取得して12:01:02 だとしたとき、基準となる集約時刻を12:01にすると決める。これは実装の問題であり、自由に変更できる。この12:01という基準がプログラム実行時に動的に決定されると、ここからresolution時間分をさかのぼって、12:00:00 - 12:00:59 という時間範囲(=resolution)のデータをMapの入力とするというように一意に決定される。この範囲内のデータは、12:01という時刻に関して紐付けられるので、最終的なReduceの結果をPATに書き込む段階で、keyの一つとして利用される。 Returning to the flowchart of FIG. 4, the PAT request analysis module 304 schedules the PAT scheduler 312 to start the program at each time resolution by the PAT scheduler registration aggregation module 308 in step 412. The PAT scheduler 312 can be handled in a driver that is substantially outside the MapReduce code described above. For example:
* / 1 * * * * hadoop jar PAT-Update-MR.jar
For example, the update program starts every minute. The driver r of the activated program first obtains the current time, and decides at what point of time it should be consolidated. For example, when the current time is acquired and it is 12:01:02, the reference aggregation time is determined to be 12:01. This is an implementation issue and can be changed freely. If this 12:01 criterion is dynamically determined at the time of program execution, the resolution time is traced back from here, and the data in the time range (= resolution) of 12:00:00-12:00:59 It is uniquely determined as input. Since data within this range is linked with respect to the time of 12:01, it is used as one of the keys at the stage of writing the final Reduce result to the PAT.

ステップ４１４では、集約プログラムPが、MapReduce２０８上で、現在の時刻=T1として保存し、ステップ416で、PATスケジューラ３１２が、センサーデータの入力のうち、最後に実行した時刻(=T0)と現在の時刻(=T1)を範囲にフィルタするルールをPに与えて集約プログラムPを実行し、結果を出力(=R)する。ここでいうルールとは、保存されているデータを現在の時刻から時間解像度前までのデータを入力とするようなフィルタ・ルールであり、PAT要求解析モジュール３０４の解析結果に基づき、PATスケジューラ３１２が予め作成する。そして、MapReduce２０８は、時間解像度の幅にフィルタされたデータを入力に指定し、集約プログラムをMapReduce実行する。その際、スケジューリングとは、時間解像度経過後に、上記ルール作成・MR実行を繰り返し行うようにスケジューリングすることである。 In step 414, the aggregation program P stores the current time = T1 on MapReduce 208, and in step 416, the PAT scheduler 312 executes the last execution time (= T0) of the sensor data input and the current time. A rule for filtering the time (= T1) into a range is given to P, the aggregation program P is executed, and the result is output (= R). The rule here is a filter rule in which stored data is input from the current time to the data before the time resolution. Based on the analysis result of the PAT request analysis module 304, the PAT scheduler 312 Create in advance. Then, MapReduce 208 designates the data filtered to the time resolution width as input, and executes the aggregation program MapReduce. In this case, scheduling refers to scheduling so that the rule creation / MR execution is repeated after the time resolution has elapsed.

より詳細には、Map処理とは、時間解像度（現在の時刻）とGroupByのIDを組み合わせたkeyを、入力データからMeasureで指定したvalueを選択したTupleを生成し、Map処理の出力とすることであり、Reduce処理は、keyごとにグループされたTuple群を入力とし、集約関数をTuple内のvalueごとに分けて集約演算を行うことである。Keyに対する集約結果を新たなTupleとして、reduce処理の出力に利用する。Reduce処理はさらに、この出力をアグリゲーションテーブルに追加し、同時に現在の時刻からwindow time前のデータを削除することを含む。これは、図４のステップ４１８の、PATの先頭にRを追加、および、Dに設定されたTime Window時間よりも古いデータをPATの末尾から削除する処理に対応する。図６は、PATの末尾のデータが削除される様子を示す。 More specifically, Map processing is to generate a Tuple that selects the value specified by Measure from the input data, using the key that combines the time resolution (current time) and GroupBy ID, and outputs it as Map processing output. In the Reduce process, a Tuple group grouped for each key is input, and an aggregation function is divided into values in the Tuple to perform an aggregation operation. The aggregation result for Key is used as a new Tuple for the output of reduce processing. The Reduce process further includes adding this output to the aggregation table and simultaneously deleting the data before window time from the current time. This corresponds to the processing of adding R to the beginning of the PAT and deleting data older than the Time Window time set in D from the end of the PAT in step 418 in FIG. FIG. 6 shows how the data at the end of the PAT is deleted.

ステップ４２０で、PATスケジューラ３１２は、T0にT1の時刻を保存後、時間解像度時間経過するまで停止し、ステップ４２２で、クライアント・コンピュータ１１４などから、実行停止要求が来ているかどうか判断し、もしそうなら、実行を終了し、そうでないなら、ステップ４１４に戻る。 In step 420, the PAT scheduler 312 stores the time T1 in T0 and then stops until the time resolution time elapses. In step 422, the PAT scheduler 312 determines whether an execution stop request is received from the client computer 114 or the like. If so, execution is terminated; otherwise, return to step 414.

図７は、クライアント・コンピュータ１１４で作成したＯＬＡＰクエリ７０２に基づき、この発明に従いクエリに対するレポートを作成する処理の機能ブロックを示す図である。 FIG. 7 is a diagram showing functional blocks of processing for creating a report for a query according to the present invention based on the OLAP query 702 created by the client computer 114.

ＯＬＡＰクエリ７０２は例えば、SQLライクな疑似コードで示すと、以下のとおりである。
SELECT AVG(speed) from PAT
TIME
FROM time.now()
TO time.now() - time.MIN(2)
GroupBy area
これは、今から2分前までの時間の間の速度の平均値を求めるクエリである。 The OLAP query 702 is, for example, as shown in SQL-like pseudo code as follows.
SELECT AVG (speed) from PAT
TIME
FROM time.now ()
TO time.now ()-time.MIN (2)
GroupBy area
This is a query that finds the average speed over a period of 2 minutes from now.

サーバ１１２にあるクエリ・サーバ１１２ａは、ＯＬＡＰクエリ７０２を、好適にはHadoopのプロジェクトの1つである、Hiveを用いて、フィルタ条件に変換する。このフィルタ条件は、ルールとして、サーバ１１６内のフィルタ７０６に加えられる。これに応答して、フィルタ７０６は、プリアグリゲーション・テーブル１１６ｂから、設定されたルールに従い、選択した範囲のデータを抽出する。 The query server 112a in the server 112 converts the OLAP query 702 into filter conditions, preferably using Hive, one of Hadoop's projects. This filter condition is added to the filter 706 in the server 116 as a rule. In response to this, the filter 706 extracts data in the selected range from the pre-aggregation table 116b according to the set rule.

クエリ・サーバ１１２ａは一方で、MapReduce２１０に、MapReduceプログラム７０８を提供し、これにより、MapReduce２１０は、MapReduceプログラム７０８によって指定された集約処理を行って、結果のレポート７１０を生成する。結果のレポート７１０は、サーバ１１６の通信モジュール７１２によってクライアント・コンピュータ１１４の表示モジュール７０４に送られ、表示モジュール７０４は結果のレポート７１０を適宜、グラフ表示する。 On the other hand, the query server 112 a provides the MapReduce 210 with the MapReduce program 708, whereby the MapReduce 210 performs aggregation processing specified by the MapReduce program 708 and generates a result report 710. The result report 710 is sent to the display module 704 of the client computer 114 by the communication module 712 of the server 116, and the display module 704 displays the result report 710 as a graph as appropriate.

次に図８のフローチャートを参照して、クエリ処理を説明する。図８のステップ８０２では、クエリ・サーバ１１２ａが、サブミットされたクエリ要求に対する結果の生成を開始する。尚、図８のフローチャートを通じて、Dはクエリアプリ定義ファイル、QはサブミットされたＯＬＡＰクエリとする。 Next, query processing will be described with reference to the flowchart of FIG. In step 802 of FIG. 8, the query server 112a begins generating results for the submitted query request. Throughout the flowchart of FIG. 8, D is a query application definition file, and Q is a submitted OLAP query.

ステップ８０４で、クエリ・サーバ１１２ａは、Qの結果生成に必要なデータがDで定義されているかどうかを判断し、そうでないなら、ステップ８０６で、終了処理エラーをクエリ要求元に返し、そうならステップ８０８に進む。 In step 804, the query server 112a determines whether data necessary for generating the Q result is defined by D. If not, in step 806, the query server 112a returns a termination processing error to the query request source. Proceed to step 808.

ステップ８０８で、クエリ・サーバ１１２ａは、Qで要求する時間解像度がDに定義された時間解像度よりも長いかどうか判断し、そうでないなら、ステップ８０６で、終了処理エラーをクエリ要求元に返し、そうならステップ８１０で、現在の時刻 = T0として保存する。 In step 808, the query server 112a determines whether or not the time resolution requested in Q is longer than the time resolution defined in D. If not, in step 806, an end processing error is returned to the query requester. If so, in step 810, the current time is saved as T0.

ステップ８１２で、クエリ・サーバ１１２ａは、Qの結果生成で要求される時間の範囲がT0に対して未来の時間を含むか判断し、そうでないなら、フィルタ７０６にルールを送ることにより、ステップ８１４でQに記述された時間範囲をフィルタしPATから該当するレコードを抽出し、ステップ８１６で、Qで要求される集約関数、グループ化列、対象データ列に対応する集約関数適用済みの値をこのフィルタされたレコードから抽出し、同じ集約関数をこれらにさらに適用して、計算結果を出力(=R)し、ステップ８１８でクエリ結果をクエリ要求元に返して、処理を終了する。 At step 812, query server 112a determines whether the time range required for Q result generation includes a future time relative to T0, otherwise, by sending a rule to filter 706, step 814. In step 816, the time range described in Q is filtered and the corresponding record is extracted, and in step 816, the aggregate function applied value corresponding to the aggregate function, grouping column, and target data column required in Q Extracting from the filtered records, further applying the same aggregate function to these, outputting the calculation result (= R), returning the query result to the query request source in step 818, and ending the processing.

ステップ８１２に戻って、クエリ・サーバ１１２ａが、Qの結果生成で要求される時間の範囲がT0に対して未来の時間を含むと判断すると、クエリ・サーバ１１２ａは、ステップ８２０で、処理１（すなわち、ステップ８１４と、それに続くステップ８１６）を行い、T0より過去の時間のレコードをPATから抽出し、Qの結果を出力する (=R)。 Returning to step 812, if the query server 112a determines that the time range required for Q result generation includes a future time relative to T0, the query server 112a, in step 820, performs processing 1 ( That is, step 814 and subsequent step 816) are performed, a record of time past T0 is extracted from the PAT, and the result of Q is output (= R).

次にクエリ・サーバ１１２ａは、ステップ８２２で、Qで要求される最も先の時間 = T2として保存し、ステップ８２４で、時間解像度時間経過するまで停止する。 The query server 112a then saves at step 822 as the earliest time required in Q = T2, and stops at step 824 until the time resolution time has elapsed.

そしてクエリ・サーバ１１２ａは、ステップ８２６で、PATの先頭を読み込み、最新のエントリのQに対応する集約結果を取得し、Rと合成して結果を出力 (=R)し、ステップ８２８で、(現在の時刻=T1) < T2かどうか判断し、もしそうなら、ステップ８２４に戻る。そして、T1がT2を過ぎると、クエリ・サーバ１１２ａは、ステップ８１８でクエリ結果をクエリ要求元に返して、処理を終了する。 Then, in step 826, the query server 112a reads the head of the PAT, acquires the aggregation result corresponding to the Q of the latest entry, combines it with R and outputs the result (= R), and in step 828, ( It is determined whether or not (current time = T1) <T2, and if so, the process returns to step 824. When T1 passes T2, the query server 112a returns the query result to the query request source in step 818 and ends the process.

図９は、図７及び図８で示すクエリ処理の例を図式的に示す図である。図９において、ユーザがOLAPクエリ７０２をサブミットすると、クエリ・サーバ１１２ａは、そこから、図示されているように、時間のキー・フィルタリング９０２と、カラム・ファミリの値フィルタリング９０４を生成する。そして、クエリ・サーバ１１２ａは、これらのフィルタリングからルールを生成し、フィルタ７０６に適用し、これによって選択された範囲のテーブル・データ９０６が得られる。 FIG. 9 is a diagram schematically illustrating an example of the query processing illustrated in FIGS. 7 and 8. In FIG. 9, when a user submits an OLAP query 702, the query server 112a generates a time key filtering 902 and a column family value filtering 904, as shown. Then, the query server 112a generates a rule from these filterings and applies it to the filter 706, thereby obtaining table data 906 in the selected range.

クエリ・サーバ１１２ａは、選択された範囲のテーブル・データ９０６に対して、MapReduce２１０上で、Map関数９０８と、Reduce関数９１０を呼び出して最終リダクションを行い、結果のレポート７１０を得る。このような一連の処理は、リダクション・プロセスと呼ばれる。結果のレポート７１０は、ユーザ側に返送される。 The query server 112a calls the Map function 908 and the Reduce function 910 on the MapReduce 210 to perform the final reduction on the table data 906 in the selected range, and obtains a result report 710. Such a series of processes is called a reduction process. The result report 710 is returned to the user.

以上、本発明を、Hadoop(商標)フレームワークのMapReduceで実装する実施例において、プリアグリゲーション・テーブルの生成にHbaseを使用し、クエリ・サーバにHiveを使用する実施例として説明してきたが、プリアグリゲーション・テーブルの生成には、Hbase以外に、Cassandra、Memcachedなどを使用することができる。 As described above, in the embodiment in which the present invention is implemented by MapReduce of the Hadoop (trademark) framework, the Hbase is used for generating the pre-aggregation table and the Hive is used for the query server. In addition to Hbase, Cassandra, Memcached, etc. can be used to generate the aggregation table.

また、この発明は、HadoopあるいはMapReduceに限定されず、任意のオペレーティング・システム及びハードウェアをもつコンピュータ・プラットフォーム上で実装可能である。 The present invention is not limited to Hadoop or MapReduce, and can be implemented on a computer platform having an arbitrary operating system and hardware.

さらに、この発明は、プローブカー・データのみならず、気象データ、金融データ、その他シミュレーションなど、時間付でデータを受け取ってテーブルに集約する任意の適用例に使用することができる。 Further, the present invention can be used not only for probe car data but also for any application example that receives data with time and aggregates it into a table such as weather data, financial data, and other simulations.

１１２・・・クエリ用サーバ
１１２ａ・・・クエリ・サーバ
１１４・・・クライアント・コンピュータ
１１６ａ・・・プリアグリゲーション・テーブル
２００・・・ストリーム・システム
２０８、２１０・・・MapReduceシステム
３０２・・・クエリアプリ定義ファイル
３０４・・・PAT解析要求モジュール
３０６・・・PAT作成モジュール
３０８・・・PATスケジューラ登録モジュール
３１０・・・PAT更新プログラム作成モジュール
３１２・・・PATスケジューラ
７０２・・・OLAPクエリ 112 ... Query server 112a ... Query server 114 ... Client computer 116a ... Pre-aggregation table 200 ... Stream system 208, 210 ... MapReduce system 302 ... Query application Definition file 304 ... PAT analysis request module 306 ... PAT creation module 308 ... PAT scheduler registration module 310 ... PAT update program creation module 312 ... PAT scheduler 702 ... OLAP query

Claims

In a system for querying data generated in real time by computer processing,
Storing an intermediate result for generating a query processing result from the data from the current time until the time of the time resolution elapses, using the specified time resolution and storage period value; and the storage period An intermediate result processing step for deleting the stored intermediate result after elapse of
Scheduling the intermediate result processing step to operate every time of a specified time resolution;
Generating a query result from the intermediate result in response to receiving a query;
Method.

The method of claim 1, wherein the intermediate result processing step is processed by a system implementing MapReduce.

The program further comprising: generating a program that executes the intermediate result processing step and that operates on a system that implements the MapReduce from the specified time resolution and storage period value by the processing of the computer. The method described in 1.

The method according to claim 1, further comprising the step of generating a program for executing the scheduling step from the specified time resolution and retention period value by the processing of the computer.

The method of claim 1, wherein the intermediate result has a field keyed by a value including time.

6. The method of claim 5, wherein the query includes a time range, and generating the query result includes aggregating data having a key value in the time range and a value including the time.

In a system for querying data generated in real time by computer processing,
In the computer,
Storing an intermediate result for generating a query processing result from the data from the current time until the time of the time resolution elapses, using the specified time resolution and storage period value; and the storage period An intermediate result processing step for deleting the stored intermediate result after elapse of
Scheduling the intermediate result processing step to operate every time of a specified time resolution;
Generating a query result from the intermediate result in response to receiving the query;
program.

The program according to claim 7, wherein the intermediate result processing step is processed by a system that implements MapReduce.

9. The program further comprising: generating a program that executes the intermediate result processing step and that operates on a system that implements the MapReduce from the specified time resolution and storage period value by the processing of the computer. The program described in.

The program according to claim 7, further comprising a step of generating a program for executing the scheduling step based on the designated time resolution and storage period value by processing of the computer.

The program according to claim 7, wherein the intermediate result has a field having a value including time as a key.

The program according to claim 11, wherein the query includes a range of time, and the step of generating the query result includes a step of aggregating data whose key is in the range of time with a value including the time.

In a system for querying data generated in real time by computer processing,
Storing an intermediate result for generating a query processing result from the data from the current time until the time of the time resolution elapses, using the specified time resolution and storage period value; and the storage period Intermediate result processing means for deleting the stored intermediate result after elapse of
Means for scheduling said intermediate result processing step to operate every time of a specified time resolution;
Means for generating a query result from the intermediate result in response to receiving a query;
system.

The system according to claim 13, wherein the intermediate result processing means is processed by a system that implements MapReduce.

15. The program further comprising: means for executing the intermediate result processing means, and generating a program operating on a system implementing MapReduce from the designated time resolution and storage period values by the processing of the computer. The system described in.

14. The system of claim 13, further comprising means for generating a program for executing the scheduling step from the specified time resolution and retention period values.

14. The system of claim 13, wherein the intermediate result has a field keyed by a value including time.

18. The system of claim 17, wherein the query includes a time range, and the means for generating the query result performs the function of aggregating data whose key is in the time range with a value including the time.