JP6319694B2

JP6319694B2 - Data cache method, node device, and program

Info

Publication number: JP6319694B2
Application number: JP2015159144A
Authority: JP
Inventors: 后宏水谷; 修明石; 健介福田; 漆谷　重雄; 重雄漆谷
Original assignee: Nippon Telegraph and Telephone Corp; Inter University Research Institute Corp Research Organization of Information and Systems
Current assignee: Nippon Telegraph and Telephone Corp; Inter University Research Institute Corp Research Organization of Information and Systems
Priority date: 2015-08-11
Filing date: 2015-08-11
Publication date: 2018-05-09
Anticipated expiration: 2035-08-11
Also published as: JP2017037532A

Description

本発明は、構造化オーバレイシステムにおけるデータ解析を行う際のデータキャッシュ方法、当該データキャッシュ方法を実現するノード装置、及びそのプログラムに関する。 The present invention relates to a data cache method for performing data analysis in a structured overlay system, a node device that implements the data cache method, and a program thereof.

構造化オーバレイとは、ひとつの論理的なＩＤ空間を複数のノードで分担し管理する技術であり、ＫｅｙＶａｌｕｅＳｔｏｒｅ等の分散データベースにて利用されている。構造化オーバレイにデータを配置する場合、当該データの名前をキーとし、当該キーを担当するノードに対して当該データを配置する。データをオーバレイから取得する場合、取得したいデータの名前をキーとし、オーバレイ上から当該キーを担当するノードを探し、データを取得する。本手法はＨａｄｏｏｐ（非特許文献１）、Ｊｕｂａｔｕｓ（非特許文献３）といった大規模データ解析基盤等で用いられるデータの保存方法である。これらの解析基盤は、保存されているデータに対してＭａｐＲｅｄｕｃｅ処理（非特許文献２）を行うことで高速かつ分散的にデータの解析を行うことができる。既存手法（非特許文献６）では、解析者がＳＱＬライクな言語を用いて記述した解析手順をＭａｐＲｅｄｕｃｅ処理に射影するＰｉｇ（非特許文献７）と呼ばれるシステムにて、全ての解析手順と解析結果の対を保存しておくことで、新たな解析手順に対する解析結果の算出を高速化している。 Structured overlay is a technique for sharing and managing a single logical ID space by a plurality of nodes, and is used in a distributed database such as Key Value Store. When data is arranged in a structured overlay, the name of the data is used as a key, and the data is arranged for a node in charge of the key. When acquiring data from an overlay, the name of the data to be acquired is used as a key, and a node in charge of the key is searched from the overlay to acquire the data. This method is a data storage method used in large-scale data analysis infrastructures such as Hadoop (Non-Patent Document 1) and Jubatus (Non-Patent Document 3). These analysis infrastructures can perform data analysis at high speed and in a distributed manner by performing MapReduce processing (Non-Patent Document 2) on stored data. In the existing method (Non-Patent Document 6), all analysis procedures and results are analyzed in a system called Pig (Non-Patent Document 7) that projects an analysis procedure described by an analyst using an SQL-like language to MapReduce processing. By saving the pair, the calculation of analysis results for a new analysis procedure is accelerated.

Ｋ．Ｓｈｖａｃｈｋｏ，Ｈ．Ｋｕａｎｇ，Ｓ．Ｒａｄｉａ，ａｎｄＲ．Ｃｈａｎｓｌｅｒ， “ＴｈｅＨａｄｏｏｐＤｉｓｔｒｉｂｕｔｅｄＦｉｌｅＳｙｓｔｅｍ” ｉｎＰｒｏｃ．ＩＥＥＥＭＳＳＴ，２０１０．K. Shvachko, H .; Kuang, S .; Radia, and R.R. Chansler, “The Hadoop Distributed File System” in Proc. IEEE MSST, 2010. Ｊ．ＤｅａｎａｎｄＳ．Ｇｈｅｍａｗａｔ， “ＭａｐＲｅｄｕｃｅ：ＳｉｍｐｌｉｆｉｅｄＤａｔａＰｒｏｃｅｓｓｉｎｇｏｎＬａｒｇｅＣｌｕｓｔｅｒｓ”，ｉｎＰｒｏｃ．ＯＳＤＩ２００４：６ｔｈＳｙｍｐｏｓｉｕｍｏｎＯｐｅｒａｔｉｎｇＳｙｓｔｅｍｓＤｅｓｉｇｎａｎｄＩｍｐｌｅｍｅｎｔａｔｉｏｎ，２００４．J. et al. Dean and S.M. Ghemawat, “MapReduce: Simply Data Processing on Large Clusters”, in Proc. OSDI 2004: 6th Symposium on Operating Systems Design and Implementation, 2004. 韓正圭，牧野浩之，熊崎宏樹，「リアルタイム型大規模データ処理基盤Ｊｕｂａｔｕｓとその活用事例について」，経営の科学５７（１２），ｐｐ．６８９−６９４，２０１２．Masanori Han, Hiroyuki Makino, Hiroki Kumazaki, “Real-time large-scale data processing infrastructure Jubatus and its use cases”, Management Science 57 (12), pp. 689-694, 2012. Ｌ．ＡｖｉｎａｓｈａｎｄＭ．Ｐｒａｓｈａｎｔ，“Ｃａｓｓａｎｄｒａ：ＡＤｅｃｅｎｔｒａｌｉｚｅｄＳｔｒｕｃｔｕｒｅｄＳｔｏｒａｇｅＳｙｓｔｅｍ” ｉｎＡＣＭＳＩＧＯＰＳＯｐｅｒａｔｉｎｇＳｙｓｔｅｍｓＲｅｖｉｅｗ，ｖｏｌ．４４，Ｉｓｓｕｅ．２，ｐｐ．３５−４０，２０１０．L. Avinash and M.M. Prashant, “Cassandra: A Decentralized Structured Storage System” in ACM SIGOPS Operating Systems Review, vol. 44, Issue. 2, pp. 35-40, 2010. Ａ．Ｔｈｕｓｏｏ，Ｊ．Ｓ．Ｓａｒｍａ，Ｎ．Ｊａｉｎ，Ｚ．Ｓｈａｏ，Ｐ．Ｃｈａｋｋａ，Ｎ．Ｚｈａｎｇ，Ｓ．Ａｎｔｏｎｙ，Ｈ．Ｌｉｕ，ａｎｄＲ．Ｍｕｒｔｈｙ， “Ｈｉｖｅ−ＡＰｅｔａｂｙｔｅＳｃａｌｅＤａｔａＷａｒｅｈｏｕｓｅＵｓｉｎｇＨａｄｏｏｐ”，ｉｎＰｒｏｃ．ＩＣＤＥ，２０１０．A. Thusoo, J. et al. S. Sarma, N .; Jain, Z .; Shao, P.A. Chakka, N .; Zhang, S.H. Antony, H.C. Liu, and R.R. Murthy, “Hive-A Petabyte Scale Datawarehouse Usage Hadoop”, in Proc. ICDE, 2010. Ｉ．ＥｌｇｈａｎｄｏｕｒａｎｄＡ．Ａｂｏｕｌｎａｇａ，“ＲｅＳｔｏｒｅ：ＲｅｕｓｉｎｇＲｅｓｕｌｔｓｏｆＭａｐＲｅｄｕｃｅＪｏｂｓ” ｉｎＰｒｏｃ．ＶＬＤＢＥｎｄｏｗｍｅｎｔ，ｖｏｌ．５，Ｉｓｓｕｅ．６，ｐｐ．５８６−５９７，２０１２．I. Elghandour and A.M. Aboolaga, “ReStore: Reusing Results of MapReduce Jobs” in Proc. VLDB Endowment, vol. 5, Issue. 6, pp. 586-597, 2012. Ｏ．Ｃｈｒｉｓｔｏｐｈｅｒｅｔａｌ，“ＰｉｇＬａｔｉｎ：ＡＮｏｔ−Ｓｏ−ＦｏｒｅｉｇｎＬａｎｇｕａｇｅｆｏｒＤａｔａＰｒｏｃｅｓｓｉｎｇ” ｉｎＰｒｏｃ．ＳＩＧＭＯＤ２００８，ｐｐ．１０９９−１１１０，２００８．O. Christopher et al, “Pig Latin: A Not-So-Foreign Language for Data Processing” in Proc. SIGMOD 2008, pp. 1099-1110, 2008. 武井洋介，太田耕平，加藤寧，根元義章，「トラヒックパターンを用いた不正アクセス検出及び追跡方式」，電子情報通信学会論文誌Ｂ，ｖｏｌ．Ｊ８４−Ｂ，Ｎｏ．８，ｐｐ．１４６４−１４７３，２００１．Yosuke Takei, Kohei Ota, Nei Kato, Yoshiaki Nemoto, “Unauthorized Access Detection and Tracking Method Using Traffic Patterns”, IEICE Transactions B, vol. J84-B, no. 8, pp. 1464-1473, 2001.

しかし、非特許文献７のような高速化手法では、全ての解析手順と解析結果の対を保存するため、複雑かつ多様な解析の実行により解析結果の保存量（キャッシュ）が大きく増加し、ハードディスク容量が枯渇する可能性があるという課題がある。 However, in the high-speed technique such as Non-Patent Document 7, since all pairs of analysis procedures and analysis results are stored, the storage amount (cache) of analysis results greatly increases due to the execution of complicated and diverse analysis, and the hard disk There is a problem that the capacity may be depleted.

そこで、本発明は、上記課題を解決するべく、データ解析を高速化できるとともにハードディスク容量が枯渇する可能性を低減できるデータキャッシュ方法、ノード装置及びプログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a data cache method, a node device, and a program that can increase the speed of data analysis and reduce the possibility that a hard disk capacity is exhausted in order to solve the above-described problems.

本発明に係るデータキャッシュ方法は、入力された解析手順とその解析結果を記録し、所定期間毎に、記録された解析手順から頻繁に利用される部分解析手順を抽出し、その後、抽出された部分解析手順を含む解析手順が入力された時に該部分解析手順の解析結果を保存しておき、新たに該部分解析手順を含む解析手順が入力された際に保存された解析結果を再利用することとした。 The data cache method according to the present invention records the input analysis procedure and the analysis result, extracts a frequently used partial analysis procedure from the recorded analysis procedure every predetermined period, and then extracts the extracted analysis procedure. When an analysis procedure including a partial analysis procedure is input, the analysis result of the partial analysis procedure is saved, and the analysis result stored when a new analysis procedure including the partial analysis procedure is input is reused. It was decided.

具体的には、本発明に係るデータキャッシュ方法は、複数のノード装置からなる木構造の構造化オーバレイシステムのデータキャッシュ方法であって、
前記ノード装置に入力された、前記構造化オーバレイシステムが蓄積する時系列データを解析する解析手順と前記解析手順で解析された解析結果を記録する全解析記録ステップと、
所定期間毎に、前記全解析記録ステップで記録した複数の前記解析手順から部分的に共通する部分操作列を抽出し、前記部分操作列を記録する部分操作列抽出ステップと、
前記部分操作列抽出ステップで記録した前記部分操作列を含む新たな解析手順が前記ノード装置に入力されたときに前記部分操作列で解析された解析結果を部分解析結果として記録する部分解析結果記録ステップと、
前記部分解析結果記録ステップの後に、前記部分操作列を含む新たな解析手順がさらに前記ノード装置に入力されたときに、前記部分解析結果記録ステップで記録した前記部分解析結果を利用して解析を行う解析結果再利用ステップと、
を行うことを特徴とする。 Specifically, a data cache method according to the present invention is a data cache method for a tree-structured structured overlay system comprising a plurality of node devices,
An analysis procedure for analyzing time-series data stored in the structured overlay system, input to the node device, and an entire analysis recording step for recording an analysis result analyzed by the analysis procedure;
A partial operation sequence extracting step for extracting a partial common operation sequence from a plurality of the analysis procedures recorded in the entire analysis recording step every predetermined period, and recording the partial operation sequence;
Partial analysis result recording that records the analysis result analyzed in the partial operation sequence as a partial analysis result when a new analysis procedure including the partial operation sequence recorded in the partial operation sequence extraction step is input to the node device Steps,
When a new analysis procedure including the partial operation sequence is further input to the node device after the partial analysis result recording step, an analysis is performed using the partial analysis result recorded in the partial analysis result recording step. Analysis result reuse step to be performed;
It is characterized by performing.

また、本発明に係るノード装置は、木構造の構造化オーバレイシステムを構成するノード装置であって、
キャッシュストレージと、
時系列データを保存するデータストレージと、
前記構造化オーバレイシステム上にて木構造を辿ることで特定の時系列データを探索する経路制御モジュールと、
入力された、前記構造化オーバレイシステムが蓄積する時系列データを解析する解析手順と前記解析手順で解析された解析結果を前記キャッシュストレージに記録させること、
所定期間毎に、前記キャッシュストレージが記録した複数の前記解析手順に部分的に共通する部分操作列を抽出し、前記部分操作列を前記キャッシュストレージに記録させること、
前記部分操作列を含む新たな解析手順が入力されたときに前記部分操作列で解析された解析結果を部分解析結果として前記キャッシュストレージに記録させること、及び
前記部分操作列を含む新たな解析手順がさらに入力されたときに、前記キャッシュストレージが記録する前記部分解析結果を利用して解析すること
を行うコントローラと、
を備えることを特徴とする。 A node device according to the present invention is a node device that constitutes a structured overlay system having a tree structure,
Cache storage,
Data storage to store time series data;
A path control module for searching for specific time series data by following a tree structure on the structured overlay system;
An analysis procedure for analyzing time-series data accumulated by the structured overlay system and an analysis result analyzed by the analysis procedure are recorded in the cache storage;
Extracting a partial operation sequence that is partially common to a plurality of the analysis procedures recorded by the cache storage for each predetermined period, and recording the partial operation sequence in the cache storage;
Recording an analysis result analyzed in the partial operation sequence in the cache storage as a partial analysis result when a new analysis procedure including the partial operation sequence is input; and a new analysis procedure including the partial operation sequence A controller that performs analysis using the partial analysis result recorded by the cache storage,
It is characterized by providing.

本発明は、多数のサーバにて構築される木構造の構造化オーバレイに蓄積されているトラフィックデータやセンサデータ等の時系列データ解析する際に、過去の解析手順から頻繁に利用される解析手順（部分操作列）とその結果を記録しておき、新たな解析手順が投入された場合、過去の解析手順（部分操作列）の結果を再利用することで、解析時間を短縮する。また、本発明は、頻繁に利用される部分操作列とその結果のみを記録するため、キャッシュ容量を低減できる。 The present invention is an analysis procedure that is frequently used from past analysis procedures when analyzing time-series data such as traffic data and sensor data stored in a structured overlay of a tree structure constructed by a large number of servers. (Partial operation sequence) and the result thereof are recorded, and when a new analysis procedure is input, the analysis time is shortened by reusing the result of the past analysis procedure (partial operation sequence). Further, since the present invention records only the frequently used partial operation sequence and its result, the cache capacity can be reduced.

従って、本発明は、データ解析を高速化できるとともにハードディスク容量が枯渇する可能性を低減できるデータキャッシュ方法及びノード装置を提供することができる。 Therefore, the present invention can provide a data cache method and a node device that can speed up data analysis and reduce the possibility of the hard disk capacity being exhausted.

本発明に係るデータキャッシュ方法は、前記部分操作列抽出ステップで抽出した前記部分操作列を前記ノード装置の上位にある上位ノード装置へ転送する上位転送ステップをさらに行うことを特徴とする。 The data cache method according to the present invention further includes a higher transfer step of transferring the partial operation sequence extracted in the partial operation sequence extraction step to an upper node device above the node device.

本発明に係るデータキャッシュ方法は、
前記上位ノード装置で、一定期間毎に、複数の前記ノード装置から前記上位転送ステップで転送されてきた複数の前記部分操作列から部分的に共通するノード間共通操作列を抽出するノード間共通操作列抽出ステップと、
前記ノード間共通操作列抽出ステップで抽出した前記ノード間共通操作列を前記部分操作列として前記ノード装置に記録させるため、前記上位ノード装置から前記ノード装置へ前記ノード間共通操作列を転送する下位転送ステップと、
を行うことを特徴とする。 The data cache method according to the present invention includes:
Inter-node common operation for extracting a partially common inter-node operation sequence from the plurality of partial operation sequences transferred in the upper transfer step from a plurality of the node devices at a certain period in the upper node device. A column extraction step;
A subordinate that transfers the inter-node common operation sequence from the upper node device to the node device in order to cause the node device to record the inter-node common operation sequence extracted in the inter-node common operation sequence extraction step as the partial operation sequence A transfer step;
It is characterized by performing.

また、本発明に係るノード装置の前記コントローラは、抽出した前記部分操作列を自装置の上位にある上位ノード装置へ転送することを特徴とする。 Further, the controller of the node device according to the present invention transfers the extracted partial operation sequence to a higher-level node device that is higher than the own device.

本発明に係るノード装置の前記コントローラは、
前記上位ノード装置が、一定期間毎に抽出した、複数の前記部分操作列に部分的に共通するノード間共通操作列を受け取り、前記ノード間共通操作列を前記部分操作列として前記キャッシュストレージに記憶させること、をさらに行うことを特徴とする。 The controller of the node device according to the present invention includes:
The upper node apparatus receives an inter-node common operation sequence that is partially common to a plurality of the partial operation sequences extracted every predetermined period, and stores the inter-node common operation sequence as the partial operation sequence in the cache storage. Is further performed.

本発明は、時系列データの特徴である近接性（近い時間のデータ同士は相関関係が強い）を考慮し、近い時間の時系列データを保存しているサーバ間で頻繁に利用される解析手順（部分操作列）を共有しておくことで、互いの解析手順（部分操作列）の再利用性を向上させる。 The present invention takes into account the proximity that is characteristic of time-series data (data of close time has a strong correlation), and is an analysis procedure frequently used between servers that store time-series data of close time By sharing (partial operation sequence), the reusability of each analysis procedure (partial operation sequence) is improved.

本発明に係るプログラムは、データキャッシュ方法をコンピュータに実行させるためのプログラムである。前記ノード装置は、コンピュータと本プログラムによっても実現でき、本プログラムを記録媒体に記録することも、ネットワークを通して提供することも可能である。 The program according to the present invention is a program for causing a computer to execute the data cache method. The node device can be realized by a computer and the program, and the program can be recorded on a recording medium or provided through a network.

本発明は、データ解析を高速化できるとともにハードディスク容量が枯渇する可能性を低減できるデータキャッシュ方法、ノード装置及びプログラムを提供することができる。 The present invention can provide a data cache method, a node device, and a program that can increase the speed of data analysis and reduce the possibility of the hard disk capacity being exhausted.

本発明に係るノード装置の構成を説明する図である。It is a figure explaining the structure of the node apparatus which concerns on this invention. 木構造を持つ構造化オーバレイの例を説明する図である。It is a figure explaining the example of the structured overlay with a tree structure. 本発明に係るデータキャッシュ方法を説明する図である。It is a figure explaining the data cache method which concerns on this invention. 本発明に係るデータキャッシュ方法を説明する図である。It is a figure explaining the data cache method which concerns on this invention. 本発明に係るデータキャッシュ方法を説明する図である。It is a figure explaining the data cache method which concerns on this invention. 本発明に係るデータキャッシュ方法を説明する図である。It is a figure explaining the data cache method which concerns on this invention.

添付の図面を参照して本発明の実施形態を説明する。以下に説明する実施形態は本発明の実施例であり、本発明は、以下の実施形態に制限されるものではない。なお、本明細書及び図面において符号が同じ構成要素は、相互に同一のものを示すものとする。 Embodiments of the present invention will be described with reference to the accompanying drawings. The embodiments described below are examples of the present invention, and the present invention is not limited to the following embodiments. In the present specification and drawings, the same reference numerals denote the same components.

図１は、本実施形態のノード装置の構成を説明する図である。本ノード装置は、木構造の構造化オーバレイシステムを構成するノード装置であって、
キャッシュストレージ３０と、
時系列データを保存するデータストレージ４０と、
前記構造化オーバレイシステム上にて木構造を辿ることで特定の時系列データを探索する経路制御モジュール１０と、
入力された、前記構造化オーバレイシステムが蓄積する時系列データを解析する解析手順と前記解析手順で解析された解析結果をキャッシュストレージ３０に記録させること、
所定期間毎に、キャッシュストレージ３０が記録した複数の前記解析手順に部分的に共通する部分操作列を抽出し、前記部分操作列をキャッシュストレージ３０に記録させること、
前記部分操作列を含む新たな解析手順が入力されたときに前記部分操作列で解析された解析結果を部分解析結果としてキャッシュストレージ３０に記録させること、及び
前記部分操作列を含む新たな解析手順がさらに入力されたときに、キャッシュストレージ３０が記録する前記部分解析結果を利用して解析すること
を行うキャッシュストレージコントローラ２０と、
を備えることを特徴とする。 FIG. 1 is a diagram illustrating the configuration of the node device according to the present embodiment. This node device is a node device constituting a structured overlay system of a tree structure,
Cache storage 30,
A data storage 40 for storing time-series data;
A path control module 10 for searching for specific time-series data by tracing a tree structure on the structured overlay system;
An analysis procedure for analyzing the input time series data stored in the structured overlay system and an analysis result analyzed by the analysis procedure are recorded in the cache storage 30;
Extracting a partial operation sequence that is partially common to the plurality of analysis procedures recorded by the cache storage 30 every predetermined period, and causing the cache storage 30 to record the partial operation sequence;
Recording an analysis result analyzed in the partial operation sequence in the cache storage 30 as a partial analysis result when a new analysis procedure including the partial operation sequence is input; and a new analysis procedure including the partial operation sequence Cache storage controller 20 that performs analysis using the partial analysis result recorded by the cache storage 30 when
It is characterized by providing.

また、本ノード装置が行うデータキャッシュ方法は、次の通りである。本データキャッシュ方法は、複数のノード装置からなる木構造の構造化オーバレイシステムのデータキャッシュ方法であって、
前記ノード装置に入力された、前記構造化オーバレイシステムが蓄積する時系列データを解析する解析手順と前記解析手順で解析された解析結果を記録する全解析記録ステップと、
所定期間毎に、前記全解析記録ステップで記録した複数の前記解析手順から部分的に共通する部分操作列を抽出し、前記部分操作列を記録する部分操作列抽出ステップと、
前記部分操作列抽出ステップで記録した前記部分操作列を含む新たな解析手順が前記ノード装置に入力されたときに前記部分操作列で解析された解析結果を部分解析結果として記録する部分解析結果記録ステップと、
前記部分解析結果記録ステップの後に、前記部分操作列を含む新たな解析手順がさらに前記ノード装置に入力されたときに、前記部分解析結果記録ステップで記録した前記部分解析結果を利用して解析を行う解析結果再利用ステップと、
を行うことを特徴とする。 The data cache method performed by the node device is as follows. The data cache method is a data cache method for a structured tree overlay system comprising a plurality of node devices,
An analysis procedure for analyzing time-series data stored in the structured overlay system, input to the node device, and an entire analysis recording step for recording an analysis result analyzed by the analysis procedure;
A partial operation sequence extracting step for extracting a partial common operation sequence from a plurality of the analysis procedures recorded in the entire analysis recording step every predetermined period, and recording the partial operation sequence;
Partial analysis result recording that records the analysis result analyzed in the partial operation sequence as a partial analysis result when a new analysis procedure including the partial operation sequence recorded in the partial operation sequence extraction step is input to the node device Steps,
When a new analysis procedure including the partial operation sequence is further input to the node device after the partial analysis result recording step, an analysis is performed using the partial analysis result recorded in the partial analysis result recording step. Analysis result reuse step to be performed;
It is characterized by performing.

経路制御モジュール１０は、木構造を持つ構造化オーバレイにおける経路制御モジュールであり、構造化オーバレイ上にて、特定のＩＤを持つデータ等を、木構造の親子関係を辿ることによって探索する機能を持っている。木の末端のノードはＩＤを持ち、ノードｉ（０≦ｉ＜ＩＤ＿Ｓｐａｃｅ＿Ｓｉｚｅ）のＩＤをＸ_ｉとする。ＩＤ空間は０とＩＤ＿Ｓｐａｃｅ＿Ｓｉｚｅにて連結しており、リング構造をもっているものとする。この時、Ｘ_ｉから右回りにもっとも近いノードをノードｉのＳｕｃｃｅｓｓｏｒ（ＩＤ：ｓｕｃ_ｉ）と呼び、ノードｉの担当領域は［Ｘ_ｉ，ｓｕｃ_ｉ）となる。なお、［Ａ，Ｂ）はＡ以上Ｂ未満を意味し、（Ａ，Ｂ］はＡより大きくＢ以下を意味する。 The routing control module 10 is a routing control module in a structured overlay having a tree structure, and has a function of searching for data having a specific ID on the structured overlay by tracing the parent-child relationship of the tree structure. ing. The node at the end of the tree has an ID, and the ID of node i (0 ≦ i <ID_Space_Size) is X _i . The ID space is connected by 0 and ID_Space_Size and has a ring structure. At this time, the node clockwise from X _i is called the successor (ID: suc _i ) of node i, and the area in charge of node i is [X _i , suc _i ). [A, B] means A or more and less than B, and (A, B] means larger than A and less than or equal to B.

木の節に当たるノードは自身の下に存在するノードの担当領域を記録しておく。これらの木構造のオーバレイの例を図２に示す。図２において、符号Ａ〜Ｇで示した装置がノード装置である。本実施形態では、時系列データを構造化オーバレイにて管理することを想定するため、ＩＤ空間の大きさは［０，８６４００）とし、時系列データは時間（秒）をキーとしてデータストレージ４０に保存されているものとする。 The node corresponding to the node of the tree records the area in charge of the node existing under the node. Examples of these tree-structured overlays are shown in FIG. In FIG. 2, devices indicated by reference signs A to G are node devices. In this embodiment, since it is assumed that the time series data is managed by the structured overlay, the size of the ID space is [0, 86400), and the time series data is stored in the data storage 40 using time (seconds) as a key. Suppose that it is preserved.

キャッシュストレージ３０では、自身の担当領域内のデータに対する解析手順と解析結果を格納するストレージであり、キャッシュストレージコントローラ２０は、解析手順と解析結果をキャッシュストレージ３０に格納するか否かを決定する。提案手法はキャッシュストレージ３０とキャッシュストレージコントローラ２０にて達成されるため、以下ではこれらの詳細について述べる。 The cache storage 30 is a storage for storing analysis procedures and analysis results for the data in its own area, and the cache storage controller 20 determines whether or not to store the analysis procedures and analysis results in the cache storage 30. Since the proposed method is achieved by the cache storage 30 and the cache storage controller 20, these details will be described below.

キャッシュストレージコントローラ２０は、部分操作列抽出部２１、解析手順実行部２２、及び情報共有部２３から構成される。ここで、（ｐ，ｑ］ｐ≠ｑの時間範囲のデータに対して、解析手順｛Ａ−Ｂ−Ｃ−Ｄ｝を実行することを考える。ここでの、ＡやＢは、「ある特徴量を数え上げる」といったデータに対する操作を示しており、｛Ｘ−Ｙ−Ｚ｝はＸ，Ｙ，Ｚの順で操作することを示している。本解析手順は、（ｐ，ｑ］を包含する担当領域を持つノードのキャッシュストレージコントローラ２０にて実行される。 The cache storage controller 20 includes a partial operation sequence extraction unit 21, an analysis procedure execution unit 22, and an information sharing unit 23. Here, it is considered that the analysis procedure {ABCD} is performed on data in a time range of (p, q] p ≠ q. Here, A and B are “a certain feature”. "X, Y, Z" indicates that the operation is performed in the order of X, Y, Z. This analysis procedure includes (p, q]. This is executed by the cache storage controller 20 of the node having the area in charge.

１．解析手順実行部
図３は、解析手順実行部２２の動作を説明する図である。
まず、解析手順｛Ａ−Ｂ−Ｃ−Ｄ｝が入力された際、解析手順実行部２２にて、過去に同じ解析手順が入力されたかどうかを判断する（Ｓ０１）。もし、入力された解析手順が過去に行われていなければ、データストレージ４０から当該解析手順に該当するデータを読み込み（Ｓ０２）、解析を行った結果を

といった形（キャッシュ）で、キャッシュストレージ３０に対して解析結果を保存する（Ｓ０３）。ステップＳ０１、Ｓ０２及びＳ０３は前記全解析記録ステップである。 1. Analysis Procedure Execution Unit FIG. 3 is a diagram for explaining the operation of the analysis procedure execution unit 22.
First, when the analysis procedure {ABCD} is input, the analysis procedure execution unit 22 determines whether the same analysis procedure has been input in the past (S01). If the input analysis procedure has not been performed in the past, data corresponding to the analysis procedure is read from the data storage 40 (S02), and the result of the analysis is displayed.

In this form (cache), the analysis result is stored in the cache storage 30 (S03). Steps S01, S02 and S03 are the all analysis recording steps.

以降、解析手順実行部２２は、ノード装置に入力された解析手順についてキャッシュストレージ３０に確認する（Ｓ０４）。そして、解析手順実行部２２は、当該解析手順がキャッシュストレージ３０に格納されていた場合、当該キャッシュを読み込むこと（Ｓ０５）で、過去と重複する解析手順（データ解析を行い解析結果を取得すること）を省略する。 Thereafter, the analysis procedure execution unit 22 confirms with the cache storage 30 about the analysis procedure input to the node device (S04). Then, when the analysis procedure is stored in the cache storage 30, the analysis procedure execution unit 22 reads the cache (S05), and thereby the analysis procedure overlaps with the past (data analysis is performed and an analysis result is obtained). ) Is omitted.

また、解析手順実行部２２は、後述する部分操作列抽出部２１で頻度が高いと判定された部分操作列を含む解析手順が入力された場合、当該部分操作列で解析した結果（部分解析結果）をキャッシュストレージ３０に書き込む。 Further, when an analysis procedure including a partial operation sequence that is determined to be high in frequency by the partial operation sequence extraction unit 21 to be described later is input, the analysis procedure execution unit 22 analyzes the partial operation sequence (partial analysis result). ) Is written to the cache storage 30.

２．部分操作列抽出部
図４は、部分操作列抽出部２１の動作を説明する図である。
解析手順と解析結果を記憶し続けると、キャッシュによってハードディスクの容量が枯渇する可能性がある。このため、本実施形態では、部分操作列抽出部２１が所定期間毎に、上記の解析手順と解析結果から頻繁に解析される手順を部分操作列として抽出し（Ｓ０６）、キャッシュストレージ３０に書き込む（Ｓ０７）。ステップＳ０６及びＳ０７は前記部分操作列抽出ステップである。 2. Partial Operation Sequence Extraction Unit FIG. 4 is a diagram for explaining the operation of the partial operation sequence extraction unit 21.
If the analysis procedure and the analysis result are continuously stored, the capacity of the hard disk may be exhausted by the cache. For this reason, in this embodiment, the partial operation sequence extraction unit 21 extracts, as a partial operation sequence, a procedure that is frequently analyzed from the above analysis procedure and analysis results every predetermined period (S06), and writes it to the cache storage 30. (S07). Steps S06 and S07 are the partial operation sequence extraction steps.

具体的には、ｋ（＞０）回以上出現する（手順長がｌ以上）部分手順をまとめて抽出することにする。手順長とは解析手順内に存在するデータ操作回数を示し、｛Ａ−Ｂ−Ｃ｝の解析手順の手順長さは３である。図５は、部分操作列抽出部２１が行う部分操作列の抽出方法を説明する図である。
Ｓｔｅｐ１：
全ての解析手順から、トライ木を作成し、共有操作をまとめる。また、各操作を何回行ったかを保存する。
Ｓｔｅｐ２：
作成したトライ木から、ｋ回以上出現し、かつ手順長がｌ以上の部分操作列を取得する。
図５の例では、｛Ａ−Ｂ｝と｛Ｃ−Ａ｝の部分操作列を抽出することができた。 Specifically, partial procedures appearing k (> 0) times or more (procedure length is 1 or more) are collectively extracted. The procedure length indicates the number of data manipulations existing in the analysis procedure, and the procedure length of the analysis procedure of {ABC} is 3. FIG. 5 is a diagram illustrating a partial operation sequence extraction method performed by the partial operation sequence extraction unit 21.
Step 1:
From all analysis procedures, create a trie tree and summarize sharing operations. Also, how many times each operation has been performed is stored.
Step 2:
A partial operation sequence that appears k times or more and has a procedure length of 1 or more is acquired from the created trie tree.
In the example of FIG. 5, the partial operation sequences {AB} and {CA} could be extracted.

そして、部分操作列抽出部２１は、抽出した部分操作列をキャッシュストレージ３０に記憶させた後、前記所定期間内に解析手順実行部２２がキャッシュストレージ３０に記憶させた解析手順と解析結果を削除する。 The partial operation sequence extraction unit 21 stores the extracted partial operation sequence in the cache storage 30, and then deletes the analysis procedure and the analysis result stored in the cache storage 30 by the analysis procedure execution unit 22 within the predetermined period. To do.

解析手順実行部２２は、新たに解析手順が入力されたときに当該解析手順についてキャッシュストレージ３０を確認し（Ｓ０８）、当該解析手順が抽出された実行頻度の高い部分操作列を含む場合（Ｓ０９）、キャッシュストレージ３０に当該部分操作列の部分解析結果を保存しておく（Ｓ１０）。ステップＳ０８〜Ｓ１０は前記部分解析結果記録ステップである。 The analysis procedure execution unit 22 checks the cache storage 30 for the analysis procedure when a new analysis procedure is input (S08), and includes a partial operation sequence with a high execution frequency from which the analysis procedure is extracted (S09). ), The partial analysis result of the partial operation sequence is stored in the cache storage 30 (S10). Steps S08 to S10 are the partial analysis result recording steps.

例えば、ある範囲（ｒ，ｓ］に対する解析手順が入力された際、解析手順実行部２２は、当該解析手順に先頭から｛Ａ−Ｂ｝と｛Ｃ−Ａ｝を含む場合、キャッシュストレージ３０に当該解析結果（部分解析結果）を保存しておく。例えば、解析手順Ｃ−Ａ−Ｄが入力された場合、解析手順実行部２２は、

というキャッシュをキャッシュストレージ３０に保存しておく。 For example, when an analysis procedure for a certain range (r, s) is input, the analysis procedure execution unit 22 stores {A-B} and {C-A} from the top in the analysis procedure in the cache storage 30. The analysis result (partial analysis result) is stored, for example, when the analysis procedure C-A-D is input, the analysis procedure execution unit 22

Is stored in the cache storage 30.

本データキャッシュ方法によって、頻繁に利用される解析結果を再利用し、解析結果の高速化を達成することができる（解析結果再利用ステップ）。また、全ての解析手順とその結果を保存するのではなく、頻繁に利用される解析手順の部分列を保存することで、キャッシュの再利用率を向上させることができるとともに、ハードディスク容量が枯渇する可能性を低減することができる。 By this data cache method, it is possible to reuse frequently used analysis results and achieve high speed analysis results (analysis result reuse step). Also, instead of saving all analysis procedures and their results, saving a partial sequence of frequently used analysis procedures can improve the cache reuse rate and deplete hard disk space. The possibility can be reduced.

３．情報共有部
図４及び図６は、情報共有部２３の動作を説明する図である。
キャッシュストレージコントローラ２０は、抽出した前記部分操作列を自装置の上位にある上位ノード装置へ転送し、前記上位ノード装置が、一定期間毎に抽出した、複数の前記部分操作列に部分的に共通するノード間共通操作列を受け取り、前記ノード間共通操作列を前記部分操作列として前記キャッシュストレージに記憶させる。 3. Information Sharing Unit FIGS. 4 and 6 are diagrams for explaining the operation of the information sharing unit 23.
The cache storage controller 20 transfers the extracted partial operation sequence to an upper node device that is higher than the own device, and the upper node device is partially common to the plurality of partial operation sequences extracted at regular intervals. The inter-node common operation sequence is received, and the inter-node common operation sequence is stored in the cache storage as the partial operation sequence.

また、本ノード装置が行うデータキャッシュ方法は、次の通りである。本データキャッシュ方法は、
前記部分操作列抽出ステップで抽出した前記部分操作列を前記ノード装置の上位にある上位ノード装置へ転送する上位転送ステップと、
前記上位ノード装置で、一定期間毎に、複数の前記ノード装置から前記上位転送ステップで転送されてきた複数の前記部分操作列から部分的に共通するノード間共通操作列を抽出するノード間共通操作列抽出ステップと、
前記ノード間共通操作列抽出ステップで抽出した前記ノード間共通操作列を前記部分操作列として前記ノード装置に記録させるため、前記上位ノード装置から前記ノード装置へ前記ノード間共通操作列を転送する下位転送ステップと、
をさらに行う。 The data cache method performed by the node device is as follows. This data caching method is
An upper transfer step of transferring the partial operation sequence extracted in the partial operation sequence extraction step to an upper node device above the node device;
Inter-node common operation for extracting a partially common inter-node operation sequence from the plurality of partial operation sequences transferred in the upper transfer step from a plurality of the node devices at a certain period in the upper node device. A column extraction step;
A subordinate that transfers the inter-node common operation sequence from the upper node device to the node device in order to cause the node device to record the inter-node common operation sequence extracted in the inter-node common operation sequence extraction step as the partial operation sequence A transfer step;
Do further.

時系列データは一般的に、近い時間同士に特定の相関がみられると考えられており（非特許文献８）、似たデータ同士に対しては、類似する解析が行われると考えられる。そのため、情報共有部２３は、各ノード装置が調べた頻度の高い部分操作列を他のノード装置と共有することにする。 Time series data is generally considered to have a specific correlation between close times (Non-Patent Document 8), and it is considered that similar analysis is performed on similar data. For this reason, the information sharing unit 23 shares a partial operation sequence frequently examined by each node device with other node devices.

図４と図６に共有方法の例を示す。
部分操作列抽出部２１は、抽出した頻度の高い部分操作列を情報共有部２３へ引き渡す（Ｓ１１）。各ノード装置の情報共有部２３は、当該部分操作列を自装置の上位にある上位ノード装置（親ノード）の情報共有部２３に対して送信する（Ｓ１２）。ステップＳ１１及びＳ１２は前記上位転送ステップである。 4 and 6 show examples of sharing methods.
The partial operation sequence extracting unit 21 delivers the extracted partial operation sequence having a high frequency to the information sharing unit 23 (S11). The information sharing unit 23 of each node device transmits the partial operation sequence to the information sharing unit 23 of the upper node device (parent node) that is higher than the own device (S12). Steps S11 and S12 are the upper transfer steps.

親ノードは、当該部分操作列を親ノードのキャッシュストレージ３０に記憶する（Ｓ１３）。そして、親ノードの部分操作列抽出部２１は一定期間毎に部分操作列を読み込み、その中でも頻度の高い部分操作列を見出す（Ｓ１４、Ｓ１５）。ステップＳ１４及びＳ１５は前記ノード間共通操作列抽出ステップである。例えば、部分操作列抽出部２１は親ノード自身のｌとｋの条件を満たす部分操作列を探し出す。 The parent node stores the partial operation sequence in the cache storage 30 of the parent node (S13). Then, the partial operation sequence extraction unit 21 of the parent node reads the partial operation sequence every fixed period, and finds a partial operation sequence having a high frequency among them (S14, S15). Steps S14 and S15 are the inter-node common operation sequence extraction step. For example, the partial operation sequence extraction unit 21 searches for a partial operation sequence that satisfies the conditions of l and k of the parent node itself.

そして、親ノードの部分操作列抽出部２１は探し出した部分操作列を親ノードの情報共有部２３へ引き渡し（Ｓ１６）、情報共有部２３は配下の子ノード間で当該部分操作列を共有させる（Ｓ１７）。子ノードの情報共有部２３は親ノードから受信した部分操作列もキャッシュストレージ３０に保持しておく。ステップＳ１６及びＳ１７は前記下位転送ステップである。 Then, the partial operation sequence extraction unit 21 of the parent node passes the found partial operation sequence to the information sharing unit 23 of the parent node (S16), and the information sharing unit 23 shares the partial operation sequence among subordinate child nodes ( S17). The child node information sharing unit 23 also stores the partial operation sequence received from the parent node in the cache storage 30. Steps S16 and S17 are the lower transfer steps.

子ノードは、自身で抽出した部分操作列と親ノードから転送された部分操作列とを保持しており、自身のある範囲（ｖ，ｗ］に対する解析手順が入力され際に、先頭から｛Ａ−Ｂ｝と｛Ｃ−Ａ｝と｛Ｋ−Ａ｝を含む場合、キャッシュストレージ３０に当該部分解析結果を保存しておく。例えば、解析手順｛Ｃ−Ａ−Ｂ｝が入力された場合、

をキャッシュストレージ３０に保存しておき、当該部分操作列を含む解析手順が投入された場合は、当該解析結果を利用する。 The child node holds the partial operation sequence extracted by itself and the partial operation sequence transferred from the parent node. When an analysis procedure for a certain range (v, w) is input, {A -B}, {CA}, and {KA}, the partial analysis result is stored in the cache storage 30. For example, when the analysis procedure {CAB} is input,

Is stored in the cache storage 30, and when an analysis procedure including the partial operation sequence is input, the analysis result is used.

この時、例えば、解析手順｛Ｃ−Ｋ−Ａ｝のように、｛Ｋ−Ａ｝が含まれているものの先頭から一致しない解析手順が入力された場合、解析手順実行部２２は当該部分操作列｛Ｋ−Ａ｝の部分解析結果を保存しない。 At this time, for example, when an analysis procedure that includes {KA} but does not match from the beginning is input as in the analysis procedure {CKA}, the analysis procedure execution unit 22 performs the partial operation. The partial analysis result of the column {KA} is not saved.

なお、各親ノードが抽出した部分操作列を、さらに上位の親ノードに共有させてもよい。 The partial operation sequence extracted by each parent node may be shared by a higher-order parent node.

なお、本データキャッシュ方法の全解析記録ステップは、図３のＳ０１〜Ｓ０３に相当し、部分操作列抽出ステップは図４のＳ０６〜Ｓ０７に相当し、部分解析結果記録ステップは図４のＳ０８〜Ｓ１０に相当する。
また、上位転送ステップは図４のＳ１１〜Ｓ１２に相当し、ノード間共通操作列抽出ステップは図４のＳ１４〜Ｓ１５に相当し、下位転送ステップは図４のＳ１６〜Ｓ１７に相当する。 Note that the entire analysis recording step of this data cache method corresponds to S01 to S03 in FIG. 3, the partial operation sequence extraction step corresponds to S06 to S07 in FIG. 4, and the partial analysis result recording step corresponds to S08 to S08 in FIG. This corresponds to S10.
Also, the upper transfer step corresponds to S11 to S12 in FIG. 4, the inter-node common operation sequence extraction step corresponds to S14 to S15 in FIG. 4, and the lower transfer step corresponds to S16 to S17 in FIG.

４．プログラム
本プログラムは、上記データキャッシュ方法をコンピュータに実行させるためのプログラムである。ネットワーク等で接続された複数のコンピュータに本プログラムを実行させることで上記データキャッシュ方法を実現できる。 4). Program This program is a program for causing a computer to execute the data cache method. The data cache method can be realized by causing a plurality of computers connected via a network or the like to execute the program.

（効果）
本発明は、多数のサーバにて構築される木構造の構造化オーバレイに蓄積されているトラフィックデータやセンサデータ等の時系列データ解析する際に、過去の解析手順から頻繁に利用される解析手順とその結果を記録しておき、新たな解析手順が投入された場合、過去の似た解析手順の結果を再利用することで、解析時間を短縮する技術に関する。また、時系列データの特徴である近接性（近い時間のデータ同士は相関関係が強い）を考慮し、近い時間の時系列データを保存しているサーバ間で頻繁に利用される解析手順を共有しておくことで、互いの解析手順の再利用性を向上させる。 (effect)
The present invention is an analysis procedure that is frequently used from past analysis procedures when analyzing time-series data such as traffic data and sensor data stored in a structured overlay of a tree structure constructed by a large number of servers. And a result thereof, and when a new analysis procedure is input, the present invention relates to a technique for shortening the analysis time by reusing the result of a similar analysis procedure in the past. In addition, considering the proximity that is characteristic of time series data (short time data has a strong correlation), frequently used analysis procedures are shared between servers that store time series data of near time This improves the reusability of each other's analysis procedures.

１０：経路制御モジュール
２０：キャッシュストレージコントローラ
２１：部分操作列抽出部
２２：解析手順実行部
２３：情報共有部
３０：キャッシュストレージ
４０：データストレージ 10: Path control module 20: Cache storage controller 21: Partial operation sequence extraction unit 22: Analysis procedure execution unit 23: Information sharing unit 30: Cache storage 40: Data storage

Claims

A data cache method for a tree structured overlay system comprising a plurality of node devices, comprising:
An analysis procedure for analyzing time-series data stored in the structured overlay system, input to the node device, and an entire analysis recording step for recording an analysis result analyzed by the analysis procedure;
A partial operation sequence extracting step for extracting a partial common operation sequence from a plurality of the analysis procedures recorded in the entire analysis recording step every predetermined period, and recording the partial operation sequence;
Partial analysis result recording that records the analysis result analyzed in the partial operation sequence as a partial analysis result when a new analysis procedure including the partial operation sequence recorded in the partial operation sequence extraction step is input to the node device Steps,
When a new analysis procedure including the partial operation sequence is further input to the node device after the partial analysis result recording step, an analysis is performed using the partial analysis result recorded in the partial analysis result recording step. Analysis result reuse step to be performed;
And a data cache method.

An upper transfer step of transferring the partial operation sequence extracted in the partial operation sequence extraction step to an upper node device above the node device;
The data cache method according to claim 1, further comprising:

Inter-node common operation for extracting a partially common inter-node operation sequence from the plurality of partial operation sequences transferred in the upper transfer step from a plurality of the node devices at a certain period in the upper node device. A column extraction step;
A subordinate that transfers the inter-node common operation sequence from the upper node device to the node device in order to cause the node device to record the inter-node common operation sequence extracted in the inter-node common operation sequence extraction step as the partial operation sequence A transfer step;
The data cache method according to claim 2, wherein:

A node device constituting a structured overlay system having a tree structure,
Cache storage,
Data storage to store time series data;
A path control module for searching for specific time series data by following a tree structure on the structured overlay system;
An analysis procedure for analyzing time-series data accumulated by the structured overlay system and an analysis result analyzed by the analysis procedure are recorded in the cache storage;
Extracting a partial operation sequence that is partially common to a plurality of the analysis procedures recorded by the cache storage for each predetermined period, and recording the partial operation sequence in the cache storage;
Recording an analysis result analyzed in the partial operation sequence in the cache storage as a partial analysis result when a new analysis procedure including the partial operation sequence is input; and a new analysis procedure including the partial operation sequence A controller that performs analysis using the partial analysis result recorded by the cache storage,
A node device comprising:

The node device according to claim 4, wherein the controller transfers the extracted partial operation sequence to a higher-level node device that is higher than the own device.

The controller is
The upper node apparatus receives an inter-node common operation sequence that is partially common to a plurality of the partial operation sequences extracted every predetermined period, and stores the inter-node common operation sequence as the partial operation sequence in the cache storage. The node device according to claim 5, further comprising:

The program for making a computer perform the data cache method in any one of Claim 1 to 3.