JP2009009304A

JP2009009304A - Processing method for stream data, and stream data processing system

Info

Publication number: JP2009009304A
Application number: JP2007169256A
Authority: JP
Inventors: Tsuneyuki Imaki; 常之今木; Shinji Fujiwara; 真二藤原; Itaru Nishizawa; 格西澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2007-06-27
Filing date: 2007-06-27
Publication date: 2009-01-15
Anticipated expiration: 2027-06-27
Also published as: JP4990696B2

Abstract

<P>PROBLEM TO BE SOLVED: To materialize processing for cutting and dividing a tuple on a stream into a partial tuple string of a phenomenon unit without taking trouble and cost in a stream data processing. <P>SOLUTION: In this stream data processing system 110, a tuple string on the stream is divided into the partial tuple strings based on a proximity relation system of time of a time-stamp imparted to each stream tuple on the stream, and a tuple set belonging to the latest partial tuple string in each the time is extracted as an analysis target in the time and is processed in a data analysis part 130 of the stream data processing system 110. Thereby, in the stream data processing, a set of observation data corresponding to each phenomenon can be extracted in just proportion. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、高レートで継続的に生成される時系列データをリアルタイムに処理する技術に関し、特に、ストリームデータ処理における、ある現象の発生を監視して生成され続ける観測データの系列から、個別の現象に対応したデータ集合を時間軸上の近接関係に基づいて抽出する処理に有効な技術に関する。 The present invention relates to a technique for processing time-series data continuously generated at a high rate in real time, and in particular, from a series of observation data continuously generated by monitoring the occurrence of a certain phenomenon in stream data processing. The present invention relates to a technique effective in processing for extracting a data set corresponding to a phenomenon based on proximity relations on a time axis.

ユビキタス情報社会の到来に伴い、ＲＦＩＤ（ＲａｄｉｏＦｒｅｑｕｅｎｃｙＩＤｅｎｔｉｆｉｃａｔｉｏｎ）読み取り情報、センサネットによる観測データ、ネットワークモニタリング、システムモニタリング、株価監視、交通情報といった、高レートで継続的に生成される時系列データが増加しており、これら時系列データを瞬時に解析し、ユーザにフィードバックするような、新たなサービスが望まれている。 With the advent of the ubiquitous information society, time-series data that is continuously generated at a high rate, such as RFID (Radio Frequency IDentification) reading information, sensor network observation data, network monitoring, system monitoring, stock price monitoring, and traffic information, will increase. Therefore, a new service is desired that analyzes these time series data instantaneously and feeds back to the user.

このようなサービスを構築するためのデータ処理方式として、ストリームデータ処理という技術が、データベース関連の学会で提案されている。ストリームデータ処理を利用すれば、従来のリレーショナルデータベースで利用されている、関係代数に基づいたデータ処理モデルを、時系列データの解析にも適用し、ユーザによるデータ処理の定義を、ＳＱＬ（ＳｔｒｕｃｔｕｒｅｄＱｕｅｒｙＬａｎｇｕａｇｅ）に類似した宣言型の検索言語で記述可能である。これにより、前記のようなサービスの開発作業を、大幅に効率化する。また、ストリームデータ処理の原理、実現方式は非特許文献１に開示されている（非特許文献１参照）。 As a data processing method for constructing such a service, a technique called stream data processing has been proposed by a database-related academic society. If stream data processing is used, the data processing model based on the relational algebra used in the conventional relational database is also applied to the analysis of time series data, and the definition of data processing by the user is defined by SQL (Structured Query). (Language) can be described in a declarative search language. This greatly improves the efficiency of the service development work as described above. The principle and implementation method of stream data processing are disclosed in Non-Patent Document 1 (see Non-Patent Document 1).

以降では、時系列データを表す用語として、無限に継続するデータ系列をストリームと呼び、ストリーム上の各データを、ストリームタプル、あるいは単にタプルと呼ぶ。ストリームタプルには、必ずタイムスタンプが付与されており、ストリームタプルは、ストリーム上において、タイムスタンプの昇順に整列される。 In the following, as a term representing time series data, a data series that continues infinitely is called a stream, and each data on the stream is called a stream tuple or simply a tuple. The stream tuple is always given a time stamp, and the stream tuple is arranged in ascending order of the time stamp on the stream.

ストリームデータ処理を利用すれば、結合演算、和・差・積集合演算、集計演算、グループ分け処理、重複排除といった、関係代数で定義されているデータ集合上の各種演算をストリームに対して実施することが可能である。 If stream data processing is used, various operations on a data set defined by relational algebra such as join operation, sum / difference / product set operation, aggregation operation, grouping processing, and deduplication are performed on the stream. It is possible.

但し、これらの演算は、有限要素数のデータ集合を対象とするため、絶え間なくタプルが到来し続ける（即ち、無限に集合の要素が増え続ける）ストリームに対してこれらの演算を適用するには、処理を実施する各時点で、解析対象となるタプルの集合を常に限定しつつ実行する必要がある。 However, since these operations target a data set with a finite number of elements, in order to apply these operations to streams where tuples continue to arrive (that is, the elements of the set continue to increase infinitely). Therefore, it is necessary to always limit the set of tuples to be analyzed at each time when the processing is performed.

ストリーム上から、解析対象となるタプル集合を限定する方法の一つとして、タプルの時間軸上での順序関係に基づき、ある時点で解析対象となるタプル集合を、常時切り出す、スライディングウィンドウ方式が提案されている。 As a method of limiting the tuple set to be analyzed from the stream, a sliding window method is proposed that always cuts out the tuple set to be analyzed at a certain point of time based on the order relation of tuples on the time axis. Has been.

該方式は、非特許文献１に開示されている。既存のスライディングウィンドウは、時間指定ウィンドウと個数指定ウィンドウの二種類に分類される。 This method is disclosed in Non-Patent Document 1. Existing sliding windows are classified into two types: time specification windows and number specification windows.

時間指定ウィンドウは、各タプルが解析対象となる期間を、該タプルのタイムスタンプを起点、該タイムスタンプよりある定数時間経過した時刻を終点とする期間に限定する。該定数時間は、ユーザにより任意の時間に指定可能である。個数指定ウィンドウは、同時に解析対象となるタプルを、最新の定数個に限定する。該個数は、ユーザにより任意の個数に指定可能である。 The time designation window limits the period during which each tuple is analyzed to a period starting from the time stamp of the tuple and ending at a time after a certain fixed time from the time stamp. The constant time can be specified at any time by the user. The number designation window simultaneously limits the tuples to be analyzed to the latest constant number. The number can be specified as an arbitrary number by the user.

ストリーム上から、解析対象のタプル集合を限定するもう一つの方法として、該集合の終了を示す、区切り信号をストーム中に挿入する方式が提案されている。この方式は、非特許文献２に開示されている（非特許文献２参照）。 As another method for limiting the set of tuples to be analyzed from the stream, a method of inserting a delimiter signal in the storm to indicate the end of the set has been proposed. This method is disclosed in Non-Patent Document 2 (see Non-Patent Document 2).

以上のように、ストリームデータ処理では、時系列データの中から解析対象となる集合を限定する機能を備えることで、データの変化量、移動平均値の算出、データの発生順序の検知、など、時系列データの解析における重要処理を、柔軟に定義することが可能である。
Ｂ．Ｂａｂｃｏｃｋ、Ｓ．Ｂａｂｕ、Ｍ．Ｄａｔａｒ、Ｒ．ＭｏｔｗａｎｉａｎｄＪ．Ｗｉｄｏｍ、“Ｍｏｄｅｌｓａｎｄｉｓｓｕｅｓｉｎｄａｔａｓｔｒｅａｍｓｙｓｔｅｍｓ”、ＩｎＰｒｏｃ．ｏｆＰＯＤＳ２００２、ｐｐ．１−１６．（２００２）Ｔ．Ｊｏｈｏｎｓｏｎ、Ｓ．Ｍｕｔｈｕｋｒｉｓｈｎａｎ、Ｖ．ＳｈｋａｐｅｎｙｕｋａｎｄＯ．Ｓｐａｔｓｃｈｅｃｋ、“ＡＨｅａｒｔｂｅａｔＭｅｃｈａｎｉｓｍａｎｄｉｔｓＡｐｐｌｉｃａｔｉｏｎｉｎＧｉｇａｓｃｏｐｅ”、ＩｎＰｒｏｃ．ｏｆＶＬＤＢ２００５、ｐｐ．１０７９−１０８８．（２００５） As described above, in stream data processing, by providing a function to limit the set to be analyzed from time-series data, the amount of data change, calculation of moving average value, detection of data generation order, etc. It is possible to flexibly define important processes in the analysis of time series data.
B. Babcock, S.M. Babu, M.M. Data, R.A. Motwani and J.M. Widom, “Models and issues in data stream systems”, In Proc. of PODS 2002, pp. 1-16. (2002) T.A. Johnson, S.M. Muthukrishnan, V.M. Shkapenyuk and O.K. Spatcheck, "A Heartbeat Mechanism and its Application in Gigascope", In Proc. of VLDB 2005, pp. 1079-1088. (2005)

今後、ユビキタス情報機器の技術が発達することで、ＲＦＩＤの輻輳読み取りによる検品業務の効率化、センサネットによる環境モニタリング、車載センサの高機能化によるテレマティクスの高度化、といったサービスが実現可能となる。 In the future, with the development of technology for ubiquitous information devices, services such as efficient inspection work by reading RFID congestion, environmental monitoring by sensor networks, and advancement of telematics by increasing the functionality of in-vehicle sensors can be realized.

これらのサービスを実現するためには、各種機器が生成する観測データから、リアルワールドで発生する現象を解析するデータ処理技術が必須になる。ストリームデータ処理は、このようなサービスの実現に適した技術と考えられている。 In order to realize these services, data processing technology that analyzes phenomena occurring in the real world from observation data generated by various devices is essential. Stream data processing is considered a technique suitable for realizing such services.

機器が生成する観測データから、現象解析を実現するためには、時系列な観測データの並びから、各現象に対応する部分データ列を抽出する処理が必須となる。 In order to realize the phenomenon analysis from the observation data generated by the device, it is essential to extract a partial data string corresponding to each phenomenon from the time series of observation data.

この処理の必要性を、図４のＲＦＩＤの輻輳読み取りを利用した入荷検品の例を用いて示す。この場合、ダンボール１３０３、１３０４が読み取りゲート１３０２を通過することで、該ダンボール中に梱包された各製品に関する情報を表す、表１３１０のようなデータが生成される。 The necessity of this processing will be described using an example of an incoming inspection using RFID congestion reading in FIG. In this case, when the corrugated cardboards 1303 and 1304 pass through the reading gate 1302, data as shown in Table 1310 representing information on each product packed in the corrugated cardboard is generated.

各行１３１１〜１３１７が、各製品に関するデータであり、製品ＩＤを示すｉｄカラム、製品重量を示すｗｅｉｇｈｔカラム、および製造日を示すｄａｔｅカラムから成る。ダンボール１３０３を読み取ることでデータ１３１１〜１３１３が、ダンボール１３０４を読み取ることでデータ１３１４〜１３１７が生成される。ストリームデータ処理システム１１０は、該データを受け取り、各ダンボールについて、梱包製品数、総重量、製造期間を算出し、検査員などに通知する。 Each row 1311 to 1317 is data relating to each product, and includes an id column indicating a product ID, a weight column indicating a product weight, and a date column indicating a manufacturing date. Data 1311 to 1313 are generated by reading the cardboard 1303, and data 1314-1317 are generated by reading the cardboard 1304. The stream data processing system 110 receives the data, calculates the number of packed products, the total weight, and the production period for each cardboard, and notifies an inspector or the like.

また、検査員による作業のタイミングおよび読み取りデータが生成されるタイミングは、表１３１０の下方のタイミングウィンドウ１３２０に示される。検査員が、９：１５’０２にダンボール１となるダンボール１３０３を読み取りゲート１３０２を通過させることで、該ダンボールに梱包された各製品のデータ１３２１が９：１５’０２〜９：１５’０４の間に生成される。 The timing of work by the inspector and the timing at which the read data is generated are shown in a timing window 1320 below the table 1310. When the inspector reads the cardboard 1303 that becomes the cardboard 1 at 9: 15′02 and passes it through the gate 1302, the data 1321 of each product packed in the cardboard is 9: 15′02 to 9: 15′04. Generated in between.

該データをストリームデータ処理システム１１０で集計した結果を検査員が９：１５’０６に確認する。ダンボール２（１３０４）についても同様である。この処理において、ストリームデータ処理システムは、各ダンボール別に集計結果を生成する必要がある。 The inspector confirms the result of totaling the data by the stream data processing system 110 at 9: 15'06. The same applies to cardboard 2 (1304). In this process, the stream data processing system needs to generate a total result for each cardboard.

ここで、個々のダンボールの読み取りを一つの現象と捉えると、ストリームデータ処理は、時系列の製品データから、各ダンボール（即ち一つの現象）に対応する部分データ列を抽出する必要がある。 Here, when the reading of each cardboard is regarded as one phenomenon, the stream data processing needs to extract a partial data string corresponding to each cardboard (that is, one phenomenon) from the time series product data.

別の例として、図７のセンサネットによる環境監視の例を示す。空間に散らばって配置されたセンサ１４０１（ＩＤがＳ０１〜Ｓ２３）により、音声、光量、温度、湿度、気圧、風圧、風速、振動、といった物理量を計測し、該計測データに基づいて、地震、火災、爆発、事故といった現象を解析する。 As another example, an example of environmental monitoring by the sensor network of FIG. 7 is shown. Physical quantities such as voice, light quantity, temperature, humidity, atmospheric pressure, wind pressure, wind speed, and vibration are measured by sensors 1401 (IDs S01 to S23) arranged in a space, and an earthquake, a fire is detected based on the measurement data. Analyzes phenomena such as explosions and accidents.

観測する物理量、現象の種類、および両者の組合せを問わず、このような現象解析に、センサネットが適用可能である。ここで、１０：２５、１０：２８、および１０：３２の各時刻に、現象１４０２、１４０３、および１４０４が発生し、それぞれの現象によって起因する物理量を、１４０５、１４０６、および１４０７の各線に囲まれたセンサ群が検知し、計測する。 Regardless of the physical quantity to be observed, the type of phenomenon, and the combination of both, a sensor network can be applied to such a phenomenon analysis. Here, phenomena 1402, 1403, and 1404 occur at times of 10:25, 10:28, and 10:32, and physical quantities resulting from these phenomena are surrounded by lines 1405, 1406, and 1407, respectively. The detected sensor group detects and measures.

表１４１０は、該計測データを示す。各行１４１１〜１４２２が、各計測データであり、計測したセンサのＩＤを示すｉｄカラム、および計測値を示すｖａｌｕｅカラムから成る。現象１４０２により、データ１４１１〜１４１４が、現象１４０３により、データ１４１５〜１４１７が、現象１４０４により、データ１４１８〜１４２２が、それぞれ生成される。なお、センサが現象の発生場所に近い程、計測値は大きくなる。 Table 1410 shows the measurement data. Each row 1411 to 1422 is each measurement data, and includes an id column indicating the ID of the measured sensor and a value column indicating the measurement value. Data 1411 to 1414 is generated by the phenomenon 1402, data 1415 to 1417 is generated by the phenomenon 1403, and data 1418 to 1422 is generated by the phenomenon 1404. The closer the sensor is to the place where the phenomenon occurs, the larger the measured value.

ウィンドウ１４３０は、現象の発生タイミング、およびセンサによる計測データが生成されるタイミングを示した図である。このようなデータ系列から、個々の現象ごとの解析を実現するために、ストリームデータ処理は、時系列の計測データから、各現象に対応する部分データ列を抽出する必要がある。 Window 1430 is a diagram showing the occurrence timing of the phenomenon and the timing at which measurement data is generated by the sensor. In order to realize analysis for each phenomenon from such a data series, the stream data processing needs to extract a partial data string corresponding to each phenomenon from time-series measurement data.

以上のように、ストリームデータ処理を、ユビキタス機器が生成する観測データに基づく現象解析に適用するためには、ストリーム上のタプルを現象単位で切り分ける処理が必須となる。 As described above, in order to apply the stream data processing to the phenomenon analysis based on the observation data generated by the ubiquitous device, it is essential to process the tuples on the stream in units of phenomena.

これに対し、非特許文献１に開示されている、既存のスライディングウィンドウ方式である、時間指定ウィンドウ、および個数指定ウィンドウでは、このような処理が実現不可能であることを、図４の例で示す。 On the other hand, in the example of FIG. 4, it is impossible to realize such processing in the time designation window and the number designation window, which are the existing sliding window methods disclosed in Non-Patent Document 1. Show.

時間指定ウィンドウ１３３０は、タイミングウィンドウ１３２０に示したストリームに、時間指定ウィンドウを適用する場合を示す。ウィンドウ１３３１は、各製品データ１３２１が、解析対象となる期間を、黒丸から右方向（時間軸における未来方向）に伸びる線分で表している。 The time designation window 1330 indicates a case where the time designation window is applied to the stream indicated in the timing window 1320. The window 1331 represents a period in which each product data 1321 is an analysis target by a line segment extending from the black circle to the right direction (future direction on the time axis).

全ての線分の長さは、ユーザにより指定された解析対象期間１３３３と同一の長さとなる。ウィンドウ１３３２は、ウィンドウ１３３１に示した解析対象期間を持つストリームタプルから、製品合計数を算出した結果を示す。９：１５’０６において、検査員が該結果を確認する際には、ダンボール１３０３の製品数３が正しく確認される。 All line segments have the same length as the analysis target period 1333 specified by the user. Window 1332 shows the result of calculating the total number of products from the stream tuple having the analysis target period shown in window 1331. At 9: 15'06, when the inspector confirms the result, the product number 3 of the cardboard 1303 is correctly confirmed.

一方、９：１５’１８に検査員が該結果を確認する際には、まだダンボール１３０３の読み取りデータが解析対象期間として残っているため、ダンボール１３０４の製品数４とは異なる間違った値７が確認されることになる。 On the other hand, when the inspector confirms the result at 9: 15'18, since the reading data of the cardboard 1303 still remains as the analysis target period, an incorrect value 7 different from the product number 4 of the cardboard 1304 is obtained. Will be confirmed.

ウィンドウ１３４０は、解析対象期間を短くして、タイミングウィンドウ１３２０に示したストリームに、時間指定ウィンドウを適用する場合を示す。１３４３は、解析対象期間の長さを示す。ウィンドウ１３４１は、各データが解析対象となる期間を、またウィンドウ１３４２は、製品合計数を算出した結果を示す。 A window 1340 shows a case where a time designation window is applied to the stream shown in the timing window 1320 by shortening the analysis target period. 1343 indicates the length of the analysis target period. A window 1341 shows a period in which each data is an analysis target, and a window 1342 shows a result of calculating the total number of products.

データの解析対象期間が短い場合、そこから算出した結果の継続期間も短くなるため、９：１５’０６、９：１５’１８どちらの時刻においても、検査員は間違った値を確認することになる。 When the data analysis target period is short, the duration of the result calculated therefrom is also short, so that the inspector confirms the wrong value at either 9: 15'06 or 9: 15'18. Become.

検査員の作業スピードは一定ではないので、解析対象期間を長くとっても、短くとっても、以上のような不具合を完全に回避することは不可能である。 Since the work speed of the inspector is not constant, it is impossible to completely avoid the above-described problems even if the analysis target period is long or short.

個数指定ウィンドウ１３５０は、タイミングウィンドウ１３２０に示したストリームに、個数指定ウィンドウとして同時解析対象個数３個を適用する場合を示す。ウィンドウ１３５１は、各データが解析対象となる期間を、またウィンドウ１３５２は、製品合計数を算出した結果を示す。個数指定ウィンドウでは、常に最新の３個のデータのみが解析対象として残るため、製品合計数は、ダンボールに含まれる製品数に関わらず３となり、正しい結果が得られない。 The number designation window 1350 shows a case where three simultaneous analysis target numbers are applied to the stream shown in the timing window 1320 as the number designation window. A window 1351 shows a period in which each data is an analysis target, and a window 1352 shows a result of calculating the total number of products. In the number designation window, since only the latest three pieces of data always remain as analysis targets, the total number of products is 3, regardless of the number of products included in the cardboard, and a correct result cannot be obtained.

同様に、図７のセンサネットの例においても、現象の発生間隔、および各現象に対応する計測データ数は不定であるため、時間指定ウィンドウと個数指定ウィンドウでは、現象単位に計測データを切り分けることは不可能である。 Similarly, in the example of the sensor network of FIG. 7, the occurrence interval of phenomena and the number of measurement data corresponding to each phenomenon are indefinite. Therefore, the measurement data is divided into phenomenon units in the time designation window and the number designation window. Is impossible.

また、非特許文献２に開示されている、ストリーム中に区切り信号を挿入する方式においても、時系列データを現象単位に切り分ける処理において課題があることを、図４、図７の例を用いて示す。 In addition, the method disclosed in Non-Patent Document 2 for inserting a delimiter signal into a stream has a problem in the process of dividing time-series data into phenomenon units using the examples of FIGS. 4 and 7. Show.

図４の例に、区切り信号を挿入する方式を適用する場合、ダンボール間に仕切り棒を配置し、該仕切り棒の読み取りデータ時に、区切り信号をストリームに挿入することで、ＲＦＩＤ読み取りデータをダンボール単位で切り分けることが可能である。 In the example of FIG. 4, when a method for inserting a delimiter signal is applied, a partition bar is arranged between cardboards, and when the partition bar read data is inserted, a delimiter signal is inserted into the stream so that RFID read data is in cardboard units. It is possible to carve out with.

または、読み取りゲート１３０２に赤外線センサを増設することで、ダンボールの区切りを検知することも可能である。または、検査員が、一つのダンボールを、読み取りゲート１３０２を通過させるごとに、読み取り終了ボタンを押すなどの操作によって、ダンボールの区切りを明示することも可能である。 Alternatively, a cardboard break can be detected by adding an infrared sensor to the reading gate 1302. Alternatively, each time an inspector passes one cardboard through the reading gate 1302, it is possible to clearly indicate the separation of the cardboard by an operation such as pressing a reading end button.

但し、これらの方法のいずれにおいても、仕切り棒の配置、赤外線センサの増設、読み取り終了ボタンの操作、などの余計な手間とコストが必要となる。 However, in any of these methods, extra labor and cost such as arrangement of partition bars, addition of an infrared sensor, and operation of a reading end button are required.

また、図７の例において、一現象に対応する計測データ列の終了を示す区切り信号を、ストリームに挿入するためには、一つの現象が発生した際に、該現象を検知したセンサの全てが、計測データを生成し終えたことを、判定する処理が必要である。 In addition, in the example of FIG. 7, in order to insert a delimiter signal indicating the end of the measurement data string corresponding to one phenomenon into the stream, when one phenomenon occurs, all the sensors that detect the phenomenon A process for determining that the measurement data has been generated is necessary.

しかし、センサネットでは、分散して配置された各センサが、独立に計測データを生成するため、このような判定処理の実現が困難である。 However, in the sensor network, since each sensor arranged in a distributed manner generates measurement data independently, it is difficult to realize such a determination process.

以上のように、ストリームデータ処理を、ユビキタス機器が生成する観測データに基づく現象解析に適用するためには、ストリーム上のタプルを現象単位で切り分ける処理が必須となるが、従来技術である、時間指定ウィンドウ、個数指定ウィンドウでは、このような処理は実現不可能であり、区切り信号挿入方式においても、実現のためにはユーザに余計な手間がかかる、コストがかかる、あるいは実現自体が困難な場合がある、といった問題があった。 As described above, in order to apply stream data processing to phenomenon analysis based on observation data generated by a ubiquitous device, processing for separating tuples on a stream by phenomenon unit is indispensable. Such a process cannot be realized in the specified window or number specified window, and even when the delimiter signal insertion method is used, it takes extra time for the user to implement, it is costly, or the implementation itself is difficult There was a problem that there was.

本発明の目的は、ストリームデータ処理において、ストリーム上のタプルを、現象単位の部分タプル列に切り分ける処理を、ユーザの手間とコストを必要とせずに、実現することのできる技術を提供することにある。 An object of the present invention is to provide a technique capable of realizing, in stream data processing, processing for dividing tuples on a stream into partial tuple sequences in units of phenomena without requiring user effort and cost. is there.

本発明の前記ならびにそのほかの目的と新規な特徴については、本明細書の記述および添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、次のとおりである。 Of the inventions disclosed in the present application, the outline of typical ones will be briefly described as follows.

以下、本願発明の概要を簡単に示す。 The outline of the present invention will be briefly described below.

本発明は、ストリーム上の各ストリームタプルに付与されたタイムスタンプの、時刻の近接関係系に基づいて、ストリーム上のタプル列を部分タプル列に分割し、各時刻において最新の部分タプル列に属するタプル集合を、該時刻における解析対象として、ストリームデータ処理システムのデータ解析部に渡す。 The present invention divides a tuple sequence on a stream into partial tuple sequences based on a time proximity relation system of time stamps assigned to each stream tuple on the stream, and belongs to the latest partial tuple sequence at each time. The tuple set is passed to the data analysis unit of the stream data processing system as an analysis target at the time.

以下では、該部分タプル列をデータクラスタと呼び、データクラスタを構成する際に参照する、タプルのタイムスタンプに関する条件を、クラスタ構成条件と呼ぶ。 Hereinafter, the partial tuple sequence is referred to as a data cluster, and a condition related to the time stamp of the tuple referred to when configuring the data cluster is referred to as a cluster configuration condition.

より具体的には、以下の処理より構成する、解析対象データ切り出し部を、システムのデータ入力口に備える、ストリームデータ処理システムを提供する。
（ａ）ユーザによるデータ処理定義から、クラスタ構成条件を抽出する処理。
（ｂ）該システムに新規タプルが入力された場合は、該クラスタ構成条件に基づいて、該新規タプルのタイムスタンプをチェックし、該新規タプルが、その時点で解析対象となっているデータクラスタに、追加されるか否かを判定する処理。
（ｃ）該新規タプルが、該解析対象となっているデータクラスタに追加されない場合には、該データクラスタに属する全てのタプルを、解析対象から外すように、データ解析部に通知し、新たなデータクラスタを解析対象として定義し、該新規タプルをデータクラスタに追加する処理。
（ｄ）該新規タプルを解析対象としてデータ解析部に渡す処理。 More specifically, a stream data processing system is provided that includes an analysis target data cutout unit configured by the following processing at a data input port of the system.
(A) Processing for extracting a cluster configuration condition from a data processing definition by a user.
(B) When a new tuple is input to the system, the time stamp of the new tuple is checked based on the cluster configuration condition, and the new tuple is added to the data cluster currently being analyzed. The process of determining whether or not to be added.
(C) If the new tuple is not added to the data cluster that is the analysis target, the data analysis unit is notified so that all tuples belonging to the data cluster are excluded from the analysis target. A process of defining a data cluster as an analysis target and adding the new tuple to the data cluster.
(D) Processing for passing the new tuple as an analysis target to the data analysis unit.

クラスタ構成条件に基づく判定は、具体的に次の２タイプの何れかとする。 The determination based on the cluster configuration condition is specifically one of the following two types.

（１）新規タプルが到着した際に、該新規タプルのタイムスタンプと、その時点で解析対象となっているデータクラスタに属するタプルのうち、最古のタプルのタイムスタンプとを比較し、両タイムスタンプの差がユーザによって指定された時間しきい値以内である場合には、該新規タプルを、該データクラスタに追加し、そうでない場合には、前記（ｃ）の処理を実施する。 (1) When a new tuple arrives, the time stamp of the new tuple is compared with the time stamp of the oldest tuple among the tuples belonging to the data cluster being analyzed at that time. If the stamp difference is within the time threshold specified by the user, the new tuple is added to the data cluster; otherwise, the process of (c) is performed.

（２）新規タプルが到着した際に、該新規タプルのタイムスタンプと、その時点で解析対象となっているデータクラスタに属するタプルのうち、最新のタプルのタイムスタンプとを比較し、両タイムスタンプの差がユーザによって指定された時間しきい値以内である場合には、該新規タプルを、該データクラスタに追加し、そうでない場合には、前記（ｃ）の処理を実施する。 (2) When a new tuple arrives, the time stamp of the new tuple is compared with the time stamp of the latest tuple among the tuples belonging to the data cluster being analyzed at that time. If the difference is within the time threshold specified by the user, the new tuple is added to the data cluster; otherwise, the process of (c) is performed.

以上により、時系列データにおいて、発生時間が近接している一固まりのデータ集合を、ストリームデータ処理の解析対象として切り出す。 As described above, in the time-series data, a group of data sets whose generation times are close to each other is cut out as an analysis target of stream data processing.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

（１）ストリームデータ処理において、各現象に対応する観測データの集合を過不足なく抽出することが可能となる。 (1) In stream data processing, a set of observation data corresponding to each phenomenon can be extracted without excess or deficiency.

（２）また、データの発生時刻のみに基づいて、該抽出処理が可能となるため、データの区切り信号を生成するための手間やコストなどを不要にすることができる。 (2) Further, since the extraction process can be performed based only on the data generation time, it is possible to eliminate the labor and cost for generating the data delimiter signal.

（３）さらに、データの生成源が複数に分散されているので、データの区切り信号を生成することが困難な場合であっても抽出処理を実現することができる。 (3) Furthermore, since the data generation sources are distributed in plural, the extraction process can be realized even when it is difficult to generate a data delimiter signal.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

図１は、本発明の一実施の形態によるストリームデータ処理システムの構成例を示すブロック図、図２は、図１のストリームデータ処理システムによるデータクラスタ切り出し処理例を示すフローチャート、図３は、図１のストリームデータ処理システムによるストリームデータ処理の説明図、図４は、図１のストリームデータ処理システムを用いたＲＦＩＤの輻輳読み取りによる入荷検品作業処理の一例を示す説明図、図５は、図１のストリームデータ処理システムによるデータクラスタ切り出し処理の他の例を示すフローチャート、図６は、図１のストリームデータ処理システムによるストリームデータ処理の他の例を示す説明図、図７は、本発明の一実施の形態によるセンサネットによるデータ処理の一例を示す説明図、図８は、一実施の形態によるデータ切り出し処理を実現するストリームデータ処理システムの構成例を示すブロック図、図９は、図８のストリームデータ処理システムによるデータクラスタ切り出し処理の一例を示すフローチャート、図１０は、図８のストリームデータ処理システムを用いたセンサネットによるストリームデータ処理の説明図、図１１は、本発明の一実施の形態によるネットワークモニタリングの一例を示す説明図、図１２は、図１１のネットワークモニタリングを用いたデータ切り出し処理の説明図、図１３は、図１２のネットワークモニタリングによるデータ切り出し処理の一例を示すフローチャート、図１４は、本発明の一実施の形態によるストリームデータ分割システムの構成例を示すブロック図、図１５は、図１２のネットワークモニタリングによるストリームデータ処理例を示す説明図である。 FIG. 1 is a block diagram showing a configuration example of a stream data processing system according to an embodiment of the present invention, FIG. 2 is a flowchart showing an example of data cluster cutout processing by the stream data processing system of FIG. 1, and FIG. 4 is an explanatory diagram of stream data processing by the stream data processing system of FIG. 1, FIG. 4 is an explanatory diagram showing an example of an incoming inspection work process by RFID congestion reading using the stream data processing system of FIG. 1, and FIG. FIG. 6 is an explanatory diagram showing another example of stream data processing by the stream data processing system of FIG. 1, and FIG. 7 is a diagram illustrating one example of the present invention. FIG. 8 is an explanatory diagram showing an example of data processing by the sensor network according to the embodiment. FIG. 9 is a block diagram showing a configuration example of a stream data processing system that implements data cutout processing according to the form of FIG. 8, FIG. 9 is a flowchart showing an example of data cluster cutout processing by the stream data processing system of FIG. 8, and FIG. FIG. 11 is an explanatory diagram illustrating an example of network monitoring according to an embodiment of the present invention, and FIG. 12 is a diagram illustrating the network monitoring of FIG. 11 using the sensor network using the stream data processing system. FIG. 13 is a flowchart showing an example of the data cutout process by the network monitoring in FIG. 12, FIG. 14 is a block diagram showing an example of the configuration of the stream data dividing system according to the embodiment of the present invention, FIG. 15 shows the network of FIG. Is an explanatory diagram showing a stream data processing example by click monitoring.

本実施の形態において、ストリームデータ処理システム１１０は、図１に示すように、ユーザが定義するデータ処理定義１０１を受け付けるデータ処理定義登録ユーザインタフェース１１１、解析対象データ切り出し部１２０、およびデータ解析部１３０から構成される。 In this embodiment, as shown in FIG. 1, the stream data processing system 110 includes a data processing definition registration user interface 111 that accepts a data processing definition 101 defined by a user, an analysis target data cutout unit 120, and a data analysis unit 130. Consists of

解析対象データ切り出し部１２０は、定義１０１に含まれるクラスタ構成条件に従って、入力ストリーム１４１から、解析対象データクラスタを抽出する処理を、実行する。データ解析部１３０は、定義１０１に従って、該データクラスタに対する解析処理を実行し、その結果ストリーム１４３，１４４をクライアントなどに出力する。 The analysis target data cutout unit 120 executes processing for extracting the analysis target data cluster from the input stream 141 according to the cluster configuration condition included in the definition 101. The data analysis unit 130 executes analysis processing on the data cluster in accordance with the definition 101, and outputs the result streams 143 and 144 to a client or the like.

データ処理定義登録ユーザインタフェース１１１は、受け取った定義１０１を解釈し、定義１０１の中からクラスタ構成条件の定義を抽出し、解析対象データ切り出し部１２０に対して通知する。また、定義１０１の内容に従い、データ解析処理実行木１３１を構成し、解析部１３０上に配置する。 The data processing definition registration user interface 111 interprets the received definition 101, extracts the definition of the cluster configuration condition from the definition 101, and notifies the analysis target data cutout unit 120 of it. Further, the data analysis processing execution tree 131 is configured according to the contents of the definition 101 and arranged on the analysis unit 130.

本実施の形態では、定義１０１中の記述“［ＣＬＵＳＴＥＲＷＩＴＨＩＮ３ｓｅｃｏｎｄｓ］”が、クラスタ構成条件の定義である。“ＷＩＴＨＩＮ”は、前記クラスタ構成条件（１）のタイプであることを意味する。 In the present embodiment, the description “[CLUSTER WITHIN 3 seconds]” in the definition 101 is the definition of the cluster configuration condition. “WITHIN” means the type of the cluster configuration condition (1).

同部分の文字列が“ＩＮＴＥＲＶＡＬ”である場合は、前記クラスタ構成条件（２）のタイプであることを意味する。以降では（１）をＷＩＴＨＩＮ条件、（２）をＩＮＴＥＲＶＡＬ条件と呼ぶ。また、“３ｓｅｃｏｎｄｓ”は、時間閾値が３秒であることを意味する。なお、時間閾値は、任意の時間を定義可能である。 When the character string in the same part is “INTERVAL”, it means that the type is the cluster configuration condition (2). Hereinafter, (1) is referred to as a WITHIN condition, and (2) is referred to as an INTERVAL condition. “3 seconds” means that the time threshold is 3 seconds. Note that any time can be defined as the time threshold.

以下、図１、および図２を用いて、解析対象データ切り出し部１２０の構成、および動作を説明する。但し、図２は、ＷＩＴＨＩＮ条件を実現する、解析対象データ判定部１２１の動作を示す。 Hereinafter, the configuration and operation of the analysis target data cutout unit 120 will be described with reference to FIGS. 1 and 2. However, FIG. 2 shows an operation of the analysis target data determination unit 121 that realizes the WITHIN condition.

解析対象データ切り出し部１２０は、まず、データ処理定義登録ユーザインタフェース１１１から通知された該時間しきい値を、時間しきい値保存領域１２８に保存する。さらに、初期化処理Ｓ５０１に従って、クラスタ時刻格納領域１２６に時刻Ｔｉｍｅ．ＭＩＮを格納し、クラスタ番号格納領域１２７に−１を格納する。 The analysis target data cutout unit 120 first stores the time threshold value notified from the data processing definition registration user interface 111 in the time threshold value storage area 128. Furthermore, according to the initialization process S501, the time Time. MIN is stored, and -1 is stored in the cluster number storage area 127.

なお、Ｔｉｍｅ．ＭＩＮは、如何なる実時刻と比較しても、過去と判定される時刻とする。初期化Ｓ５０１が済むと、新規入力データの受信待ち処理Ｓ５０２に移行する。 Time. MIN is a time determined to be in the past even if compared with any actual time. After the initialization S501, the process proceeds to a new input data reception waiting process S502.

解析対象データ判定部１２１は、システム１１０の外部から新規入力データである入力ストリーム１４１を受信した際に、該データのタイムスタンプを入力データ時刻格納領域１２２に格納し、処理Ｓ５０３に従って、データ時刻比較部１２３を用いて、領域１２２に格納された該新規入力データのタイムスタンプと、クラスタ時刻格納領域１２６に格納されたクラスタ時刻を比較する。 When receiving the input stream 141 that is new input data from outside the system 110, the analysis target data determination unit 121 stores the time stamp of the data in the input data time storage area 122, and compares the data time according to processing S503. Using the unit 123, the time stamp of the new input data stored in the area 122 is compared with the cluster time stored in the cluster time storage area 126.

その比較により、該新規入力データのタイムスタンプが、該クラスタ時刻を超える場合には、処理Ｓ５０４に従って、該新規入力データのタイムスタンプに該時間しきい値を加算した時刻を、クラスタ時刻の更新値として、クラスタ時刻格納領域１２６に格納する。 If the comparison shows that the time stamp of the new input data exceeds the cluster time, the time obtained by adding the time threshold to the time stamp of the new input data is set to the update value of the cluster time in accordance with step S504. Is stored in the cluster time storage area 126.

さらに、処理Ｓ５０５に従って、その時点でクラスタ番号格納領域１２７に格納されているクラスタ番号を、解析対象外データ削除部１２５に通知する。続いて、処理Ｓ５０６に従って、クラスタ番号格納領域１２７に格納されたクラスタ番号を、インクリメントする。なお、初期化処理Ｓ５０１の後、最初に入力データを受信した際は、必ず、処理Ｓ５０４〜Ｓ５０６を実行することになる。これら処理の後、処理Ｓ５０７に移行する。 Further, according to the process S505, the cluster number currently stored in the cluster number storage area 127 is notified to the non-analysis target data deletion unit 125. Subsequently, in accordance with the processing S506, the cluster number stored in the cluster number storage area 127 is incremented. When input data is first received after the initialization process S501, the processes S504 to S506 are always executed. After these processes, the process proceeds to process S507.

一方、処理Ｓ５０３の比較により、新規入力データのタイムスタンプが、該クラスタ時刻以前の場合には、直接、処理Ｓ５０７に移行する。この処理Ｓ５０７では、該新規入力データに、その時点でクラスタ番号格納領域１２７に格納されているクラスタ番号を付与し、解析対象データ挿入部１２４に渡す。処理Ｓ５０７が終了すると、再び新規入力データの受信待ち処理Ｓ５０２に移行する。 On the other hand, if the time stamp of the new input data is earlier than the cluster time based on the comparison in step S503, the process directly proceeds to step S507. In this process S507, the cluster number currently stored in the cluster number storage area 127 is assigned to the new input data and passed to the analysis target data insertion unit 124. When the process S507 ends, the process proceeds to the new input data reception wait process S502 again.

データ挿入部１２４は、該新規入力データを、データ解析部１３０が保持する解析対象データバッファ１３２に挿入する。一方、データ削除部１２５は、通知されたクラスタ番号を付与された全てのデータを、バッファ１３２から削除する。 The data insertion unit 124 inserts the new input data into the analysis target data buffer 132 held by the data analysis unit 130. On the other hand, the data deletion unit 125 deletes all data to which the notified cluster number is assigned from the buffer 132.

データ解析部１３０は、バッファ１３２に対するデータ挿入／削除が発生する度に、データ解析処理実行木１３１に基づいて、バッファ１３２上に存在するデータ１４２の集合を対象とした解析処理を実施し、その結果を外部に送信する。なお、データ解析部１３０の実現方法は、たとえば、非特許文献１に開示されている。 Each time data insertion / deletion occurs in the buffer 132, the data analysis unit 130 performs an analysis process on a set of data 142 existing on the buffer 132 based on the data analysis process execution tree 131. Send the result to the outside. A method for realizing the data analysis unit 130 is disclosed in Non-Patent Document 1, for example.

以下、図３を用いて、図１の構成と図２の動作によって実現されるストリームデータ処理方法を、図４に示すＲＦＩＤの輻輳読み取りによる入荷検品作業に適用した場合の結果を示す。 Hereinafter, the results when the stream data processing method realized by the configuration of FIG. 1 and the operation of FIG. 2 is applied to the incoming inspection work by RFID congestion reading shown in FIG. 4 will be described with reference to FIG.

図３は、ストリームデータ処理システム１１０が、表１３１０（図４）に示すデータを、タイミングウィンドウ１３２０に示すタイミングで受信して、データ処理定義１０１に従って処理する際に、該処理方法を適用した結果を示している。 FIG. 3 shows the result of applying the processing method when the stream data processing system 110 receives the data shown in Table 1310 (FIG. 4) at the timing shown in the timing window 1320 and processes it according to the data processing definition 101. Is shown.

定義１０１は、表１３１０のデータ（ｒｆｉｄ）を、時間しきい値３秒のＷＩＴＨＩＮ条件によりデータクラスタに分割し、各データクラスタについて、梱包製品数、総重量、製造期間を算出することを意味する。 Definition 101 means that the data (rfid) in Table 1310 is divided into data clusters according to the WITHIN condition with a time threshold of 3 seconds, and the number of packed products, the total weight, and the production period are calculated for each data cluster. .

図３において、ウィンドウ６０１は、各データの解析対象期間を示しており、ウィンドウ６０３は、時間しきい値の長さを示しており、時間しきい値の長さは、たとえば、３秒程度である。 In FIG. 3, a window 601 indicates the analysis target period of each data, a window 603 indicates the length of the time threshold, and the length of the time threshold is, for example, about 3 seconds. is there.

９：１５’０２にデータａ（１３１１）が読み込まれると、データクラスタが新規に生成される。クラスタ番号は０となる。この際、クラスタ時刻は、データａのタイムスタンプ９：１５’０２に該時間しきい値３秒を加算した９：１５’０５に定まる。 When data a (1311) is read at 9: 15'02, a data cluster is newly generated. The cluster number is 0. At this time, the cluster time is determined to be 9: 15'05 obtained by adding the time threshold 3 seconds to the time stamp 9: 15'02 of the data a.

データａ〜ｃ（１３１１〜１３１３）のタイムスタンプは、該クラスタ時刻９：１５’０５以前であるので、データａと同一のデータクラスタ０に追加される（以降、クラスタ番号Ｎが付与されたデータから成るデータクラスタを、データクラスタＮと表記する）。 Since the time stamps of the data a to c (1311 to 1313) are before the cluster time 9: 15'05, they are added to the same data cluster 0 as the data a (hereinafter, data to which the cluster number N is assigned). The data cluster consisting of is denoted as data cluster N).

この後、９：１５’１２にデータｄ（１３１４）を受信した際には、該クラスタ時刻を超えているので、この時点におけるクラスタ番号０が付与された、データａ〜ｃの解析対象期間が終了する。 Thereafter, when the data d (1314) is received at 9: 15′12, the cluster time is exceeded, and therefore the analysis target period of the data a to c to which the cluster number 0 is given at this time is determined. finish.

一方、新たなデータクラスタ１が生成され、データｄは該クラスタに追加される。また、クラスタ時刻は、データｄのタイムスタンプ９：１５’１２に該時間しきい値３秒を加算した９：１５’１５に定まる。データｅ〜ｇ（１３１５〜１３１７）のタイムスタンプは、該クラスタ時刻９：１５’１５以前であるので、データｄと同一のデータクラスタ１に追加される。 On the other hand, a new data cluster 1 is generated and data d is added to the cluster. The cluster time is determined to be 9: 15'15, which is obtained by adding the time threshold value 3 seconds to the time stamp 9: 15'12 of the data d. Since the time stamps of the data e to g (1315 to 1317) are before the cluster time 9: 15'15, they are added to the same data cluster 1 as the data d.

ウィンドウ６０２は、時間しきい値の長さを示しており、各データの解析対象期間を持つストリームタプルから、製品合計数を算出した結果を示す。９：１５’０６、および９：１５’１８において、検査員が該結果を確認する際には、ダンボール１の製品数３、およびダンボール２の製品数４が、共に正しく確認されることになる。総重量、および製造期間についても、同様に正しい値が確認される。 A window 602 indicates the length of the time threshold, and shows the result of calculating the total number of products from the stream tuple having the analysis target period of each data. At 9: 15′06 and 9: 15′18, when the inspector confirms the result, the number of products of cardboard 1 and the number of products of cardboard 2 are both confirmed correctly. . Correct values are also confirmed for the total weight and the production period.

表６１０は、処理結果データを示す。データ１３１１〜１３１７の入力により、それぞれデータ６１１〜６１７を出力する。検査員は、９：１５’０６にデータ６１３を、９：１５’１８にデータ６１７を、それぞれ確認することになる。これらは、それぞれダンボール１およびダンボール２の集計値を正しく表している。 Table 610 shows the processing result data. Data 611 to 617 are output in response to the input of data 1311 to 1317, respectively. The inspector confirms the data 613 at 9: 15'06 and the data 617 at 9: 15'18. These correctly represent the total values of cardboard 1 and cardboard 2, respectively.

以上のように、図１の構成と図２の動作を用いることで、ユーザに余計な手間やコストを強いることなく、正しい現象解析を実現することが可能となる。 As described above, by using the configuration of FIG. 1 and the operation of FIG. 2, it is possible to realize a correct phenomenon analysis without imposing extra effort and cost on the user.

次に、図５を用いて、ＩＮＴＥＲＶＡＬ条件を実現する、解析対象データ判定部１２１の動作を示す。 Next, the operation of the analysis target data determination unit 121 that realizes the INTERVAL condition will be described with reference to FIG.

この図５においは、処理Ｓ５０４の実行位置以外は、図２の動作と全く同一である。処理Ｓ５０４が該位置にあることで、新規入力データが現時点の解析対象データクラスタに追加されるか否かに関わらず、クラスタ時刻は、必ず、該新規入力データのタイムスタンプ＋時間しきい値の値に更新される。 5 is exactly the same as the operation of FIG. 2 except for the execution position of step S504. Since the processing S504 is in the position, the cluster time is always the time stamp of the new input data + the time threshold value regardless of whether or not the new input data is added to the current analysis target data cluster. Updated to value.

これにより、新規入力データが到着した際には、必ず、解析対象データクラスタにおける最新のデータのタイムスタンプと比較した、時間しきい値内か否かの判定が実施されることになる。 As a result, when new input data arrives, a determination is always made as to whether it is within the time threshold compared with the time stamp of the latest data in the analysis target data cluster.

以下、図６を用いて、図１の構成と図５の動作によって実現される、ストリームデータ処理方法を、ＲＦＩＤの輻輳読み取りによる入荷検品作業に適用した場合の結果を示す。 Hereinafter, the results when the stream data processing method realized by the configuration of FIG. 1 and the operation of FIG. 5 is applied to an incoming inspection work by reading RFID congestion will be described with reference to FIG.

この場合、ストリームデータ処理システム１１０が、図６のタイミングウィンドウ８１０で示されるタイミングでデータを受信して、データ処理定義８０１に従って処理する際に、該処理方法を適用した結果を示している。 In this case, when the stream data processing system 110 receives data at the timing indicated by the timing window 810 in FIG. 6 and processes the data according to the data processing definition 801, the result of applying the processing method is shown.

定義８０１は、データ（ｒｆｉｄ）を、時間しきい値４秒のＩＮＴＥＲＶＡＬ条件によりデータクラスタに分割し、各データクラスタについて、梱包製品数、総重量、製造期間を算出することを意味する。 The definition 801 means that the data (rfid) is divided into data clusters according to the INTERVAL condition with a time threshold value of 4 seconds, and the number of packed products, the total weight, and the production period are calculated for each data cluster.

９：１５’０２にデータａが読み込まれると、データクラスタが新規に生成される。クラスタ番号は０となる。この際、クラスタ時刻は、データａのタイムスタンプ９：１５’０２に該時間しきい値８２３（４秒）を加算した９：１５’０６に定まる。 When data a is read at 9: 15'02, a data cluster is newly generated. The cluster number is 0. At this time, the cluster time is determined to be 9: 15'06, which is obtained by adding the time threshold value 823 (4 seconds) to the time stamp 9: 15'02 of the data a.

データｂのタイムスタンプ９：１５’０３は該クラスタ時刻９：１５’０６以前であるので、データａと同一のデータクラスタ０に追加される。同時に、クラスタ時刻は、データｂのタイムスタンプ９：１５’０３に該時間しきい値４秒を加算したクラスタ時刻は９：１５’０７に更新される。 Since the time stamp 9: 15'03 of the data b is before the cluster time 9: 15'06, it is added to the same data cluster 0 as the data a. At the same time, the cluster time is updated to 9: 15'07 by adding the time threshold 4 seconds to the time stamp 9: 15'03 of the data b.

データｃ〜ｅについても同様に、該時間しきい値４秒以内で連続しているため、全てデータクラスタ０に追加される。データｅ入力後のクラスタ時刻は、データｅのタイムスタンプ９：１５’０７に該時間しきい値４秒を加算したクラスタ時刻は９：１５’１１に更新される。 Similarly, since the data c to e are continuous within the time threshold of 4 seconds, they are all added to the data cluster 0. The cluster time after the input of data e is updated to 9: 15'11 by adding the time threshold of 4 seconds to the time stamp 9: 15'07 of data e.

この後、９：１５’１３にデータｆを受信した際には、該クラスタ時刻を超えているので、この時点におけるクラスタ番号０が付与された、データａ〜ｅの解析対象期間（解析対象期間８２１で示す）が終了する。 Thereafter, when the data f is received at 9: 15′13, the cluster time is exceeded, and therefore the analysis target period (analysis target period) of the data a to e to which the cluster number 0 at this time is given. Is completed).

一方、新たなデータクラスタ１が生成され、データｆは該クラスタに追加される。また、クラスタ時刻は、データｆのタイムスタンプ９：１５’１３に該時間しきい値８２３の４秒を加算した９：１５’１７に定まる。データｇ〜ｉは、該時間しきい値８２３の４秒以内で連続しているため、全てデータクラスタ１に追加される。 On the other hand, a new data cluster 1 is generated and data f is added to the cluster. The cluster time is determined to be 9: 15'17, which is obtained by adding 4 seconds of the time threshold value 823 to the time stamp 9: 15'13 of the data f. Since the data g to i are continuous within 4 seconds of the time threshold value 823, they are all added to the data cluster 1.

９：１５’０９、および９：１５’１９において、検査員が、解析対象期間８２１を持つストリームタプルから、製品合計数を算出した結果８２２を確認する際には、ダンボール１の製品数５、およびダンボール２の製品数４が、共に正しく確認されることになる。総重量、および製造期間についても、同様に正しい値が確認される。 At 9: 15′09 and 9: 15′19, when the inspector confirms the result 822 of calculating the total number of products from the stream tuple having the analysis target period 821, the number of products of the cardboard 1 is 5, And the product number 4 of the corrugated cardboard 2 is confirmed correctly. Correct values are also confirmed for the total weight and the production period.

以上のように、図１の構成と図５の動作を用いることで、ユーザに余計な手間やコストを強いることなく、正しい現象解析を実現することが可能となる。また、大きなダンボールは、一定の時間しきい値以内にゲートを通過し終えることが困難であるため、図２の動作で実現されるＷＩＴＨＩＮ条件では、一つのダンボールに対応するデータを正しく纏めることができない。 As described above, by using the configuration of FIG. 1 and the operation of FIG. 5, it is possible to realize a correct phenomenon analysis without imposing extra effort and cost on the user. In addition, since it is difficult for a large cardboard to finish passing through the gate within a certain time threshold, the data corresponding to one cardboard can be correctly collected under the WITHIN condition realized by the operation of FIG. Can not.

このような場合でも、ＲＦＩＤ読み取りデータが途切れることなく生成されている間は、一つのダンボールの通過中と見做すことができるので、図５の動作で実現されるＩＮＴＥＲＶＡＬ条件を用いることで、該期間に生成されたデータ集合を、一つのダンボールに対応するデータとして抽出することが可能である。 Even in such a case, while the RFID read data is generated without interruption, it can be considered that one cardboard is passing, so by using the INTERVAL condition realized by the operation of FIG. The data set generated during the period can be extracted as data corresponding to one cardboard.

この例のように、一つの現象に対応するデータが、一定時間に収まらないものの、連続して生成され続ける場合は、本方法を適用できる。 As in this example, this method can be applied when data corresponding to one phenomenon does not fit in a certain period of time but is continuously generated.

次に、図７のセンサネットの例において想定するデータ処理の内容を示す。 Next, the contents of data processing assumed in the example of the sensor network of FIG. 7 are shown.

図７の例では、各センサの計測値の大小による、現象の発生頻度が高い位置を推定する。図では、センサＳ１７の周辺で発生頻度が高いことが示されている。 In the example of FIG. 7, the position where the occurrence frequency of the phenomenon is high is estimated based on the magnitude of the measurement value of each sensor. In the figure, it is shown that the occurrence frequency is high around the sensor S17.

該推定処理として、各現象に対応するデータクラスタの中で、最も大きい値を示すデータを抽出し、該値を計測したセンサＩＤを抽出することが、本例で想定するデータ処理である。 As the estimation processing, extracting the data indicating the largest value in the data cluster corresponding to each phenomenon and extracting the sensor ID that measured the value is the data processing assumed in this example.

但し、このような推定処理においては、一つの現象に関する計測データ集合のみを対象にデータ処理を実施しても、正確な値は得られず、複数の現象に関するデータの解析が必要になる。これに対し、複数のデータクラスタを同時に解析対象とする、データ切り出し方法を提供する。 However, in such estimation processing, even if data processing is performed only on a measurement data set related to one phenomenon, an accurate value cannot be obtained, and analysis of data related to a plurality of phenomena is necessary. On the other hand, a data cut-out method for simultaneously analyzing a plurality of data clusters is provided.

以下、図８を用いて、該データ切り出し方法を実現するための構成を示す。 Hereinafter, a configuration for realizing the data cutout method will be described with reference to FIG.

データ処理定義３０１における記述“［ＣＬＵＳＴＥＲ２ＷＩＴＨＩＮ３ｓｅｃｏｎｄｓ］”が、該切り出し方法の定義であり、該記述は、時間しきい値３秒のＷＩＴＨＩＮ条件で切り出されるデータクラスタの、最新の２個を、同時に解析対象とすることを意味する。 The description “[CLUSTER 2 WITHIN 3 seconds]” in the data processing definition 301 is the definition of the extraction method, and the description includes the latest two data clusters extracted under the WITHIN condition with a time threshold of 3 seconds. , Which means that it is subject to analysis at the same time.

データ処理定義登録ユーザインタフェース１１１は、“ＣＬＵＳＴＥＲ”文字列の後に続く数字を、同時に解析対象とするデータクラスタ数と解釈し、解析対象データ切り出し部１２０に通知する。 The data processing definition registration user interface 111 interprets the number following the “CLUSTER” character string as the number of data clusters to be analyzed simultaneously, and notifies the analysis target data cutout unit 120 of the number.

解析対象データ切り出し部１２０は、該値を同時対象クラスタ数保存領域３１１に保存する。該構成は、領域３１１が追加されていることを除き、図１に示したストリームデータ処理システム１１０の構成と同一である。 The analysis target data cutout unit 120 stores the value in the simultaneous target cluster number storage area 311. This configuration is the same as that of the stream data processing system 110 shown in FIG. 1 except that the area 311 is added.

以下、図９を用いて、該データ切り出し方法を実現するための動作を示す。 The operation for realizing the data cutout method will be described below with reference to FIG.

この場合、処理Ｓ９０１が、図２における処理Ｓ５０５の代わりに位置することを除き、図２の動作と同一である。処理Ｓ５０５は、新規入力データが新規のデータクラスタに属すると判定される場合、その時点で解析対象であったデータクラスタのクラスタ番号を、解析対象外データ削除部１２５に通知することであった。 In this case, the process S901 is the same as the operation in FIG. 2 except that the process S901 is positioned instead of the process S505 in FIG. Process S505 is to notify the non-analysis data deletion unit 125 of the cluster number of the data cluster that was the analysis target at that time when it is determined that the new input data belongs to the new data cluster.

一方、図９の処理Ｓ９０１は、その時点のクラスタ番号より、同時対象クラスタ数を減じた値に、さらに１を加えた値を、削除するデータクラスタのクラスタ番号として、データ削除部１２５に通知する。外処理により、新規に生成されるデータを含めて、同時対象クラスタ数のデータクラスタを、解析対象として保持することになる。なお、同時対象クラスタ数が１の場合、処理Ｓ９０１は処理Ｓ５０５と一致する。 On the other hand, the processing S901 in FIG. 9 notifies the data deletion unit 125 of a value obtained by adding 1 to the value obtained by subtracting the number of simultaneous target clusters from the current cluster number as the cluster number of the data cluster to be deleted. . As a result of the external processing, the number of simultaneous target cluster data clusters including newly generated data is held as the analysis target. If the number of simultaneous target clusters is 1, the process S901 is identical to the process S505.

以上は、ＷＩＴＨＩＮ条件に対して、同時対象クラスタ数の指定を追加したデータ切り出し方法に関する説明である。 The above is the description regarding the data extraction method in which the designation of the number of simultaneous target clusters is added to the WITHIN condition.

ＩＮＴＥＲＶＡＬ条件に対しても、同様にして、同時対象クラスタ数の指定を追加したデータ切り出し方法を実現可能であることは自明である。該切り出し方法を実現するための構成は、図８と同一であり、動作は、図５の処理Ｓ５０５を処理Ｓ９０１に置き換えたもので実現できる。 It is self-evident that a data cutout method in which designation of the number of simultaneous target clusters is added can also be realized in the same manner for the INTERVAL condition. The configuration for realizing the cutout method is the same as that in FIG. 8, and the operation can be realized by replacing the process S505 in FIG. 5 with a process S901.

以下、図１０を用いて、図８の構成と図９の動作によって実現される、ストリームデータ処理方法を、図７に示すセンサネットによる環境監視に適用した場合の結果を示す。 Hereinafter, the results when the stream data processing method realized by the configuration of FIG. 8 and the operation of FIG. 9 is applied to the environment monitoring by the sensor network shown in FIG. 7 will be described using FIG.

図１０は、ストリームデータ処理システム１１０が、表１４１０に示すデータを、図１０のウィンドウ１４３０のタイミングで受信して、データ処理定義３０１，１００１に従って処理する際に、該処理方法を適用した結果を示している。 FIG. 10 shows the result of applying the processing method when the stream data processing system 110 receives the data shown in Table 1410 at the timing of the window 1430 in FIG. 10 and processes the data according to the data processing definitions 301 and 1001. Show.

定義３０１は、データ（ｓｅｎｓｏｒ）１４１０を、時間しきい値３秒のＷＩＴＨＩＮ条件によりデータクラスタに分割し、最新のデータクラスタ２個を同時に解析対象とし、センサごとに計測値を加算することを意味する。また、定義１００１は、該加算値が最大となるセンサのＩＤを抽出することを意味する。ウィンドウ１０１０は、各データの解析対象期間を示しており、ウィンドウ１０１０においては、時間しきい値１０１１，１０１２，１０１３の長さ（たとえば、３秒）をそれぞれ示している。 Definition 301 means that data (sensor) 1410 is divided into data clusters according to a WITHIN condition with a time threshold of 3 seconds, and the latest two data clusters are simultaneously analyzed, and the measured values are added for each sensor. To do. The definition 1001 means that the ID of the sensor having the maximum added value is extracted. A window 1010 indicates an analysis target period of each data, and the window 1010 indicates the lengths of the time threshold values 1011, 1012, and 1013 (for example, 3 seconds).

データ１４１１〜１４１４は、データクラスタ０に、データ１４１５〜１４１７は、データクラスタ１に、データ１４１８〜１４２２は、データクラスタ２に、それぞれ属することになる。 Data 1411 to 1414 belong to data cluster 0, data 1415 to 1417 belong to data cluster 1, and data 1418 to 1422 belong to data cluster 2, respectively.

データ１４１１〜１４１４までが入力された状態において、１０：２８の時点でデータ１４１５が入力されると、処理Ｓ５０３の条件分岐でＮｏに倒れ、処理Ｓ５０４、Ｓ９０１、Ｓ５０６が実行される。 When data 1411 to 1414 are input, if data 1415 is input at 10:28, the condition branch of processing S503 falls to No, and processing S504, S901, and S506 are executed.

但し、処理Ｓ９０１では、その時点でのクラスタ番号０から同時対象クラスタ数２を減じて１を加えたクラスタ番号−１が負であるため、何も実行されない。そのため、データクラスタ０に属するデータは解析対象として残ることになる。 However, in the process S901, nothing is executed because the cluster number −1 obtained by subtracting the simultaneous target cluster number 2 from the cluster number 0 at the time and adding 1 is negative. Therefore, data belonging to the data cluster 0 remains as an analysis target.

一方、データ１４１１〜１４１７が入力された状態において、１０：３２の時点でデータ１４１８が入力されると、処理Ｓ５０３の条件分岐でＮｏに倒れ、処理Ｓ５０４、Ｓ９０１、Ｓ５０６が実行される。 On the other hand, if the data 1418 is input at the time of 10:32 in the state where the data 1411 to 1417 are input, the condition branch of the process S503 falls to No, and the processes S504, S901, and S506 are executed.

このとき、その時点でのクラスタ番号１から同時対象クラスタ数２を減じて１を加えたクラスタ番号が０となるため、データクラスタ０に属するデータ１４１１〜１４１４が解析対象から外れることになる。 At this time, the cluster number obtained by subtracting the number 2 of simultaneously targeted clusters from the cluster number 1 at that time and adding 1 is 0, so that the data 1411 to 1414 belonging to the data cluster 0 are excluded from the analysis target.

一方、新しいデータクラスタ２が生成され、データ１４１８が追加される。その結果、同時対象クラスタ数２個のデータクラスタ１、および２が、解析対象として切り出されることになる。 On the other hand, a new data cluster 2 is generated and data 1418 is added. As a result, data clusters 1 and 2 having two simultaneous target clusters are extracted as analysis targets.

表１０２０は、定義３０１による処理結果データを示す。データ１４１１〜１４２２の入力により、それぞれデータ１０２１〜１０３２を算出する。データ１０２６は、データクラスタ０、および１に属する、センサＳ１７による計測データの計測値を加算した値が、ｖａｌｕｅ＿ｓｕｍカラムの値となる。 A table 1020 shows processing result data according to the definition 301. Data 1021 to 1032 are calculated by inputting data 1411 to 1422, respectively. In the data 1026, a value obtained by adding the measurement values of the measurement data by the sensor S17 belonging to the data clusters 0 and 1 is the value in the value_sum column.

同様に、データ１０２７は、データクラスタ０、および１に属する、センサＳ１８による計測データの計測値加算した値が、データ１０３２は、データクラスタ１、および２に属する、センサＳ１７による計測データの計測値を加算した値が、それぞれ、ｖａｌｕｅ＿ｓｕｍカラムの値となる。 Similarly, the data 1027 is obtained by adding the measurement values of the measurement data by the sensor S18 belonging to the data clusters 0 and 1, and the data 1032 is the measurement value of the measurement data by the sensor S17 belonging to the data clusters 1 and 2. Are added to the value_sum column.

表１０４０は、定義１００１による処理結果データを示す。データ１０２１〜１０３２が算出されることで、それぞれデータ１０４１〜１０５２を出力する。データ１０４４、１０４７、および１０５２が、それぞれ現象１４０２、１４０３、および１４０４の発生後の解析結果となる。２つの現象を解析対象とした結果である、データ１０４７、１０５２においては、想定する答えであるセンサＩＤＳ１７が含まれることになる。 A table 1040 shows processing result data according to the definition 1001. By calculating data 1021 to 1032, data 1041 to 1052 are output, respectively. Data 1044, 1047, and 1052 are analysis results after occurrence of the phenomena 1402, 1403, and 1404, respectively. The data 1047 and 1052 that are the results of analyzing two phenomena include the sensor IDS 17 that is an assumed answer.

以上のように、図８の構成と図９の動作を用いることで、複数のデータクラスタを解析対象とするデータ切り出し方法が実現され、複数の現象を対象とした解析が可能となる。該データ切り出し方法は、解析精度の向上などを目的に利用することができる。 As described above, by using the configuration of FIG. 8 and the operation of FIG. 9, a data extraction method for analyzing a plurality of data clusters is realized, and analysis for a plurality of phenomena is possible. The data cutout method can be used for the purpose of improving analysis accuracy.

次に、図１１を用いて、ネットワークモニタリングの例を示す。 Next, an example of network monitoring will be described with reference to FIG.

ネットワークモニタリングは、図示するように、ネットワークルータ１５０１、および計算機１５０２，１５０３，１５０４から構成されている。 As shown in the figure, the network monitoring includes a network router 1501 and computers 1502, 1503 and 1504.

ルータ１５０１をパケットが通過することで、表１５１０のようなデータが生成される。各行１５１１〜１５２２が、各パケットに関するデータであり、送信元を示すｓｒｃカラム、送信先を示すｄｓｔカラム、およびデータ長を示すｌｅｎｇｔｈカラムからなる。 As the packet passes through the router 1501, data as shown in Table 1510 is generated. Each row 1511 to 1522 is data relating to each packet, and includes a src column indicating a transmission source, a dst column indicating a transmission destination, and a length column indicating a data length.

ストリームデータ処理システム１１０は、該データを受け取り、発生時間が近接する複数のパケットを、一つのメッセージとして切り出し、メッセージ毎のメッセージ長を算出する。 The stream data processing system 110 receives the data, extracts a plurality of packets whose generation times are close to each other as one message, and calculates a message length for each message.

図１１におけるタイミングウィンドウ１５３０は、パケットに関する情報が生成されるタイミングを示した図である。このように、複数の計算機１５０２，１５０３，１５０４から送出されるパケットに関するデータが、一本のストリームに混在するため、データのタイムスタンプの近接関係のみに基づいて、該切り出し処理を実現することはできない。 A timing window 1530 in FIG. 11 is a diagram showing a timing at which information about a packet is generated. As described above, since data related to packets sent from a plurality of computers 1502, 1503, and 1504 are mixed in one stream, it is possible to realize the extraction processing based only on the proximity relationship of the time stamps of the data. Can not.

一方で、送信元となる個別の計算機に着目すれば、一つのメッセージを構成するパケット列は、連続して送出され、異なるメッセージ間の送信間隔は、それに比べて十分長い時間となるため、ＩＮＴＥＲＶＡＬ条件を適用することで、パケットに関するデータを、メッセージ単位で切り出すことが可能である。 On the other hand, if attention is paid to an individual computer as a transmission source, a packet sequence constituting one message is continuously transmitted, and the transmission interval between different messages is sufficiently longer than that. By applying the condition, it is possible to cut out data related to the packet in units of messages.

図１１におけるウィンドウ１５４０，１５５０，１５６０は、タイミングウィンドウ１５３０に示したデータ系列から、それぞれ計算機１５０２，１５０３，１５０４を送信元とするパケットに関するデータのみを抽出した図である。 Windows 1540, 1550, and 1560 in FIG. 11 are diagrams in which only data related to packets having computers 1502, 1503, and 1504 as transmission sources are extracted from the data series shown in the timing window 1530, respectively.

このように分類することで、計算機１５０２を送信元とするパケットに関するデータは、データクラスタ１５４１，１５４２に、計算機１５０３を送信元とするパケットに関するデータは、データクラスタ１５５１，１５５２に、計算機１５０４を送信元とするパケットに関するデータは、データクラスタ１５６１に、それぞれ切り分けることが可能となる。該分類処理は、データ１５１１〜１５２２のｓｒｃカラムに基づくことで実現できる。 By classifying in this way, data relating to packets originating from the computer 1502 is sent to the data clusters 1541 and 1542, and data relating to packets originating from the computer 1503 is sent to the data clusters 1551 and 1552. Data relating to the original packet can be divided into data clusters 1561. The classification process can be realized based on the src column of the data 1511 to 1522.

以下、図１２を用いて、該データ切り出し方法を実現するための構成を示す。 Hereinafter, a configuration for realizing the data cutout method will be described with reference to FIG.

データ処理定義４０１における記述“［ＰＡＲＴＩＴＩＯＮＢＹｓｒｃＣＬＵＳＴＥＲＩＮＴＥＲＶＡＬ１ｓｅｃｏｎｄｓ］”が、該切り出し方法の定義であり、該記述は、ｓｒｃカラムの値同一性に基づいてデータをグループ分けし、グループごとに、時間しきい値１秒のＩＮＴＥＲＶＡＬ条件で切り出されるデータクラスタを解析対象とすることを意味する。 The description “[PARTITION BY src CLUSTER INTERVAL 1 second]” in the data processing definition 401 is the definition of the extraction method. The description groups data based on the value identity of the src column, This means that a data cluster cut out under the INTERVAL condition with a time threshold of 1 second is to be analyzed.

データ処理定義登録ユーザインタフェース１１１は、“ＰＡＲＴＩＴＩＯＮＢＹ”文字列の後に続く文字列を、データのグループ分けに利用するカラムの名称と解釈し、解析対象データ切り出し部１２０に通知する。以降、該カラムをグループ識別子と呼ぶ。 The data processing definition registration user interface 111 interprets the character string that follows the “PARTITION BY” character string as the name of a column used for data grouping, and notifies the analysis target data cutout unit 120. Hereinafter, this column is referred to as a group identifier.

この構成は、図１に示したストリームデータ処理システム１１０の構成を、グループ別のデータ切り出し処理に対応するように拡張した形となる。クラスタ時刻格納領域１２６、およびクラスタ番号格納領域１２７は、グループごとに必要である。両領域をまとめて、グループ別格納領域と呼ぶ。 This configuration is obtained by extending the configuration of the stream data processing system 110 shown in FIG. 1 so as to correspond to the data extraction processing for each group. The cluster time storage area 126 and the cluster number storage area 127 are necessary for each group. Both areas are collectively referred to as a group-specific storage area.

グループ別格納領域は、一つのグループに対して一つ、また一つのみ存在する必要がある。この管理を実現するために、グループ識別子の値とグループ別格納領域との対応関係を保持する、グループ別格納領域管理表４１１を備える。 There should be only one storage area for each group, and only one for each group. In order to realize this management, a group-specific storage area management table 411 that holds the correspondence between group identifier values and group-specific storage areas is provided.

グループ別格納領域４１２〜４１４は、それぞれ送信元の計算機１５０２，１５０３，１５０４のパケットに関するデータを、データクラスタに切り分けるために利用する。管理表４１１には、グループ識別子の値Ａ〜Ｃと、グループ別格納領域４１２〜４１４との対応関係を保持する。 The group-specific storage areas 412 to 414 are used to separate data relating to packets of the transmission source computers 1502, 1503, and 1504 into data clusters. The management table 411 holds the correspondence between the group identifier values A to C and the group storage areas 412 to 414.

以下、図１３を用いて、該データ切り出し方法を実現するための動作を示す。 Hereinafter, the operation for realizing the data cutout method will be described with reference to FIG.

図１３の動作では、図２の動作に、処理Ｓ１１０１，Ｓ１１０２を追加し、処理Ｓ５０１，Ｓ５０５を、それぞれ処理Ｓ５０１Ｇ，Ｓ５０５Ｇに置き換えた形となる。該動作は新規入力データの受信待ち処理Ｓ５０２から開始する。 In the operation of FIG. 13, processing S1101 and S1102 are added to the operation of FIG. 2, and processing S501 and S505 are replaced with processing S501G and S505G, respectively. This operation starts from the new input data reception waiting process S502.

新規入力データを受信すると、処理Ｓ１１０１に従って、該データのグループ識別子の値を参照し、グループ別格納領域管理表４１１より、該値に対応するグループ別格納領域を抽出する。 When new input data is received, the group identifier corresponding to the value is extracted from the group-specific storage area management table 411 by referring to the group identifier value of the data in accordance with step S1101.

さらに、処理Ｓ１１０２に従って、該グループ別格納領域が抽出されなかった場合は、処理Ｓ５０１Ｇに移行する。処理Ｓ５０１Ｇは、グループ別格納領域を新規に生成し、該グループ識別子の値と、該グループ別格納領域との対応関係を、管理表４１１に登録し、該グループ別格納領域を初期化する。該初期化処理は、処理Ｓ５０１と同様である。 Furthermore, if the group-specific storage area is not extracted in accordance with process S1102, the process proceeds to process S501G. The process S501G newly creates a storage area for each group, registers the correspondence between the group identifier value and the storage area for each group in the management table 411, and initializes the storage area for each group. The initialization process is the same as the process S501.

以上の処理によって得られたグループ別格納領域を利用して、処理Ｓ５０３〜Ｓ５０７の処理を実行する。但し、処理Ｓ５０５では、解析対象外データ削除部に、解析対象外となったデータクラスタのクラスタ番号と、グループ識別子の値を併せて通知する。 Using the group-specific storage area obtained by the above processing, the processing in steps S503 to S507 is executed. However, in process S505, the non-analysis target data deletion unit is notified of the cluster number of the data cluster that has not been analyzed and the value of the group identifier.

以上は、ＷＩＴＨＩＮ条件に対して、グループ分け指定を追加したデータ切り出し方法に関する説明である。ＩＮＴＥＲＶＡＬ条件に対しても、同様にして、グループ分け指定を追加したデータ切り出し方法を実現可能であることは自明である。 The above is the description regarding the data cutout method in which the grouping designation is added to the WITHIN condition. It is self-evident that a data extraction method with grouping designation added can be realized in the same manner for the INTERVAL condition.

該切り出し方法を実現するための構成は、図１２と同一であり、動作は、図１３の処理Ｓ５０４を、処理Ｓ５０７の直前に移したもので実現できる。 The configuration for realizing the clipping method is the same as that in FIG. 12, and the operation can be realized by moving the process S504 in FIG. 13 immediately before the process S507.

以下、図１５を用いて、図１２の構成と図１３の動作によって実現されるストリームデータ処理方法を、図１１に示すネットワークモニタリングに適用した場合の結果を示す。 Hereinafter, the results when the stream data processing method realized by the configuration of FIG. 12 and the operation of FIG. 13 is applied to the network monitoring shown in FIG. 11 will be described using FIG.

図１５は、ストリームデータ処理システム１１０が、表１５１０に示すデータを、タイミングウィンドウ１５３０に示すタイミングで受信して、データ処理定義４０１に従って処理する際に、該処理方法を適用した結果を示している。 FIG. 15 shows a result of applying the processing method when the stream data processing system 110 receives the data shown in Table 1510 at the timing shown in the timing window 1530 and processes the data according to the data processing definition 401. .

定義４０１は、データ（ｎｌｏｇ）１５１０を、ｓｒｃカラムの値同一性に基づいてデータをグループ分けし、グループごとに、時間しきい値１秒のＩＮＴＥＲＶＡＬ条件で切り出されるデータクラスタを解析対象とし、送信元計算機ごとに、パケット長を加算することを意味する。 Definition 401 divides data (nlog) 1510 into groups based on the value identity of the src column, and for each group, analyzes data clusters that are cut out under the INTERVAL condition with a time threshold value of 1 second, and is transmitted. This means that the packet length is added for each original computer.

ウィンドウ１２１０，１２２０，１２３０は、ｓｒｃカラムの値Ａ，Ｂ，Ｃの各データの解析対象期間をそれぞれ示している。また、ウィンドウ１２１０においては、時間しきい値１２０１の値の長さ（たとえば、１秒）を示している。 Windows 1210, 1220, and 1230 indicate analysis target periods of the data A, B, and C in the src column, respectively. The window 1210 shows the length of the time threshold value 1201 (for example, 1 second).

表１２４０は、定義４０１による処理結果データを示す。データ１５１１〜１５２２の入力により、それぞれデータ１２４１〜１２５２を算出する。データ１２４４，１２５１，１２４５，１２５２，１２４８が、データクラスタ１５４１，１５４２，１５５１，１５５２，１５６１からそれぞれ算出されるメッセージ長を示している。 A table 1240 shows processing result data according to the definition 401. Data 1241 to 1252 are calculated by inputting the data 1511 to 1522, respectively. Data 1244, 1251, 1245, 1252, and 1248 indicate message lengths calculated from the data clusters 1541, 1542, 1551, 1552, and 1561, respectively.

以上のように、図１２の構成と図１３の動作を用いることで、一本のストリームに複数の現象が混在する場合にも、データ列を分離したうえで、現象単位のデータ切り出しが可能となる。 As described above, by using the configuration of FIG. 12 and the operation of FIG. 13, even when a plurality of phenomena are mixed in one stream, it is possible to extract data in units of phenomena after separating the data strings. Become.

次に、図１４を用いて、時系列データをデータクラスタに分割し、各データにクラスタ番号を付与して、データベースに格納するストリームデータ分割システムの構成を示す。 Next, the configuration of a stream data division system that divides time-series data into data clusters, assigns a cluster number to each data, and stores the data in a database will be described with reference to FIG.

図１４に示す構成は、図１の構成における、解析対象データ切り出し部１２０の変形である。ストリームデータ分割システム２１０は、ユーザによるクラスタ分割方法定義２０１を受け付けるクラスタ分割方法定義登録ユーザインタフェース２１１、および、クラスタ決定部２１２から構成される。 The configuration shown in FIG. 14 is a modification of the analysis target data cutout unit 120 in the configuration of FIG. The stream data division system 210 includes a cluster division method definition registration user interface 211 that accepts a cluster division method definition 201 by a user, and a cluster determination unit 212.

クラスタ決定部２１２の構成、および動作は、図１の構成、および図２の動作と同一である。但し、データ挿入部１２４Ｃは、データベース２２０上の関係表に、データ２２１とクラスタ番号２２２の組を挿入する。また、解析対象外データ削除部１２５は存在しない。 The configuration and operation of the cluster determination unit 212 are the same as the configuration of FIG. 1 and the operation of FIG. However, the data insertion unit 124C inserts a set of the data 221 and the cluster number 222 into the relation table on the database 220. Further, the non-analysis target data deletion unit 125 does not exist.

以上は、クラスタ分割方法として、ＷＩＴＨＩＮ条件を指定する場合の説明であるが、クラスタ決定部の構成、および動作を切り替えることで、ＩＮＴＥＲＶＡＬ条件の指定、および、グループ分け指定も可能であることは自明である。 The above is a description of the case where the WITHIN condition is specified as the cluster dividing method. However, it is obvious that the INTERVAL condition and the grouping can be specified by switching the configuration and operation of the cluster determining unit. It is.

以上の構成により、現象単位に時系列データを分割したうえで、データベースに格納することが可能となる。仮に、データをタイムスタンプと共に、データベースに格納したとしても、該タイムスタンプに基づいて、事後分析により現象単位に切り分ける処理を、ＳＱＬで定義することは困難である。これに対し、本構成を適用することで、そのような処理が不要となる。 With the above configuration, time series data can be divided into phenomenon units and stored in a database. Even if the data is stored in the database together with the time stamp, it is difficult to define the processing for dividing into the phenomenon units by the post-mortem analysis based on the time stamp. On the other hand, by applying this configuration, such processing becomes unnecessary.

また、図１４の構成を、非特許文献２に示される、区切り信号方式の入り口に配置し、クラスタ番号がインクリメントされる際に、データ挿入部１２４Ｃが、区切り信号を併せて送出する機能を追加することで、区切り信号方式の拡張機能として、本発明を適用することも可能である。 Further, the configuration shown in FIG. 14 is arranged at the entrance of the delimiter signal system shown in Non-Patent Document 2, and a function for the data insertion unit 124C to send a delimiter signal together when the cluster number is incremented is added. Thus, the present invention can be applied as an extended function of the delimiter signal system.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

本発明は、ＲＦＩＤ読み取り情報、センサネットによる観測データ、ネットワークモニタリング、システムモニタリング、株価監視、交通情報といった高レートで継続的に生成される時系列データをリアルタイムに処理するストリームデータ処理技術において、時系列データから、ある時点で解析対象となるデータを抽出するスライディングウィンドウ処理に関し、特に、ある現象の発生を監視して生成され続ける観測データの系列から、個別の現象に対応したデータ集合を時間軸上の近接関係に基づいて、抽出する技術に適している。 The present invention relates to a stream data processing technology for processing time-series data continuously generated at a high rate, such as RFID reading information, sensor network observation data, network monitoring, system monitoring, stock price monitoring, and traffic information, in real time. For sliding window processing that extracts data to be analyzed from series data at a certain point in time, in particular, from a series of observation data that is continuously generated by monitoring the occurrence of a phenomenon, a data set corresponding to each phenomenon is plotted on a time axis. It is suitable for the technique of extracting based on the proximity relationship above.

本発明の一実施の形態によるストリームデータ処理システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the stream data processing system by one embodiment of this invention. 図１のストリームデータ処理システムによるデータクラスタ切り出し処理例を示すフローチャートである。It is a flowchart which shows the example of a data cluster cut-out process by the stream data processing system of FIG. 図１のストリームデータ処理システムによるストリームデータ処理の説明図である。It is explanatory drawing of the stream data processing by the stream data processing system of FIG. 図１のストリームデータ処理システムを用いたＲＦＩＤの輻輳読み取りによる入荷検品作業処理の一例を示す説明図である。It is explanatory drawing which shows an example of the goods inspection work process by the congestion reading of RFID using the stream data processing system of FIG. 図１のストリームデータ処理システムによるデータクラスタ切り出し処理の他の例を示すフローチャートである。7 is a flowchart illustrating another example of a data cluster cutout process performed by the stream data processing system of FIG. 1. 図１のストリームデータ処理システムによるストリームデータ処理の他の例を示す説明図である。It is explanatory drawing which shows the other example of the stream data processing by the stream data processing system of FIG. 本発明の一実施の形態によるセンサネットによるデータ処理の一例を示す説明図である。It is explanatory drawing which shows an example of the data processing by the sensor network by one embodiment of this invention. 本発明の一実施の形態によるデータ切り出し処理を実現するストリームデータ処理システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the stream data processing system which implement | achieves the data cut-out process by one embodiment of this invention. 図８のストリームデータ処理システムによるデータクラスタ切り出し処理の一例を示すフローチャートである。It is a flowchart which shows an example of the data cluster cut-out process by the stream data processing system of FIG. 図８のストリームデータ処理システムを用いたセンサネットによるストリームデータ処理の説明図である。It is explanatory drawing of the stream data processing by the sensor network using the stream data processing system of FIG. 本発明の一実施の形態によるネットワークモニタリングの一例を示す説明図である。It is explanatory drawing which shows an example of the network monitoring by one embodiment of this invention. 図１１のネットワークモニタリングを用いたデータ切り出し処理の説明図である。It is explanatory drawing of the data cut-out process using the network monitoring of FIG. 図１２のネットワークモニタリングによるデータ切り出し処理の一例を示すフローチャートである。It is a flowchart which shows an example of the data cut-out process by the network monitoring of FIG. 図１４は、本発明の一実施の形態によるストリームデータ分割システムの構成例を示すブロック図である。FIG. 14 is a block diagram showing a configuration example of a stream data division system according to an embodiment of the present invention. 図１２のネットワークモニタリングによるストリームデータ処理例を示す説明図である。It is explanatory drawing which shows the example of stream data processing by the network monitoring of FIG.

Explanation of symbols

１１０ストリームデータ処理システム
１１１データ処理定義登録ユーザインタフェース
１２０解析対象データ切り出し部
１２１解析対象データ判定部
１２２入力データ時刻格納領域
１２３データ時刻比較部
１２４解析対象データ挿入部
１２４Ｃデータ挿入部
１２５解析対象外データ削除部
１２６クラスタ時刻格納領域
１２７クラスタ番号格納領域
１２８時間しきい値保存領域
１３０データ解析部
１３１データ解析処理実行木
１３２解析対象データバッファ
１４１入力ストリーム
１４３，１４４結果ストリーム
２１０ストリームデータ分割システム
２１１クラスタ分割方法定義登録ユーザインタフェース
２１２クラスタ決定部
２１４Ｃデータ挿入部
２２０データベース
４１１グループ別格納領域管理表
４１２〜４１４グループ別格納領域
６０１〜６０３ウィンドウ
８１０タイミングウィンドウ
１３２０タイミングウィンドウ
１５０１ネットワークルータ
１５０２，１５０３，１５０４計算機
１５３０タイミングウィンドウ
１５４１，１５４２データクラスタ
１５５１，１５５２，１５６１データクラスタ DESCRIPTION OF SYMBOLS 110 Stream data processing system 111 Data processing definition registration user interface 120 Analysis object data extraction part 121 Analysis object data determination part 122 Input data time storage area 123 Data time comparison part 124 Analysis object data insertion part 124C Data insertion part 125 Data not to be analyzed Deletion unit 126 Cluster time storage area 127 Cluster number storage area 128 Time threshold storage area 130 Data analysis unit 131 Data analysis processing execution tree 132 Analysis target data buffer 141 Input stream 143, 144 Result stream 210 Stream data division system 211 Cluster division Method definition registration user interface 212 Cluster determination unit 214C Data insertion unit 220 Database 411 Group-specific storage area management tables 412 to 414 groups Storage areas 601 to 603 window 810 timing window 1320 timing window 1501 network routers 1502,1503,1504 computer 1530 timing window 1541, 1542, correspondingly data cluster 1551,1552,1561 data cluster

Claims

A stream data processing method for continuously analyzing a stream as a flow of time-series data, in which a plurality of data to which time stamps are attached continuously arrives in a line in ascending order of the time stamps. ,
When new data is input, it becomes the target of analysis processing at any point in the data cluster consisting of the time stamp of the new data and the partial sequence of data that arrives continuously on the time axis on the stream Calculating a time difference from the time stamp of the oldest data of the data belonging to the analysis target cluster;
When the calculated time difference is within an arbitrarily set time threshold, the new data is added to the analysis target cluster, and when the calculated time difference exceeds the time threshold, All data belonging to the analysis target cluster is deleted, a data cluster composed only of the new data is generated as the analysis target cluster, and each time the analysis target cluster is updated, it belongs to the analysis target cluster A stream data processing method comprising: analyzing the data set and processing the stream data.

The stream data processing method according to claim 1,
When new data is input, the newest data cluster in the analysis target cluster group consisting of the time stamp of the new data and the number of data clusters as many as the number of simultaneous target clusters specified from the new one. A step of calculating a time difference from a time stamp of the oldest data among data belonging to a certain latest analysis target cluster;
When the calculated time difference is within the time threshold, the new data is added to the latest analysis target cluster, and when the calculated time difference exceeds the time threshold, the analysis target cluster Delete all data belonging to the oldest analysis target cluster, which is the oldest one data cluster in the group, generate a data cluster composed only of the new data and set it as the latest analysis target cluster, and the analysis target cluster group A stream data processing method comprising: performing an analysis on a data set belonging to the analysis target cluster group each time an update occurs, and processing the stream data.

The stream data processing method according to claim 1,
When an arbitrary column is specified, each data is grouped in units of a data set having the same column value, and when new data is input, the data column value is referred to. The analysis target cluster for each group that is the analysis target cluster for each group to which the data belongs is extracted, and the time stamp of the new data and the time stamp of the oldest data among the data belonging to the group analysis target cluster Calculating a time difference;
When the calculated time difference is within the time threshold, the new data is added to the group-by-group analysis target cluster, and when the calculated time difference exceeds the time threshold, the group Delete all data belonging to another analysis target cluster, generate a data cluster consisting only of the new data, and make it a group analysis target cluster, and whenever the analysis target cluster group is updated, the analysis target cluster A stream data processing method comprising: analyzing a data set belonging to a cluster group and processing the stream data.

A stream data processing method for continuously analyzing a stream as a flow of time-series data, in which a plurality of data to which time stamps are attached continuously arrives in a line in ascending order of the time stamps. ,
When an arbitrary time threshold value is specified, when new data is input, the data consists of a time stamp of the new data and a partial sequence of data that continuously arrives on the time axis on the stream. In the cluster, calculating a time difference with the time stamp of the oldest data of the data belonging to the analysis target cluster to be analyzed at a certain time point;
When the calculated time difference is within an arbitrarily set time threshold, the new data is added to the analysis target cluster, and when the calculated time difference is within the time threshold, The new data is added to the analysis target cluster, and when the calculated time difference exceeds the time threshold, all data belonging to the analysis target cluster is deleted, and only the new data is configured. Generating a data cluster to be an analysis target cluster, and performing an analysis on the data set belonging to the analysis target cluster and processing stream data each time the analysis target cluster is updated Stream data processing method.

The stream data processing method according to claim 4, wherein:
When new data is input, if the time stamp of the new data and a specific number of simultaneous target clusters are specified by the user, an analysis consisting of the specified number of data clusters counted from the newer one A step of calculating a time difference from a time stamp of the newest data in the data belonging to the latest analysis target cluster which is the newest one data cluster among the target cluster group;
When the calculated time difference is within the time threshold, the new data is added to the latest analysis target cluster, and when the calculated time difference exceeds the time threshold, the analysis target Delete all data belonging to the oldest analysis target cluster to be the oldest one of the cluster group, generate a data cluster composed only of the new data, and set as the latest analysis target cluster, A stream data processing method, comprising: performing analysis on a data set belonging to the analysis target cluster group each time the analysis target cluster group is updated, and processing stream data.

The stream data processing method according to claim 4, wherein:
When new data is input, refer to the value of the specified column in the data, and when an arbitrary column is specified, each data is a unit of a data set having the same column value. Among the clusters grouped as follows, the analysis target cluster by group consisting of the analysis target cluster to which the data belongs is extracted, and the time stamp of the new data and the newest data among the data belonging to the analysis target cluster by group are extracted. Calculating a time difference from the time stamp;
When the calculated time difference is within the time threshold, the new data is added to the group-by-group analysis target cluster, and when the calculated time difference exceeds the time threshold, the group Delete all data belonging to another analysis target cluster, generate a data cluster composed only of the new data, and set it as a group analysis target cluster, each time an update of the group analysis target cluster group occurs, A stream data processing method comprising: performing analysis on a data set belonging to the analysis target cluster group and processing stream data.

A stream data processing system that continuously analyzes a stream that is a flow of time-series data that continuously arrives in a plurality of pieces of data to which time stamps are attached, arranged in a line in ascending order of the time stamps,
On the stream, a data analysis unit that performs an analysis process on a data set belonging to a data cluster composed of a partial sequence of data that continuously arrives on the time axis;
A time threshold storage area for storing the time threshold; and
When new data arrives, the time stamp of the new data is used as the time stamp of the oldest or newest data among the data belonging to the analysis target cluster that is the data cluster to be analyzed at a certain point in time. A data time comparison unit for determining before and after both times in comparison with a cluster time consisting of a time obtained by adding the time threshold to
When the time stamp is before the cluster time, the new data is added to the analysis target cluster at that time, and when the time stamp is after the cluster time, all the data belonging to the analysis target cluster at that time Data from the data analysis unit, a data cluster composed only of the new data as a new analysis target cluster, and an analysis target data determination unit that inserts the new data into the data analysis unit A featured stream data processing system.

The stream data processing system according to claim 7, wherein
It has a simultaneous analysis target cluster number storage area to store any specified number of simultaneous target clusters,
The data time comparison unit
When new data arrives, the time stamp of the new data is the time obtained by adding the time threshold to the oldest or newest data time stamp among the data belonging to the latest analysis target cluster. Compare with the cluster time to determine before and after both times,
When the time stamp is before the cluster time, the analysis target data determination unit adds the new data to the latest analysis target cluster at that time, and when the time stamp is after the cluster time, The oldest one data cluster among the latest analysis target clusters consisting of the newest one data cluster in the analysis target cluster group, which is the number of data clusters as many as the specified number of simultaneous target clusters, counting from the newest one. Delete all data belonging to the oldest analysis target cluster from the data analysis unit, make the data cluster consisting only of the new data a new latest analysis target cluster, and insert the new data into the data analysis unit And a stream data processing system.

The stream data processing system according to claim 7, wherein
When an arbitrary column is specified, each data is grouped in units of data sets in which the column values are the same, and the analysis target cluster for each group is classified by group. A storage area for each group that stores a cluster time obtained by adding the time threshold to the time stamp of the oldest or newest data in the data belonging to the analysis target cluster;
A storage area management unit for each group that stores a storage area management table for each group that holds the correspondence between the value of the column and the storage area for each group;
The data time comparison unit
When new data is input to the stream data processing system, the storage area for each group to which the data belongs is extracted from the storage area management table for each group to which the data belongs by referring to the column value of the data. Compare the time stamp and the cluster time stored in the group storage area to determine before and after both times,
The analysis target data determination unit
When the time stamp is before the cluster time, the new data is added to the group analysis target cluster at that time, and when the time stamp is later than the cluster time, analysis by group at the time is performed. Deleting all data belonging to the target cluster from the data analysis unit, setting a data cluster composed only of the new data as a new group analysis target cluster, and inserting the new data into the data analysis unit. A featured stream data processing system.

The stream data processing system according to claim 7, wherein
In a stream that is a flow of time series data in which a plurality of data to which time stamps are attached continuously arrives in a line in ascending order of the time stamps, from a partial sequence of data that continuously arrives on the time axis A cluster number storage area for storing the number of data clusters, and
If the time stamp is before the cluster time, the cluster number stored in the cluster number storage area at that time is added to the data and stored in the database, and the time stamp is after the cluster time In this case, add 1 to the cluster number stored in the cluster number storage area at that time, store the cluster number in the cluster number storage area, and add the cluster number after addition to the data to store the database The data time comparison unit comprises:
When new data arrives, the time threshold of the new data is added to the time stamp of the oldest or newest data in the latest cluster that is the latest data cluster at a certain time. A stream data processing system for determining before and after both times by comparing with a cluster time which is a determined time.

The data processing system of claim 10, wherein
The cluster determination unit
When new data arrives, the time stamp of the new data is compared with the cluster time. If the time stamp is before the cluster time, the new data is output, and the time stamp is the cluster time. If the time is later than the time, the stream data processing system outputs the new data after sending a signal indicating the delimiter of the data set.