JP5377897B2

JP5377897B2 - Stream data ranking query processing method and stream data processing system having ranking query processing mechanism

Info

Publication number: JP5377897B2
Application number: JP2008174086A
Authority: JP
Inventors: 格西澤; 常之今木; 俊彦樫山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2007-10-29
Filing date: 2008-07-03
Publication date: 2013-12-25
Anticipated expiration: 2028-07-03
Also published as: JP2009134689A

Abstract

<P>PROBLEM TO BE SOLVED: To (1) achieve a ranking calculating mechanism for maintaining compatibility in inserting stream data into a window and also in extinguishing data, and to (2) achieve a general-purpose interface for delivering a ranking calculation result to an application and an output mechanism according to the interface in order to secure the versatility of a stream data processing system. <P>SOLUTION: (1) A mechanism for managing ranking information using a sign of a stream tuple generated when stream data is inserted into, or deleted from, a window is provided. (2) A mechanism for generating only the differential information of ranking calculation results, a mechanism for adding ranking information according to a request, an interface for generating and outputting all ranking information from the differential information, a mechanism for generating all ranking calculation results, and an interface for using these mechanisms are provided. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、時々刻々と到来するストリームデータをリアルタイムに処理するストリームデータ処理システムにおける、ランキング計算方法、および該計算方法を有するストリームデータ処理システムに関する。 The present invention relates to a ranking calculation method and a stream data processing system having the calculation method in a stream data processing system that processes stream data that comes every moment in real time.

従来、企業情報システムのデータ管理の中心にはデータベース管理システム（以下、ＤＢＭＳとする）が位置づけられていた。ＤＢＭＳは、処理対象のデータをストレージに格納し、格納したデータに対してトランザクション処理に代表される高信頼な処理を実現している。これに対して、時々刻々と到着する大量のデータをリアルタイム処理するデータ処理システムに対する要求が高まっている。例えば、株取引を支援するファイナンシャルアプリケーションを考えた場合、株価の変動にいかに迅速に反応できるかがシステムの最重要の課題の一つである。従来のＤＢＭＳのように株式のデータを一旦記憶装置に格納してから、該格納データに関して検索を行うようなシステムでは、データの格納とそれに続く検索処理が株価変動のスピードに追いつくことができず、ビジネスチャンスを逃してしまうことになりかねない。例えば、米国特許５４９５６００号（特許文献１）では、記憶されているクエリが周期的に実行される機構を開示しているが、この機構においても前述の株価のようにデータが入ってきた瞬間にクエリを実行することが重要となる。すなわちクエリの実行周期とデータ処理のタイミングのずれが許容できないので、前記のファイナンシャルアプリケーションに代表されるリアルタイムデータ処理には適用が困難であった。Ｊａｖａ（Ｒ）に代表されるプログラミング言語を用いて、各種のリアルタイムアプリケーションを個別に作りこむアプローチは、開発期間の長期化、開発コストの高騰、該アプリケーションを利用する業務の変化への迅速な対応が難しいなどの問題があり、汎用のリアルタイムデータ処理機構が求められるようになっていた。 Conventionally, a database management system (hereinafter referred to as DBMS) has been positioned at the center of data management of enterprise information systems. The DBMS stores data to be processed in a storage and realizes highly reliable processing represented by transaction processing for the stored data. On the other hand, there is an increasing demand for a data processing system that processes a large amount of data that arrives every moment in real time. For example, when considering a financial application that supports stock trading, one of the most important issues of the system is how quickly it can react to fluctuations in stock prices. In a system in which stock data is temporarily stored in a storage device as in a conventional DBMS and a search is performed on the stored data, data storage and subsequent search processing cannot keep up with the speed of stock price fluctuations. , You could miss a business opportunity. For example, US Pat. No. 5,495,600 (Patent Document 1) discloses a mechanism in which a stored query is periodically executed. Even in this mechanism, the moment when data enters like the aforementioned stock price. It is important to execute the query. That is, since the difference between the query execution cycle and the data processing timing cannot be tolerated, it has been difficult to apply to the real-time data processing represented by the financial application. The approach of creating various real-time applications individually using a programming language such as Java (R) is a quick response to prolonged development time, rising development costs, and changes in operations that use the applications. However, there is a problem that it is difficult, and a general-purpose real-time data processing mechanism has been demanded.

このようなリアルタイムデータ処理に好適なデータ処理システムとして、ストリームデータ処理システムが提案されている。例えばＲ．Ｍｏｔｗａｎｉ、Ｊ．Ｗｉｄｏｍ、Ａ．Ａｒａｓｕ、Ｂ．Ｂａｂｃｏｃｋ、Ｓ．Ｂａｂｕ、Ｍ．Ｄａｔａｒ、Ｇ．Ｍａｎｋｕ、Ｃ．Ｏｌｓｔｏｎ、Ｊ．Ｒｏｓｅｎｓｔｅｉｎ、ａｎｄＲ．Ｖａｒｍａ著：“ＱｕｅｒｙＰｒｏｃｅｓｓｉｎｇ、ＲｅｓｏｕｒｃｅＭａｎａｇｅｍｅｎｔ、ａｎｄＡｐｐｒｏｘｉｍａｔｉｏｎｉｎａＤａｔａＳｔｒｅａｍＭａｎａｇｅｍｅｎｔＳｙｓｔｅｍ”、ＩｎＰｒｏｃ．ｏｆｔｈｅ２００３Ｃｏｎｆ．ｏｎＩｎｎｏｖａｔｉｖｅＤａｔａＳｙｓｔｅｍｓＲｅｓｅａｒｃｈ（ＣＩＤＲ）、Ｊａｎｕａｒｙ２００３（非特許文献１）にストリームデータ処理システムＳＴＲＥＡＭが開示されている。 A stream data processing system has been proposed as a data processing system suitable for such real-time data processing. For example, R.A. Motwani, J.M. Widom, A. Arasu, B.H. Babcock, S.M. Babu, M.M. Data, G.G. Manku, C.I. Olston, J.M. Rosenstein, and R.R. By Varma: “Query Processing, Resource Management, and Application in a Data Stream Management System”, In Proc. of the 2003 Conf. A stream data processing system STREAM is disclosed in on Innovative Data Systems Research (CIDR) and January 2003 (Non-Patent Document 1).

ストリームデータ処理システムでは、従来のＤＢＭＳとは異なり、まずクエリ（問合せ）をシステムに登録し、データの到来と共に該クエリが継続的に実行される。ここでのストリームデータとは、映像ストリームのような論理的に継続する一つの大きなデータではなく、ファイナンシャルアプリケーションにおける株価配信データ、小売業でのＰＯＳデータ、交通情報システムにおけるプローブカーデータ、計算機システム管理におけるエラーログ、センサやＲＦＩＤなどのユビキタスデバイスから発生するセンシングデータなど、比較的小さな論理的には独立した大量の時系列データである。ストリームデータは継続してシステムに到着し続けるため、その終わりを待ってから処理を開始するのでは実時間での処理は不可能である。また、システムに到着したデータは、データ処理の負荷に影響されることなく、その到着順を守って処理する必要がある。前記ＳＴＲＥＡＭでは、システムに到来し続けるストリームデータを、最新１０分間などの時間の幅、もしくは最新１０００件などの個数の幅を指定してストリームデータの一部を切り取りながらリアルタイムの処理を実現するため、スライディングウィンドウ（以下単にウィンドウと呼ぶ）と呼ばれる概念を導入している。ウィンドウ指定を含むクエリの記述言語の好適な例としては非特許文献１に開示されているＣＱＬ（ＣｏｎｔｉｎｕｏｕｓＱｕｅｒｙＬａｎｇｕａｇｅ）をあげることができる。ＣＱＬは、ＤＢＭＳで広く用いられているＳＱＬ（ＳｔｒｕｃｔｕｒｅｄＱｕｅｒｙＬａｎｇｕａｇｅ）のＦＲＯＭ句に、ストリーム名に続いて括弧を用いることにより、ウィンドウを指定する拡張が施されている。ＳＱＬに関しては、Ｃ．Ｊ．Ｄａｔｅ、ＨｕｇｈＤａｒｗｅｎ著：“ＡＧｕｉｄｅｔｏＳＱＬＳｔａｎｄａｒｄ（４ｔｈＥｄｉｔｉｏｎ）”、Ａｄｄｉｓｏｎ−ＷｅｓｌｅｙＰｒｏｆｅｓｓｉｏｎａｌ；４ｅｄｉｔｉｏｎ（Ｎｏｖｅｍｂｅｒ８、１９９６）、ＩＳＢＮ：０２０１９６４２６０（非特許文献２）が詳しい。 In a stream data processing system, unlike a conventional DBMS, a query is first registered in the system, and the query is continuously executed as data arrives. The stream data here is not one large logically continuous data such as a video stream, but stock price distribution data in a financial application, POS data in a retail business, probe car data in a traffic information system, computer system management Is a relatively small amount of time-series data that is logically independent such as an error log and sensing data generated from a ubiquitous device such as a sensor or RFID. Since stream data continues to arrive at the system, processing in real time is impossible if processing is started after waiting for the end of the stream data. In addition, data that arrives at the system must be processed in the order of arrival without being affected by the data processing load. In the STREAM, in order to realize real-time processing of stream data that continues to arrive in the system by specifying a time width such as the latest 10 minutes or a number of widths such as the latest 1000, and cutting out part of the stream data. A concept called a sliding window (hereinafter simply referred to as a window) is introduced. As a suitable example of a query description language including window specification, CQL (Continuous Query Language) disclosed in Non-Patent Document 1 can be cited. CQL is extended by specifying a window by using parentheses following a stream name in the FROM clause of SQL (Structured Query Language) widely used in DBMS. Regarding SQL, C.I. J. et al. Date, Hugh Darwen: “A Guide to SQL Standard (4th Edition)”, Addison-Wesley Professional; 4 edition (November 8, 1996), ISBN: 0201964260 (non-patent document).

図１２のクエリ１２０１は非特許文献１の２．１節に示されているＣＱＬによるクエリの例である。該クエリでは、あるＷｅｂプロキシサーバにおいて、ドメインｓｔａｎｆｏｒｄ．ｅｄｕからの現時点から過去１日分のアクセスの総数を計算する。Ｒｅｑｕｅｓｔsは前記Ｗｅｂプロキシサーバに到来し続けるＷｅｂアクセスデータであり、従来のＤＢＭＳで取り扱うテーブル（表）のような静止化されたデータではなく、切れ目のないストリームデータとなる。そのため、アクセスの総数を計算は、ウィンドウ指定“［Ｒａｎｇｅ１ＤａｙＰｒｅｃｅｄｉｎｇ］”による、ストリームデータのどの部分を対象とするかの指定なしでは、不可能となる。ウィンドウによって切り取られたストリームデータはメモリ上に保持され、クエリ処理に使用される。代表的なウィンドウの指定方法には、ウィンドウの幅を時間で指定するＲａｎｇｅウィンドウと、ウィンドウの幅をデータ数で指定するＲｏｗウィンドウがある。例えば、Ｒａｎｇｅウィンドウを用いて、［Ｒａｎｇｅ１０ｍｉｎｕｔｅｓ］とすると、最新の１０分間分がクエリ処理の対象となり、Ｒｏｗウィンドウを用いて［Ｒｏｗｓ１０］とすると、最新の１０件がクエリ処理の対象となる。 A query 1201 in FIG. 12 is an example of a query by CQL shown in section 2.1 of Non-Patent Document 1. In this query, a domain proxy. The total number of accesses for the past day from the current time from edu is calculated. Requests is Web access data that continues to arrive at the Web proxy server, and is not static data such as a table handled in a conventional DBMS, but is stream data without interruption. Therefore, the total number of accesses cannot be calculated without specifying which part of the stream data is targeted by the window specification “[Range 1 Day Preceding]”. The stream data cut by the window is held in the memory and used for query processing. As a typical window designation method, there are a range window for designating a window width by time and a row window for designating a window width by the number of data. For example, if [Range 10 minutes] is selected using the Range window, the latest 10 minutes are subject to query processing. If [Rows 10] is selected using the Row window, the latest 10 items are subject to query processing. Become.

ストリームデータ処理システムは、ファイナンシャルアプリケーション、小売業での売り上げモニタリング、交通情報システム、計算機システム管理に代表される、リアルタイム処理が必要とされる応用に対する適用が期待されている。以下、リアルタイム処理を必要とする応用をリアルタイムアプリケーションと呼ぶ。リアルタイムアプリケーションにおいては、膨大な情報から瞬時に重要度の高い情報を取り出すために、ある瞬間でのランキング計算が必要とされる場合が多い。例えばファイナンシャルアプリケーションでは、株価の値動きや取引量が大きい株に注目するためのランキング情報が重要であり、小売業の売り上げモニタリングでは、店舗別、商品別など様々な角度からの売上高、売上数のランキング情報が注目される。また、交通情報システムでは、渋滞度が高い、通行量が多い地区に注目するためのランキング情報が必要となり、計算機管理においても、重大なエラーの数、アクセス数など、管理対象の優先度をつけるためのランキング情報が必須となる。 Stream data processing systems are expected to be applied to applications that require real-time processing, such as financial applications, retail sales monitoring, traffic information systems, and computer system management. Hereinafter, an application that requires real-time processing is referred to as a real-time application. In real-time applications, it is often necessary to perform ranking calculation at a certain moment in order to quickly extract highly important information from a huge amount of information. For example, in financial applications, ranking information to focus on stock price fluctuations and stocks with large trading volumes is important. In retail sales monitoring, sales and sales figures from various angles, such as by store and by product, are important. Ranking information attracts attention. In addition, traffic information systems require ranking information to focus on areas with high traffic congestion and high traffic volume. In computer management, prioritize management targets such as the number of serious errors and the number of accesses. Ranking information is essential.

ランキング計算対象のデータが静止化されている場合、すなわちランキング付けしようとするデータが変更されない場合には、該データをランキング付けしようとするキー（以下、ランキングキー）でソーティングし、ソーティング結果の順位に従って、データを出力すればよい。例えば、データベースに格納されている株価の売上高の上位１０位のランキング情報を計算する際には、その日の各銘柄別の売上高を集計し、売上高をランキングキーとして集計した結果をソートし、上位の１０件を選択して出力すればよい。ユーザが投入したクエリからランキングキー（前述の例では売上高）を自動的に決定する方法が米国特許７２５１６４８号（特許文献２）で開示されている。米国公開特許ＵＳ２００６／０２５９４５７号（特許文献３）では、ＤＢＭＳのクエリで最初のｎ行のみを出力する指定があった場合に、クエリ処理時に条件を追加することによって余分なデータ処理のコストを削減する方法が開示されている。また、前記ＳＱＬでは銘柄別の分類のためのＧＲＯＵＰＢＹ句、売上高の総計を計算するための集計関数ＳＵＭ、集計値に基づいてソーティングを実行するＯＲＤＥＲＢＹ句が準備されている。これらを組合せることで、データベースに格納されている一日の株取引データから売上高の高い順（もしくは低い順）のランキング計算結果を生成することができる。 When the ranking calculation target data is static, that is, when the data to be ranked is not changed, the data is sorted by the key for ranking (hereinafter, ranking key), and the ranking of the sorting results According to the above, data may be output. For example, when calculating the top 10 ranking information of sales of stock prices stored in the database, the sales for each brand on that day are aggregated, and the results of aggregation using the sales as a ranking key are sorted. The top 10 items may be selected and output. US Pat. No. 7,251,648 (Patent Document 2) discloses a method for automatically determining a ranking key (sales amount in the above example) from a query input by a user. US published patent US2006 / 0259457 (patent document 3) reduces the cost of extra data processing by adding a condition during query processing when there is a specification to output only the first n rows in a DBMS query A method is disclosed. In the SQL, a GROUP BY phrase for classification by brand, an aggregation function SUM for calculating the total sales amount, and an ORDER BY phrase for performing sorting based on the aggregation value are prepared. By combining these, it is possible to generate ranking calculation results in descending order of sales (or in ascending order) from daily stock transaction data stored in the database.

しかしながら、前記リアルタイムアプリケーションにおいては、新しいデータ（ストリームデータ）が次々に到来し続けるため、その静止化は困難である。ＤＢＭＳを用いて、リアルタイムアプリケーション向けのランキング計算を実施しようとする場合、ストリームデータが到来するごとに該データをＤＢＭＳに格納し、ＤＢＭＳで前記の分類、集計、ソーティング処理を行う必要がある。これらの処理では、基本的にデータベース内の大量のデータにアクセスする必要があり、処理コストが高い。そのため、リアルタイムアプリケーションから発生するストリームデータが高速で到来する場合、すなわちストリームデータの到来する時間間隔が短い場合には、該時間間隔内での処理の実行は不可能であり、ＤＢＭＳを用いたリアルタイムアプリケーション向けのランキング計算の実現は困難であった。 However, in the real-time application, since new data (stream data) continues to arrive one after another, it is difficult to make it static. When a ranking calculation for a real-time application is to be performed using a DBMS, it is necessary to store the data in the DBMS every time stream data arrives, and perform the above-described classification, aggregation, and sorting processing by the DBMS. In these processes, it is basically necessary to access a large amount of data in the database, and the processing cost is high. For this reason, when stream data generated from a real-time application arrives at high speed, that is, when the time interval at which stream data arrives is short, it is impossible to execute processing within the time interval, and real-time using DBMS Realizing ranking calculation for applications has been difficult.

前述したように、ストリームデータ処理システムでは、無限に続くストリームデータから、処理の対象を前述のウィンドウで切り取って処理している。処理対象のデータは、ウィンドウ中に存在するデータのみであり、ウィンドウから押し出されたデータはランキング処理の対象から削除する必要がある。ウィンドウからデータが押し出されるタイミングは、ウィンドウの指定方法が時間である（前述のＲａｎｇｅウィンドウ）か、件数である（前述のＲｏｗウィンドウ）かによって異なる。件数指定の場合、処理対象のデータがウィンドウから押し出される時刻は、該データがウィンドウに入った瞬間には決定できず、後続のストリームデータによって決定される。一方、時間指定の場合には、処理対象のデータがウィンドウから押し出される時刻は、該データがウィンドウに入った瞬間に決定できるが、その消去タイミング（ウィンドウから押し出されるタイミング）は、後続のデータとは同期しない。 As described above, in the stream data processing system, the processing target is cut out from the infinite stream data and processed in the above-described window. The data to be processed is only the data existing in the window, and the data pushed out from the window needs to be deleted from the target of the ranking process. The timing at which data is pushed out of the window differs depending on whether the window designation method is time (the aforementioned Range window) or the number of cases (the aforementioned Row window). In the case of specifying the number of cases, the time at which the data to be processed is pushed out of the window cannot be determined at the moment when the data enters the window, but is determined by the subsequent stream data. On the other hand, in the case of time designation, the time when the data to be processed is pushed out from the window can be determined at the moment when the data enters the window, but the erasure timing (timing pushed out from the window) Does not synchronize.

ランキング計算においては、ウィンドウへのストリームデータの挿入の都度、ランキング計算を実行してランキング情報の整合性を保つ必要がある。それに加えて、ウィンドウからのデータの消滅の際にも同様にランキング情報の整合性を保つことが必要となる。とくにウィンドウが時間指定の場合には、後続のデータ到来とは同期しないウィンドウからのデータの消滅タイミングを考慮してランキング計算を実行する必要がある。 In ranking calculation, it is necessary to maintain ranking information consistency by executing ranking calculation every time stream data is inserted into a window. In addition, it is necessary to maintain the consistency of ranking information when data disappears from the window. In particular, when a window is designated for time, it is necessary to perform ranking calculation in consideration of the disappearance timing of data from a window that is not synchronized with the arrival of subsequent data.

さらに、ランキング計算では、処理の効率化によって、ストリームデータ処理システム利用の重要な目的の一つであるリアルタイム処理の制約を守る必要がある。加えて、ストリームデータ処理システムは汎用のデータ処理基盤であるため、ランキング計算結果の差分情報のみを渡す、ランキング計算結果全体を渡す、ランキング計算結果に順位情報を含めるなど利用するアプリケーションの要求に応えるための汎用のインタフェース、そのインタフェースに従う処理を実現する機構を準備する必要がある。以上の条件を満足するストリームデータ処理システム向けのランキング計算方法はこれまで実現されていなかった。 Furthermore, in the ranking calculation, it is necessary to observe the restrictions of real-time processing, which is one of the important purposes of using the stream data processing system, by improving the processing efficiency. In addition, since the stream data processing system is a general-purpose data processing platform, it responds to the demands of applications to use, such as passing only difference information of ranking calculation results, passing the entire ranking calculation results, and including ranking information in ranking calculation results. It is necessary to prepare a general-purpose interface and a mechanism for realizing processing according to the interface. A ranking calculation method for a stream data processing system that satisfies the above conditions has not been realized so far.

米国特許５４９５６００号US Pat. No. 5,495,600 米国特許７２５１６４８号US Pat. No. 7,251,648 米国公開特許ＵＳ２００６／０２５９４５７号US Published Patent US2006 / 0259457 特開２００６−３３８４３２号JP 2006-338432 A Ｒ．Ｍｏｔｗａｎｉ、Ｊ．Ｗｉｄｏｍ、Ａ．Ａｒａｓｕ、Ｂ．Ｂａｂｃｏｃｋ、Ｓ．Ｂａｂｕ、Ｍ．Ｄａｔａｒ、Ｇ．Ｍａｎｋｕ、Ｃ．Ｏｌｓｔｏｎ、Ｊ．Ｒｏｓｅｎｓｔｅｉｎ、ａｎｄＲ．Ｖａｒｍａ著：“ＱｕｅｒｙＰｒｏｃｅｓｓｉｎｇ、ＲｅｓｏｕｒｃｅＭａｎａｇｅｍｅｎｔ、ａｎｄＡｐｐｒｏｘｉｍａｔｉｏｎｉｎａＤａｔａＳｔｒｅａｍＭａｎａｇｅｍｅｎｔＳｙｓｔｅｍ”、ＩｎＰｒｏｃ．ｏｆｔｈｅ２００３Ｃｏｎｆ．ｏｎＩｎｎｏｖａｔｉｖｅＤａｔａＳｙｓｔｅｍｓＲｅｓｅａｒｃｈ（ＣＩＤＲ）、Ｊａｎｕａｒｙ２００３R. Motwani, J.M. Widom, A. Arasu, B.H. Babcock, S.M. Babu, M.M. Data, G.G. Manku, C.I. Olston, J.M. Rosenstein, and R.R. By Varma: “Query Processing, Resource Management, and Application in a Data Stream Management System”, In Proc. of the 2003 Conf. on Innovative Data Systems Research (CIDR), January 2003 Ｃ．Ｊ．Ｄａｔｅ、ＨｕｇｈＤａｒｗｅｎ著：“ＡＧｕｉｄｅｔｏＳＱＬＳｔａｎｄａｒｄ（４ｔｈＥｄｉｔｉｏｎ）”、Ａｄｄｉｓｏｎ−ＷｅｓｌｅｙＰｒｏｆｅｓｓｉｏｎａｌ；４ｅｄｉｔｉｏｎ（Ｎｏｖｅｍｂｅｒ８、１９９６）、ＩＳＢＮ：０２０１９６４２６０C. J. et al. Date, Hugh Darwen: “A Guide to SQL Standard (4th Edition)”, Addison-Wesley Professional; 4 edition (November 8, 1996), ISBN: 0201964260.

リアルタイムアプリケーションで必要となるランキング計算を実現するためにストリームデータ処理システムを用いる場合、ウィンドウへのストリームデータの挿入に加えて、該ストリームデータの消滅の際にも整合性を保つランキング処理の実現が必要となる。また、リアルタイムと呼べるランキング処理結果を得るには、処理対象ウィンドウの内部データの時々刻々の変化の度に実行するランキング更新の処理を高速化する必要がある。 When using a stream data processing system to achieve the ranking calculation required for real-time applications, in addition to inserting stream data into a window, it is possible to realize ranking processing that maintains consistency even when the stream data disappears Necessary. In order to obtain a ranking process result that can be called real time, it is necessary to speed up the ranking update process that is executed every time the internal data of the processing target window changes.

本発明の目的は、処理対象ウィンドウの内部データの時々刻々の変化の度に行なうランキング更新の演算を高速化でき、しかも処理結果の整合性を保つランキング処理方法及びシステムを提供するにある実現することにある。 An object of the present invention is to provide a ranking processing method and system capable of speeding up the ranking update operation that is performed every time the internal data of the processing target window changes, and that maintains the consistency of the processing results. There is.

本願で開示される発明のうち、代表的な発明の概要は以下の通りである。すなわち、代表的実施態様ではウィンドウへのストリームデータの挿入、削除の度毎に、すなわちあるストリームタプルの生存期間の開始、及びあるストリームタプルの生存期間の終了の度毎に、生存期間にあるストリームタプルの範囲内でそれらのランキングを生成・更新し、かつ出力指定された順位の範囲を越えて、生存期間にあるストリームタプルの範囲内でバッファに保存するストリームデータ処理を採用する。 Among the inventions disclosed in this application, the outline of typical inventions is as follows. That is, in a typical embodiment, every time stream data is inserted into or deleted from a window, that is, every time a stream tuple starts its lifetime and every time a stream tuple lives, the stream that is in its lifetime Stream data processing that generates and updates those rankings within the range of tuples and stores them in a buffer within the range of stream tuples that are in the lifetime beyond the rank range specified for output is adopted.

ある時点でのランキング情報の出力だけから言えば、ランキング情報の保存は出力指定された順位のストリームタプルの範囲で行なえば一見充分であるかに思われる。しかしながら、新たなストリームタプルの受け付けに起因して、ウィンドウ内のストリームタプルへの挿入および削除が生じ、その挿入、削除の度にランキングそのものが変動する。したがって、整合性のあるランキング計算を継続して実行するには、その挿入・削除の度毎にランキング情報の更新が必要であり、かつ出力指定された順位の範囲を越えて、生存期間にあるストリームタプルの範囲内のストリームタプル及びそれらのランキング情報を保存する必要がある。 Speaking only from the output of ranking information at a certain point in time, it seems that it is sufficient to store the ranking information if it is performed within the range of stream tuples in the rank specified for output. However, due to acceptance of a new stream tuple, insertion and deletion into the stream tuple in the window occur, and the ranking itself changes each time the insertion and deletion are performed. Therefore, in order to continue to perform consistent ranking calculation, it is necessary to update the ranking information for each insertion / deletion, and the lifetime is beyond the rank range specified for output. It is necessary to store stream tuples within the range of stream tuples and their ranking information.

さらに上記代表的実施態様は、受け付けたストリームタプルをランキング計算の処理対象としてウィンドウへ挿入し、該ストリームタプルのウィンドウ内での生存期間を決定し、生存期間の終了時には上記ウィンドウからの削除を行なうウィンドウマネージャと、ランキング計算を行うランキング処理モジュールとの２段階の処理機構を持ち、上記ウィンドウマネージャは、ウィンドウ内のストリームタプル全体の情報ではなく、刻々の変化部分を示すウィンドウ差分情報を上記ランキング処理モジュールに伝達し、上記ランキング処理モジュールは、伝達された上記ウィンドウ差分情報と、過去にランキング計算を行って保存した保存情報とを用いてランキングの更新を行う構成とした点に特徴を有する。具体的には当該ストリームタプルがウィンドウに挿入されたことを示す符号を付加したストリームタプル、および当該ストリームタプルがウィンドウから削除されたことを示す符号を付加したストリームタプルが上記のウィンドウ差分情報としてランキング処理モジュールに伝達され、ランキング処理モジュールでは該差分情報に基づいてランキング情報を更新し、ランキング情報保持バッファに保存するとともに、指定された形式でランキング出力情報を出力する。 Further, the representative embodiment inserts the received stream tuple into a window as a processing target for ranking calculation, determines the lifetime of the stream tuple in the window, and deletes the stream tuple from the window at the end of the lifetime. It has a two-stage processing mechanism of a window manager and a ranking processing module that performs ranking calculation, and the window manager performs not only the information of the entire stream tuple in the window but the window difference information indicating the changing part every moment. The ranking processing module is transmitted to the module, and the ranking processing module is characterized in that the ranking is updated using the transmitted window difference information and saved information that has been calculated and saved in the past. Specifically, the stream tuple to which the code indicating that the stream tuple has been inserted into the window and the stream tuple to which the code indicating that the stream tuple has been deleted from the window are ranked as the window difference information. The ranking processing module updates ranking information based on the difference information, saves it in the ranking information holding buffer, and outputs ranking output information in a designated format.

本発明を用いることにより、時々刻々と到来する大量データをリアルタイム処理するストリームデータ処理システムにおいて、入力されるストリームデータとの整合性を保ち、かつ高速、高効率のランキング計算が実現できる。本ランキング計算方法の適用により、リアルタイムアプリケーションで共通に利用可能なデータ処理基盤が提供できる。 By using the present invention, in a stream data processing system that processes a large amount of data that arrives from moment to moment in real time, it is possible to achieve high-speed and high-efficiency ranking calculation while maintaining consistency with input stream data. By applying this ranking calculation method, it is possible to provide a data processing infrastructure that can be commonly used in real-time applications.

図１および図１６に、本発明のストリームデータ処理システムの好適な実現例を示す。アプリケーション１（１０２）を実行するクライアント計算機１０１、アプリケーション２（１０４）を実行するクライアント計算機１０３は、ネットワーク１０７を介してストリームデータ処理システム１１５に接続されている。ネットワーク１０７は、イーサネット、光ファイバなどで接続されるローカルエリアネットワーク（ＬＡＮ）、もしくはＬＡＮよりも低速なインターネットを含んだワイドエリアネットワーク（ＷＡＮ）でも差し支えない。また、クライアント計算機はパーソナルコンピュータ、ブレード型の計算機システムなどの任意のコンピュータシステムでよい。 1 and 16 show a preferred implementation example of the stream data processing system of the present invention. A client computer 101 that executes application 1 (102) and a client computer 103 that executes application 2 (104) are connected to a stream data processing system 115 via a network 107. The network 107 may be a local area network (LAN) connected by Ethernet, optical fiber, or the like, or a wide area network (WAN) including the Internet that is slower than the LAN. The client computer may be an arbitrary computer system such as a personal computer or a blade type computer system.

本実施例のストリームデータ処理システムが稼動する計算機をストリームデータ処理サーバと呼ぶ。図１６に示すように、ストリームデータ処理サーバ１６０１は、イーサネットアダプタなどの通信インタフェース１６０２、ＣＰＵ１６０３、メモリ１６０４、およびＩ／Ｏインタフェース１６０５を備えた計算機であり、ブレード型計算機システム、ＰＣサーバなどの任意のコンピュータシステムでよい。ストリームデータ処理サーバでは、前記通信インタフェースを介して前記クライアント計算機、後述するデータソースにアクセスする。ストリームデータ処理サーバで、ストリームデータ処理結果、処理の中間結果、システム動作に必要な設定データを不揮発性のストレージに格納する場合には、ストリームデータ処理サーバに接続したストレージ装置１６０６を用いることができる。ストレージ装置１６０６は、ストリームデータ処理サーバのＩ／Ｏインタフェースを介して直接接続されるか、もしくは通信インタフェースを介してネットワーク接続される。 A computer on which the stream data processing system of this embodiment operates is called a stream data processing server. As shown in FIG. 16, the stream data processing server 1601 is a computer having a communication interface 1602 such as an Ethernet adapter, a CPU 1603, a memory 1604, and an I / O interface 1605, and is an arbitrary one such as a blade computer system or a PC server. Computer system. The stream data processing server accesses the client computer and a data source described later through the communication interface. When the stream data processing server stores the stream data processing result, the intermediate result of the processing, and the setting data necessary for system operation in the nonvolatile storage, the storage device 1606 connected to the stream data processing server can be used. . The storage device 1606 is directly connected via the I / O interface of the stream data processing server, or connected to the network via the communication interface.

ストリームデータ処理システム１１５は、前記ストリームデータ処理サーバ１６０１の上で動作する。図１にストリームデータ処理システムの主要構成要素を示す。アプリケーションは、まずストリームデータ処理システムにクエリを登録する（１０５）。登録されたクエリはクエリ管理テーブル（１１２）に格納される。クエリ登録の詳細な手順、ストリームデータ処理システム内部のデータの格納方法、格納形式、クエリを受け付けた後の解析方法、最適化方法、システムへの登録方法、ストリームデータ処理システムへのストリームの登録方法、システム内のデータ保持方法については、特開２００６−３３８４３２号「ストリームデータ処理システムのクエリ処理方法」（特許文献４）に、その好適な実施の方法が開示されている。クエリ管理テーブルは、ストリームデータ処理サーバ上のメモリ１６０４に保持するのでも、ストリームデータ処理サーバに接続されているストレージ装置１６０６に格納するのでも差し支えない。 The stream data processing system 115 operates on the stream data processing server 1601. FIG. 1 shows the main components of the stream data processing system. The application first registers a query in the stream data processing system (105). The registered query is stored in the query management table (112). Detailed procedure for query registration, data storage method inside stream data processing system, storage format, analysis method after receiving query, optimization method, system registration method, stream data processing system registration method As for the data holding method in the system, Japanese Patent Application Laid-Open No. 2006-338432 “Query processing method of stream data processing system” (Patent Document 4) discloses a suitable method of implementation. The query management table may be held in the memory 1604 on the stream data processing server or stored in the storage device 1606 connected to the stream data processing server.

ストリームデータ処理システムには、一つ以上のストリームデータソースであるストリームデータソース１（１２２）〜ストリームデータソースＮ（１２３）からネットワーク１２１を介して時々刻々と大量のデータが到来する。このデータをストリームデータと呼ぶ。ストリームデータの好適な例としては、ファイナンシャルアプリケーションにおける株価配信情報、小売業でのＰＯＳデータ、交通情報システムにおけるプローブカー情報、計算機システム管理におけるエラーログなどが挙げられる。ストリームデータ処理システムでは、通信インタフェース１６０２を介してデータフローマネージャ１１９が受け付けたストリームデータをクエリ処理エンジン１１３にフィードする。 A large amount of data comes to the stream data processing system from the stream data source 1 (122) to the stream data source N (123), which are one or more stream data sources, via the network 121 every moment. This data is called stream data. Preferable examples of stream data include stock price distribution information in a financial application, POS data in a retail business, probe car information in a traffic information system, an error log in computer system management, and the like. In the stream data processing system, the stream data received by the data flow manager 119 is fed to the query processing engine 113 via the communication interface 1602.

前述したように、継続して到来する比較的小さな論理的には独立した大量の時系列データであるストリームデータを取り扱うために、ストリームデータ処理システムではウィンドウを用いる。図１のウィンドウマネージャ１２６は、到来するストリームデータに対して、クエリで指定されたウィンドウ演算を適用してストリームタプルを生成し、ストリームタプルのシステム内での生存期間を設定する。ストリームデータがウィンドウに挿入された時が生存期間の開始時刻、そしてウィンドウから該ストリームデータが削除される時が生存期間の終了時刻に相当する。 As described above, a window is used in a stream data processing system in order to handle stream data that is a relatively small amount of logically independent time series data that continuously arrives. The window manager 126 in FIG. 1 generates a stream tuple by applying the window operation specified by the query to the incoming stream data, and sets the lifetime of the stream tuple in the system. The time when the stream data is inserted into the window corresponds to the start time of the lifetime, and the time when the stream data is deleted from the window corresponds to the end time of the lifetime.

図１５を用いて、ウィンドウマネージャの構成と動作内容を説明する。ウィンドウマネージャ１２６は、ストリームデータ受付インタフェース１５０５、ストリームタプル保持バッファ１５０２、生存期間決定部１５０４、および差分情報生成部１５０６から構成される。ストリームデータ受付インタフェースは、受け付けたストリームデータの構成要素であるストリームタプルを、ストリームタプル保持バッファ１５０２に格納するとともに、生存期間決定部１５０６に格納したことを伝達する。生存期間決定部１５０６は前記ウィンドウ演算により各ストリームタプルの生存期間を決定し、生存期間が終了するストリームプルをストリーム保存バッファ１５０２から消去する。差分情報生成部１５０４は、ストリームタプルが該ストリームタプル保持バッファに格納されたタイミングで、プラスタプルを生成し、差分情報として出力する（１５０８）。同様に、ストリームタプルがストリームタプル保持バッファから消去されるタイミング（ストリームタプルの生存期間が終了するタイミング）でマイナスタプルを生成し、同様に差分情報として出力する（１５０８）。 The configuration and operation contents of the window manager will be described with reference to FIG. The window manager 126 includes a stream data reception interface 1505, a stream tuple holding buffer 1502, a lifetime determination unit 1504, and a difference information generation unit 1506. The stream data reception interface stores that the stream tuple as a component of the received stream data is stored in the stream tuple holding buffer 1502 and stored in the lifetime determination unit 1506. The lifetime determination unit 1506 determines the lifetime of each stream tuple by the window operation, and erases the stream pull whose lifetime has ended from the stream storage buffer 1502. The difference information generation unit 1504 generates a plus tuple at the timing when the stream tuple is stored in the stream tuple holding buffer, and outputs it as difference information (1508). Similarly, a minus tuple is generated at the timing at which the stream tuple is deleted from the stream tuple holding buffer (the timing at which the lifetime of the stream tuple ends), and similarly output as difference information (1508).

図１６に示すように、ストリームタプル保持バッファ１５０２は、クエリ処理エンジン１１３内のメモリマネージャ１１７によって割り当てられるメモリ上に配置される。該メモリは、ストリームデータ処理サーバ１６０１上のメモリ１６０４、もしくは要求される性能要件、信頼性要件に応じて、ストリームデータ処理サーバに接続されたストレージ装置１６０６、もしくはストリームデータ処理サーバとネットワークで接続される、ストリームデータ処理サーバと同様の計算機資源を保持するサーバ計算機（図１のブロック１３３）上のメモリを利用してもよい。 As shown in FIG. 16, the stream tuple holding buffer 1502 is arranged on a memory allocated by the memory manager 117 in the query processing engine 113. The memory is connected to the memory 1604 on the stream data processing server 1601 or the storage device 1606 connected to the stream data processing server or the stream data processing server via a network according to the required performance requirement and reliability requirement. Alternatively, a memory on a server computer (block 133 in FIG. 1) that holds the same computer resources as the stream data processing server may be used.

次に、図２を用いてランキング処理モジュールの構成を説明する。ランキング処理モジュール１１６はストリームタプル受付インタフェース２０６、順序・順位生成部２０４、ランキング情報保持バッファ２０２、ランキング情報保持テーブル２０３、順位管理インデックス２０８、およびランキング処理結果出力インタフェース２０７から構成される。図１６に示すように、ランキング処理モジュールは、クエリ処理エンジン１１３内のメモリマネージャ１１７によって割り当てられるメモリ上に配置される。該メモリは、ストリームデータ処理サーバ１６０１上のメモリ１６０４、もしくは要求される性能要件、信頼性要件に応じて、ストリームデータ処理サーバに接続されたストレージ装置１６０６、もしくはストリームデータ処理サーバとネットワークで接続される、ストリームデータ処理サーバと同様の計算機資源を保持するサーバ計算機（図１のブロック１３３）上のメモリを利用してもよい。 Next, the configuration of the ranking processing module will be described with reference to FIG. The ranking processing module 116 includes a stream tuple reception interface 206, an order / rank generation unit 204, a ranking information holding buffer 202, a ranking information holding table 203, a ranking management index 208, and a ranking processing result output interface 207. As shown in FIG. 16, the ranking processing module is arranged on a memory allocated by the memory manager 117 in the query processing engine 113. The memory is connected to the memory 1604 on the stream data processing server 1601 or the storage device 1606 connected to the stream data processing server or the stream data processing server via a network according to the required performance requirement and reliability requirement. Alternatively, a memory on a server computer (block 133 in FIG. 1) that holds the same computer resources as the stream data processing server may be used.

図１７に本発明のストリームデータ処理における、ランキング情報生成のシーケンスを示す。前記ウィンドウマネージャ１２６は外部データソースから到来するストリームデータを受け付け（１７０１）、差分情報（符号付ストリームタプル）を生成する（１７０２）。ランキング処理モジュール１１６では、ストリームタプル受付インタフェース２０６が、前記ウィンドウマネージャから出力された差分情報（符号付ストリームタプル）を受け取り、順序・順位生成部２０４に転送する。順序・順位生成部では、ランキング情報保持バッファ２０２を参照しながら、受け取った符号付ストリームタプルをランキングバッファの適切な位置に追加、もしくはランキングバッファ内の適切な位置のランキング情報を削除し、前回の出力時とのランキング情報の差分（ランキング差分情報）を生成し（１７０５）、該ランキング差分情報をランキング処理結果出力インタフェース２０７に転送する（１７０６）。ランキング処理結果出力インタフェースでは、前記クエリによって指定されているデータの出力形式に従ってランキング情報を出力する。 FIG. 17 shows a ranking information generation sequence in the stream data processing of the present invention. The window manager 126 receives stream data coming from an external data source (1701) and generates difference information (signed stream tuple) (1702). In the ranking processing module 116, the stream tuple reception interface 206 receives the difference information (signed stream tuple) output from the window manager and transfers it to the order / rank generation unit 204. The order / rank generation unit adds the received signed stream tuple to an appropriate position in the ranking buffer while referring to the ranking information holding buffer 202 or deletes the ranking information at an appropriate position in the ranking buffer. A difference (ranking difference information) between the ranking information and the output time is generated (1705), and the ranking difference information is transferred to the ranking processing result output interface 207 (1706). The ranking processing result output interface outputs ranking information in accordance with the data output format specified by the query.

以下、具体的なクエリ、ストリームデータに対する、本実施例のランキング計算方法を説明する。図３（ａ）に示すクエリ３０１は、順位出力（順位そのものを出力すること）を指定しないランキング処理を命じるクエリである。１行目のＳＥＬＥＣＴ句はｓ．ｉｄとｓ．ｖａｌの値の組を出力対象とすることを、２行目のＦＲＯＭ句はストリームｓを対象とすることを、同じく２行目のＰａｒｔｉｔｉｏｎＢｙ句はｓ．ｉｄの値でグルーピングしてそれぞれのｓ．ｖａｌの値の最新の１件を保持することを、また３行目のＬＩＭＩＴ句は、ｓ．ｖａｌの値の降順に３件を出力すること意味する。本実施例のストリームデータ処理システムでは、予め入力されるクエリにて、ランキング計算の対象となるカラムと、ランキング付けの方向（昇順／降順）と、ランキング計算結果を出力する個数と、計算結果に順位情報を付与するか否かが指定される。クエリ３０１では、３行目のＬＩＭＩＴ句が、ｓ．ｖａｌの値の降順に３件を出力するランキング指定である。さらに、１行目のＩＤＳＴＲＥＡＭ句は、本クエリは出力として前回の出力時との差分情報のみを出力することを意味している。 Hereinafter, the ranking calculation method of the present embodiment for specific queries and stream data will be described. A query 301 shown in FIG. 3A is a query that commands a ranking process that does not specify rank output (outputting the rank itself). The SELECT clause on the first line is s. id and s. The output value is a set of val values, the FROM phrase in the second line is the stream s, and the Partition By phrase in the second line is the s. By grouping with the value of id, each s. Keeping the latest one of the values of val, and the LIMIT phrase on the third line is s. This means that 3 items are output in descending order of the value of val. In the stream data processing system of the present embodiment, in a query inputted in advance, the columns to be subjected to ranking calculation, the ranking direction (ascending / descending order), the number of ranking calculation results to be output, and the calculation results It is specified whether or not rank information is added. In query 301, the LIMIT phrase on the third line is s. It is a ranking specification that outputs three items in descending order of the value of val. Further, the IDSTREAM phrase on the first line means that this query outputs only the difference information from the previous output as the output.

図４および図５（ａ）を用いて、クエリ３０１が設定されたストリームデータ処理システムに対して、ストリームデータが到来した際の処理内容を説明する。クエリ３０１がストリームデータ処理システム１１５に登録された場合、例えば特許文献４に示された方法でクエリ解析、クエリ最適化、クエリ生成が実行され、クエリ処理エンジンにクエリの実行形式が登録される。クエリ３０１はランキング計算指定を含むため、実行形式のクエリの実行時には、クエリ処理エンジン１１３内のランキング処理モジュール１１６でランキング計算が実行される。特許文献１で述べられているように、ストリームデータ処理システムでは、クエリは登録された後システム上で動作し続け、一つ一つのストリームデータがシステムに到来するたびに、その状態が変化する。 With reference to FIG. 4 and FIG. 5A, processing contents when stream data arrives for the stream data processing system in which the query 301 is set will be described. When the query 301 is registered in the stream data processing system 115, for example, query analysis, query optimization, and query generation are executed by the method disclosed in Patent Document 4, and the query execution format is registered in the query processing engine. Since the query 301 includes a ranking calculation specification, the ranking calculation module 116 in the query processing engine 113 executes the ranking calculation when executing the execution format query. As described in Patent Document 1, in a stream data processing system, a query continues to operate on the system after being registered, and the state of each query changes each time stream data arrives at the system.

クエリ３０１が登録されている状況で、図４に示すストリームデータが到来したことを仮定する。図４および図５（ａ）では、時刻の経過を縦軸にとり、時刻の経過と共にシステムに到来したストリームデータが各処理モジュールで処理される様子を模式的に表している。凡例４０２および５０２に示すように、本実施例では到来するストリームデータは｛ｉｄ，ｖａｌ｝の形式と仮定しており、これを楕円で表現している。ｔ１〜ｔ７（４０１）は、４０５〜４１１の各ストリームデータがストリームデータ処理システムに到来する時刻を表している。例えば、ストリームデータ｛ａ，５０｝（４０５）、及びストリームデータ｛ｂ，１０｝（４０６）はそれぞれ、時刻ｔ１、ｔ２にストリームデータ処理システムに到来することを示している。図４の横軸は到来したストリームデータが処理される位置、および生成されるデータを表している。さらに、図４の左側の角丸四角（１５０２）は、システムに到来した前記ストリームデータに対して、前記ウィンドウマネージャ（１２６）でウィンドウ演算“［ＰａｒｔｉｔｉｏｎＢｙｓ．ｉｄＲｏｗｓ１］”（４０３）を適用した結果を格納した、ストリーム保持バッファの各時刻での状態を示している。また、右側の角丸四角（２０２）は、ウィンドウマネージャから到来するストリームタプルに対して、前記ランキング処理モジュール（１１６）内の順序・順位生成部（２０４）における、ランキング計算指定句“ＬＩＭＩＴ３Ｂｙｓ．ｖａｌＤＥＳＣ”（４０４）の適用した結果を格納した、ランキング情報保持バッファの各時刻での状態を示している。 Assume that the stream data shown in FIG. 4 has arrived in a situation where the query 301 is registered. 4 and 5 (a) schematically show how each processing module processes stream data that has arrived at the system as time passes, with the passage of time taken along the vertical axis. As shown in the legends 402 and 502, in this embodiment, the incoming stream data is assumed to be in the form of {id, val}, which is represented by an ellipse. t1 to t7 (401) represent times when the stream data 405 to 411 arrive at the stream data processing system. For example, stream data {a, 50} (405) and stream data {b, 10} (406) indicate that they arrive at the stream data processing system at times t1 and t2, respectively. The horizontal axis of FIG. 4 represents the position where the incoming stream data is processed and the data to be generated. Further, the rounded square (1502) on the left side of FIG. 4 performs a window operation “[Partition By s.id Rows 1]” (403) by the window manager (126) on the stream data that has arrived at the system. The state at each time of the stream holding buffer storing the applied result is shown. The right rounded square (202) is a ranking calculation specification phrase “LIMIT 3 By in the order / rank generation unit (204) in the ranking processing module (116) for the stream tuple coming from the window manager. The status of the ranking information holding buffer storing the application result of “s.val DESC” (404) at each time is shown.

前述したように、ウィンドウマネージャ１２６は、到来するストリームデータに対して、クエリで指定されたウィンドウ演算を適用してストリームタプルを生成し、ストリームタプルのシステム内での生存期間を設定する。図４の例では、生存期間の開始時刻は黒丸（４２６）、終了時刻は白丸４２７で表現されており、該ウィンドウ演算により、ストリームタプル｛ａ，５０｝（４０５）の生存期間は時刻ｔ１に始まり、時刻ｔ７に終わることを示している。本実現例においては、ストリームデータの生存期間の開始時刻には、システム内部で前記のストリームデータに増加分を表す符号を付けたタプル（以下プラスタプル）が生成される。また、ウィンドウからストリームデータが削除された場合、先に出力されたプラスタプルへの参照を持つ、減少分を表す符号を付けたタプル（以下マイナスタプル）が生成される。以上の処理は、前記ウィンドウマネージャで実行される。例えば、図４の場合、時刻ｔ１にはストリームデータ｛ａ，５０｝（４０５）に対応するプラスタプル４３０が生成され、時刻ｔ７に該ストリームデータのマイナスタプル４３１が生成されることとなる。ウィンドウ演算に続く、後段のクエリ処理は、本プラスタプルおよびマイナスタプルが到着したタイミングで該プラスタプルとマイナスタプルに起因して生成される差分情報に関して実行される。なおプラスタプル、マイナスタプルそのものの概念は前述の非特許文献１に紹介されている。 As described above, the window manager 126 generates a stream tuple by applying the window operation specified by the query to the incoming stream data, and sets the lifetime of the stream tuple in the system. In the example of FIG. 4, the start time of the survival period is represented by a black circle (426) and the end time is represented by a white circle 427, and the survival time of the stream tuple {a, 50} (405) is obtained at time t1 by the window operation. It starts and ends at time t7. In this implementation example, at the start time of the lifetime of the stream data, a tuple (hereinafter referred to as a plus tuple) in which a sign indicating an increase is added to the stream data is generated inside the system. Further, when stream data is deleted from the window, a tuple (hereinafter referred to as a minus tuple) with a reference representing a decrease and having a reference to the previously output plus tuple is generated. The above processing is executed by the window manager. For example, in the case of FIG. 4, a plus tuple 430 corresponding to the stream data {a, 50} (405) is generated at time t1, and a minus tuple 431 of the stream data is generated at time t7. Subsequent query processing subsequent to the window operation is executed with respect to difference information generated due to the plus tuple and the minus tuple at the arrival timing of the main tuple and the minus tuple. The concept of plus tuple and minus tuple itself is introduced in Non-Patent Document 1 described above.

クエリ３０１では、到来するストリームデータはまずウィンドウ演算“［ＰａｒｔｉｔｉｏｎＢｙｓ．ｉｄＲｏｗｓ１］”（４０３）により、ｓ．ｉｄの値毎に最新１個のみが保持される。具体的には、図４で時刻ｔ１にストリームデータ｛ａ，５０｝（４０５）が、時刻ｔ２に｛ｂ，１０｝（４０６）が、時刻ｔ３に｛ｃ，３０｝（４０７）が、時刻ｔ４に｛ｄ，２０｝（４０８）が、そして時刻ｔ５に｛ｅ，４０｝（４０９）がそれぞれ到来する。これら５つのストリームデータはｓ．ｉｄの値が異なるので、それぞれウィンドウに保持される。次に、時刻ｔ６に新たなストリームデータ｛ｅ，１５｝（４１０）が到来すると、それまでウィンドウに保持されていたｓ．ｉｄの値がｅのストリームデータ４０９はウィンドウから押し出される（削除される）。次に時刻ｔ７に｛ａ，４５｝（４１１）が到来すると、同様に｛ａ，５０｝（４０５）がウィンドウから削除される。この様子を図示したのが図４の中央部分４２９である。例えば、ストリームデータ｛ａ，５０｝（４１２）は時刻ｔ１に到来し、時刻ｔ７に｛ａ，４５｝（４１８）が到来するまでウィンドウに保持される。この時、ストリームデータ｛ａ、５０｝の生存期間はｔ１からｔ７（但し時刻ｔ７は含まない）と表現する。図４の例では、生存期間の開始時刻は黒丸（４２６）、終了時刻は白丸（４２７）で、生存期間中は実線で表している。同様にして、｛ｅ，４０｝（４１７）の生存期間はｔ５からｔ６となる。一方、｛ｂ，１０｝（４１３）、｛ｃ，３０｝（４１４）、｛ｄ，２０｝（４１５）、｛ｅ，１５｝（４１７）、および｛ａ，４５｝（４１８）については、時刻ｔ７の時点では終了時刻が決定していないため、実線で表現されている。 In the query 301, the incoming stream data is first s.d. by the window operation “[Partition By s. Id Rows 1]” (403). Only the latest one is held for each id value. Specifically, in FIG. 4, stream data {a, 50} (405) at time t1, {b, 10} (406) at time t2, {c, 30} (407) at time t3, time {d, 20} (408) arrives at t4, and {e, 40} (409) arrives at time t5. These five stream data are s. Since the value of id is different, it is held in each window. Next, when new stream data {e, 15} (410) arrives at time t6, s. The stream data 409 whose id value is e is pushed out (deleted) from the window. Next, when {a, 45} (411) arrives at time t7, {a, 50} (405) is similarly deleted from the window. This is illustrated in the central portion 429 of FIG. For example, stream data {a, 50} (412) arrives at time t1, and is held in the window until {a, 45} (418) arrives at time t7. At this time, the lifetime of the stream data {a, 50} is expressed as t1 to t7 (however, the time t7 is not included). In the example of FIG. 4, the start time of the survival period is indicated by a black circle (426), the end time is indicated by a white circle (427), and the lifetime is indicated by a solid line. Similarly, the lifetime of {e, 40} (417) is from t5 to t6. On the other hand, for {b, 10} (413), {c, 30} (414), {d, 20} (415), {e, 15} (417), and {a, 45} (418), Since the end time is not determined at time t7, it is represented by a solid line.

以上のように生存期間を持つストリームデータから、どのようにランキング情報を生成するかを、同じく図４を用いて説明する。図４の右側４３２に、ランキング計算指定句“ＬＩＭＩＴ３Ｂｙｓ．ｖａｌＤＥＳＣ”（４０４）による処理結果を示す。前述したように、本ランキング指定句の意味は、ｓ．ｖａｌの値で降順に３件を抽出することである。今、ストリームデータ｛ａ，５０｝（４１９）、｛ｂ，１０｝（４２０）、｛ｃ，３０｝（４２１）がそれぞれ時刻ｔ１、ｔ２、ｔ３に到来する。これら３つのストリームデータは上位３件ランキング情報としてそれぞれ出力される。次に、時刻ｔ４に｛ｄ，２０｝（４２２）が到来すると、そのｓ．ｖａｌの値２０は、これまでに保持されている｛ｂ、１０｝の１０よりも大きいため、｛ｂ，１０｝が上位３件のランキング情報から削除されてその生存期間がｔ４で終了し（４２８）、｛ｄ，２０｝がランキング計算結果として出力される。次に時刻ｔ５に｛ｅ，４０｝（４２３）が到来すると、そのｓ．ｖａｌの値は４０となるため、上位３件に含まれるため、ランキング情報として出力される。そしてこれまで上位３件に保持されていた中で最も値の小さい｛ｄ，２０｝がランキングから削除される。ところが、時刻ｔ６には、前述したように｛ｅ、１５｝（４１７）の到来により｛ｅ，４０｝（４１６）がウィンドウから削除されるため、再び｛ｄ，２０｝（４２４）が上位３件に復活することになり、再びランキング計算結果として出力される。時刻ｔ７には｛ａ，５０｝（４１２）が｛ａ、４５｝（４１８）に置き換わり、｛ａ，４５｝のｓ．ｖａｌの値４５は上位３件に含まれるため、｛ａ，４５｝（４２５）がランキング結果として出力される。 As described above, how ranking information is generated from stream data having a lifetime will be described with reference to FIG. The right side 432 of FIG. 4 shows the processing result by the ranking calculation designation phrase “LIMIT 3 By s.val DESC” (404). As described above, the meaning of this ranking specification phrase is s. It is to extract 3 cases in descending order by the value of val. Now, stream data {a, 50} (419), {b, 10} (420), {c, 30} (421) arrive at times t1, t2, and t3, respectively. These three stream data are respectively output as the top three ranking information. Next, when {d, 20} (422) arrives at time t4, the s. Since the value 20 of val is larger than 10 of {b, 10} held so far, {b, 10} is deleted from the ranking information of the top three cases, and its lifetime ends at t4 ( 428), {d, 20} is output as a ranking calculation result. Next, when {e, 40} (423) arrives at time t5, the s. Since the value of val is 40, it is included in the top three cases, so it is output as ranking information. Then, {d, 20} having the smallest value among the top three held so far is deleted from the ranking. However, at time t6, as described above, {e, 40} (416) is deleted from the window due to the arrival of {e, 15} (417), so {d, 20} (424) is again the top 3 Will be revived and will be output again as a ranking calculation result. At time t7, {a, 50} (412) is replaced with {a, 45} (418), and s. Since the val value 45 is included in the top three cases, {a, 45} (425) is output as the ranking result.

以上のランキング計算の好適な実施方法を図２、図９、図１０、図１８、および図１９を用いて説明する。ランキング処理モジュール１１６のストリームタプル受付インタフェース２０６が差分情報であるストリームタプルを受け取る（９０２）。すると、順序・順位生成部２０４はランキング情報保持バッファ２０２のバッファメンテナンスを実行する（９０３）。バッファメンテナンス処理の処理方法について、図２および図１０を用いて説明する。順序・順位生成部２０４は、ストリームタプルを受け取ると、該ストリームタプルの符号がプラスであるかマイナスであるかをチェックする（１００２）。符号がプラスの場合（１００２でＹｅｓが選択された場合）、ランキング情報保持バッファ２０２のランキング情報保持テーブル２０３に受け取ったストリームタプルを追加し（１００３）、ランキング情報保持バッファメンテナンス処理を終了する。ランキング情報保持バッファへのストリームタプル追加処理の詳細を図１８のフローチャートを用いて説明する。順位・順序生成部では、追加対象のストリームタプル（符号がプラスのストリームタプル）を受け取ると、追加先のランキング情報保持テーブル２０３に順位管理インデックス２０８が付与されているかどうかをチェックする（１８０２）。順位管理インデックスは、順位付けの対象のカラムをキーとしたＢ＋ｔｒｅｅインデックスやハッシュインデックスで差し支えない。順位管理インデックスが存在する場合（ステップ１８０２でＹｅｓが選択された場合）には、順位・順序生成部は該インデックスを利用して追加対象のストリームタプルの、ランキング情報保持テーブルへの挿入位置を決定する（１８０３）。該インデックスが存在しない場合（ステップ１８０２でＮｏが選択された場合）には、順位・順序生成部はランキング情報保持テーブルを検索し、追加対象のストリームタプルの、ランキング情報保持テーブルへの挿入位置を決定する（１８０４）。挿入位置を決定した後、順位・順序生成部は追加対象のストリームタプルをランキング情報保持テーブルに挿入し（１８０５）、順位管理インデックスが存在する場合（ステップ１８０６でＹｅｓが選択された場合）には、該インデックスも更新してランキング情報保持バッファへのストリームタプル追加処理を終了する（１８０８）。 A preferred implementation method for the above ranking calculation will be described with reference to FIGS. 2, 9, 10, 18, and 19. The stream tuple reception interface 206 of the ranking processing module 116 receives a stream tuple as difference information (902). Then, the order / rank generation unit 204 executes buffer maintenance of the ranking information holding buffer 202 (903). A processing method of the buffer maintenance process will be described with reference to FIGS. When receiving the stream tuple, the order / rank generation unit 204 checks whether the sign of the stream tuple is positive or negative (1002). When the sign is positive (when Yes is selected in 1002), the received stream tuple is added to the ranking information holding table 203 of the ranking information holding buffer 202 (1003), and the ranking information holding buffer maintenance process is terminated. Details of the stream tuple addition processing to the ranking information holding buffer will be described with reference to the flowchart of FIG. When receiving the stream tuple to be added (stream tuple with a plus sign), the rank / order generation unit checks whether the rank management index 208 is added to the rank information holding table 203 of the addition destination (1802). The rank management index may be a B + tree index or a hash index using the column to be ranked as a key. When the rank management index exists (when Yes is selected in step 1802), the rank / order generation unit determines the insertion position of the stream tuple to be added to the ranking information holding table using the index. (1803). If the index does not exist (No is selected in step 1802), the rank / order generation unit searches the ranking information holding table, and determines the insertion position of the stream tuple to be added to the ranking information holding table. Determine (1804). After determining the insertion position, the rank / order generation unit inserts the stream tuple to be added into the ranking information holding table (1805), and if a rank management index exists (if Yes is selected in step 1806). The index is also updated, and the stream tuple addition processing to the ranking information holding buffer is completed (1808).

好適なランキング情報保持バッファの実現形態では、追加されたストリームタプルの、ランキング指定されたカラムの値での順序関係が保持される。ストリームタプルの追加時に順序関係を保持しておく理由は、遅延処理により一括して順序関係を作成する方法では、リアルタイム処理アプリケーションの要求である即時の結果出力が困難であるためである。例えば、図３のクエリ３０１の場合には、順序付けの対象となるカラム（前述のランキングキー）はｓ．ｖａｌであるので、図２のランキング情報保持バッファ中のランキング情報保持テーブル（２０３）ではｓ．ｖａｌの降順に順位情報とストリームタプルを保持している。 In a preferred embodiment of the ranking information holding buffer, the order relation of the added stream tuples in the column value designated for ranking is held. The reason for maintaining the order relationship when adding a stream tuple is that it is difficult to output the immediate result, which is a request of the real-time processing application, in the method of creating the order relationship collectively by delay processing. For example, in the case of the query 301 in FIG. 3, the column to be ordered (the above-described ranking key) is s. In the ranking information holding table (203) in the ranking information holding buffer of FIG. The rank information and stream tuple are held in descending order of val.

図１０に戻って、受け取ったストリームタプルの符号がマイナスの場合（図１０の１００２でＮｏが選択された場合）、ランキング情報保持バッファ（２０２）から該ストリームタプルに対応する、符号がプラスのタプルを削除し（１００４）、ランキング情報保持バッファメンテナンス処理を終了する。 Returning to FIG. 10, when the sign of the received stream tuple is negative (when No is selected in 1002 of FIG. 10), the tuple with positive sign corresponding to the stream tuple from the ranking information holding buffer (202). Is deleted (1004), and the ranking information holding buffer maintenance process is terminated.

ランキング情報保持バッファからのストリームタプル削除処理の詳細を図１９のフローチャートを用いて説明する。順位・順序生成部では、ランキング情報保持テーブル２０３に順位管理インデックス２０８が付与されているかどうかをチェックする（１９０２）。順位管理インデックスが存在する場合（ステップ１９０２でＹｅｓが選択された場合）には、順位・順序生成部は該インデックスを利用して、ランキング情報保持テーブル中の削除対象のストリームタプルを検索する（１９０３）。該インデックスが存在しない場合（ステップ１９０２でＮｏが選択された場合）には、順位・順序生成部はランキング情報保持テーブルを検索し、削除対象のストリームタプルを決定する（１９０４）。削除対象のストリームタプルが決定されると、順位・順序生成部では該ストリームタプルの削除処理を実行する（１９０５）。順位管理インデックスが存在する場合（ステップ１９０６でＹｅｓが選択された場合）には、順位管理インデックスも更新し、ランキング情報保持バッファからのストリームタプル削除処理を終了する。 Details of the stream tuple deletion processing from the ranking information holding buffer will be described with reference to the flowchart of FIG. The rank / order generation unit checks whether or not the rank management index 208 is assigned to the ranking information holding table 203 (1902). When the rank management index exists (when Yes is selected in step 1902), the rank / order generation unit uses the index to search for a stream tuple to be deleted in the ranking information holding table (1903). ). If the index does not exist (No is selected in step 1902), the rank / order generation unit searches the ranking information holding table and determines a stream tuple to be deleted (1904). When the stream tuple to be deleted is determined, the rank / order generation unit executes the deletion process of the stream tuple (1905). If the rank management index exists (if Yes is selected in step 1906), the rank management index is also updated, and the stream tuple deletion process from the ranking information holding buffer is terminated.

ここで図１４を用いて、図３（ａ）のクエリを登録したストリームデータ処理システムに対して、図４で示したタイムチャートでストリームが到来する場合の、ランキング情報保持テーブルに保持されるランキング情報の変化の様子を説明する。図１４では、左側のｔ０（１４１６）、ｔ１（１４０４）、…、ｔ７が時刻を、中央の符号付きの楕円（１４０５、１４０７、…）がウィンドウマネージャで生成された差分情報（符号付ストリームタプル）を、そして右側のテーブル（１４１７、１４０６、…、１４１５）がランキング情報保持テーブルに保持されるランキング情報を表す。便宜上時刻ｔ１以前のｔ０にはランキング情報は存在しなかったと仮定する。 Here, with reference to FIG. 14, the ranking held in the ranking information holding table when the stream arrives in the time chart shown in FIG. 4 for the stream data processing system in which the query of FIG. 3A is registered. Explain how information changes. In FIG. 14, t0 (1416), t1 (1404),..., T7 on the left is the time, and the center signed ellipse (1405, 1407,...) Is the difference information (signed stream tuple) generated by the window manager. ) And the right table (1417, 1406,..., 1415) represent the ranking information held in the ranking information holding table. For convenience, it is assumed that no ranking information exists at t0 before time t1.

時刻ｔ１（１４０４）に、プラスの符号を持つストリームタプル｛ａ，５０｝（１４０５）が到来すると、ランキング情報保持テーブルに該ストリームタプルが登録される（１４０６）。次に、時刻ｔ２（１４１８）にプラスの符号を持つストリームタプル｛ｂ，１０｝（１４０７）が到来すると、その順位付け対象ｓ．ｖａｌの値である１０を、ランキング情報保持テーブルに保持されているストリームタプルのｓ．ｖａｌの値５０と比較し、挿入位置が｛ａ，５０｝（１４０５）の次と決定され、該挿入位置にストリームタプル｛ｂ，１０｝が挿入される（１４０８）。以下同様に、ｔ３、ｔ４、ｔ５にそれぞれ、｛ｃ，３０｝、｛ｄ，２０｝、｛ｅ，４０｝が到来すると、これらのストリームタプルが順位付け対象のｓ．ｖａｌの値の順にソートされて登録される。 When a stream tuple {a, 50} (1405) having a plus sign arrives at time t1 (1404), the stream tuple is registered in the ranking information holding table (1406). Next, when a stream tuple {b, 10} (1407) having a plus sign arrives at time t2 (1418), its ranking object s. The value 10 of val is set to s. of the stream tuple held in the ranking information holding table. Compared with the value 50 of val, the insertion position is determined to be next to {a, 50} (1405), and the stream tuple {b, 10} is inserted at the insertion position (1408). Similarly, when {c, 30}, {d, 20}, {e, 40} arrive at t3, t4, and t5, respectively, these stream tuples are assigned to the s. Sorted and registered in the order of the value of val.

図４に示すように、時刻ｔ６において｛ｅ，１５｝（４０８）が到来すると、図３（ａ）に示した１行指定のＲｏｗウィンドウクエリでは｛ｅ、４０｝がウィンドウから削除される（４３３）。そのため、図１４に示すランキング処理モジュール（１１６）のストリームタプル受付インタフェース２０６には、マイナス符号の付いたストリームタプル｛ｅ，４０｝（１４１３）と、プラス符号の付いたストリームタプル｛ｅ，１５｝（１４１４）が到来する。順序・順位生成部２０４では、マイナス符号の付いたストリームタプル｛ｅ，４０｝に対応するプラス符号の付いたストリームタプルを検索、決定し（１４１２の上から２番目）、該ストリームタプルを削除する。そして、プラス符号の付いた新たなストリームタプル｛ｅ，１５｝（１４１４）の挿入位置を決定し（１４１５の上から４番目）、ランキング情報保持テーブルに該ストリームタプルを追加する。時刻ｔ７についても同様である。 As shown in FIG. 4, when {e, 15} (408) arrives at time t6, {e, 40} is deleted from the window in the one-row designated Row window query shown in FIG. 433). Therefore, the stream tuple reception interface 206 of the ranking processing module (116) shown in FIG. 14 includes a stream tuple {e, 40} (1413) with a minus sign and a stream tuple {e, 15} with a plus sign. (1414) arrives. The order / rank generation unit 204 searches and determines a stream tuple with a plus sign corresponding to the stream tuple with a minus sign {e, 40} (second from the top of 1412), and deletes the stream tuple. . Then, the insertion position of a new stream tuple {e, 15} (1414) with a plus sign is determined (fourth from the top of 1415), and the stream tuple is added to the ranking information holding table. The same applies to time t7.

本実施例では、ランキング情報保持バッファ内でのデータ構造がテーブルの場合を示したが、ランキング情報保持バッファ内でのデータ構造の他の好適な実現例としては、例えば図１３に示すようなランキングキーをノードとした二分探索木（ｂｉｎａｒｙｓｅａｒｃｈｔｒｅｅ）を挙げることができる。図１３では、二分探索木のノード（ランキングキー）を四角１３０１で、ノードからポイントされるデータ本体（ストリームタプル）を円と楕円の組１３０２で示した。二分探索木を用いる場合には、ランキングキーをノードとして、あるノードの左側の子およびその全ての子孫ノードのランキングキーの値は、該ノードのランキングキーの値より小さく、右の子およびその全ての子孫ノードの値は該ノードのランキングキーの値と等しいもしくは大きくなるように構成する。図１３の角丸四角１３０３内の二分探索木は、図４の時刻ｔ５の時点でのランキング情報保持バッファが保持するデータの二分探索木での構成の例であり、保持しているデータの内容は図１４のランキング情報保持テーブル１４１２と等しい。二分探索木を用いてランキングキーに基づいた順序を管理することで、前記プラスタプル、および前記マイナスタプル到来時の順序関係の管理を効率化することができる。さらに、ストリームタプルの順序関係を保持して管理するための他の好適なデータ構造としては、多くのＤＢＭＳのデータ管理機構で利用されるＢ＋−Ｔｒｅｅを利用するのでも差し支えない。 In this embodiment, the case where the data structure in the ranking information holding buffer is a table is shown. However, as another preferable implementation example of the data structure in the ranking information holding buffer, for example, a ranking as shown in FIG. A binary search tree with a key as a node can be cited. In FIG. 13, a node (ranking key) of the binary search tree is indicated by a square 1301, and a data body (stream tuple) pointed to by the node is indicated by a circle and ellipse set 1302. When using a binary search tree, with the ranking key as a node, the value of the ranking key of the left child of all nodes and all of its descendant nodes is smaller than the value of the ranking key of the node, and the right child and all of them. The value of the descendant node is configured to be equal to or greater than the ranking key value of the node. The binary search tree in the rounded circle 1303 in FIG. 13 is an example of the configuration of the binary search tree of the data held in the ranking information holding buffer at the time t5 in FIG. 4, and the contents of the held data Is equivalent to the ranking information holding table 1412 of FIG. By managing the order based on the ranking key using the binary search tree, the management of the order relation when the plus tuple and the minus tuple arrive can be made efficient. Further, as another suitable data structure for maintaining and managing the order relation of stream tuples, B + -Tree used in many DBMS data management mechanisms may be used.

ランキング情報保持バッファでは、ユーザに指定されたランキングの出力件数のみでなく、生存期間中のストリームデータに対応する全てのストリームタプルを保持する必要がある。例えば、図３のクエリ３０１では、ユーザからは上位３件のみを出力するように指示されているが、図４に示すストリームデータの系列がシステムに到来する場合、時刻ｔ４で｛ｄ，２０｝（４０８）が到来した際に、前記ランキング情報保持バッファから、順位が４位となったストリームデータ｛ｂ，１０｝（４０６）に対応するストリームタプル４３３を削除してはならない。その理由は、ストリームデータ処理においては、新しいストリームデータがシステムに到来するときに加えて、ストリームデータの生存期間が終了した際にもランキングが変化するため、現在はユーザが指定した範囲外にあるストリームデータが、他のストリームデータの生存期間の終了によって、再びランキング結果として出力する必要が出てくるためである。例えば、図４では時刻ｔ４で到来し、上位３件に含まれたストリームデータ｛ｄ，２０｝（４０８）が時刻ｔ５に到来した｛ｅ，４０｝（４０９）によってランキング外に押し出されてしまっているが、時刻ｔ６に到来した｛ｅ，１５｝（４１０）によって、｛ｅ，４０｝はウィンドウから消去される（４３３）ため、｛ｄ，２０｝は再びランキング計算結果に含まれる（４２４）必要がある。すなわち、ランキング情報保持バッファでは、ユーザに指定されたランキングの出力件数のみでなく、ウィンドウで管理されているストリームデータに対応する全て、すなわち生存期間中のストリームデータに対応する全てのストリームタプルを保持する必要がある。 In the ranking information holding buffer, it is necessary to hold not only the number of ranking outputs specified by the user but also all stream tuples corresponding to the stream data during the lifetime. For example, in the query 301 of FIG. 3, the user is instructed to output only the top three items, but when the stream data series shown in FIG. 4 arrives in the system, at time t4, {d, 20} When (408) arrives, the stream tuple 433 corresponding to the stream data {b, 10} (406) having the fourth rank should not be deleted from the ranking information holding buffer. The reason is that in stream data processing, the ranking changes when the new stream data arrives in the system and also when the lifetime of the stream data ends, so it is currently outside the range specified by the user. This is because the stream data needs to be output again as a ranking result when the lifetime of the other stream data ends. For example, in FIG. 4, the stream data {d, 20} (408) included in the top three cases arrives at time t4 and is pushed out of the ranking by {e, 40} (409) arrived at time t5. However, {e, 40} is deleted from the window by {e, 15} (410) that arrived at time t6 (433), so {d, 20} is included in the ranking calculation result again (424) )There is a need. That is, in the ranking information holding buffer, not only the number of ranking output specified by the user, but also all the stream tuples corresponding to the stream data managed in the window, that is, all the stream tuples corresponding to the live stream data are held. There is a need to.

但し、ストリームタプルが到着した瞬間に上位５０位以内に含まれていない場合には、該ストリームタプルはランキングの対象としないなどのアプリケーションの特別な条件が存在する場合には、該アプリケーションの条件に従ってランキング情報保持バッファで保持するストリームタプルの数を変更することは可能である。 However, if the stream tuple is not included in the top 50 at the moment of arrival, the stream tuple is not subject to ranking. It is possible to change the number of stream tuples held in the ranking information holding buffer.

図９に戻って、順序・順位生成部では、受け取ったストリームタプルのバッファメンテナンス処理（９０３）の結果に基づき、該処理結果がランキングに影響するか否かをチェックする（９０４）。処理結果がランキングに影響を及ぼす場合とは、受け取ったストリームタプルに基づいたランキング情報保持バッファのメンテナンスの結果、クエリで指定されている範囲の順序に変更がある場合を指す。例えば、図３のクエリ３０１では、上位３件を出力の範囲に指定しているため、上位３位以内の順序に変更がある場合、処理結果がランキングに影響するか否かの判定（ステップ９０４）はＹｅｓとなる。例えば、図４で示した処理の場合、時刻ｔ４で｛ｄ，２０｝が到来した場合には上位３位以内の順序に変更があるので、Ｙｅｓの例となる。処理結果がランキングに影響する場合（ステップ９０４でＹｅｓが選択された場合）、順序出力指定があるかないかをチェックする（９０５）。順序出力指定がある場合（ステップ９０５でＹｅｓが選択された場合）、処理結果タプルへの順位情報カラムを追加して（９０６）、処理結果タプルを出力し（９０７）、ランキング処理を終了する（９０８）。順位出力の指定がない場合（ステップ９０５でＮｏが選択された場合）、処理結果タプルを出力し（９０７）、ランキング処理を終了する（９０８）。 Returning to FIG. 9, the order / rank generation unit checks whether the processing result affects the ranking based on the result of the buffer maintenance processing (903) of the received stream tuple (904). The case where the processing result affects the ranking refers to the case where the order of the range specified by the query is changed as a result of the maintenance of the ranking information holding buffer based on the received stream tuple. For example, in the query 301 of FIG. 3, since the top three cases are specified as the output range, if the order within the top three is changed, it is determined whether the processing result affects the ranking (step 904). ) Becomes Yes. For example, in the case of the process shown in FIG. 4, when {d, 20} arrives at time t4, the order within the top three is changed, so this is an example of Yes. If the processing result affects the ranking (if Yes is selected in step 904), it is checked whether there is an order output designation (905). If there is an order output designation (if Yes is selected in step 905), a rank information column is added to the processing result tuple (906), the processing result tuple is output (907), and the ranking process is terminated ( 908). If there is no designation of rank output (when No is selected in step 905), a processing result tuple is output (907), and the ranking process is terminated (908).

図３のクエリ３０１の場合には、順位出力の指定はないので、ランキング処理結果出力インタフェースから出力する処理結果タプルは、例えば図５の５２３に示すものであり、順位情報は付加されていない。順位出力の指定がある場合（ステップ９０５でＹｅｓが選択された場合）の例については後述する。 In the case of the query 301 of FIG. 3, since the rank output is not specified, the processing result tuple output from the ranking processing result output interface is, for example, shown at 523 in FIG. 5, and no rank information is added. An example of the case where the rank output is designated (when Yes is selected in step 905) will be described later.

前述したように、リアルタイムアプリケーションのランキング計算では、膨大な情報から瞬時に有用な情報を取り出す必要があるため、処理の効率化が必要となる。そこで、本実施例のストリームデータ処理システムでは、ランキングに変動があった差分だけを出力するインタフェースと、その時点のランキング内の全データを出力するインタフェースを備える。図５（ａ）中央の２０２は、図４で説明したクエリ３０１（図３）のウィンドウ演算“［ＰａｒｔｉｔｉｏｎＢｙｓ．ｉｄＲｏｗｓ１］”、およびランキング計算指定句“ＬＩＭＩＴ３Ｂｙｓ．ｖａｌＤＥＳＣ”の処理結果を格納したランキング情報保持バッファの各時刻での状態を示している。クエリ３０１では、最終的な出力形式として、ＩＤＳＴＲＥＡＭ句が指定されている。ＩＤＳＴＲＥＡＭ句は、ランキング計算の結果、ランキングに追加されたタプルを増加分タプルとして、ランキングから削除されたタプルを減少分タプルとして出力するインタフェースである。図５（ａ）では、ＩＤＳＴＲＥＡＭ句（５０４）によって処理された後のストリームデータを図の右側の角丸四角５２３内に示した。ただし、該ストリームデータの黒丸は増加分、白丸は減少分を表す。以下、ＩＤＳＴＲＥＡＭ句の処理の内容について説明する。時刻ｔ１に｛ａ，５０｝（５０５）がランキングに追加されるので、ＩＤＳＴＲＥＡＭ句では処理結果の増加分ストリームデータとして｛ａ，５０｝（５１２）を出力する。次に時刻ｔ２、ｔ３にそれぞれ｛ｂ，１０｝（５０６）および｛ｃ，３０｝（５０７）がランキングに追加されるので、｛ｂ，１０｝（５１４）および｛ｃ，３０｝（５１５）が増加分として出力される。時刻ｔ４には、｛ｄ，２０｝（５０８）がランキングに挿入されると共に、｛ｂ，１０｝（５０６）がランキングから削除される。この場合、ＩＤＳＴＲＥＡＭ句では、時刻ｔ４に｛ｄ，２０｝（５１６）を増加分として、｛ｂ，１０｝（５１７）を減少分として出力する。増加分と減少分の情報のみを計算し、該情報を利用するクライアント計算機に送信することにより、ストリームデータ処理システムの処理コスト、および通信コストを削減することができる。例えば図５（ａ）の場合、ｔ１からｔ７までの間に、クライアント計算機に対して１１個の処理結果が送信されるが、各タイミングで上位ｎ件の全結果を全て送信する場合にはｎ×７個の処理結果を送信する必要があり、特にｎが大きい場合差分情報のみを送信する本発明の効果は高い。 As described above, in ranking calculation of a real-time application, it is necessary to instantly extract useful information from an enormous amount of information. Therefore, it is necessary to improve processing efficiency. Therefore, the stream data processing system according to the present embodiment includes an interface that outputs only the difference in ranking, and an interface that outputs all data in the ranking at that time. The center 202 in FIG. 5A is the window operation “[Partition By s. Id Rows 1]” of the query 301 (FIG. 3) described in FIG. 4 and the ranking calculation designation phrase “LIMIT 3 By s.val DESC”. This shows the status of the ranking information holding buffer storing the processing results at each time. In the query 301, the IDSTREAM phrase is specified as the final output format. The IDSTREAM phrase is an interface that outputs a tuple added to the ranking as an increase tuple and a tuple deleted from the ranking as a decrease tuple as a result of ranking calculation. In FIG. 5A, the stream data after being processed by the IDSTREAM phrase (504) is shown in a rounded square 523 on the right side of the figure. However, the black circle of the stream data represents an increase, and the white circle represents a decrease. The contents of the IDSTREAM phrase processing will be described below. Since {a, 50} (505) is added to the ranking at time t1, {a, 50} (512) is output as the stream data corresponding to the increased processing result in the IDSTREAM phrase. Next, {b, 10} (506) and {c, 30} (507) are added to the rankings at times t2 and t3, respectively, so {b, 10} (514) and {c, 30} (515) Is output as an increment. At time t4, {d, 20} (508) is inserted into the ranking and {b, 10} (506) is deleted from the ranking. In this case, in the IDSTREAM phrase, {d, 20} (516) is output as an increase and {b, 10} (517) is output as a decrease at time t4. By calculating only the information on the increase and the decrease and transmitting it to the client computer that uses the information, the processing cost and the communication cost of the stream data processing system can be reduced. For example, in the case of FIG. 5A, eleven processing results are transmitted to the client computer between t1 and t7. However, when all the top n results are transmitted at each timing, n It is necessary to transmit 7 processing results, and the effect of the present invention in which only difference information is transmitted is particularly high when n is large.

但し、クライアント計算機側でランキング情報を随時ユーザに提供し続ける必要がある場合には、クライアント計算機側で差分情報から全体のランキング情報を作成し続ける必要がある。本処理では、クライアント計算機側で状態（ステート）を管理する必要があるため、クライアント計算機の状態、リアルタイムアプリケーションの形態によっては実現が難しい場合もある。このような状況でもランキング情報を利用できるようにするために、本出願のストリームデータ処理システムでは、生成した差分ランキング情報から全体のランキング情報を生成し、出力するインタフェースを備える。図３（ｂ）のクエリ３０２は、１行目のＲＳＴＲＥＡＭ句以外はクエリ３０１と同一であり、１行目のＳＥＬＥＣＴ句はｓ．ｉｄとｓ．ｖａｌの値の組を出力対象とすることを、２行目のＦＲＯＭ句はストリームｓを対象とすることを、同じく２行目のＰａｒｔｉｔｉｏｎＢｙ句はｓ．ｉｄの値でグルーピングしてそれぞれのｓ．ｖａｌの値の最新の１件を保持することを、３行目のＬＩＭＩＴ句はｓ．ｖａｌの値の降順に３件を出力することを指定する。クエリ３０１の１行目のＩＤＳＴＲＥＡＭ句が、前回の出力時との差分情報のみを出力するのに対して、クエリ３０２の１行目のＲＳＴＲＥＡＭ句は、出力のタイミング毎に、出力指定範囲内のストリームタプル全てのランキング情報を出力することを意味している。図５（ｂ）に、該インタフェースでのランキング計算出力結果を示す。図５（ｂ）のランキング情報保持バッファ（２０２）の各時刻の状態は図５（ａ）の場合と同じである。ＬＩＭＩＴ句の処理後に、図３（ｂ）に示すクエリ３０２のＲＳＴＲＥＡＭ句を適用した結果が図５（ｂ）の右側の角丸四角５２６内に示すストリームデータとなる。以下図５（ｂ）を用いて、ランキング計算結果の出力形式について説明する。時刻ｔ１に｛ａ，５０｝（５０５）がランキングに追加されるので、ＲＳＴＲＥＡＭ句では｛ａ，５０｝（５２７）を出力する。次に時刻ｔ２に｛ｂ，１０｝（５０６）がランキングに追加されると、ＲＳＴＲＥＡＭ句はランキング全体、すなわち｛ａ，５０｝と｛ｂ，１０｝の２つのストリームデータを出力する（５２８）。次に時刻ｔ３で｛ｃ，３０｝（５０７）がランキングに追加されると、｛ａ，５０｝、｛ｂ，１０｝に加えて｛ｃ，３０｝が出力される（５２９）。次に、時刻ｔ４で｛ｄ，２０｝（５０８）がランキングに挿入されると共に、｛ｂ，１０｝（５０６）がランキングから削除される。この場合、ＲＳＴＲＥＡＭ句では、時刻ｔ４に｛ａ，５０｝、｛ｃ，３０｝、｛ｄ，２０｝を出力する（５３０）。本処理を実現するためには、処理結果出力時にランキング処理結果出力インタフェース（２０７）で前回の出力時の出力内容を保持し、該出力内容と、順序・順位生成部（２０４）で新たに生成されたランキング情報とを組合せて、出力情報を生成すればよい。今回の説明では、ＲＳＴＲＥＡＭ句の出力タイミングは、入力となるストリームデータが到来した瞬間としたが、出力生成の負荷、通信コスト削減のために、例えば１秒毎などの一定間隔、入力タプルｎ個毎、出力タプルｍ個毎などでも差し支えない。 However, if it is necessary for the client computer side to continue to provide the ranking information to the user at any time, it is necessary to continue to create the overall ranking information from the difference information on the client computer side. In this processing, since it is necessary to manage the state on the client computer side, it may be difficult to realize depending on the state of the client computer and the form of the real-time application. In order to make it possible to use ranking information even in such a situation, the stream data processing system of the present application includes an interface that generates and outputs overall ranking information from the generated difference ranking information. The query 302 in FIG. 3B is the same as the query 301 except for the RSTREAM phrase in the first line, and the SELECT phrase in the first line is s. id and s. The output value is a set of val values, the FROM phrase in the second line is the stream s, and the Partition By phrase in the second line is the s. By grouping with the value of id, each s. The LIMIT phrase in the third line is s. Specifies that 3 items are output in descending order of the value of val. The IDSTREAM phrase on the first line of the query 301 outputs only the difference information from the previous output, whereas the RSTREAM phrase on the first line of the query 302 has an output specification range within each output timing. This means that ranking information for all stream tuples is output. FIG. 5B shows a ranking calculation output result at the interface. Each time state of the ranking information holding buffer (202) in FIG. 5 (b) is the same as in FIG. 5 (a). After the processing of the LIMIT phrase, the result of applying the RSTREAM phrase of the query 302 shown in FIG. 3B is the stream data shown in the rounded square 526 on the right side of FIG. 5B. Hereinafter, the output format of the ranking calculation result will be described with reference to FIG. Since {a, 50} (505) is added to the ranking at time t1, {a, 50} (527) is output in the RSTREAM phrase. Next, when {b, 10} (506) is added to the ranking at time t2, the RSTREAM phrase outputs the entire ranking, that is, two stream data of {a, 50} and {b, 10} (528). . Next, when {c, 30} (507) is added to the ranking at time t3, {c, 30} is output in addition to {a, 50} and {b, 10} (529). Next, {d, 20} (508) is inserted into the ranking at time t4, and {b, 10} (506) is deleted from the ranking. In this case, in the RSTREAM phrase, {a, 50}, {c, 30}, {d, 20} are output at time t4 (530). In order to realize this processing, when the processing result is output, the ranking processing result output interface (207) holds the output content at the previous output, and the output content and the order / rank generation unit (204) newly generate it. The output information may be generated by combining the ranking information. In this explanation, the output timing of the RSTREAM phrase is the moment when the input stream data arrives. However, in order to reduce the load of output generation and communication cost, for example, at regular intervals such as every 1 second, n input tuples For example, every m output tuples can be used.

次に、クエリで順位出力指定がある場合（図９のステップ９０５でＹｅｓが選択される場合）の例について、図６のクエリ６０１および６０２を用いて説明する。クエリ６０１では、クエリ３０１で出力したｓ．ｉｄとｓ．ｖａｌに加えて、１行目のＲＡＮＫＩＮＧＡＳｒａｎｋ指定により、その時点での順位情報も加えて出力することが指定されている。また、クエリ６０１が前回出力した後の差分情報のみを出力するのに対して、クエリ６０２では、出力タイミング毎にランキング情報を全て出力する。 Next, an example in the case where the rank output is specified in the query (when Yes is selected in step 905 in FIG. 9) will be described using queries 601 and 602 in FIG. In the query 601, the s. id and s. In addition to val, the rank information at that time is specified to be output by the RANKING AS rank specification on the first line. Further, only the difference information after the query 601 is output last time is output, whereas the query 602 outputs all the ranking information for each output timing.

最初に、順位出力指定を含む場合のランキング計算方法について、図７を用いて説明する。図７の左側の角丸四角（１５０２）は、ウィンドウ演算“［ＰａｒｔｉｔｉｏｎＢｙｓ．ｉｄＲｏｗｓ１］”の処理結果を格納したストリームタプル保持バッファの各時刻での状態を示しており、図４と同じである。各時刻の状態に対して、順位出力指定を含むランキング計算の方法を説明する。図７の凡例７０２に示すように、楕円で囲まれた２つの値の組は｛ｉｄ，ｖａｌ｝の形式のストリームデータを表す。また、凡例７０３に示すように、角丸四角形で囲まれた３つの値の組は、｛ｒａｎｋ，ｉｄ，ｖａｌ｝の形式のストリームデータを表す。ここで、ｒａｎｋは出力時のｖａｌの値に基づいた順位である。 First, a ranking calculation method in the case of including rank output designation will be described with reference to FIG. The rounded square (1502) on the left side of FIG. 7 shows the state at each time of the stream tuple holding buffer storing the processing result of the window operation “[Partition By s. Id Rows 1]”. The same. A ranking calculation method including rank output designation for each time state will be described. As shown in the legend 702 in FIG. 7, a set of two values surrounded by an ellipse represents stream data in the format of {id, val}. Also, as shown in the legend 703, a set of three values surrounded by a rounded rectangle represents stream data in the format {rank, id, val}. Here, rank is a rank based on the value of val at the time of output.

時刻ｔ１にストリームデータで｛ａ，５０｝（７０５）が到来すると、本ストリームタプルはランキングに含まれ、かつその順位は１位であるので、ランキング計算結果は｛１，ａ，５０｝（７１２）となる。次に時刻ｔ２にストリームデータ｛ｂ、１０｝（７０６）が到来すると、本ストリームデータもランキングに含まれるので、｛２，ｂ，１０｝（７１３）が出力される。時刻ｔ３に｛ｃ，３０｝（７０７）が到来すると、本ストリームデータもランキングに含まれ、かつそのｖａｌの値３０が｛ｂ，１０｝よりも大きいため順位は２位となり、同時に｛ｂ，１０｝の順位は３位となる。そのため、ランキング計算結果からは、｛２，ｂ，１０｝が削除され、｛３，ｂ，１０｝（７１４）および｛２，ｃ，３０｝（７１５）が追加される。続いて、時刻ｔ４にストリームデータ｛ｄ，２０｝（７０８）が到来すると、そのｖａｌの値２０が｛ｂ，１０｝のｖａｌの値１０よりも大きいので、｛３，ｂ，１０｝がランキング計算結果から削除され、｛３，ｄ，２０｝（７１６）が計算結果に追加される。次に、時刻ｔ５でストリームデータ｛ｅ，４０｝（７０９）が到来すると、そのｖａｌの値４０は現在ランキング計算結果に含まれている｛ｃ，３０｝および｛ｄ，２０｝よりも大きく、順位は２位となるため｛２，ｅ，４０｝（７１８）がランキング計算結果に含まれる。同時に、｛３，ｄ，２０｝がランキング計算結果から削除され、かつ｛ｃ，３０｝の順位が２位から３位に変化するため、｛２，ｃ，３０｝が削除され、｛３，ｃ，３０｝（７１７）が追加される。時刻ｔ６、ｔ７にそれぞれ｛ｅ，１５｝（７１０）および｛ａ，４５｝（７１１）が到来する場合も同様である。これらのランキング計算処理は、図２の順序・順位生成部（２０４）で実行され、各時刻でのランキング計算処理結果はランキング情報保持バッファ（２０２）に格納される。計算の好適な実施方法は、図９のフローチャートでは、ランキング情報保持バッファメンテナンス（９０３）までは共通である。次に、順序出力指定があるかないかをチェックし（９０５）、順序出力指定がある場合（９０５でＹｅｓが選択された場合）には処理結果タプルの順位情報カラム追加処理を実施する（９０６）。順位情報追加処理は、ランキング情報処理バッファ（２０２）を利用する。前述したように、好適な実施例では、ランキング情報保持バッファでは、追加されたストリームタプルを、順序付けが指定されたカラムの値の順序関係を保持しながら管理する。例えば、クエリ６０１の場合には、クエリ３０１の場合と同様に、順序付けの対象となるカラムはｓ．ｖａｌであるので、ランキング情報保持バッファ内でのデータ構造としては、図２の２０３に示すようなテーブル形式や、図１３のｓ．ｖａｌの値をキーにした二分探索木が実施の方法として挙げられる。順位情報の追加処理では、順位情報追加対象のストリームカラムがｓ．ｖａｌをキーにした場合何番目にあるかを計算し、該順位をクエリで指定されたカラムの位置に追加する。例えば、クエリ６０１の場合には、ＳＥＬＥＣＴ句の最初のカラムに“ＲＡＮＫＩＮＧＡＳｒａｎｋ”指定があるため、一番目のカラムに順位情報を含めて出力する。 When {a, 50} (705) arrives at the time t1 in the stream data, this stream tuple is included in the ranking and its rank is first, so the ranking calculation result is {1, a, 50} (712 ) Next, when stream data {b, 10} (706) arrives at time t2, since this stream data is also included in the ranking, {2, b, 10} (713) is output. When {c, 30} (707) arrives at time t3, this stream data is also included in the ranking, and the value 30 of val is larger than {b, 10}. 10} is ranked 3rd. Therefore, {2, b, 10} is deleted from the ranking calculation result, and {3, b, 10} (714) and {2, c, 30} (715) are added. Subsequently, when the stream data {d, 20} (708) arrives at time t4, the val value 20 is larger than the val value 10 of {b, 10}, so {3, b, 10} is ranked. It is deleted from the calculation result, and {3, d, 20} (716) is added to the calculation result. Next, when stream data {e, 40} (709) arrives at time t5, its val value 40 is larger than {c, 30} and {d, 20} included in the current ranking calculation result, Since the ranking is second, {2, e, 40} (718) is included in the ranking calculation result. At the same time, {3, d, 20} is deleted from the ranking calculation result, and the order of {c, 30} changes from 2nd to 3rd, so {2, c, 30} is deleted, {3, c, 30} (717) is added. The same applies when {e, 15} (710) and {a, 45} (711) arrive at times t6 and t7, respectively. These ranking calculation processes are executed by the order / rank generation unit (204) of FIG. 2, and the ranking calculation process results at each time are stored in the ranking information holding buffer (202). The preferred implementation method of the calculation is common to the ranking information holding buffer maintenance (903) in the flowchart of FIG. Next, it is checked whether or not there is an order output designation (905). If there is an order output designation (when Yes is selected in 905), a rank information column addition process for the processing result tuple is performed (906). . The ranking information addition process uses the ranking information processing buffer (202). As described above, in the preferred embodiment, the ranking information holding buffer manages the added stream tuple while holding the order relation of the column values for which the ordering is specified. For example, in the case of the query 601, as in the case of the query 301, the column to be ordered is s. Since it is val, the data structure in the ranking information holding buffer includes a table format as indicated by 203 in FIG. A binary search tree using the value of val as a key can be cited as an implementation method. In the rank information addition process, the stream column to which rank information is added is s. When val is used as a key, the position number is calculated, and the rank is added to the column position specified by the query. For example, in the case of the query 601, since “RANKING AS rank” is specified in the first column of the SELECT clause, the rank information is output in the first column.

クエリ６０１では、最終的な出力形式として、ＩＤＳＴＲＥＡＭ句が指定されている。クエリ３０１を用いて説明したように、ＩＤＳＴＲＥＡＭ句は、ランキング計算の結果、ランキングに追加されたストリームデータを増加分として、ランキングから削除されたストリームデータを減少分として出力するインタフェースである。図８（ａ）では、ＩＤＳＴＲＥＡＭ句（８０４）によって処理された後のストリームデータを図の右側（８１９）に示した。時刻ｔ１に｛ａ，５０｝（８０５）が到来した場合、出力カラムの先頭に前記順序・順位生成部で計算した順位情報をクエリで指定された１番目のカラムに追加して出力する。｛ａ，５０｝が到来したときには、ストリームデータは１つしかなく、その順位は１位となるので、｛１，ａ，５０｝（８１１）を出力する。次に時刻ｔ２に｛ｂ，１０｝（８０６）が到来すると、その順位は２位となるので、｛２，ｂ，１０｝（８１２）を出力する。時刻ｔ３に｛ｃ，３０｝（８０７）が到来すると、そのｖａｌの値３０は｛ａ，５０｝のｖａｌの値５０よりは小さく、｛ｂ，１０｝のｖａｌの値１０よりは大きいため、その順位は２位となる。そのため、時刻ｔ３では、｛３，ｂ，１０｝（８１３）および｛２，ｃ，３０｝（８１４）が増加分として出力されると共に、｛２，ｂ，１０｝（８１５）が減少分として出力される。図５で示したクエリ３０１の例では、｛ｃ，３０｝によりランキング出力に含まれる集合自体は変化しなかったため、増加分として｛ｃ，３０｝のみを出力した（５１５）が、図８のクエリ６０１の例では｛ｃ，３０｝の到来によって順位が変化するため、増加分、減少分を合わせて３つのストリームデータが出力される。同様にして、時刻ｔ５およびｔ６では順位の変化に対応して、それぞれ４個（８２０）、（８２１）となり、クエリ３０１の場合よりも出力するストリームタプル数が増加している。クエリ６０１では、最終的な出力形式として、前記ＩＤＳＴＲＥＡＭ句が指定されているため、増減分のストリームデータを出力しているが、順位出力指定がない場合と同様に、順位出力指定がある場合も各瞬間のランキングの全体を出力する要求もある。クエリ６０２がその例である。クエリ６０１のＩＤＳＴＲＥＡＭのかわりにＲＳＴＲＥＡＭを指定する（８２９）ことによって、出力時点でのランキング計算結果全体を出力する。図８（ｂ）はクエリ６０２の実行結果を示している。時刻ｔ１〜ｔ７のそれぞれの出力を８２２〜８２８に示した。 In the query 601, the IDSTREAM phrase is specified as the final output format. As described using the query 301, the IDSTREAM phrase is an interface that outputs the stream data added to the ranking as an increase and the stream data deleted from the ranking as a decrease as a result of the ranking calculation. In FIG. 8A, the stream data after being processed by the IDSTREAM phrase (804) is shown on the right side (819) of the figure. When {a, 50} (805) arrives at time t1, the rank information calculated by the order / rank generation unit is added to the head of the output column and output in the first column specified by the query. When {a, 50} arrives, there is only one stream data and the rank is first, so {1, a, 50} (811) is output. Next, when {b, 10} (806) arrives at time t2, the rank becomes second, so {2, b, 10} (812) is output. When {c, 30} (807) arrives at time t3, the val value 30 is smaller than the val value 50 of {a, 50} and larger than the val value 10 of {b, 10}. The ranking is 2nd. Therefore, at time t3, {3, b, 10} (813) and {2, c, 30} (814) are output as increments, and {2, b, 10} (815) as increments. Is output. In the example of the query 301 shown in FIG. 5, since the set itself included in the ranking output did not change due to {c, 30}, only {c, 30} was output as an increase (515). In the example of the query 601, since the order changes with the arrival of {c, 30}, three stream data are output by adding the increment and decrement. Similarly, at times t5 and t6, four (820) and (821) respectively correspond to the change in rank, and the number of stream tuples to be output is increased compared to the case of the query 301. In the query 601, since the IDSTREAM phrase is specified as the final output format, stream data corresponding to the increase / decrease is output. There is also a request to output the entire ranking of each moment. An example is the query 602. By specifying RSTREAM instead of IDSTREAM of the query 601 (829), the entire ranking calculation result at the time of output is output. FIG. 8B shows the execution result of the query 602. Respective outputs at times t1 to t7 are shown in 822 to 828.

以上の実施例では、ランキングの指定は降順でその開始順位は１位のみを示したが、ランキングの指定は昇順でも差し支えない。また、ランキング開始順位は任意の整数値での指定が可能である。例えば、図１１のクエリ１１０１はストリームｓから、ｓ．ｉｄとｓ．ｖａｌの値の組を、ｓ．ｉｄの値でグルーピングしてそれぞれのｓ．ｖａｌの値の最新の１件を保持し、開始順位１０位からｓ．ｖａｌの値の昇順に３件を出力することを示している。図３のクエリ３０１との違いは、３行目のＯＦＦＳＥＴ句とＡＳＣ指定であり、前者が開始順位の指定、後者が昇順の指定である。ＯＦＦＳＥＴ指定がある場合には、図９のランキング処理の受け取ったタプルがランキングに影響するか否かのチェック（９０４）は、受け取ったストリームタプルが、ＯＦＦＳＥＴ句で指定された開始順位から、ＬＩＭＩＴ句で指定された出力指定範囲に影響するか否かをチェックすればよい。ランキング情報の保持、管理に関しては、図２に示したランキング処理モジュール（１１６）でＯＦＦＳＥＴ指定なしの場合と同様に処理できる。
さらに、上述の実施例ではランキング付け対象のカラムが明示的にシステム投入されるが、ランキング付け対象のカラムを自動的に決定するシステムにも本発明は適用可能である。 In the above embodiment, the designation of ranking is in descending order and only the first ranking is shown, but the designation of ranking may be in ascending order. The ranking start order can be specified by an arbitrary integer value. For example, the query 1101 in FIG. id and s. The set of val values is denoted by s. By grouping with the value of id, each s. The latest one of the values of val is held, and s. It shows that 3 items are output in ascending order of the value of val. The difference from the query 301 in FIG. 3 is the OFFSET phrase and ASC designation on the third line, where the former is the designation of the start order and the latter is the designation in ascending order. If there is an OFFSET designation, a check (904) on whether the received tuple in the ranking process in FIG. 9 affects the ranking is performed by checking whether the received stream tuple is a LIMIT phrase from the start order designated by the OFFSET phrase. It is sufficient to check whether or not the output specification range specified in is affected. Retention and management of ranking information can be processed in the same manner as when OFFSET is not designated in the ranking processing module (116) shown in FIG.
Further, in the above-described embodiment, the ranking target column is explicitly input to the system, but the present invention can also be applied to a system that automatically determines the ranking target column.

本発明におけるストリームデータ処理システムの構成を示す図。The figure which shows the structure of the stream data processing system in this invention. 本発明におけるランキング処理モジュールの構成を示す図。The figure which shows the structure of the ranking process module in this invention. 本発明におけるランキング指定を含むクエリ（順位出力指定含まず）の例。An example of a query including ranking designation (not including rank output designation) in the present invention. 本発明におけるランキング計算の計算内容（順位出力指定含まず）を示す図。The figure which shows the calculation content (rank output designation | designated is not included) of the ranking calculation in this invention. 本発明におけるランキング計算結果の出力結果（順位出力指定含まず）を示す図。The figure which shows the output result (The ranking output designation is not included) of the ranking calculation result in this invention. 本発明におけるランキング指定を含むクエリ（順位出力指定含む）の例。The example of the query (rank output specification is included) containing the ranking specification in this invention. 本発明におけるランキング計算の計算内容（順位出力指定含む）を示す図。The figure which shows the calculation content (rank output designation | designated) of the ranking calculation in this invention. 本発明におけるランキング計算結果の出力結果（順位出力指定含む）を示す図。The figure which shows the output result (an order | rank output specification is included) of the ranking calculation result in this invention. 本発明におけるランキング処理手順を示すフローチャート。The flowchart which shows the ranking process sequence in this invention. 本発明におけるランキング情報保持バッファメンテナンス処理手順を示すフローチャート。The flowchart which shows the ranking information holding buffer maintenance processing procedure in this invention. 本発明におけるランキング指定を含むクエリ（オフセット指定含む）の例。An example of a query (including offset designation) including ranking designation in the present invention. クエリ処理言語ＣＱＬによるクエリ記述例。An example of query description in the query processing language CQL. 本発明におけるランキング情報の二分探索木による表現の例。The example of the expression by the binary search tree of the ranking information in this invention. 本発明におけるランキング情報保持バッファ内のランキング情報保持テーブルの変化の様子を示す図。The figure which shows the mode of a change of the ranking information holding table in the ranking information holding buffer in this invention. 本発明におけるウィンドウマネージャの構成を示す図。The figure which shows the structure of the window manager in this invention. 本発明におけるクエリ処理エンジンの計算機上での実現例を示す図。The figure which shows the implementation example on the computer of the query processing engine in this invention. 本発明におけるランキング計算方法を示すシーケンス図。The sequence diagram which shows the ranking calculation method in this invention. 本発明におけるランキング情報保持バッファへのストリームタプル追加処理方法を示すフローチャート。The flowchart which shows the stream tuple addition processing method to the ranking information holding buffer in this invention. 本発明におけるランキング情報保持バッファからのストリームタプル削除処理方法を示すフローチャート。The flowchart which shows the stream tuple deletion processing method from the ranking information holding buffer in this invention.

Explanation of symbols

１１５…ストリームデータ処理システム
１０７、１２１…ネットワーク
１１３…クエリ処理エンジン
１１６、…ランキング処理モジュール
２０２…ランキング情報保持バッファ
２０３…ランキング情報保持テーブル
２０４…順序・順位生成部
２０６…ストリームタプル受付インタフェース
２０７…ランキング処理結果出力インタフェース
２０８…順位管理インデックス
１１６…ランキング処理モジュール
１１７…メモリマネージャ
１２６…ウィンドウマネージャ
１５０２…ストリームタプル保持バッファ
１５０４…生存期間決定部
１５０５…ストリームデータ受付インタフェース
１５０６…差分情報生成部
３０１、３０２、６０１、６０２、１１０１、１２０１…クエリ
１６０１…ストリームデータ処理サーバ
１６０２…通信インタフェース
１６０３…ＣＰＵ
１６０４…メモリ
１６０５…Ｉ／Ｏインタフェース
１６０６…ストレージ装置 115: Stream data processing system 107, 121 ... Network 113 ... Query processing engine 116 ... Ranking processing module 202 ... Ranking information holding buffer 203 ... Ranking information holding table 204 ... Order / rank generation unit 206 ... Stream tuple reception interface 207 ... Ranking Processing result output interface 208 ... ranking management index 116 ... ranking processing module 117 ... memory manager 126 ... window manager 1502 ... stream tuple holding buffer 1504 ... lifetime determination unit 1505 ... stream data reception interface 1506 ... difference information generation units 301 and 302, 601, 602, 1101, 1201 ... Query 1601 ... Stream data processing server 1602 ... Communication interface 603 ... CPU
1604 ... Memory 1605 ... I / O interface 1606 ... Storage device

Claims

In a stream data processing system having a control unit that continuously receives stream data composed of a plurality of stream tuples with time stamps and continuously executes query processing on the stream data by a pre-registered query , Stream data ranking query processing method,
The controller is
In accordance with the window specification specified by the query, for each timing when each stream tuple arrives , the lifetime in the stream tuple window or the end of the lifetime in the stream tuple window that arrived in the past is determined. Generating window difference information indicating insertion of a stream tuple into the window and deletion of the stream tuple from the window;
In accordance with the ranking process specified by the query , the ranking information indicating the ranking between the stream tuples within the range of the stream tuple in the lifetime in the window based on the window difference information is changed from the first ranking information to the second ranking information. Updated to the ranking information of
Generating ranking difference information that is a difference between the first ranking information and the second ranking information;
A ranking query processing method for stream data, comprising: outputting a ranking processing result based on the ranking difference information and an output designation range designated by the query.

2. The stream data ranking query processing method according to claim 1, wherein the control unit stores the updated ranking information of stream tuples that are within the lifetime beyond the specified output range.

The stream data according to claim 1, wherein the control unit outputs a difference between the first ranking information and the second ranking information as the ranking processing result based on the ranking difference information. Ranking query processing method.

The control unit stores the outputted ranking process result, and outputs the entire ranking information included in the output designated range as the ranking process result based on the ranking difference information and the ranking process result outputted last time. The stream data ranking query processing method according to claim 1,

2. The stream data ranking query processing method according to claim 1, wherein the ranking information is stored over all stream tuples within the lifetime.

2. The ranking query processing method for stream data according to claim 1, wherein the ranking information is stored in the window as a table in which each stream tuple is associated with a rank.

  A stream data processing system that continuously receives stream data composed of a plurality of stream tuples with time stamps attached thereto, and continuously executes query processing on the stream data according to a query registered in advance.
  A window manager that performs the window operation specified in the query on the incoming stream tuple and determines the lifetime in the window of each stream tuple;
  A ranking processing module that performs ranking processing specified by the query and outputs a set of stream tuples included in the output specification range specified by the query;
  The window manager
  Generates window difference information indicating insertion of a stream tuple into the window and deletion of a stream tuple from the window at each timing when each stream tuple arrives, and transmits it to the ranking processing module.
  The ranking processing module, based on the window difference information transmitted from the difference information generation unit of the window manager, converts the ranking information into first ranking information within the range of the stream tuple in the lifetime in the window. To the second ranking information from
  Storing the second ranking information in the ranking information holding buffer within the range of the stream tuple in the lifetime in the window;
  Generating ranking difference information that is a difference between the first ranking information and the second ranking information, and outputting a ranking processing result based on the ranking difference information and an output designation range designated by the query; A stream data processing system.

8. The ranking processing module according to claim 7, wherein the ranking processing module outputs a difference between the first ranking information and the second ranking information as the ranking processing result based on the ranking difference information. Stream data processing system.

The ranking processing module stores the output ranking processing result, and based on the ranking difference information and the ranking processing result output last time, the ranking information as a whole is included in the output designation range as the ranking processing result. The stream data processing system according to claim 7, wherein the stream data processing system outputs the stream data.

8. The stream data processing system according to claim 7, wherein the ranking information holding buffer holds the ranking information as a table in which each stream tuple and its rank are associated with each other.

When the stream tuple arrives, the difference information generation unit adds a code indicating insertion into the window to the stream tuple, and the stream tuple from the window at the end of the determined lifetime of the stream tuple. 8. The stream data processing system according to claim 7, wherein another code indicating deletion is added, and the difference information is transmitted as the window difference information to the ranking processing module.