JP4801566B2

JP4801566B2 - Data stream monitoring device, data stream monitoring method, program thereof, and recording medium

Info

Publication number: JP4801566B2
Application number: JP2006318752A
Authority: JP
Inventors: 保志櫻井; 雅司山室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-11-27
Filing date: 2006-11-27
Publication date: 2011-10-26
Anticipated expiration: 2026-11-27
Also published as: JP2008134706A

Description

本発明は、継続して大量に流入してくるデータをリアルタイムに分析する技術（ストリームマイニング技術）に関する。 The present invention relates to a technique (stream mining technique) for analyzing data that continuously flows in a large amount in real time.

近年、コンピュータ装置やネットワークに関する技術が発達し、ネットワーク上を大量のデータが継続的に流通することも少なくない。ネットワーク上で継続して大量に流通するデータのことをデータストリームと呼ぶが、データストリームの例としては、たとえば、インターネットにおける株価などの金融データや、ＬＡＮ（Local Area Network）などのセンサネットワークにおけるセンサデータ（温度センサや照度センサの計測データなど）などが挙げられる。 In recent years, technologies related to computer devices and networks have been developed, and a large amount of data is often continuously distributed on the network. Data that continuously circulates in large quantities on a network is called a data stream. Examples of data streams include financial data such as stock prices on the Internet and sensors in sensor networks such as a LAN (Local Area Network). Data (such as temperature sensor and illuminance sensor measurement data).

そして、ストリームマイニング技術は、データベースに蓄えられた大容量のデータを分析するのではなく、継続的に受信する大量のデータをリアルタイムで分析しなくてはならないため、計算の高速化と省メモリ化を図る必要がある。また、そうすれば、利用者に対して必要な情報を迅速に提供することもできる。 Stream mining technology does not analyze the large amount of data stored in the database, but must analyze a large amount of continuously received data in real time, thereby speeding up computation and saving memory. It is necessary to plan. In addition, necessary information can be quickly provided to the user.

データストリームの分析としては、たとえば、ある問合せシーケンス（所定のデータ列）とある程度類似するシーケンスがデータストリーム中にあるか否かを、部分的なシーケンスマッチングによって見つける手法がある。この場合、データストリームのサンプリングレートなどによって、データストリームと問合せシーケンスでは相対的な経時速度が異なることもあるので、データの時間軸方向の伸縮も考慮に入れたマッチングを行うことが望ましい。 As an analysis of the data stream, for example, there is a method of finding out by partial sequence matching whether or not a sequence somewhat similar to a certain query sequence (predetermined data string) exists in the data stream. In this case, since the relative aging speed may differ between the data stream and the query sequence depending on the sampling rate of the data stream, it is desirable to perform matching taking into account the expansion and contraction of the data in the time axis direction.

そこで、動的計画法として、シーケンス間の時間軸方向の伸縮も考慮に入れ、シーケンス間の距離を最小化するように時間軸方向の調整を行いながらシーケンスマッチングを行う技術であるダイナミックタイムワーピング（ＤＴＷ：Dynamic Time Warping）が、広く使用されている。
このＤＴＷに基づいて作成および保存される行列をタイムワーピング行列と呼び、そのタイムワーピング行列によって２つのシーケンス間の距離であるＤＴＷ距離を算出することができる。ＤＴＷ距離が小さいほど、２つのシーケンスは類似していることになる。 Therefore, as a dynamic programming method, dynamic time warping, which is a technology that performs sequence matching while adjusting the time axis direction so as to minimize the distance between sequences, taking into account the expansion and contraction in the time axis direction between sequences. DTW (Dynamic Time Warping) is widely used.
A matrix created and stored based on this DTW is called a time warping matrix, and a DTW distance that is a distance between two sequences can be calculated by the time warping matrix. The smaller the DTW distance, the more similar the two sequences are.

たとえば、非特許文献１と非特許文献２には、ＤＴＷに基づく類似部分シーケンスの検出の高速化に関する技術が記載されている。
また、非特許文献３には、データストリームのリアルタイム分析手法として、データストリーム中のデータ同士の相関係数も考慮することで精度を上げたStatStreamに関する技術が記載されている。
さらに、非特許文献４には、データストリームのリアルタイム分析手法において使用する、データ間の遅延相関を検出するアルゴリズムであるＢＲＡＩＤに関する技術が記載されている。
Emonn Keogh,Exact Indexing of Dynamic Time Warping,Proceedings of the Twenty-eighth International Conference on VLDB(Very Large Data Bases),China,August 2002,pp.406-417 Yasushi Sakurai,MasatoshiYoshikawa,Christos Faloutsos,FTW:FastSimilarity Search under the Time Warping Distance, Proceedings of Symposium on PODS(Principles of Database Systems),USA,June 2005,pp.326-337 Yunyue Zhu,Dennis Shasha,StatStream:StatisticalMonitoring of Thousands of Data Streams in Real Time,Proceedingsof the Twenty-eighth International Conference on VLDB(Very Large Data Bases),China,August 2002,pp.358-369 Yasushi Sakurai,Spiros Papadimitriou, Christos Faloutsos,BRAID:StreamMining through Group Lag Correlations, Proceedings of ACM SIGMOD(Association For Computing Machinery Special Interest Group On Management Of Data),USA,June 2005,pp.599-610 For example, Non-Patent Document 1 and Non-Patent Document 2 describe techniques related to speeding up detection of similar partial sequences based on DTW.
Non-Patent Document 3 describes a technique related to StatStream that improves accuracy by considering a correlation coefficient between data in a data stream as a data stream real-time analysis technique.
Further, Non-Patent Document 4 describes a technique related to BRAID, which is an algorithm for detecting a delay correlation between data, which is used in a real-time analysis method of a data stream.
Emonn Keogh, Exact Indexing of Dynamic Time Warping, Proceedings of the Twenty-eighth International Conference on VLDB (Very Large Data Bases), China, August 2002, pp. 406-417 Yasushi Sakurai, Masatoshi Yoshikawa, Christos Faloutsos, FTW: Fast Similarity Search under the Time Warping Distance, Proceedings of Symposium on PODS (Principles of Database Systems), USA, June 2005, pp.326-337 Yunyue Zhu, Dennis Shasha, StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, Proceedingsof the Twenty-eighth International Conference on VLDB (Very Large Data Bases), China, August 2002, pp. 358-369 Yasushi Sakurai, Spiros Papadimitriou, Christos Faloutsos, BRAID: StreamMining through Group Lag Correlations, Proceedings of ACM SIGMOD (Association For Computing Machinery Special Interest Group On Management Of Data), USA, June 2005, pp.599-610

しかしながら、非特許文献１および非特許文献２は、蓄積されたデータ集合に適用する技術であり、データストリームのリアルタイム分析のための技術ではない。
また、非特許文献３および非特許文献４は、時間軸方向に調整を行うことなくシーケンスマッチングを行うものであり、ＤＴＷを使用したデータストリーム監視に関する技術ではない。 However, Non-Patent Document 1 and Non-Patent Document 2 are techniques applied to accumulated data sets, and are not techniques for real-time analysis of data streams.
Non-Patent Document 3 and Non-Patent Document 4 perform sequence matching without adjusting in the time axis direction, and are not related to data stream monitoring using DTW.

そして、ＤＴＷを使用したデータストリーム監視を行おうとすると、データストリームの長さに比例して計算量やメモリの使用量が増大してしまい、実用的ではなかった（詳細は、発明を実施するための最良の形態の比較例の説明を参照）。
そこで、本発明は、前記問題点に鑑みてなされたものであり、ＤＴＷを使用したデータストリーム監視において、計算コスト（計算量やメモリの使用量）の低減を図ることを目的とする。 When data stream monitoring using DTW is attempted, the amount of calculation and the amount of memory used increase in proportion to the length of the data stream, which is not practical (details are for carrying out the invention). (See description of comparative example of best mode).
Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to reduce calculation costs (calculation amount and memory use amount) in data stream monitoring using DTW.

前記課題を解決するために、請求項１および請求項４に係る発明は、継続的に受信するデータであるデータストリームから、所定のデータ列である問合せシーケンスと類似する部分シーケンスを、ダイナミックタイムワーピング距離に基づいて検出するデータストリーム監視装置において、記憶部が、データストリーム中の時刻ｔにおけるデータｘ _ｔを受信したときに、長さｍのシーケンスＹ＝（ｙ _１，ｙ _２，・・・，ｙ _ｍ）である問合せシーケンス中のデータｙ _ｉ（ただし、ｉ＝１，２，・・・，ｍ）とデータストリーム中の時刻ｔ−１におけるデータｘ _ｔ−１との距離に関する情報ｄ（ｔ−１，ｉ）と、当該距離に関する情報ｄ（ｔ−１，ｉ）に対応する開始時刻に関する情報ｓ（ｔ−１，ｉ）と、を（ｔ−１，ｉ）要素に持つ単一のタイムワーピング行列、および、ダイナミックタイムワーピング距離に関する所定の閾値εを記憶している。
そして、処理部は、データｘ _ｔを受信したときに、記憶部に記憶されたタイムワーピング行列と以下の式（１１）〜（１４）を用いて、問合せシーケンス中のデータｙ _ｉ（ただし、ｉ＝１，２，・・・，ｍ）とデータｘ _ｔとの間の距離に関する情報ｄ（ｔ，ｉ）と、以下の式（１５）により計算される当該ｄ（ｔ，ｉ）に対応する開始時刻に関する情報ｓ（ｔ，ｉ）とを計算して、記憶部に記憶されたタイムワーピング行列を更新し、問合せシーケンス中のデータｙ _ｍとデータｘ _ｔとの間の距離に関する情報ｄ（ｔ，ｍ）が所定の閾値ε以下のときに、当該距離に関する情報ｄ（ｔ，ｍ）に対応する開始時刻ｓ（ｔ，ｍ）を開始時刻ｔ _ｓとし、時刻ｔを終了時刻ｔ _ｅとするデータストリーム中の部分シーケンスＸ［ｔ _ｓ：ｔ _ｅ］を類似する部分シーケンスとして検出する。 In order to solve the above problems, the inventions according to claim 1 and claim 4 perform dynamic time warping on a partial sequence similar to a query sequence which is a predetermined data string from a data stream which is continuously received data. In the data stream monitoring device that detects based on the distance, when the storage unit receives the data x _t at time t in the data stream, the sequence of length m = (y ₁ , y ₂ ,... y _m) data _{y i} in query sequence is (where, i = 1,2, ···, m ) information on the distance between the data _{x t-1} at time t-1 in the data stream d (t −1, i) and information s (t−1, i) related to the start time corresponding to the information d (t−1, i) related to the distance are simply included in the (t−1, i) element. One time warping matrix, and stores a predetermined threshold value ε to a dynamic time warping distance.
Then, the processing unit, when receiving the data x _t, using the following formula stored time warping matrix in the storage unit (11) to (14), the data y _i in query sequence _(where, i = 1, 2,..., M) and the information d (t, i) regarding the distance between the data x _t and the d (t, i) calculated by the following equation (15). information s (t, i) relating to the start time to calculate the, update the time warping matrix stored in the storage unit, information relating to the distance between the data y _m and the data x _t in query sequence d (t when m) is equal to or less than a predetermined threshold epsilon, the starting time s (t, m) a starting time _{t s} corresponding to the information d (t, m) relating to the distance, the time t and the end time _{t e} partial sequence _X in the data stream _{[t s:} t e] the class It is detected as a partial sequence that.

かかる発明によれば、タイムワーピング行列が単一であることで、計算コストの低減を図ることができる。また、類似する部分シーケンスを検出したときに、その開始時刻を特定することができる。 According to this invention, since the time warping matrix is single, the calculation cost can be reduced. Further, when a similar partial sequence is detected, the start time can be specified.

請求項２および請求項５に係る発明は、処理部が、類似する部分シーケンスＸ［ｔ _ｓ：ｔ _ｅ］を検出した場合、その後、さらに、タイムワーピング行列の各行列要素の値の算出を行い、その類似する部分シーケンスＸ［ｔ _ｓ：ｔ _ｅ］と時間帯が一部でも重複する部分シーケンスの中で、ダイナミックタイムワーピング距離がより小さいものがないと判断したときに、その類似する部分シーケンスＸ［ｔ _ｓ：ｔ _ｅ］をより適切な部分シーケンスとして検出することを特徴とする。 In the inventions according to claims 2 and 5 , when the processing unit detects a similar partial sequence X [t _s : t _e ] , it further calculates a value of each matrix element of the time warping matrix. , The similar partial sequence when it is determined that none of the partial sequences whose time zone overlaps with the similar partial sequence X [t _s : t _e ] has a smaller dynamic time warping distance X [t _s : t _e ] is detected as a more appropriate partial sequence.

かかる発明によれば、類似する部分シーケンスと時間帯が重複する部分シーケンスを、その後、さらに調べることで、その類似する部分シーケンスがより適切な部分シーケンスであると判断することができる。 According to this invention, it is possible to determine that the similar partial sequence is a more appropriate partial sequence by further examining the partial sequence whose time zone overlaps with the similar partial sequence.

請求項３および請求項６に係る発明は、処理部が、類似する部分シーケンスＸ［ｔ _ｓ：ｔ _ｅ］を検出した場合、その類似する部分シーケンスＸ［ｔ _ｓ：ｔ _ｅ］の最後のデータの受信時刻を終了時刻とすると、その終了時刻後におけるタイムワーピング行列のある列において、各行列要素のうち、その開始時刻が終了時刻以前であって、かつ、その値が類似する部分シーケンスのダイナミックタイムワーピング距離よりも小さいものがないときに、その類似する部分シーケンスＸ［ｔ _ｓ：ｔ _ｅ］をより適切なものとして検出することを特徴とする。 The invention according to claims 3 and 6, processing unit, similar to the partial sequence _X: when detecting _{_[t} s t e], partial sequence X that similar _{_[t s:} t e] last data If the reception time is the end time, in a column of the time warping matrix after the end time, among the matrix elements, the dynamics of the partial sequences whose start time is before the end time and whose values are similar when there is no less than the time warping distance, the similar parts sequence _X: and detecting as [t s t _e] of the more appropriate.

かかる発明によれば、類似する部分シーケンスの終了時刻後におけるタイムワーピング行列のある列における各行列要素について上記のように調べることで、その部分シーケンスをより適切なものとして判断することができる。 According to this invention, it is possible to determine that the partial sequence is more appropriate by examining each matrix element in a certain column of the time warping matrix after the end time of the similar partial sequence as described above.

請求項７に係る発明は、請求項４から請求項６のいずれか１項に記載のデータストリーム監視方法をコンピュータに実行させるためのプログラムである。 The invention according to claim 7 is a program for executing the data stream monitoring method according to the computer of claims 4 to any one of claims 6.

かかる発明によれば、データストリーム監視方法をコンピュータに実行させることができる。 According to this invention, the computer can execute the data stream monitoring method.

請求項８に係る発明は、請求項７のプログラムを記録することを特徴とする記録媒体である。 The invention according to an eighth aspect is a recording medium that records the program according to the seventh aspect.

かかる発明によれば、データストリーム監視方法のプログラムを記録することができる。 According to this invention, the program of the data stream monitoring method can be recorded.

本発明によれば、ＤＴＷを使用したデータストリーム監視において、計算コストの低減を図ることができる。 According to the present invention, calculation cost can be reduced in data stream monitoring using DTW.

以下、本発明に係るデータストリーム監視装置を実施するための最良の形態（以下、実施形態という。）について、適宜図面を参照しながら説明する。なお、参照図以外の図も適宜参照する。
その前に、理解を容易にするため、図８および図９を参照しながら、ＤＴＷを使用したデータストリーム監視の比較例（従来技術）について説明する。 Hereinafter, the best mode for carrying out a data stream monitoring apparatus according to the present invention (hereinafter referred to as an embodiment) will be described with reference to the drawings as appropriate. Note that drawings other than the reference diagram are also referred to as appropriate.
Before that, a comparative example (prior art) of data stream monitoring using DTW will be described with reference to FIGS. 8 and 9 for easy understanding.

図８は、ＤＴＷを使用したデータストリーム監視の比較例における、問合せ処理のイメージを示した図である。
図８において、シーケンスＹは、固定長ｍのデータであり、シーケンスの類似判断の元となるデータである。 FIG. 8 is a diagram showing an image of query processing in a comparative example of data stream monitoring using DTW.
In FIG. 8, a sequence Y is data of a fixed length m, and is data that is a source of sequence similarity determination.

データストリームであるシーケンスＸは、時々刻々と伸張している（データ量が増えている）シーケンスであり、その長さをｎとし、また、シーケンスの類似判断の対象となるデータである。
ＴＷＭ（タイムワーピング行列）１〜Ｓは、それぞれ時刻ｔ＝１〜ｔ＝Ｓから始まるタイムワーピング行列である。また、ＴＷＭＳに示した黒いマスのつながりは、ワーピングパス（後記するＤＴＷ距離を算出するためにたどるルート）である。 The sequence X, which is a data stream, is a sequence that is constantly expanding (the amount of data is increasing), the length of which is n, and data that is the target of sequence similarity determination.
TWM (Time Warping Matrix) 1 to S are time warping matrices starting from time t = 1 to t = S, respectively. Further, the connection of the black cells shown in TWMS is a warping path (route to be followed for calculating the DTW distance described later).

ここで、目的は、ＤＴＷ距離に基づき、シーケンスＸにおいて、シーケンスＹと類似する部分シーケンスを検出することである。以下、図８を参照しながらその概要を説明し、その後、図９を参照しながら具体例について説明する。なお、類似する部分シーケンスの検出方法は２つあり、それらを第１の方法、第２の方法と呼ぶ。まず、第１の方法について説明する。 Here, the purpose is to detect a partial sequence similar to the sequence Y in the sequence X based on the DTW distance. Hereinafter, the outline will be described with reference to FIG. 8, and then a specific example will be described with reference to FIG. There are two similar partial sequence detection methods, which are referred to as a first method and a second method. First, the first method will be described.

（第１の方法）
２つのシーケンス間のＤＴＷ距離とは、２つのシーケンスを、全体的あるいは部分的に時間軸方向に伸縮の調整を行った後の距離のことである。ＤＴＷ距離は、タイムワーピング行列に基づいて算出することができる。
長さｍのシーケンスＹ＝（ｙ_１，ｙ_２，・・・，ｙ_ｍ）と長さｎのシーケンスＸ＝（ｘ_１，ｘ_２，・・・，ｘ_ｎ）の間のＤＴＷ距離Ｄ（Ｘ，Ｙ）は、以下の式（１）〜（４）のようにして求めることができる。なお、ｔ＝（１，２，・・・，ｎ）、ｉ＝（１，２，・・・，ｍ）とする。 (First method)
The DTW distance between the two sequences is a distance after the two sequences are adjusted, in whole or in part, in the time axis direction. The DTW distance can be calculated based on the time warping matrix.
DTW distance D () between sequence Y = (y ₁ , y ₂ ,..., Y _m ) of length _m and sequence X = (x ₁ , x ₂ ,..., X _n ) of length _n X, Y) can be obtained by the following equations (1) to (4). Note that t = (1, 2,..., N) and i = (1, 2,..., M).

Ｄ（Ｘ，Ｙ）＝ｆ（ｎ，ｍ）・・・（１）
ｆ（ｔ，ｉ）＝||ｘ_ｔ−ｙ_ｉ||＋ｍｉｎ｛ｆ（ｔ，ｉ−１），ｆ（ｔ−１，ｉ），
ｆ（ｔ−１，ｉ−１）｝・・・（２）
ｆ（０，０）＝０・・・（３）
ｆ（ｔ，０）＝ｆ（０，ｉ）＝∞・・・（４） D (X, Y) = f (n, m) (1)
f (t, i) = || x _t −y _i || + min {f (t, i−1), f (t−1, i),
f (t-1, i-1)} (2)
f (0,0) = 0 (3)
f (t, 0) = f (0, i) = ∞ (4)

式（１）は、ＤＴＷ距離の定義である。式（２）は、具体的な計算式である。
式（２）において、||ｘ_ｔ−ｙ_ｉ||は、２つの数値（ｘ_ｔとｙ_ｉ）の距離を表す。２つの数値の距離としては、たとえば、ユークリッド距離やマンハッタン距離が考えられる。ｎ次元空間において、ａ、ｂという２つの点の座標をａ（ａ_１，ａ_２，・・・，ａ_ｎ）、ｂ（ｂ_１，ｂ_２，・・・，ｂ_ｎ）とし、また、（１≦ｊ≦ｎ）とすると、ユークリッド距離とは√（Σ（ａ_ｊ−ｂ_ｊ）^２）、マンハッタン距離とはΣ｜ａ_ｊ−ｂ_ｊ｜で表される距離のことである。
以下の具体例では、計算を容易にするために、||ｘ_ｔ−ｙ_ｉ||として、ユークリッド距離の二乗の値を算出し、使用する。 Equation (1) is a definition of the DTW distance. Formula (2) is a specific calculation formula.
In Expression (2), || x _t −y _i || represents the distance between two numerical values (x _t and y _i ). As the distance between the two numerical values, for example, the Euclidean distance and the Manhattan distance can be considered. In the n-dimensional space, the coordinates of two points a and b are a (a ₁ , a ₂ ,..., a _n ), b (b ₁ , b ₂ ,..., b _n ), If (1 ≦ j ≦ n), the Euclidean distance is √ (Σ (a _j −b _j ) ² ), and the Manhattan distance is a distance represented by Σ | a _j −b _j |.
In the following specific example, in order to facilitate calculation, the square value of the Euclidean distance is calculated and used as || x _t −y _i ||.

また、式（２）において、ｍｉｎ｛ｆ（ｔ，ｉ−１），ｆ（ｔ−１，ｉ），ｆ（ｔ−１，ｉ−１）｝は、｛｝内の３つの値のうち、最小のものを採用する、という意味である。なお、最小のものが２つあるいは３つある場合については、検出される部分シーケンスが長くなるような（あるいは短くなるような）値を選択する、あるいは、検出される部分シーケンスが問合せシーケンスＹに近くなるような値を選択する、など、予めその選択基準を設定しておいてもよい。
式（３）および式（４）は、この３つの値を計算する際に使用する、タイムワーピング行列における境界条件である。 In the expression (2), min {f (t, i−1), f (t−1, i), f (t−1, i−1)} is among the three values in {}. Means to adopt the smallest one. When there are two or three minimum ones, a value that makes the detected partial sequence longer (or shorter) is selected, or the detected partial sequence becomes the query sequence Y. The selection criteria may be set in advance, such as selecting a value that is close.
Equations (3) and (4) are boundary conditions in the time warping matrix used when calculating these three values.

タイムワーピング行列は、ＤＴＷの関数の値、すなわち、上記式（２）におけるｆ（ｔ，ｉ）の値を保持するものである。比較例では、長さｎのシーケンスＸと長さｍのシーケンスＹの距離を求めるときに、ｎ個のタイムワーピング行列を使用するため、計算にＯ（ｎｍ）（ｎとｍの積に比例）の時間がかかってしまう。つまり、この方法では、ｎの値が大きくなると、計算時間が長くなりすぎて、実用に耐えない。 The time warping matrix holds the value of the DTW function, that is, the value of f (t, i) in the above equation (2). In the comparative example, when calculating the distance between the sequence X of length n and the sequence Y of length m, since n time warping matrices are used, O (nm) is calculated (proportional to the product of n and m). It will take a long time. In other words, in this method, if the value of n increases, the calculation time becomes too long and cannot be practically used.

なお、メモリの使用量はＯ（ｍ）である。これは、タイムワーピング行列の情報として、現在時刻の列（タイムワーピング行列におけるその時刻の各行列要素の集合）の情報とその直前時刻の列の情報の計２列の情報だけを記憶しておけばよく、それ以前の情報は逐次消去可能だからである。 Note that the memory usage is O (m). This is because the time warping matrix information can be stored only in two columns of information: the current time column information (a set of matrix elements at that time in the time warping matrix) and the immediately preceding time column information. This is because the previous information can be erased sequentially.

また、Ｘ［ｔ_ｓ：ｔ_ｅ］を、時刻ｔ_ｓから時刻ｔ_ｅまでの部分シーケンスとする。すなわち、Ｘ＝（ｘ_１，ｘ_２，・・・，ｘ_ｓ，・・・，ｘ_ｅ，・・・，ｘ_ｎ）において、Ｘ［ｔ_ｓ：ｔ_ｅ］＝（ｘ_ｓ，・・・，ｘ_ｅ）である。ここでの目的は、固定長ｍの問合せシーケンスＹと高い類似性を有するＸ［ｔ_ｓ：ｔ_ｅ］を発見（検出）することである。 In addition, _X: the _{_[t} s t _e], a partial sequence from time _{t s} to time _{t e.} _{_{That, X = (x 1, x}} 2, ···, x s, ···, x e, ···, x n) _{_{in, X [t s: t e}} ] = (x s, ··· , X _e ). The purpose here is, a fixed length m of the query sequence Y and high similarity to a _X: is to discover _[t s t _e] (Detection).

第１の方法では、所定の閾値εを設定しておき、次の式（５）を満たすＸ［ｔ_ｓ：ｔ_ｅ］を検出する。
Ｄ（Ｘ［ｔ_ｓ：ｔ_ｅ］，Ｙ）≦ε・・・（５） In the first method, previously set a predetermined threshold epsilon, X satisfying the following equation _(5): detecting the _[t s t _e].
_{_{D (X [t s: t}} e], Y) ≦ ε ··· (5)

第１の方法の具体例を、図９を参照しながら説明する。図９は、比較例におけるタイムワーピング行列の例を示した図である。
ここでは、時間経過とともに伸張するシーケンスＸにおいて、シーケンスＹ＝（１１，６，９，４）と高い類似性を有する部分シーケンスを検出することを目的とする。図９（１）〜（７）では、シーケンスＸの長さが「７」（Ｘ＝（５，１２，６，１０，６，５，１））のときのタイムワーピング行列（ＴＷＭ）１〜７（それぞれ太線の内部）の状態を示している。なお、閾値ε＝２０とする。 A specific example of the first method will be described with reference to FIG. FIG. 9 is a diagram illustrating an example of a time warping matrix in the comparative example.
Here, an object is to detect a partial sequence having a high similarity to the sequence Y = (11, 6, 9, 4) in the sequence X that expands with time. 9 (1) to (7), the time warping matrix (TWM) 1 to 1 when the length of the sequence X is “7” (X = (5, 12, 6, 10, 6, 5, 1)). 7 (each inside a thick line). Note that the threshold ε = 20.

また、上記式（１）〜（４）を、実際に各ＴＷＭ１〜７の各要素（各行列要素）の値を算出するための式に変形すると、以下の式（６）〜（９）のようになる。
つまり、ｔ番目のタイムワーピング行列（時刻ｔから始まるタイムワーピング行列）において、要素（ｋ，ｉ）のＤＴＷ距離をｆ_ｔ（ｋ，ｉ）とする。なお、ｔ＝（１，２，・・・，ｎ）、ｉ＝（１，２，・・・，ｍ）、ｋ＝（１，２，・・・，ｎ−ｔ＋１）とする。また、式（１）〜（４）と重複する事項に関しては、説明を適宜省略する。 Further, when the above formulas (1) to (4) are transformed into formulas for actually calculating the values of the elements (each matrix element) of the TWMs 1 to 7, the following formulas (6) to (9) are obtained. It becomes like this.
That is, in the t-th time warping matrix (time warping matrix starting at time t), the DTW distance elements _{(k, i) f t (} k, i) and. Note that t = (1, 2,..., N), i = (1, 2,..., M), and k = (1, 2,..., N−t + 1). In addition, regarding the items overlapping with the formulas (1) to (4), the description will be omitted as appropriate.

Ｄ（Ｘ［ｔ_ｓ：ｔ_ｅ］，Ｙ）＝ｆ_ｔｓ（ｔ_ｅ−ｔ_ｓ＋１，ｍ）＝ｍｉｎ（ｆ_ｔ（ｋ，ｍ））・・・（６）
ｆ_ｔ（ｋ，ｉ）＝||ｘ_{ｔ＋ｋ−１}−ｙ_ｉ||＋ｍｉｎ｛ｆ_ｔ（ｋ，ｉ−１），ｆ_ｔ（ｋ−１，ｉ），ｆ_ｔ（ｋ−１，ｉ−１）｝・・・（７）
ｆ_ｔ（０，０）＝０・・・（８）
ｆ_ｔ（ｋ，０）＝ｆ_ｔ（０，ｉ）＝∞・・・（９） _{_{_{_{D (X [t s: t}}}} e], Y) = f ts (t e -t s + 1, m) = min (f t (k, m)) ··· (6)
f _t (k, i) = || x _{t + k−1} −y _i || + min {f _t (k, i−1), f _t (k−1, i), f _t (k−1, i−) 1)} ... (7)
f _t (0,0) = 0 (8)
f _t (k, 0) = _ft (0, i) = ∞ (9)

式（６）は、ＤＴＷ距離の定義である。式（７）は、具体的な計算式である。式（８）および式（９）は、タイムワーピング行列における境界条件である。 Equation (6) is a definition of the DTW distance. Formula (7) is a specific calculation formula. Equations (8) and (9) are boundary conditions in the time warping matrix.

次に、図９を参照しながら、具体的な計算について説明する。
ＴＷＭ１〜７の各要素には、上記式（６）〜（９）にしたがって算出された値が格納されている。
ＴＷＭ２を例にとって説明すると、上記式（６）〜（９）から、まず、時刻ｔ＝２のとき、ｆ_２（１，１）＝（１２−１１）^２＋ｍｉｎ｛ｆ_２（１，０）＝∞，ｆ_２（０，１）＝∞，ｆ_２（０，０）＝０｝＝１＋０＝１である。
また、ｆ_２（２，１）＝（６−１１）^２＋ｍｉｎ｛ｆ_２（２，０）＝∞，ｆ_２（１，１）＝１，ｆ_２（１，０）＝∞｝＝２５＋１＝２６である。 Next, specific calculations will be described with reference to FIG.
Each element of TWM 1 to 7 stores a value calculated according to the above formulas (6) to (9).
Taking TWM2 as an example, from the above formulas (6) to (9), first, at time t = 2, f ₂ (1,1) = (12-11) ² + min {f ₂ (1,0) = ∞, f ₂ (0, 1) = ∞, f ₂ (0, 0) = 0} = 1 + 0 = 1.
Also, f ₂ (2,1) = (6-11) ² + min {f ₂ (2,0) = ∞, f ₂ (1,1) = 1, f ₂ (1,0) = ∞} = 25 + 1 = 26.

さらに、ｆ_２（１，２）＝（１２−６）^２＋ｍｉｎ｛ｆ_２（１，１）＝１，ｆ_２（０，２）＝∞，ｆ_２（０，１）＝∞｝＝３６＋１＝３７である。
また、ｆ_２（２，２）＝（６−６）^２＋ｍｉｎ｛ｆ_２（２，１）＝２６，ｆ_２（１，２）＝３７，ｆ_２（１，１）＝１｝＝０＋１＝１である。
以下、同様にして、各ＴＷＭ１〜７における各要素の値を算出することができる。 Further, f ₂ (1,2) = (12−6) ² + min {f ₂ (1,1) = 1, f ₂ (0,2) = ∞, f ₂ (0,1) = ∞} = 36 + 1 = 37.
Also, f ₂ (2,2) = (6-6) ² + min {f ₂ (2,1) = 26, f ₂ (1,2) = 37, f ₂ (1,1) = 1} = 0 + 1 = 1.
In the same manner, the value of each element in each TWM 1 to 7 can be calculated in the same manner.

そして、各ＴＷＭ１〜７のそれぞれの時刻において、それ以前のｙ_４のときの要素の値のうち最小の値がＤＴＷ距離となる。また、各ＴＷＭ１〜７における各ＤＴＷ距離の開始時刻（計算に使用したシーケンスＸの値のうち最初の値の時刻）は、各ＴＷＭ１〜７の番号（ＴＷＭ１なら「１」、ＴＷＭ３なら「３」）と一致している。 Then, at each time in each TWM1～7, the minimum value is DTW distance among the elements of the value when the previous y _4. Further, the start time of each DTW distance in each TWM 1 to 7 (the time of the first value among the values of the sequence X used for the calculation) is the number of each TWM 1 to 7 (“1” for TWM1 and “3” for TWM3). ).

このような状況で、シーケンスＸにおいて、シーケンスＹと類似性の高い部分シーケンスを検出する、すなわち、ＤＴＷ距離が「２０」以下になっているものを検出する。
まず、時刻ｔ＝１においては、タイムワーピング行列がＴＷＭ１しか存在せず、ＤＴＷ距離が「５４」なので、該当しない。 Under such circumstances, a partial sequence having a high similarity to the sequence Y is detected in the sequence X, that is, a DTW distance of “20” or less is detected.
First, at time t = 1, only TWM1 exists in the time warping matrix and the DTW distance is “54”, which is not applicable.

次に、時刻ｔ＝２において、ＴＷＭ１のＤＴＷ距離は「５４」、ＴＷＭ２のＤＴＷ距離は「１１０」なので、該当しない。
そして、時刻ｔ＝３において、ＴＷＭ１のＤＴＷ距離は「５０」、ＴＷＭ２のＤＴＷ距離は「１４」、ＴＷＭ３のＤＴＷ距離は「３８」であり、ＴＷＭ２が該当（「ＤＴＷ距離が「２０」以下」の条件を充足）するので、部分シーケンスＸ［２：３］を発見できたことになる。 Next, since the DTW distance of TWM1 is “54” and the DTW distance of TWM2 is “110” at time t = 2, this is not applicable.
At time t = 3, the DTW distance of TWM1 is “50”, the DTW distance of TWM2 is “14”, the DTW distance of TWM3 is “38”, and TWM2 is applicable (“DTW distance is“ 20 ”or less”). Therefore, the partial sequence X [2: 3] can be found.

このようにして、比較例によれば、シーケンスＸの長さがｎのときには、ｎ個のタイムワーピング行列を使用することで、目的の類似する部分シーケンスを発見することができる。
ただし、上記式（６）〜（９）と図９による具体例からもわかるように、各ＴＷＭ１〜７は、現在時刻およびその直前の時刻の２列分の情報のみをメモリに記憶させておけばよく、それよりも前の情報は逐次消去することができる。 Thus, according to the comparative example, when the length of the sequence X is n, the target similar partial sequence can be found by using n time warping matrices.
However, as can be seen from the above formulas (6) to (9) and the specific example shown in FIG. 9, each TWM 1 to 7 can store only the information for the two columns of the current time and the time immediately before in the memory. What is necessary is just to erase information before that.

そして、第１の方法によれば、ＤＴＷ距離が閾値ε以下になった場合に、該当する部分シーケンスを検出したことになる。しかし、利用者の都合などによっては、その直後に、より類似性の高い部分シーケンスが存在する場合には、最初の部分シーケンスは検出したことにせず、後のより類似性の高い部分シーケンスのみを検出したことにしたい場合も多い。 According to the first method, when the DTW distance is equal to or smaller than the threshold ε, the corresponding partial sequence is detected. However, depending on the convenience of the user, if there is a partial sequence with higher similarity immediately after that, the first partial sequence is not detected and only the subsequent partial sequence with higher similarity is detected. Often you want to detect.

（第２の方法）
そこで、第２の方法では、最初に類似する部分シーケンスを検出した後、その類似する部分シーケンスと時間帯が少しでも重複する部分シーケンスの中で、より類似性の高い（ＤＴＷ距離が小さい）ものがあるか否かを調べる。つまり、類似する部分シーケンスを検出した後、時間帯が少しでも重複する部分シーケンスの中で、より類似性の高いものがないことが確定して初めて、それを該当する部分シーケンス（以下、「最適な部分シーケンス」という。）と判断するのである。 (Second method)
Therefore, in the second method, after detecting a similar partial sequence first, among the partial sequences whose time zone overlaps with the similar partial sequence even a little, the one with higher similarity (small DTW distance) Find out if there is. In other words, after detecting a similar partial sequence, it is only determined that there is no more similar partial sequence that overlaps even a little in the time zone. It is called “a partial sequence”).

図９の例では、時刻ｔ＝３において、ＴＷＭ２のＤＴＷ距離が「１４」となり、まず、閾値ε以下という条件をクリアする部分シーケンスＸ［２：３］を発見できたことになる。この部分シーケンスは終了時刻がｔ＝３なので、開始時刻がそれよりも後であるＴＷＭ４〜７は考慮する必要がない。 In the example of FIG. 9, at time t = 3, the DTW distance of TWM2 is “14”, and first, the partial sequence X [2: 3] that clears the condition that the threshold value ε or less is found. Since this partial sequence has an end time t = 3, it is not necessary to consider TWMs 4 to 7 whose start time is later than that.

そこで、ＴＷＭ１〜３において、時刻ｔ＝３よりも後に、ＤＴＷ距離が「１４」よりも小さくなる可能性があるか否かが問題になる。まず、ＴＷＭ１では、時刻ｔ＝３のときの各要素の値（下から、「６２」，「３７」，「４６」，「５０」）がすでにいずれも「１４」よりも大きいので、その後、ＤＴＷ距離が「１４」よりも小さくなる可能性はない。同様に、ＴＷＭ３についても、時刻ｔ＝３よりも後に、ＤＴＷ距離が「１４」よりも小さくなる可能性はない。 Therefore, in TWM 1 to 3, it becomes a problem whether there is a possibility that the DTW distance may become smaller than “14” after time t = 3. First, in TWM1, since the value of each element at time t = 3 (from the bottom, “62”, “37”, “46”, “50”) is already larger than “14”, then, There is no possibility that the DTW distance is smaller than “14”. Similarly, for TWM3, there is no possibility that the DTW distance becomes smaller than “14” after time t = 3.

ただし、ＴＷＭ２では、時刻ｔ＝３のとき、ｆ_２（２，２）＝１、ｆ_２（２，３）＝１０であり、その後、ＤＴＷ距離が「１４」よりも小さくなる可能性がある。実際に、時刻ｔ＝５において、ＤＴＷ距離が「６」になっている。そして、その時刻ｔ＝５において、「６」以外の各要素の値（下から、「５２」，「１７」，「１１」）がいずれも「６」よりも大きいので、その後、ＤＴＷ距離が「６」よりも小さくなる可能性はない。 However, in TWM2, when time t = 3, f ₂ (2,2) = 1 and f ₂ (2,3) = 10, and thereafter, the DTW distance may be smaller than “14”. . Actually, at time t = 5, the DTW distance is “6”. At the time t = 5, the values of the elements other than “6” (from the bottom, “52”, “17”, “11”) are all larger than “6”. There is no possibility of being smaller than “6”.

しかし、ＤＴＷ距離が「６」になるのは部分シーケンスＸ［２：５］なので、今度はＴＷＭ４とＴＷＭ５も関係してくる。そこで、ＴＷＭ４を見ると、時刻ｔ＝５およびｔ＝６に「６」以下の要素、すなわち、それぞれｆ_４（２，２）＝１およびｆ_４（３，２）＝２の値が存在するが、時刻ｔ＝７には、「６」以下の値が存在しない。また、ＴＷＭ５においては、時刻ｔ＝５のときすでに、「６」以下の要素は存在しない。 However, since the DTW distance “6” is the partial sequence X [2: 5], TWM4 and TWM5 are also related this time. Therefore, when TWM4 is viewed, at time t = 5 and t = 6, there are elements of “6” or less, that is, values of f ₄ (2,2) = 1 and f ₄ (3,2) = 2, respectively. However, at time t = 7, there is no value less than “6”. Further, in TWM5, when time t = 5, there is no element equal to or less than “6”.

したがって、時刻ｔ＝７の時点で、ＤＴＷ距離「６」を与えるＴＷＭ２の時刻ｔ＝５における部分シーケンスＸ［２：５］が最適なシーケンスであることがわかる。
また、ＴＷＭにおいて、このＤＴＷ距離「６」を算出するためにたどってきたルート「ｆ_２（１，１）→ｆ_２（２，２）→ｆ_２（３，３）→ｆ_２（４，４）」がワーピングパスの一例である。 Therefore, it can be seen that the partial sequence X [2: 5] at the time t = 5 of the TWM 2 giving the DTW distance “6” is the optimum sequence at the time t = 7.
Also, in the TWM, the route “f ₂ (1,1) → f ₂ (2,2) → f ₂ (3,3) → f ₂ (4,4) that has been taken to calculate the DTW distance“ 6 ”. 4) "is an example of a warping path.

このように、比較例の第１の方法および第２の方法によれば、シーケンスＸの長さがｎのとき、ｎ個のタイムワーピング行列を使用する必要があり、計算にＯ（ｎｍ）の時間がかかってしまい、ｎの値が大きくなると計算時間が長くなりすぎて、実用的ではない。
そこで、以下、図１〜７を参照しながら、タイムワーピング行列が単一でよく、計算がＯ（ｍ）の時間で済み、計算の高速化と省メモリ化を実現することのできる本実施形態のデータストリーム監視装置について説明する。 Thus, according to the first method and the second method of the comparative example, when the length of the sequence X is n, it is necessary to use n time warping matrices, and O (nm) of the calculation is performed. It takes time, and if the value of n increases, the calculation time becomes too long, which is not practical.
Accordingly, in the following, referring to FIGS. 1 to 7, the present embodiment is capable of realizing a high-speed calculation and memory saving with a single time warping matrix and a calculation of O (m). The data stream monitoring apparatus will be described.

図１は、本実施形態のデータストリーム監視装置の構成図である。データストリーム監視装置１は、コンピュータ装置であり、入力部２、通信部３、記憶部４、出力部５、メモリ６および処理部７を備えている。 FIG. 1 is a configuration diagram of a data stream monitoring apparatus according to the present embodiment. The data stream monitoring device 1 is a computer device, and includes an input unit 2, a communication unit 3, a storage unit 4, an output unit 5, a memory 6, and a processing unit 7.

入力部２は、データ入力を行うものであり、たとえば、キーボードやマウスである。データストリーム監視装置１の使用者は、入力部２を使って、問合せシーケンスを入力することができる。
通信部３は、外部装置（不図示）やセンサ（不図示）からインターネットやＬＡＮなどを介してデータを受信するものであり、たとえば通信インターフェースである。通信部３は、外部装置（不図示）などからデータストリームを受信する。 The input unit 2 is for inputting data, and is, for example, a keyboard or a mouse. A user of the data stream monitoring apparatus 1 can input an inquiry sequence using the input unit 2.
The communication unit 3 receives data from an external device (not shown) or a sensor (not shown) via the Internet or a LAN, and is a communication interface, for example. The communication unit 3 receives a data stream from an external device (not shown) or the like.

記憶部４は、データを記憶するものであり、たとえば、ハードディスクである。記憶部４は、問合せシーケンス４１とタイムワーピングデータ４２を記憶している。また、記憶部４は、図示を省略しているが、データストリーム監視方法が記述されたプログラムを記憶している。
問合せシーケンス４１は、入力部２から入力された問合せシーケンスである。
タイムワーピングデータ４２は、タイムワーピング行列などのタイムワーピングの演算に必要なデータである。 The storage unit 4 stores data and is, for example, a hard disk. The storage unit 4 stores an inquiry sequence 41 and time warping data 42. Although not shown, the storage unit 4 stores a program describing a data stream monitoring method.
The inquiry sequence 41 is an inquiry sequence input from the input unit 2.
The time warping data 42 is data necessary for time warping calculation such as a time warping matrix.

出力部５は、データを出力するものであり、たとえば、ディスプレイやスピーカである。出力部５は、所定の条件を充足する部分シーケンスなどの検出データを出力する。
メモリ６は、処理部７の作業領域であり、たとえば、ＲＡＭ（Random Access Memory）である。 The output unit 5 outputs data, and is, for example, a display or a speaker. The output unit 5 outputs detection data such as a partial sequence that satisfies a predetermined condition.
The memory 6 is a work area of the processing unit 7 and is, for example, a RAM (Random Access Memory).

処理部７は、各種演算処理を行うものであり、たとえば、CPU（Central Processing Unit）である。処理部７は、入力部２が受け付けた問合せシーケンスを記憶部４に問合せシーケンス４１として記憶したり、通信部３が受信したデータストリームに関して記憶部４のタイムワーピングデータ４２を使用して問合せシーケンス４１と類似性の高い部分シーケンスを検出したり、その部分シーケンスを出力部５から出力したりする。 The processing unit 7 performs various arithmetic processes, and is, for example, a CPU (Central Processing Unit). The processing unit 7 stores the query sequence received by the input unit 2 in the storage unit 4 as the query sequence 41 or uses the time warping data 42 of the storage unit 4 for the data stream received by the communication unit 3. The partial sequence having high similarity is detected, or the partial sequence is output from the output unit 5.

次に、図２を参照しながら、本実施形態のデータストリーム監視装置による問合せ処理のイメージについて説明する。図２は、本実施形態のデータストリーム監視装置による問合せ処理のイメージを示した図である。 Next, an image of inquiry processing by the data stream monitoring apparatus of this embodiment will be described with reference to FIG. FIG. 2 is a diagram showing an image of inquiry processing by the data stream monitoring apparatus of the present embodiment.

図２に示すように、データストリーム監視装置１による問合せ処理では、比較例の場合（図８参照）と異なり、単一のＴＷＭを使用して、シーケンスＸから、問合せシーケンスＹ’（問合せシーケンスＹを少し変形したもの。詳細は後記）と類似する部分シーケンスを検出することができる。
図２では、該当する部分シーケンスとして、Ｘ［ｔ_ｓ：ｔ_ｅ］（Ｘ_１）がまず検出され、その後に、Ｘ_２が検出されている。 As shown in FIG. 2, in the query processing by the data stream monitoring apparatus 1, unlike the comparative example (see FIG. 8), a single TWM is used to change the query sequence Y ′ (query sequence Y ′) from the sequence X. A partial sequence similar to that described later can be detected.
In Figure 2, a corresponding portion _{_{sequence, X [t s: t e}} ] (X 1) is first detected, then, _{X 2} is detected.

なお、以下、長さｍのシーケンスＹ＝（ｙ_１，ｙ_２，・・・，ｙ_ｍ）に対して、先頭にｙ_０を付加し、Ｙ’＝（ｙ_０，ｙ_１，ｙ_２，・・・，ｙ_ｍ）とする。このとき、ｙ_０は、（−∞：∞）の値をとることができ、また、シーケンスＸに関して時刻ｔのときに受信してこれからタイムワーピング行列の演算に使用するデータをｘ_ｔとすると、ｙ_０＝ｘ_ｔである。
そして、そのＹ’と、長さｎのシーケンスＸ＝（ｘ_１，ｘ_２，・・・，ｘ_ｎ）の間でシーケンスマッチングを行うことになり、そのシーケンスマッチングは次の式（１０）〜（１４）により実現することができる。なお、ｔ＝（１，２，・・・，ｎ）、ｉ＝（１，２，・・・，ｍ）とする。また、式（６）〜（９）と重複する事項に関しては、説明を適宜省略する。 In the following description, y ₀ is added to the beginning of a sequence Y = (y ₁ , y ₂ ,..., Y _m ) of length m, and Y ′ = (y ₀ , y ₁ , y ₂ , , Y _m ). At this time, y ₀ can take a value of (−∞: ∞), and x _t is the data received at time t with respect to the sequence X and used for the calculation of the time warping matrix from now on. y ₀ = x _t .
Then, sequence matching is performed between the Y ′ and the sequence X = (x ₁ , x ₂ ,..., X _n ) of length n, and the sequence matching is expressed by the following formula (10) to It can be realized by (14). Note that t = (1, 2,..., N) and i = (1, 2,..., M). In addition, regarding items that overlap with the formulas (6) to (9), description will be omitted as appropriate.

Ｄ（Ｘ［ｔ_ｓ：ｔ_ｅ］，Ｙ）＝ｄ（ｔ_ｅ，ｍ）＝ｍｉｎ（ｄ（ｔ，ｍ））・・・（１０）
ｄ（ｔ，ｉ）＝||ｘ_ｔ−ｙ_ｉ||＋ｄ_ｂｅｓｔ・・・（１１）
ｄ_ｂｅｓｔ＝ｍｉｎ｛ｄ（ｔ，ｉ−１），ｄ（ｔ−１，ｉ），ｄ（ｔ−１，ｉ−１）｝・・・（１２）
ｄ（ｔ，０）＝ｄ（０，０）＝０・・・（１３）
ｄ（０，ｉ）＝∞・・・（１４） _{_{_{D (X [t s: t}}} e], Y) = d (t e, m) = min (d (t, m)) ··· (10)
d (t, i) = || x _t −y _i || + d _best (11)
d _best = min {d (t, i-1), d (t-1, i), d (t-1, i-1)} (12)
d (t, 0) = d (0,0) = 0 (13)
d (0, i) = ∞ (14)

上記の式（１０）〜（１４）のうち、式（１３）が本実施形態のポイントの１つである。この式（１３）によって、シーケンスＹをシーケンスＹ’に変更したことと同じ効果が得られる。
すなわち、データストリーム監視装置１の処理部７は、ある時刻にデータストリーム中の１つのデータを受信した場合、問合せシーケンスYの先頭にその１つのデータと同じデータを付加してシーケンスY’としてから、そのシーケンスY’に関してタイムワーピング行列を使ってダイナミックタイムワーピング距離を算出し、そのダイナミックタイムワーピング距離が閾値以下のときに、そのダイナミックタイムワーピング距離に対応する部分シーケンスを類似する部分シーケンスとして検出することができるが、式（１３）によって、シーケンスＹをシーケンスＹ’に変更したことと同じ効果が得られるのである。 Of the above formulas (10) to (14), formula (13) is one of the points of this embodiment. This expression (13) provides the same effect as changing the sequence Y to the sequence Y ′.
That is, when the processing unit 7 of the data stream monitoring apparatus 1 receives one data in the data stream at a certain time, the processing unit 7 adds the same data as the one data to the head of the inquiry sequence Y to form the sequence Y ′. , Calculate a dynamic time warping distance using the time warping matrix for the sequence Y ′, and detect a partial sequence corresponding to the dynamic time warping distance as a similar partial sequence when the dynamic time warping distance is equal to or less than a threshold value. However, the same effect as changing the sequence Y to the sequence Y ′ can be obtained by the equation (13).

また、本実施形態のタイムワーピング行列の各要素は、距離ｄ（ｔ，ｉ）だけでなく、次の式（１５）で求める、各距離ｄ（ｔ，ｉ）の開始時刻に関する情報ｓ（ｔ，ｉ）（開始時刻がｔ＝１なら「１」、開始時刻がｔ＝３なら「３」）も保持する。 In addition, each element of the time warping matrix of the present embodiment is not only the distance d (t, i), but also information s (t I) (“1” if the start time is t = 1, “3” if the start time is t = 3).

＜ｉ≧２の場合＞
ｓ（ｔ，ｉ）＝ｓ（ｔ，ｉ−１）（ｄ_ｂｅｓｔ＝ｄ（ｔ，ｉ−１）のとき）
ｓ（ｔ−１，ｉ）（ｄ_ｂｅｓｔ＝ｄ（ｔ−１，ｉ）のとき）
ｓ（ｔ−１，ｉ−１）（ｄ_ｂｅｓｔ＝ｄ（ｔ−１，ｉ−１）のとき）
＜ｉ＝１の場合＞
ｓ（ｔ，ｉ）＝ｓ（ｔ，１）＝ｔ・・・（１５） <If i ≧ 2>
s (t, i) = s (t, i−1) (when d _best = d (t, i−1))
s (t-1, i) (when d _best = d (t-1, i))
s (t-1, i-1) (when d _best = d (t-1, i-1))
<When i = 1>
s (t, i) = s (t, 1) = t (15)

また、Ｄ（Ｘ［ｔ_ｓ：ｔ_ｅ］，Ｙ）の開始時刻ｔ_ｓは、次の式（１６）により得られる。
ｔ_ｓ＝ｓ（ｔ_ｅ，ｍ）・・・（１６）
最適なワーピングパスはこのタイムワーピング行列による距離計算から求めることができ、検出された類似する部分シーケンスの開始時刻は、そのワーピングパス上で引き継がれていくことで、特定することができる。 _{_{Also, D (X [t s:}} t e], Y) starting time _{t s} of is obtained by the following equation (16).
t _s = s (t _e , m) (16)
The optimum warping path can be obtained from the distance calculation by this time warping matrix, and the start time of the detected similar partial sequence can be specified by being taken over on the warping path.

次に、図３を参照しながら、本実施形態のタイムワーピング行列の具体例について説明する。図３は、本実施形態のタイムワーピング行列の具体例を示した図である。なお、図１に示したハードウェアであるデータストリーム監視装置１の動作に関する説明は、図４および図５とともに後記する。 Next, a specific example of the time warping matrix of the present embodiment will be described with reference to FIG. FIG. 3 is a diagram showing a specific example of the time warping matrix of the present embodiment. A description of the operation of the data stream monitoring apparatus 1 that is the hardware shown in FIG. 1 will be given later with reference to FIGS.

図３に示すように、タイムワーピング行列ＴＷＭは、図９の場合と同様、時間経過とともに伸張するシーケンスＸにおいて、シーケンスＹ＝（１１，６，９，４）と高い類似性を有する部分シーケンスを検出することを目的とし、ここでは、シーケンスＸ＝（５，１２，６，１０，６，５，１）である。なお、閾値ε＝２０とする。 As shown in FIG. 3, the time warping matrix TWM is a partial sequence having a high similarity with sequence Y = (11, 6, 9, 4) in sequence X that expands with time, as in FIG. The sequence is X = (5, 12, 6, 10, 6, 5, 1). Note that the threshold ε = 20.

具体的な計算について説明すると、上記式（１０）〜（１４）から、ｄ（１，１）＝（５−１１）^２＋ｍｉｎ｛ｄ（１，０）＝０，ｄ（０，１）＝∞，ｄ（０，０）＝０｝＝３６＋０＝３６である。また、このとき、式（１５）から、開始時刻はｓ（１，１）＝１であり、図３のＴＷＭにおいては、「３６」の下に「（１）」と記載している。 A specific calculation will be described. From the above formulas (10) to (14), d (1,1) = (5-11) ² + min {d (1,0) = 0, d (0,1) = ∞, d (0, 0) = 0} = 36 + 0 = 36. At this time, the start time is s (1, 1) = 1 from the equation (15), and “(1)” is described below “36” in the TWM of FIG.

また、ｄ（２，１）＝（１２−１１）^２＋ｍｉｎ｛ｄ（２，０）＝０，ｄ（１，１）＝３６，ｄ（１，０）＝０｝＝１＋０＝１である。また、このとき、開始時刻はｓ（２，１）＝２である。
比較例のＴＷＭ１（図９（１）参照）の場合と異なり、このＴＷＭではｄ（２，１）＝１となっているのが、本実施形態のポイントの１つである（図９のＴＷＭ１における該当箇所は「３７」）。詳細は後記するが、このように、図３のＴＷＭの最下行の各要素がそのすぐ左の要素の値と関係なく独立に算出されている、すなわち、前記したシーケンスＹ＝（ｙ_１，ｙ_２，・・・，ｙ_ｍ）に対し先頭にｙ_０を付加してＹ’＝（ｙ_０，ｙ_１，ｙ_２，・・・，ｙ_ｍ）としたことと同様の効果を式（１３）によって実現したことにより、タイムワーピング行列が単一で済むのである。 Further, d (2,1) = (12-11) ² + min {d (2,0) = 0, d (1,1) = 36, d (1,0) = 0} = 1 + 0 = 1. . At this time, the start time is s (2,1) = 2.
Unlike the case of the TWM1 of the comparative example (see FIG. 9 (1)), d (2,1) = 1 is one of the points of the present embodiment in this TWM (TWM1 in FIG. 9). The corresponding part in "37"). Although details will be described later, in this way, each element in the bottom row of the TWM in FIG. 3 is independently calculated regardless of the value of the element immediately to the left, that is, the sequence Y = (y ₁ , y ₂ ,..., Y _m ), y ₀ is added to the head, and Y ′ = (y ₀ , y ₁ , y ₂ ,..., Y _m ) is obtained. ) Realizes a single time warping matrix.

同様にして、式（１０）〜（１５）を使用することで、ＴＷＭの各要素の値と各開始時刻の値を算出することができる。
以下、第１の方法と同様、閾値ε以下のＤＴＷ距離が求まったときにそれを類似する部分シーケンスとして検出する方法を第３の方法という。また、第２の方法と同様、最初に類似する部分シーケンスを検出した後、時間帯が少しでもその類似する部分シーケンスと重複するそれ以降の部分シーケンスの中で、より類似性の高いものがないことが確定してから、それを最適な部分シーケンスと判断する方法を第４の方法という。 Similarly, by using the equations (10) to (15), the value of each element of the TWM and the value of each start time can be calculated.
Hereinafter, similarly to the first method, when a DTW distance equal to or less than the threshold ε is obtained, a method of detecting it as a similar partial sequence is referred to as a third method. Similarly to the second method, after detecting a similar partial sequence first, there is no higher similarity among the subsequent partial sequences that overlap with the similar partial sequence even if the time zone is a little. A method of determining that the partial sequence is the optimum partial sequence after this is determined is referred to as a fourth method.

第３の方法では、時刻ｔ＝３のとき、ＤＴＷ距離が「１４」であることにより、類似する部分シーケンスとしてＸ［２：３］を検出する。なお、開始時刻ｔ＝２はｄ（３，４）＝１４の下の「（２）」からわかり、また、終了時刻ｔ＝３はそのときの時刻からわかる。 In the third method, at time t = 3, X [2: 3] is detected as a similar partial sequence because the DTW distance is “14”. The start time t = 2 is known from “(2)” under d (3,4) = 14, and the end time t = 3 is known from the time at that time.

この検出結果は、比較例による検出結果（図９（２）参照）と同じである。つまり、本実施形態のデータストリーム監視装置１によれば、単一のタイムワーピング行列により、比較例の場合よりも少ない計算量および少ないメモリ使用量で、比較例と同等の検出結果を得ることができる。 This detection result is the same as the detection result of the comparative example (see FIG. 9 (2)). That is, according to the data stream monitoring apparatus 1 of the present embodiment, a detection result equivalent to that of the comparative example can be obtained by a single time warping matrix with a smaller calculation amount and a smaller memory usage than in the comparative example. it can.

また、第４の方法では、類似する部分シーケンスＸ［２：３］を検出した後、さらに、その類似する部分シーケンスと少しでも時間帯が重複するそれ以降の部分シーケンスの中で、より類似性の高いものがあるか否かを判断する。
まず、左から３列目のｄ（３，１）、ｄ（３，２）およびｄ（３，３）の値がいずれも「１４」より大きければ、開始時刻がｔ＝３以前でＤＴＷ距離が「１４」以下になるものが存在する可能性がなくなるので、部分シーケンスＸ［２：３］が最適な部分シーケンスと判断できる。 Further, in the fourth method, after detecting a similar partial sequence X [2: 3], the similarity is further increased in the subsequent partial sequences whose time zone overlaps with the similar partial sequence as much as possible. It is determined whether there is a high one.
First, if the values of d (3,1), d (3,2), and d (3,3) in the third column from the left are all greater than “14”, the DTW distance is equal to or earlier than t = 3. Therefore, the partial sequence X [2: 3] can be determined as the optimum partial sequence.

しかし、ここでは、ｄ（３，２）＝１、ｄ（３，３）＝１０なので、時刻ｔ＝３の時点で部分シーケンスＸ［２：３］が最適とは判断できない。
次に、時刻ｔ＝４に移り、その列の各要素の全てについて、「値が『１４』よりも大きい」あるいは「開始時刻がｔ＝４以降である」、という条件を満たすか否かを判断する。この条件を満たせば、部分シーケンスＸ［２：３］が最適であると判断できる。
ここでは、ｄ（４，３）＝２（開始時刻ｔ＝２）がこの条件を満たさないため、部分シーケンスＸ［２：３］が最適であると判断できない。 However, since d (3,2) = 1 and d (3,3) = 10 here, the partial sequence X [2: 3] cannot be determined to be optimal at the time t = 3.
Next, at time t = 4, it is determined whether or not the condition “value is greater than“ 14 ”” or “start time is after t = 4” is satisfied for all the elements in the column. to decide. If this condition is satisfied, it can be determined that the partial sequence X [2: 3] is optimal.
Here, since d (4,3) = 2 (start time t = 2) does not satisfy this condition, it cannot be determined that the partial sequence X [2: 3] is optimal.

続いて、時刻ｔ＝５に移り、ＤＴＷ距離が「６」なので、最適な部分シーケンスの候補をＸ［２：５］に変更する。
次に、時刻ｔ＝６に移り、その列の各要素の全てについて、「値が『６』よりも大きい」あるいは「開始時刻がｔ＝６以降である」、という条件を満たすか否かを判断する。
ここでは、ｄ（６，２）＝２（開始時刻ｔ＝４）がこの条件を満たさないため、部分シーケンスＸ［２：５］が最適であると判断できない。 Subsequently, at time t = 5, since the DTW distance is “6”, the optimum partial sequence candidate is changed to X [2: 5].
Next, at time t = 6, it is determined whether or not the condition “value is greater than“ 6 ”” or “start time is after t = 6” is satisfied for all the elements in the column. to decide.
Here, since d (6,2) = 2 (start time t = 4) does not satisfy this condition, it cannot be determined that the partial sequence X [2: 5] is optimal.

次に、時刻ｔ＝７に移り、その列の各要素の全てについて、「値が『６』よりも大きい」あるいは「開始時刻がｔ＝６以降である」、という条件を満たすか否かを判断する。
ここでは、その条件を満たすので、部分シーケンスＸ［２：５］が最適であると判断できる。 Next, at time t = 7, it is determined whether or not the condition “value is greater than“ 6 ”” or “start time is after t = 6” is satisfied for all the elements in the column. to decide.
Here, since the condition is satisfied, it can be determined that the partial sequence X [2: 5] is optimal.

（第３の方法）
次に、図４を参照しながら、第３の方法を実行する場合のデータストリーム監視装置の動作について説明する。図４は、第３の方法を実行する場合のデータストリーム監視装置の動作を示したフローチャートである。なお、ｄ（ｔ，ｉ）（式（１１）参照）をｄ_ｉ、ｄ（ｔ−１，ｉ）をｄ_ｉ’と表記する。また、ｓ（ｔ，ｉ）（式（１５）参照）をｓ_ｉ、ｓ（ｔ−１，ｉ）をｓ_ｉ’と表記する。 (Third method)
Next, the operation of the data stream monitoring apparatus when the third method is executed will be described with reference to FIG. FIG. 4 is a flowchart showing the operation of the data stream monitoring apparatus when the third method is executed. In addition, d (t, i) (refer to Formula (11)) is expressed as d _i , and d (t−1, i) is expressed as d _i ′. In addition, s (t, i) (see Expression (15)) is expressed as s _i , and s (t−1, i) is expressed as s _i ′.

まず、利用者は、データストリーム監視装置１において、入力部２から問合せシーケンスを入力し、入力された問合せシーケンスは記憶部４に問合せシーケンス４１として記憶される。また、記憶部４には、単一のタイムワーピング行列（図３参照）や、所定の閾値εなどのタイムワーピングデータ４２が記憶されている。
そして、通信部３は、外部装置（不図示）からデータストリームの受信を開始する。なお、以下、処理部７は、通信部３や記憶部４から受け取ったデータをメモリ６に展開して処理するが、メモリ６に展開する旨の記載は省略する。 First, the user inputs an inquiry sequence from the input unit 2 in the data stream monitoring apparatus 1, and the input inquiry sequence is stored in the storage unit 4 as an inquiry sequence 41. The storage unit 4 stores a single time warping matrix (see FIG. 3) and time warping data 42 such as a predetermined threshold ε.
And the communication part 3 starts reception of a data stream from an external device (not shown). Hereinafter, the processing unit 7 develops and processes the data received from the communication unit 3 and the storage unit 4 in the memory 6, but the description of developing in the memory 6 is omitted.

処理部７は、まず、時刻ｔ＝１として処理を開始し（ステップＳ４０１）、次に、そのときの時刻ｔ（ここでは「１」）において、ｘ_ｔを受信する（ステップＳ４０２）。
続いて、処理部７は、式（１０）〜（１５）に基づいて、全てのｄ_ｉとｓ_ｉを計算し（ステップＳ４０３）、タイムワーピング行列を更新する。このとき、タイムワーピング行列における時刻「ｔ−２」以降のデータは、削除してもよい（ｔ≧３の場合）。 Processing unit 7 first starts processing as a time t = 1 (step S401), then, at the time at that time t ( "1" in this case), receives the _{x t} (step S402).
Subsequently, the processing unit 7 calculates all d _i and s _i based on the equations (10) to (15) (step S403), and updates the time warping matrix. At this time, data after time “t−2” in the time warping matrix may be deleted (when t ≧ 3).

次に、処理部７は、ｄ_ｍ、つまり、時刻ｔにおけるＤＴＷ距離が閾値ε以下であるか否かを判断する（ステップＳ４０４）。
時刻ｔにおけるＤＴＷ距離が閾値ε以下でなかった場合（ステップＳ４０４でＮｏ）、処理部７は、ステップＳ４０８に進む。 Then, the processing unit 7, _{d m,} i.e., DTW distance at time t is equal to or less than the threshold value epsilon (step S404).
If the DTW distance at time t is not less than or equal to the threshold ε (No in step S404), the processing unit 7 proceeds to step S408.

時刻ｔにおけるＤＴＷ距離が閾値ε以下であった場合（ステップＳ４０４でＹｅｓ）、処理部７は、そのＤＴＷ距離の開始時刻ｓ_ｍを開始時刻の変数ｔ_ｓに代入し、そのときの時刻ｔを終了時刻の変数ｔ_ｅに代入する（ステップＳ４０５）。
続いて、処理部７は、ＤＴＷ距離ｄ_ｍ、その開始時刻ｔ_ｓ、および、その終了時刻ｔ_ｅを出力部５に出力する（ステップＳ４０６）。 If DTW distance at time t is equal to or less than the threshold value epsilon (Yes in step S404), the processing unit 7 substitutes starting time _{s m} of the DTW distance variable _{t s} start time, the time t at that time It is assigned to the variable _{t e} of the end time (step S405).
Subsequently, the processing unit 7, DTW distance _{d m,} the starting time _{t s,} and outputs the end time _{t e} to the output unit 5 (step S406).

利用者は、出力部５から出力されたそれらのデータを見て、データストリーム中に、問合せシーケンスと類似する部分シーケンスがあったことを知ることができる。
次に、処理部７は、以降の処理のために、全てのｄ_ｉの値を初期化する、つまり、ｄ_ｉに∞（プログラムを作成する際はεなどと比較して充分大きな値を使用。以下同様）を代入する（ステップＳ４０７）。
続いて、処理部７は、次の時刻の処理に移るために、全てのｄ_ｉとｓ_ｉの値を、それぞれ、１つ前の時刻の値の変数であるｄ_ｉ’とｓ_ｉ’に代入し（ステップＳ４０８）、「ｔ＝ｔ＋１」として（ステップＳ４０９）、ステップＳ４０２に戻り、処理を繰り返す。 The user can know that there is a partial sequence similar to the query sequence in the data stream by looking at the data output from the output unit 5.
Next, processing unit 7 for further processing, initializing the values of all d _i, i.e., use a sufficiently large value as compared to like ε is when creating ∞ (program d _i The same applies to the following (step S407).
Subsequently, the processing unit 7, in order to move to the processing of the next time, the value of all the d _i and s _i, respectively, to d _{i 'and} s _i' is a variable of one previous value of the time Substitution is performed (step S408), "t = t + 1" is set (step S409), the process returns to step S402, and the process is repeated.

このようにして、本実施形態のデータストリーム監視装置１によれば、単一のタイムワーピング行列を使用することで、データストリームが長くなっても、計算量やメモリの使用量を一定に保つことができ、それでいて比較例と同様の結果を得ることができる。
つまり、本実施形態のデータストリーム監視装置１によれば、たとえば、図３の例において、時刻ｔ＝３の時点で、ＤＴＷ距離「１４」、その開始時刻「２」、および、その終了時刻「３」を出力部５から出力することができる。 Thus, according to the data stream monitoring apparatus 1 of the present embodiment, by using a single time warping matrix, the calculation amount and the memory usage amount can be kept constant even when the data stream becomes long. However, the same result as the comparative example can be obtained.
That is, according to the data stream monitoring apparatus 1 of the present embodiment, for example, in the example of FIG. 3, at the time t = 3, the DTW distance “14”, its start time “2”, and its end time “ 3 ”can be output from the output unit 5.

（第４の方法）
次に、図５を参照しながら、第４の方法を実行する場合のデータストリーム監視装置の動作について説明する。図５は、第４の方法を実行する場合のデータストリーム監視装置の動作を示したフローチャートである。なお、図４の場合と重複する説明は、適宜省略する。また、図４の場合と同様、データストリーム監視装置１において、記憶部４には、問合せシーケンス４１と、単一のタイムワーピング行列（図３参照）や、所定の閾値εなどのタイムワーピングデータ４２が記憶されている。さらに、最適な部分シーケンスの候補を入力する変数としてｄ_ｍｉｎを使用する。
そして、通信部３は、外部装置（不図示）からデータストリームの受信を開始する。 (Fourth method)
Next, the operation of the data stream monitoring apparatus when the fourth method is executed will be described with reference to FIG. FIG. 5 is a flowchart showing the operation of the data stream monitoring apparatus when the fourth method is executed. In addition, the description which overlaps with the case of FIG. 4 is abbreviate | omitted suitably. As in the case of FIG. 4, in the data stream monitoring apparatus 1, the storage unit 4 stores an inquiry sequence 41, a single time warping matrix (see FIG. 3), and time warping data 42 such as a predetermined threshold ε. Is remembered. Further, d _min is used as a variable for inputting an optimum partial sequence candidate.
And the communication part 3 starts reception of a data stream from an external device (not shown).

処理部７は、まず、時刻ｔ＝１として処理を開始し（ステップＳ５０１）、次に、そのときの時刻ｔ（ここでは「１」）において、ｘ_ｔを受信する（ステップＳ５０２）。
続いて、処理部７は、式（１０）〜（１５）に基づいて、全てのｄ_ｉとｓ_ｉを計算し（ステップＳ５０３）、タイムワーピング行列を更新する。このとき、タイムワーピング行列における時刻「ｔ−２」以降のデータは、削除してもよい（ｔ≧３の場合）。 Processing unit 7 first starts processing as a time t = 1 (step S501), then, at the time at that time t ( "1" in this case), receives the _{x t} (step S502).
Subsequently, the processing unit 7 calculates all d _i and s _i based on the equations (10) to (15) (step S503), and updates the time warping matrix. At this time, data after time “t−2” in the time warping matrix may be deleted (when t ≧ 3).

次に、処理部７は、ｄ_ｍｉｎ、つまり、時刻ｔにおけるＤＴＷ距離が閾値ε以下であるか否かを判断する（ステップＳ５０４）。
時刻ｔにおけるＤＴＷ距離が閾値ε以下でなかった場合（ステップＳ５０４でＮｏ）、処理部７は、ステップＳ５１４で、「ｄ_ｍｉｎ≦ε」かつ「ｄ_ｍ＜ｄ_ｍｉｎ」、を満たすか否かを判断するが、ステップＳ５１４では「ｄ_ｍｉｎ≦ε」を満たさないので（Ｎｏ）、さらに、ステップＳ５１６に進む。 Next, the processing unit 7 determines whether or not d _min , that is, whether the DTW distance at time t is equal to or less than the threshold ε (step S504).
If DTW distance at time t is not less than the threshold value epsilon (No at step S504), the processing unit 7, in step S514, whether or not meet, _{"d min} ≦ epsilon" and _"d m _{<d min"} Although it is determined, since “d _min ≦ ε” is not satisfied in step S514 (No), the process proceeds to step S516.

時刻ｔにおけるＤＴＷ距離ｄ_ｍｉｎが閾値ε以下であった場合（ステップＳ５０４でＹｅｓ）、処理部７は、そのＤＴＷ距離ｄ_ｍｉｎを与える部分シーケンスが最適か否か、つまり、その部分シーケンスよりも後のある時刻における全ての要素（「ｄ_ｉ」と「ｓ_ｉ」）について、「値ｄ_ｉがｄ_ｍｉｎよりも大きい」あるいは「開始時刻ｓ_ｉがｔ_ｅよりも後である」、という条件を満たすか否かを判断する（ステップＳ５０５）。この条件を満たせば、そのＤＴＷ距離ｄ_ｍｉｎを与える部分シーケンスが最適であると判断できるからである。 When the DTW distance d _{min at} time t is equal to or smaller than the threshold ε (Yes in step S504), the processing unit 7 determines whether or not the partial sequence that provides the DTW distance d _min is optimal, that is, after the partial sequence. of all the elements at a certain time (the _{"d i",} _{"s i"),} "the value _{d i} is greater than _{d min"} or "start time _{s i} is later than _{t e",} a condition that It is determined whether or not it is satisfied (step S505). This is because, if this condition is satisfied, it can be determined that the partial sequence that provides the DTW distance d _min is optimal.

すなわち、処理部７は、ＤＴＷ距離ｄ_ｍｉｎが閾値ε以下の部分シーケンスを検出した場合、その後、さらに、タイムワーピング行列の各行列要素の値の算出を行い、その部分シーケンスと時間帯が一部でも重複する部分シーケンスの中で、ＤＴＷ距離がより小さくなるものが存在しえないと判断したときに、その部分シーケンスを最適な（より適切な）部分シーケンスとして検出することができる。 That is, when the processing unit 7 detects a partial sequence whose DTW distance d _min is equal to or less than the threshold value ε, the processing unit 7 further calculates the value of each matrix element of the time warping matrix, and the partial sequence and the time zone are partially However, when it is determined that there is no overlapping partial sequence having a smaller DTW distance, the partial sequence can be detected as an optimal (more appropriate) partial sequence.

ステップＳ５０５の条件を満たさない場合（Ｎｏ）、処理部７は、ステップＳ５１４で、「ｄ_ｍｉｎ≦ε」かつ「ｄ_ｍ＜ｄ_ｍｉｎ」、を満たすか否かを判断する。
ここで、さらなる最適な部分シーケンスの候補があればステップＳ５１４の条件を満たし（Ｙｅｓ）、処理部７は、ＤＴＷ距離ｄ_ｍを変数ｄ_ｍｉｎに、その開始時刻ｓ_ｍを変数ｔ_ｓに、その終了時刻ｔを変数ｔ_ｅに、それぞれ代入する（ステップＳ５１５）。 If the condition is not satisfied in step S505 (No), the processing unit 7, in step S514, it is determined whether meet, _{"d min} ≦ epsilon" and _"d m _{<d min".}
Here, satisfies the condition of step S514 if there is a candidate for further optimal subsequence satisfied (Yes), then the process unit 7, the DTW distance _{d m} in the variable _{d min,} the starting time _{s m} in the variable _{t s,} the end time t to the variable _{t e,} the values are (step S515).

ステップＳ５０５の条件を満たす場合（Ｙｅｓ）、そのＤＴＷ距離ｄ_ｍｉｎを与える部分シーケンスが最適であると判断できるので、処理部７は、そのＤＴＷ距離の開始時刻ｓ_ｍを開始時刻の変数ｔ_ｓに代入し、そのときの時刻ｔを終了時刻の変数ｔ_ｅに代入してから、そのＤＴＷ距離ｄ_ｍｉｎ、その開始時刻ｔ_ｓ、および、その終了時刻ｔ_ｅを出力部５に出力する（ステップＳ５０６）。
利用者は、出力部５から出力されたそれらのデータを見て、データストリーム中に存在する、問合せシーケンスと類似する部分シーケンスのうち、最適な部分シーケンスを知ることができる。 Satisfies case of step S505 (Yes), since the subsequence gives the DTW distance _{d min} can be determined to be optimal, the processing unit 7, the variable _{t s} start time start time _{s m} of the DTW distance assignment and, after substituting the time t at that time in the variable _{t e} end time, the DTW distance _{d min,} the starting time _{t s,} and outputs the end time _{t e} to the output unit 5 (step S506 ).
The user can know the optimum partial sequence among the partial sequences similar to the query sequence existing in the data stream by looking at the data output from the output unit 5.

次に、処理部７は、以降の処理のために、ｄ_ｍｉｎに∞を代入し（ステップＳ５０７）、ｉに「０」を代入し（ステップＳ５０８）、さらに、ステップＳ５０９〜Ｓ５１３の処理を行う。
ステップＳ５０９〜Ｓ５１３の処理では、処理部７は、「１」から「ｍ」までの「ｉ」の値に関して（ステップＳ５１０とステップＳ５１３参照）、「ｓ_ｉ≦ｔ_ｅ」を満たせば（ステップＳ５１１でＹｅｓ）、「ｄ_ｉ」に「∞」を代入する（ステップＳ５１２）。ここでは、ステップＳ５０６で出力した最適な部分シーケンスと少しでも時間帯が重複する部分シーケンスがこれ以降にステップＳ５０６で出力されてしまう可能性をなくすために、その時刻における各要素のうちステップＳ５０６で出力した最適な部分シーケンスの終了時刻ｔ_ｅ以前のものの値を無限大にしている。 Next, for the subsequent processing, the processing unit 7 substitutes ∞ for d _min (step S507), substitutes “0” for i (step S508), and further performs the processing of steps S509 to S513. .
In the process of step S509～S513, processing unit 7 with respect the value of "i" from "1" to "m" (see step S510 and step S513), satisfies the _{"s i} ≦ _{t e"} (Step S511 Yes), “∞” is substituted into “d _i ” (step S512). Here, in order to eliminate the possibility that a partial sequence whose time zone overlaps with the optimum partial sequence output in step S506 even if it is a little later will be output in step S506, in step S506 among the elements at that time. is the value of the end time t _e previous ones of output the best part sequence to infinity.

続いて、処理部７は、次の時刻の処理に移るために、全てのｄ_ｉとｓ_ｉの値を、それぞれ、１つ前の時刻の値の変数であるｄ_ｉ’とｓ_ｉ’に代入し（ステップＳ５１６）、「ｔ＝ｔ＋１」として（ステップＳ５１７）、ステップＳ５０２に戻り、処理を繰り返す。 Subsequently, the processing unit 7, in order to move to the processing of the next time, the value of all the d _i and s _i, respectively, to d _{i 'and} s _i' is a variable of one previous value of the time Substitution is performed (step S516), "t = t + 1" is set (step S517), the process returns to step S502, and the process is repeated.

このようにして、本実施形態のデータストリーム監視装置１によれば、使用するタイムワーピング行列が単一であっても、最適な部分シーケンスを出力することができる。
つまり、本実施形態のデータストリーム監視装置１によれば、たとえば、図３の例において、時刻ｔ＝３の時点のＤＴＷ距離「１４」ではなく、時刻ｔ＝５の時点のＤＴＷ距離「６」（および、その開始時刻と終了時刻）を、最適な部分シーケンスのＤＴＷ距離として、出力部５から出力することができる。
この第４の方法によれば、そのアルゴリズムの特性上、比較例による方法と同様、最適な部分シーケンスを必ず検出することができる。 Thus, according to the data stream monitoring apparatus 1 of the present embodiment, an optimal partial sequence can be output even if a single time warping matrix is used.
That is, according to the data stream monitoring apparatus 1 of the present embodiment, for example, in the example of FIG. 3, the DTW distance “6” at the time t = 5, not the DTW distance “14” at the time t = 3. (And its start time and end time) can be outputted from the output unit 5 as the DTW distance of the optimum partial sequence.
According to the fourth method, the optimum partial sequence can be detected without fail due to the characteristics of the algorithm, as in the method according to the comparative example.

また、この第３の方法および第４の方法は、図４および図５のフローチャートを実行するプログラムを作成することで、データストリーム監視装置１などのコンピュータ装置において実現することができる。さらに、それらのプログラムは、ハードディスク、フラッシュメモリ、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ（Digital Versatile Disk）などの記録媒体に保存することが可能である。 Further, the third method and the fourth method can be realized in a computer apparatus such as the data stream monitoring apparatus 1 by creating a program for executing the flowcharts of FIGS. 4 and 5. Furthermore, these programs can be stored in a recording medium such as a hard disk, a flash memory, a CD-ROM (Compact Disk Read Only Memory), or a DVD (Digital Versatile Disk).

（第１実施例）
次に、図６を参照しながら、本実施形態のデータストリーム監視装置に関する第１実施例について説明する。図６は、本実施形態のデータストリーム監視装置に関する第１実施例を説明するための図であり、（ａ）は問合せシーケンス、（ｂ）は部分シーケンス検出の様子を、それぞれ表している。 (First embodiment)
Next, a first example relating to the data stream monitoring apparatus of this embodiment will be described with reference to FIG. 6A and 6B are diagrams for explaining a first example relating to the data stream monitoring apparatus of the present embodiment, where FIG. 6A shows an inquiry sequence, and FIG. 6B shows a partial sequence detection state.

第１実施例は、本実施形態のデータストリーム監視装置１を、１ＧＢのメモリ、Ｉｎｔｅｌ（登録商標）のＸｅｏｎ２．８ＧＨｚのＣＰＵを搭載したＬｉｎｕｘ（登録商標）のマシンで実施したものである。ここでは、温度センサによって気温を計測した場合に関して、実験を行っている。
図６（ａ）は、問合せシーケンスの波形であり、縦軸が温度（摂氏）を表し、横軸が時間の経過を表している。この問合せシーケンスは、天候の変化によって、気温が約２０度から約３１度まで大きく変動しているパターンが２回存在していることを特徴としている。 In the first example, the data stream monitoring apparatus 1 according to the present embodiment is implemented on a Linux (registered trademark) machine equipped with a 1 GB memory and an Intel (registered trademark) Xeon 2.8 GHz CPU. Here, an experiment is performed in the case where the temperature is measured by the temperature sensor.
FIG. 6A shows the waveform of the inquiry sequence, where the vertical axis represents temperature (Celsius) and the horizontal axis represents the passage of time. This inquiry sequence is characterized in that there are two patterns in which the temperature varies greatly from about 20 degrees to about 31 degrees due to changes in the weather.

図６（ｂ）は、データストリームの波形であり、（ａ）と同じく、縦軸が温度（摂氏）を表し、横軸が時間の経過を表している。このデータストリームは、温度センサによる約１分ごとの計測データを、上記マシンで受信したものである。
本実施形態の第４の方法を上記マシンで実行することで、図６（ｂ）に示すように、データストリームにおいて、（ａ）の問合せシーケンスと類似している部分シーケンス（Ｓｕｂｓｅｑ）としてＳｕｂ１とＳｕｂ２の２つを、逃さずに検出している。 FIG. 6B shows a waveform of the data stream. As in FIG. 6A, the vertical axis represents temperature (Celsius) and the horizontal axis represents the passage of time. This data stream is obtained by receiving data measured by the temperature sensor about every minute by the machine.
By executing the fourth method of the present embodiment on the above machine, as shown in FIG. 6B, in the data stream, Sub1 and Sub1 as a partial sequence (Subseq) similar to the query sequence of (a) Two of Sub2 are detected without missing.

（第２実施例）
続いて、図７を参照しながら、本実施形態のデータストリーム監視装置に関する第２実施例について説明する。図７は、本実施形態のデータストリーム監視装置に関する第２実施例について、データストリームのシーケンスの長さと計算時間との関係を表した図である。 (Second embodiment)
Next, a second example relating to the data stream monitoring apparatus of this embodiment will be described with reference to FIG. FIG. 7 is a diagram illustrating the relationship between the sequence length of the data stream and the calculation time in the second example relating to the data stream monitoring apparatus of the present embodiment.

第２実施例において、問合せシーケンスは人工データであり、長さは「２５６」である。データストリームは人工データであり、長さ（シーケンス長）を「１０の３乗」から「１０の６乗」まで可変とした。
図７において、縦軸は計算時間（ｍｓ）、横軸はデータストリームの長さ（シーケンス長）である。比較例による方法、および、本実施例による方法ともに、タイムワーピング行列の更新と類似する部分シーケンスの検出の合計時間を平均し、計算時間とした。 In the second embodiment, the inquiry sequence is artificial data and the length is “256”. The data stream is artificial data, and the length (sequence length) is variable from “10 to the third power” to “10 to the sixth power”.
In FIG. 7, the vertical axis represents the calculation time (ms), and the horizontal axis represents the length of the data stream (sequence length). In both the method according to the comparative example and the method according to the present embodiment, the total time of detection of the partial sequences similar to the update of the time warping matrix is averaged to obtain the calculation time.

図７に示すように、比較例による方法（Ｌ２）では、データストリームが長くなるにつれ、計算時間が増加している。一方、本実施例による方法（Ｌ１）では、データストリームが長くなっても、計算時間は一定に保たれている。これは、データストリームが長くなるにつれて、比較例による方法ではタイムワーピング行列の数が増えるのに対し、本実施例による方法ではタイムワーピング行列が常に単一であるからである。 As shown in FIG. 7, in the method (L2) according to the comparative example, the calculation time increases as the data stream becomes longer. On the other hand, in the method (L1) according to the present embodiment, the calculation time is kept constant even if the data stream becomes long. This is because as the data stream becomes longer, the number of time warping matrices increases in the method according to the comparative example, whereas the time warping matrix is always single in the method according to the present embodiment.

このように、第１実施例および第２実施例からもわかるように、本実施形態のデータストリーム監視装置１によれば、比較例の場合と比べて、計算コストの低減、つまり、計算の高速化と省メモリ化を実現できる。 Thus, as can be seen from the first example and the second example, according to the data stream monitoring apparatus 1 of the present embodiment, the calculation cost is reduced, that is, the calculation speed is higher than in the comparative example. And memory saving.

以上で実施形態の説明を終えるが、本発明の態様はこれらに限定されるものではない。
たとえば、本実施形態では、問合せシーケンスおよびデータストリームに関して、一次元データの場合について説明したが、ｎ次元データ（ｎは２以上の自然数）の場合でも同様に適用することができる。
また、本発明は、温度センサ以外に、映像や音楽の配信、バイオインフォマティックス、各種ロボットなど、多くの分野に適用可能である。
その他、具体的な構成について、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 This is the end of the description of the embodiments, but the aspects of the present invention are not limited to these.
For example, in the present embodiment, the case of one-dimensional data has been described with respect to the query sequence and the data stream.
In addition to the temperature sensor, the present invention is applicable to many fields such as video and music distribution, bioinformatics, and various robots.
In addition, the specific configuration can be changed as appropriate without departing from the spirit of the present invention.

本実施形態のデータストリーム監視装置の構成図である。It is a block diagram of the data stream monitoring apparatus of this embodiment. 本実施形態のデータストリーム監視装置による問合せ処理のイメージを示した図である。It is the figure which showed the image of the inquiry process by the data stream monitoring apparatus of this embodiment. 本実施形態のタイムワーピング行列の具体例を示した図である。It is the figure which showed the specific example of the time warping matrix of this embodiment. 第３の方法を実行する場合のデータストリーム監視装置の動作を示したフローチャートである。It is the flowchart which showed operation | movement of the data stream monitoring apparatus in the case of performing a 3rd method. 第４の方法を実行する場合のデータストリーム監視装置の動作を示したフローチャートである。It is the flowchart which showed operation | movement of the data stream monitoring apparatus in the case of performing a 4th method. 本実施形態のデータストリーム監視装置に関する第１実施例を説明するための図であり、（ａ）は問合せシーケンス、（ｂ）は部分シーケンス検出の様子を、それぞれ表している。It is a figure for demonstrating the 1st Example regarding the data stream monitoring apparatus of this embodiment, (a) represents the inquiry sequence, (b) represents the mode of the partial sequence detection, respectively. 本実施形態のデータストリーム監視装置に関する第２実施例について、シーケンスの長さと計算時間との関係図である。It is a relationship figure of the length of a sequence, and calculation time about the 2nd Example regarding the data stream monitoring apparatus of this embodiment. ＤＴＷを使用したデータストリーム監視の比較例における、問合せ処理のイメージを示した図である。It is the figure which showed the image of the inquiry process in the comparative example of the data stream monitoring which uses DTW. 比較例におけるタイムワーピング行列の例を示した図である。It is the figure which showed the example of the time warping matrix in a comparative example.

Explanation of symbols

１データストリーム監視装置
４記憶部
５出力部
７処理部
ＴＷＭタイムワーピング行列 1 Data Stream Monitoring Device 4 Storage Unit 5 Output Unit 7 Processing Unit TWM Time Warping Matrix

Claims

A data stream monitoring device that detects a partial sequence similar to an inquiry sequence that is a predetermined data string from a data stream that is continuously received data based on a dynamic time warping distance,
When data x _t at time t in the data stream is received , the data y _i in the query sequence where the sequence of length m = (y ₁ , y ₂ ,..., Y _m ) (however, , I = 1, 2,..., M) and the information d (t−1, i) regarding the distance between the data x _t−1 at the time t−1 in the data stream and the information d ( a single time warping matrix having information s (t-1, i) relating to the start time corresponding to t-1, i) and (t-1, i) elements , and a predetermined relating to the dynamic time warping distance. a storage section for storing the threshold epsilon,
Upon receiving the pre-Symbol data x _t, using said storage unit to the stored time warping matrix with the following equation (11) to (14), the data y _{i (except} in the query sequence, i = 1 , 2, ..., m) and information d (t, i) regarding the distance between the data x _t and the start corresponding to the d (t, i) calculated by the following equation (15) Information s (t, i) relating to time is calculated, the time warping matrix stored in the storage unit is updated, and information d relating to the distance between the data y _m and the data x _t in the inquiry sequence (t, m) when there is more than a predetermined threshold epsilon, the starting time s (t, m) a starting time _{t s} corresponding to the information d (t, m) regarding the distance, terminating the time t time t the partial sequence X [t in the data stream as _e _s: a processing unit for detecting a t _e] as subsequence said similar,
A data stream monitoring apparatus comprising:
d (t, i) = || x _t −y _i || + d _best (11)
d _best = min {d (t, i-1), d (t-1, i), d (t-1, i-1)}
(12)
d (t, 0) = d (0,0) = 0 (13)
d (0, i) = ∞ (14)
(Where || x _t −y _i || represents a distance between two numerical values (x _t and y _i ), and min {d (t, i−1), d (t−1, i), d (t−1, i−1)} represents that the smallest one of the three values in {} is adopted.)
<If i ≧ 2>
s (t, i) = s (t, i−1) (when d _best = d (t, i−1))
s (t-1, i) (when d _best = d (t-1, i))
s (t-1, i-1) (when d _best = d (t-1, i-1))
<When i = 1>
s (t, i) = s (t, 1) = t (15)

Wherein the processing unit, the similar parts sequence _X: when detecting [t s t _e], then further subjected to calculation of the value of each matrix element of the time warping matrix, partial sequences X [t that similar When it is determined that there is no partial sequence having a smaller dynamic time warping distance among partial sequences that overlap with _s : t _e ] even in part, the similar partial sequence X [t _s : t _e ] The data stream monitoring apparatus according to claim 1 , wherein: is detected as a more appropriate partial sequence.

Wherein the processing unit, the similar parts sequence _X: when detecting [t s t _e], partial sequence X that _similar: When the end time of the reception time of the last data [t s t _e], the in the column with the time warping matrix after the end time, of each matrix element, a the start time is the end time earlier, and partial sequence X whose value the similarity [t _s: t _e] of The data stream monitoring apparatus according to claim 2 , wherein when there is nothing smaller than the dynamic time warping distance, the similar partial sequence is detected as more appropriate.

A data stream monitoring method by a data stream monitoring device that detects a partial sequence similar to an inquiry sequence that is a predetermined data string from a data stream that is continuously received data based on a dynamic time warping distance,
The data stream monitoring device receives the data x _t at time t in the data stream, and the inquiry sequence is a sequence m of length m = (y ₁ , y ₂ ,..., Y _m ). Data d _i (where i = 1, 2,..., M) and information d (t−1, i) relating to the distance between the data x _t−1 at time t−1 in the data stream and A single time warping matrix having (t-1, i) elements as information s (t-1, i) related to the start time corresponding to the information d (t-1, i) related to the distance , and a storage unit for storing a predetermined threshold value ε about the dynamic time warping distance, and a processing unit,
Wherein the processing unit, when receiving the previous SL data x _t, using said storage unit to the stored time warping matrix with the following equation (11) to (14), the data y _i in the query sequence ( However, the information d (t, i) regarding the distance between i = 1, 2,..., M) and the data x _t and the d (t, i) calculated by the following equation (15). ) To calculate the information s (t, i) related to the start time corresponding to), update the time warping matrix stored in the storage unit, and between the data y _m and the data x _t in the query sequence distance information d (t, m) in the case is less than a predetermined threshold epsilon, the starting time s (t, m) a starting time _{t s} corresponding to the information d (t, m) regarding the distance, the time portion in the data stream to end time t _e the t Shi Sequence X [t _s: t _e] the data stream monitoring method characterized by detecting a subsequence the similar.
d (t, i) = || x _t −y _i || + d _best (11)
d _best = min {d (t, i-1), d (t-1, i), d (t-1, i-1)}
(12)
d (t, 0) = d (0,0) = 0 (13)
d (0, i) = ∞ (14)
(Where || x _t −y _i || represents a distance between two numerical values (x _t and y _i ), and min {d (t, i−1), d (t−1, i), d (t−1, i−1)} represents that the smallest one of the three values in {} is adopted.)
<If i ≧ 2>
s (t, i) = s (t, i−1) (when d _best = d (t, i−1))
s (t-1, i) (when d _best = d (t-1, i))
s (t-1, i-1) (when d _best = d (t-1, i-1))
<When i = 1>
s (t, i) = s (t, 1) = t (15)

Wherein the processing unit, the similar parts sequence _X: when detecting [t s t _e], then further subjected to calculation of the value of each matrix element of the time warping matrix, partial sequences X [t that similar When it is determined that there is no partial sequence having a smaller dynamic time warping distance among partial sequences that overlap with _s : t _e ] even in part, the similar partial sequence X [t _s : t _e ] The data stream monitoring method according to claim 4 , wherein the data stream is detected as a more appropriate partial sequence.

Wherein the processing unit, the similar parts sequence _X: when detecting [t s t _e], partial sequence X that _similar: When the end time of the reception time of the last data [t s t _e], the in the column with the time warping matrix after the end time, of each matrix element, a the start time is the end time earlier, and partial sequence X whose value the similarity [t _s: t _e] of 6. The data stream monitoring method according to claim 5 , wherein when there is nothing smaller than the dynamic time warping distance, the similar partial sequence is detected as more appropriate.

Program for executing a data stream monitoring method according to the computer of claims 4 to any one of claims 6.

A recording medium for recording the program according to claim 7 .