JP2013257712A

JP2013257712A - Device, method and program for detection of similar partial sequence

Info

Publication number: JP2013257712A
Application number: JP2012133152A
Authority: JP
Inventors: Machiko Toyoda; 真智子豊田; Yasushi Sakurai; 保志櫻井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-06-12
Filing date: 2012-06-12
Publication date: 2013-12-26

Abstract

PROBLEM TO BE SOLVED: To accurately detect a similar partial sequence even with a small amount of data without using all of time series data.SOLUTION: When one element x of sampled time series data is received (S201), a similarity score between a partial sequence of time series data including the element, and a partial sequence of another time series data, is calculated using a sampling period (S103). A pair of partial sequences for which the similarity score is equal to or more than a predetermined threshold is determined as a pair of similar partial sequences, and the pair of similar partial sequences is detected as a pair of matching partial sequences (S104).

Description

本発明は、類似部分シーケンス検出装置、方法、及びプログラムに係り、特に、２つの時系列データから、類似する部分シーケンスのペアを検出する類似部分シーケンス検出装置、方法、及びプログラムに関する。 The present invention relates to a similar partial sequence detection device, method, and program, and more particularly, to a similar partial sequence detection device, method, and program for detecting a pair of similar partial sequences from two time-series data.

センサネットワークの普及に伴い、センサノードから収集されるセンサデータを効率的に処理・分析する技術が求められるようになった。特に、センサデータは、近年のBig data に代表されるように、データを蓄積してから分析するのではなく、リアルタイムに瞬時に分析を行い、結果を提示することが重要とされている。従来から検討されてきたセンサデータの平均値や最大・最小値を求めるといった統計情報の処理のみではなく、有益な情報を自動抽出する高度なマイニング技術が必要となる。このマイニング技術の1つとして、類似探索が挙げられる。センサデータの類似探索とは、センサデータ間の類似部分を特定する技術であり、相関検出(例えば、特許文献１)や問合せパターンに類似する部分パターンの特定(例えば、特許文献２)などもこれに該当する。特許文献３では、データ間に共通する部分的なパターンを特定する技術が示されている。パターン検出のための距離関数として、ダイナミックタイムワーピング距離(DTW)を利用し、純粋にDTWを用いた場合と同等の精度を保証しつつも、高速かつ省メモリに動作する。これは、センサデータをサーバなどでモニタリングし、2つのセンサ間に共通するパターンがあればリアルタイムに検出することを指す。図１７、図１８に具体的な類似パターンの検出を例示する。図１７は、2つの異なるセンサによって取得された湿度データである。各センサは約1分間隔でデータを取得しており、横軸は取得したデータのシーケンス番号であり、縦軸がその値である。これら2つのセンサデータは、一部異なる区間が存在するものの、よく似たパターンを示している。具体的には、Subseqによって記される2つのパターンが、これらのセンサにおける類似パターンである。図１８は、これらのデータに対し、特許文献３の技術を適用した場合の類似パターン検出結果を示している。横軸が図１７における1つ目のセンサデータのシーケンス番号を、縦軸が2つ目のセンサデータのシーケンス番号を表している。図における太線が、2つのデータ間でどの要素同士が対応付いているかを意味し、2つの類似パターンを特定する様子を示している。このように、2つのセンサデータを比べ、類似パターンがあればその位置(開始位置と終了位置)と類似度をリアルタイムに報告する。この技術は時系列データに適用する汎用的なものであるが、センサデータに適用する場合、次の2点に注目する必要がある。1点目は、データ通信量の考慮である。類似パターン検出のためには、各センサノードが取得したデータをサーバに送信する必要がある。この時、データが短い間隔で取得されている場合、そのデータ量は膨大になる。各センサノードがこれらをすべて送信することで、ネットワーク全体の通信量が増大してしまう。2点目は、センサノードの電源制約である。各センサノードは、主に電池駆動であり、データ収集やデータ送信時に電力を消費する。この電力量は、データ送信時が一番高いことが知られており、必要以上のデータを送信することで、電池の消耗を速めてしまう可能性がある。以上のことから、センサデータからリアルタイムに類似パターン検出を行う場合、パターン検出に必要な最低限のデータを保持しつつも、効率的にデータを送信する必要がある。 With the widespread use of sensor networks, a technique for efficiently processing and analyzing sensor data collected from sensor nodes has come to be demanded. In particular, sensor data is not analyzed after accumulating data, as represented by recent big data, but it is important to perform analysis in real time and present the results. In addition to statistical information processing such as obtaining the average value and maximum / minimum values of sensor data that have been studied in the past, advanced mining technology that automatically extracts useful information is required. One of the mining techniques is similarity search. Similarity search of sensor data is a technique for specifying a similar part between sensor data, such as correlation detection (for example, Patent Document 1) and identification of a partial pattern similar to an inquiry pattern (for example, Patent Document 2). It corresponds to. Patent Document 3 discloses a technique for specifying a partial pattern common between data. The dynamic time warping distance (DTW) is used as a distance function for pattern detection, and it operates at high speed and saves memory while ensuring the same accuracy as when using pure DTW. This means that sensor data is monitored by a server or the like, and if there is a common pattern between two sensors, it is detected in real time. FIG. 17 and FIG. 18 illustrate specific similar pattern detection. FIG. 17 is humidity data acquired by two different sensors. Each sensor acquires data at intervals of about 1 minute, the horizontal axis is the sequence number of the acquired data, and the vertical axis is the value. These two sensor data show very similar patterns, although there are some different sections. Specifically, the two patterns described by Subseq are similar patterns in these sensors. FIG. 18 shows similar pattern detection results when the technique of Patent Document 3 is applied to these data. The horizontal axis represents the sequence number of the first sensor data in FIG. 17, and the vertical axis represents the sequence number of the second sensor data. The bold lines in the figure indicate which elements correspond to each other between the two data, and show how two similar patterns are specified. In this way, the two sensor data are compared, and if there is a similar pattern, the position (start position and end position) and similarity are reported in real time. This technique is a general-purpose technique applied to time series data, but when applied to sensor data, it is necessary to pay attention to the following two points. The first point is the amount of data communication. In order to detect a similar pattern, it is necessary to transmit data acquired by each sensor node to a server. At this time, if the data is acquired at short intervals, the amount of data becomes enormous. Since each sensor node transmits all of these, the communication amount of the entire network increases. The second point is the power restriction of the sensor node. Each sensor node is mainly battery-powered and consumes power during data collection and data transmission. This amount of power is known to be the highest during data transmission, and transmitting more data than necessary may accelerate battery consumption. From the above, when similar pattern detection is performed in real time from sensor data, it is necessary to efficiently transmit data while retaining minimum data necessary for pattern detection.

そこで、効率的なデータ送信方法として、特許文献４のようにデータの変化率(速度や位置などを基に算出) に応じてデータを選択して送信したり、特許文献５のように、複数のセンサデータを圧縮して送信する方法が挙げられる。 Therefore, as an efficient data transmission method, data is selected and transmitted according to the rate of change of data (calculated based on speed, position, etc.) as in Patent Document 4, or plural data as in Patent Document 5 A method of compressing and transmitting the sensor data is available.

また、非特許文献1では、時系列データに対し、オリジナルデータを縮約した概要情報のみにリアルタイムに変換する技術が提案されている。概要情報の一部を送信することで、データ量は大幅に削減される。 Further, Non-Patent Document 1 proposes a technique for converting, in real time, only time-series data into summary information obtained by reducing original data. By transmitting a part of the summary information, the amount of data is greatly reduced.

特許第4452234号公報Japanese Patent No.4452234 特許第4801566号公報Japanese Patent No. 4801566 特開2010-198227号公報JP 2010-198227 特開2011-192198号公報JP 2011-192198A 特開2012-44449号公報JP 2012-44449 A

Umit Y. Ogras, and Hakan Ferhatosmanoglu: “Online summarizationof dynamic time series data, ” VLDB journal, vol.15, num.1, pp.84-98,2006.Umit Y. Ogras, and Hakan Ferhatosmanoglu: “Online summarizationof dynamic time series data,” VLDB journal, vol.15, num.1, pp.84-98, 2006.

上記の特許文献３において利用しているDTWは、データの形状(時間ドメインの情報) を利用して類似パターンを検出する。そのため、オリジナルデータに近い検出精度を保証するためには、データ削減を実現しつつも、データの形状は保持するようにデータを選択する必要がある。これに対し、上記で述べた技術はデータの形状に関係なく、または、データの形状を壊した状態でデータが選択されるため、適切に類似パターン検出を行うことが困難である。 The DTW used in Patent Document 3 described above detects a similar pattern using the shape of data (time domain information). Therefore, in order to guarantee the detection accuracy close to the original data, it is necessary to select the data so that the shape of the data is maintained while realizing the data reduction. On the other hand, in the technique described above, data is selected regardless of the data shape or in a state where the data shape is broken, so that it is difficult to appropriately detect similar patterns.

本発明は、上記の事情を鑑みてなされたもので、時系列データのすべてのデータ(オリジナルデータ)を用いることなく、少ないデータであっても、高い精度で類似部分シーケンスを検出することができる類似部分シーケンス検出装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and can detect a similar partial sequence with high accuracy even with a small amount of data without using all data (original data) of time-series data. An object is to provide a similar partial sequence detection apparatus, method, and program.

上記の目的を達成するために本発明に係る類似部分シーケンス検出装置は、第１時系列データを第１サンプリング周期でサンプリングした第１時系列データ及び第２時系列データを第２サンプリング周期でサンプリングした第２時系列データから、類似する部分シーケンスのペアを、２つの前記部分シーケンス同士のＤＴＷ(Dynamic Time Warping）距離と相互変換可能な類似度スコアを行列要素とする単一のスコア行列、及び前記部分シーケンスの開始位置を行列要素とする位置行列を用いて検出する類似部分シーケンス検出装置であって、前記スコア行列及び前記位置行列を記憶する記憶部と、前記サンプリングした第１時系列データ及び第２時系列データのいずれかのデータの１つの要素を受信したとき、当該要素を含む第１時系列データ及び第２時系列データの何れか一方中の部分シーケンスと、第１時系列データ及び第２時系列データの何れか他方中の部分シーケンスと、の類似度スコアを算出し、前記算出した類似度スコアを、前記類似度スコアの算出に用いた２つの前記部分シーケンスの終了位置に対応する、前記記憶部のスコア行列の行列要素として記憶すると共に、前記類似度スコアの算出に用いた２つの前記部分シーケンスの開始位置を、前記終了位置に対応する、前記記憶部の位置行列の行列要素として記憶し、前記記憶部のスコア行列に記憶された前記類似度スコアが所定の閾値以上である部分シーケンスのペアを類似する部分シーケンスのペアと判定し、この類似する部分シーケンスのペアを適合する部分シーケンスのペアとして検出する処理部と、を備え、前記処理部は、前記類似度スコアのいずれかを算出するとき、前記スコア行列において当該類似度スコアと隣接し前記第１時系列データに関して１つ前の時刻に対応する前記類似度スコアを取得し、当該取得した類似度スコアに対して、対象の２つの前記部分シーケンスにおける対応するデータの要素同士の差の大きさに、第１サンプリング周期を乗算した値を減算するとともに、予め定められた係数に第１サンプリング周期に応じた数値を乗算した値を加算して、第１スコアを算出し、前記スコア行列において当該類似度スコアと隣接し前記第２時系列データに関して１つ前の時刻に対応する前記類似度スコアを取得し、当該取得した類似度スコアに対して、対象の２つの前記部分シーケンスにおける対応するデータの要素同士の差の大きさに、第２サンプリング周期を乗算した値を減算するとともに、前記予め定められた係数に第２サンプリング周期に応じた数値を乗算した値を加算して、第２スコアを算出し、前記スコア行列において当該類似度スコアと隣接し前記第１時系列データ及び前記第２時系列データの両方に関して１つ前の時刻に対応する前記類似度スコアを取得し、当該取得した類似度スコアに対して、対象の２つの前記部分シーケンスにおける対応するデータの要素同士の差の大きさに、第１サンプリング周期及び第２サンプリング周波数の何れか大きい方を乗算した値を減算するとともに、前記予め定められた係数に第１サンプリング周期及び第２サンプリング周期の和に応じた数値を乗算した値を加算して、第３スコアを算出し、前記第１スコア、前記第２スコア、及び前記第３スコアのうちの最大のスコアを、当該類似度スコアとする。 In order to achieve the above object, a similar partial sequence detection apparatus according to the present invention samples first time-series data obtained by sampling first time-series data at a first sampling period and second time-series data at a second sampling period. A single score matrix having a similarity score that can be converted into a DTW (Dynamic Time Warping) distance between the two partial sequences as a matrix element from the second time series data, A similar partial sequence detection device that detects using a position matrix having a start position of the partial sequence as a matrix element, the storage unit storing the score matrix and the position matrix, the sampled first time series data, When one element of any data of the second time series data is received, the first time series data including the element and The similarity score between the partial sequence in any one of the two time series data and the partial sequence in any one of the first time series data and the second time series data is calculated, and the calculated similarity score is The two partial sequences used for calculating the similarity score are stored as matrix elements of the score matrix of the storage unit corresponding to the end positions of the two partial sequences used for the calculation of the similarity score. A pair of partial sequences in which the start position is stored as a matrix element of the position matrix of the storage unit corresponding to the end position, and the similarity score stored in the score matrix of the storage unit is greater than or equal to a predetermined threshold A processing unit for determining a pair of similar partial sequences and detecting the pair of similar partial sequences as a pair of matching partial sequences, The processing unit obtains the similarity score adjacent to the similarity score in the score matrix and corresponding to the previous time with respect to the first time-series data when calculating any of the similarity scores, A value obtained by multiplying the obtained similarity score by the first sampling period is subtracted from the magnitude of the difference between corresponding data elements in the two partial sequences of interest, and a predetermined coefficient is obtained. A value obtained by multiplying a numerical value corresponding to the first sampling period is added to calculate a first score, which is adjacent to the similarity score in the score matrix and corresponds to the previous time with respect to the second time-series data. The similarity score is acquired, and with respect to the acquired similarity score, the magnitude of a difference between corresponding data elements in the two target partial sequences is determined. Subtracting the value multiplied by the second sampling period and adding a value obtained by multiplying the predetermined coefficient by a numerical value corresponding to the second sampling period to calculate a second score; The similarity score that is adjacent to the similarity score and corresponds to the previous time for both the first time series data and the second time series data is acquired, and for the acquired similarity score, A value obtained by multiplying a magnitude of a difference between corresponding data elements in the two partial sequences by a larger one of the first sampling period and the second sampling frequency is subtracted, and the predetermined coefficient is A value obtained by multiplying a numerical value corresponding to the sum of one sampling period and the second sampling period is added to calculate a third score, and the first score, the second score are calculated. A, and a maximum score among the third score, and the similarity score.

本発明に係る類似部分シーケンス検出方法は、第１時系列データを第１サンプリング周期でサンプリングした第１時系列データ及び第２時系列データを第２サンプリング周期でサンプリングした第２時系列データから、類似する部分シーケンスのペアを、２つの前記部分シーケンス同士のＤＴＷ(Dynamic Time Warping）距離と相互変換可能な類似度スコアを行列要素とする単一のスコア行列、及び前記部分シーケンスの開始位置を行列要素とする位置行列を用いて検出する、前記スコア行列及び前記位置行列を記憶する記憶部を備えた類似部分シーケンス検出装置における類似部分シーケンス検出方法であって、処理部によって、前記サンプリングした第１時系列データ及び第２時系列データのいずれかのデータの１つの要素を受信したとき、当該要素を含む第１時系列データ及び第２時系列データの何れか一方中の部分シーケンスと、第１時系列データ及び第２時系列データの何れか他方中の部分シーケンスと、の類似度スコアを算出し、前記算出した類似度スコアを、前記類似度スコアの算出に用いた２つの前記部分シーケンスの終了位置に対応する、前記記憶部のスコア行列の行列要素として記憶すると共に、前記類似度スコアの算出に用いた２つの前記部分シーケンスの開始位置を、前記終了位置に対応する、前記記憶部の位置行列の行列要素として記憶し、前記記憶部のスコア行列に記憶された前記類似度スコアが所定の閾値以上である部分シーケンスのペアを類似する部分シーケンスのペアと判定し、この類似する部分シーケンスのペアを適合する部分シーケンスのペアとして検出することを含み、前記処理部によって前記類似度スコアのいずれかを算出するとき、前記スコア行列において当該類似度スコアと隣接し前記第１時系列データに関して１つ前の時刻に対応する前記類似度スコアを取得し、当該取得した類似度スコアに対して、対象の２つの前記部分シーケンスにおける対応するデータの要素同士の差の大きさに、第１サンプリング周期を乗算した値を減算するとともに、予め定められた係数に第１サンプリング周期に応じた数値を乗算した値を加算して、第１スコアを算出し、前記スコア行列において当該類似度スコアと隣接し前記第２時系列データに関して１つ前の時刻に対応する前記類似度スコアを取得し、当該取得した類似度スコアに対して、対象の２つの前記部分シーケンスにおける対応するデータの要素同士の差の大きさに、第２サンプリング周期を乗算した値を減算するとともに、前記予め定められた係数に第２サンプリング周期に応じた数値を乗算した値を加算して、第２スコアを算出し、前記スコア行列において当該類似度スコアと隣接し前記第１時系列データ及び前記第２時系列データの両方に関して１つ前の時刻に対応する前記類似度スコアを取得し、当該取得した類似度スコアに対して、対象の２つの前記部分シーケンスにおける対応するデータの要素同士の差の大きさに、第１サンプリング周期及び第２サンプリング周波数の何れか大きい方を乗算した値を減算するとともに、前記予め定められた係数に第１サンプリング周期及び第２サンプリング周期の和に応じた数値を乗算した値を加算して、第３スコアを算出し、前記第１スコア、前記第２スコア、及び前記第３スコアのうちの最大のスコアを、当該類似度スコアとする。 The similar partial sequence detection method according to the present invention is based on the first time series data obtained by sampling the first time series data in the first sampling period and the second time series data obtained by sampling the second time series data in the second sampling period. A single score matrix having a similarity score that can be interconverted between a pair of similar partial sequences as a DTW (Dynamic Time Warping) distance between the two partial sequences, and a start position of the partial sequence as a matrix A similar partial sequence detection method in a similar partial sequence detection apparatus comprising a storage unit for storing the score matrix and the position matrix, which is detected using a position matrix as an element, wherein the sampled first sampled by a processing unit When one element of either time-series data or second time-series data is received, A similarity score between a partial sequence in any one of the first time-series data and the second time-series data including the prime, and a partial sequence in any one of the first time-series data and the second time-series data, Calculating and storing the calculated similarity score as a matrix element of the score matrix of the storage unit corresponding to the end positions of the two partial sequences used to calculate the similarity score, and the similarity score Storing the start positions of the two partial sequences used for the calculation as matrix elements of the position matrix of the storage unit corresponding to the end positions, and the similarity score stored in the score matrix of the storage unit is A partial sequence pair that is equal to or greater than a predetermined threshold is determined as a similar partial sequence pair, and the similar partial sequence pair is determined as a matching partial sequence pair. And detecting the similarity score by the processing unit, and the similarity corresponding to the previous time with respect to the first time series data adjacent to the similarity score in the score matrix The degree score is acquired, and the value obtained by multiplying the magnitude of the difference between corresponding data elements in the two partial sequences of interest by the first sampling period is subtracted from the acquired similarity score, A value obtained by multiplying a predetermined coefficient by a numerical value corresponding to the first sampling period is added to calculate a first score, and one of the second time-series data adjacent to the similarity score in the score matrix is calculated. The similarity score corresponding to the previous time is acquired, and the corresponding similarity score in the two partial sequences of the target is acquired with respect to the acquired similarity score. Subtracting the value obtained by multiplying the difference between the data elements by the second sampling period, and adding the value obtained by multiplying the predetermined coefficient by a numerical value corresponding to the second sampling period, 2 scores are calculated, and the similarity score adjacent to the similarity score in the score matrix and corresponding to the previous time for both the first time series data and the second time series data is obtained, A value obtained by multiplying the obtained similarity score by the larger of the first sampling period and the second sampling frequency to the magnitude of the difference between corresponding data elements in the two target partial sequences is subtracted. And a third score is calculated by adding a value obtained by multiplying the predetermined coefficient by a numerical value corresponding to the sum of the first sampling period and the second sampling period. The first score and the second score, and the highest score among the third score, and the similarity score.

本発明に係るプログラムは、コンピュータを、上記の類似部分シーケンス検出装置の各部として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit of the similar partial sequence detection device.

以上説明したように、本発明の類似部分シーケンス検出装置、方法、及びプログラムによれば、それぞれの時系列データのサンプリング周期を用いて、類似度スコアを算出することにより、時系列データのすべてのデータを用いることなく、少ないデータであっても、高い精度で類似部分シーケンスを検出することができる、という効果が得られる。 As described above, according to the similar partial sequence detection device, method, and program of the present invention, by calculating the similarity score using the sampling period of each time series data, all of the time series data is calculated. An effect is obtained that a similar partial sequence can be detected with high accuracy even when there is little data without using data.

湿度データのパワースペクトルを示すグラフである。It is a graph which shows the power spectrum of humidity data. サンプリング後の湿度データを示すグラフである。It is a graph which shows the humidity data after sampling. 本発明の第１の実施の形態に係るセンサノードの構成を示す概略図である。It is the schematic which shows the structure of the sensor node which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る類似パターン検出装置の構成を示す概略図である。It is the schematic which shows the structure of the similar pattern detection apparatus which concerns on the 1st Embodiment of this invention. ＤＴＷを説明するための図である。It is a figure for demonstrating DTW. 本発明の第１の実施の形態に係る類似パターン検出装置における類似部分シーケンスペア検出処理の処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the processing routine of the similar partial sequence pair detection process in the similar pattern detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る類似パターン検出装置におけるＬｉｓｔ配列への格納処理の処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the processing routine of the storing process to List arrangement | sequence in the similar pattern detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る類似パターン検出装置におけるＬｉｓｔ配列への格納処理の処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the processing routine of the storing process to List arrangement | sequence in the similar pattern detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係るセンサノードの構成を示す概略図である。It is the schematic which shows the structure of the sensor node which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係るセンサノードにおけるサンプリング周期決定処理の処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the processing routine of the sampling period determination process in the sensor node which concerns on the 2nd Embodiment of this invention. 温度センサのデータを示すグラフである。It is a graph which shows the data of a temperature sensor. 温度センサのデータを用いた類似パターン検出結果を示す図である。It is a figure which shows the similar pattern detection result using the data of a temperature sensor. 太陽黒点のデータを示すグラフである。It is a graph which shows the data of a sunspot. 太陽黒点のデータを用いた類似パターン検出結果を示す図である。It is a figure which shows the similar pattern detection result using the data of a sunspot. 温度センサのデータを用いた類似パターン検出結果を示す図である。It is a figure which shows the similar pattern detection result using the data of a temperature sensor. 太陽黒点のデータを用いた類似パターン検出結果を示す図である。It is a figure which shows the similar pattern detection result using the data of a sunspot. 湿度データと共通する部分パターンを説明するための図である。It is a figure for demonstrating the partial pattern which is common with humidity data. 従来技術を用いた類似パターンの検出例を示す図である。It is a figure which shows the example of a detection of the similar pattern using a prior art.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

〔第１の実施の形態〕
＜発明の概要＞
本発明では、センサデータ間の類似パターンを検出において、センサネットワークに対して効率的なデータ送信を実現しつつ、従来手法(特許文献３)と近い精度を保つシステムを提供する。すなわち、本発明の目的は、センサノードで収集されたすべてのデータ(オリジナルデータ)を用いることなく、少ないデータであっても高い精度で類似パターンを検出することである。センサノードでは、収集されるすべてのデータを送信するのではなく、類似パターン検出に必要なデータのみを選択して送信する。一方サーバでは、送信されたデータから、オリジナルデータを用いた類似パターン検出に近い精度を維持する検出を行う。 [First Embodiment]
<Outline of the invention>
The present invention provides a system that maintains an accuracy close to that of the conventional method (Patent Document 3) while realizing efficient data transmission to a sensor network in detecting a similar pattern between sensor data. That is, an object of the present invention is to detect a similar pattern with high accuracy even with a small amount of data without using all data (original data) collected by the sensor node. The sensor node does not transmit all collected data, but selects and transmits only data necessary for similar pattern detection. On the other hand, the server detects from the transmitted data while maintaining the accuracy close to the similar pattern detection using the original data.

オリジナルデータと同等の精度を達成するためには、データの形状を維持するようにデータを選択(サンプリング)する必要がある。そこで、本発明では、サンプリング定理に基づいてオリジナルデータをサンプリングし、サンプリング後のデータを送信することで類似パターン検出を行う。サンプリング定理とは、アナログ信号をデジタル信号に変換する際に、どの程度の間隔でサンプリングすればよいかを定量的に示した定理であり、元の信号に含まれる最大周波数成分をfとすると、2fよりも高い周波数f_Nqでサンプリングした信号は、元の信号を完全に復号することができるというものである。このf_Nqはナイキスト周波数と呼ばれている。サンプリング周期を決定するためには、データに離散フーリエ変換(FFT)を行い、パワースペクトルを算出する。これは、データの中に各周波数成分がどの程度の割合で含まれているかを表している。図１はHumidity #1のパワースペクトルを図示したものである。上記図１におけるパワーは、合計が１になるように正規化されている。上記図１では60までの周波数成分を示したが、もっと高周波の成分にもパワーが割り当てられており、さまざまな周波数成分から構成されていることがわかる。このようなデータの最大周波数成分fを決定するために、パワーの割合が低い高周波成分はノイズであるとみなし、閾値を用いて決定するのが一般的である(例えば、信号処理やネットワーク分析など)。閾値による決定方法は主に2通りあり、全パワーに対する割合を閾値として指定し、低周波成分からのパワーの合計が閾値を超える周波数をfとする場合と、パワーの値を閾値として指定し、閾値以上のパワーを持つ周波数成分のうち、最大となるものをfとする場合がある。しかしながら、最大周波数成分の決定方法はこれだけには限定されず、どの方法を用いてもかまわない。次にfを基に、以下の（１）式によりナイキスト周波数f_Nqを求め、以下の（２）式に従って、サンプリング周期Tを決定する。 In order to achieve the same accuracy as the original data, it is necessary to select (sample) the data so as to maintain the shape of the data. Therefore, in the present invention, similar pattern detection is performed by sampling original data based on the sampling theorem and transmitting the sampled data. The sampling theorem is a theorem that quantitatively indicates how much sampling should be performed when converting an analog signal to a digital signal, where f is the maximum frequency component contained in the original signal, A signal sampled at a frequency f _Nq higher than 2f means that the original signal can be completely decoded. This f _Nq is called the Nyquist frequency. In order to determine the sampling period, discrete Fourier transform (FFT) is performed on the data to calculate a power spectrum. This represents how much each frequency component is included in the data. FIG. 1 illustrates the power spectrum of Humidity # 1. The power in FIG. 1 is normalized so that the sum is 1. Although FIG. 1 shows up to 60 frequency components, it can be seen that power is also allocated to higher frequency components, which are composed of various frequency components. In order to determine the maximum frequency component f of such data, a high frequency component with a low power ratio is considered to be noise and is generally determined using a threshold value (for example, signal processing or network analysis). ). There are two main determination methods based on the threshold, the ratio to the total power is specified as the threshold, the frequency where the total power from the low frequency components exceeds the threshold is set as f, and the power value is specified as the threshold. Of the frequency components having a power equal to or higher than a threshold value, the maximum frequency component may be defined as f. However, the method for determining the maximum frequency component is not limited to this, and any method may be used. Next, based on f, the Nyquist frequency f _Nq is obtained by the following equation (1), and the sampling period T is determined according to the following equation (2).

ここで、nはデータの長さを表す。サンプリング周期は、すべてのセンサノードに対して共通で用いても構わないし、個々のセンサノードで個別に設定してもかまわない。しかし、サンプリング周期はデータに依存する値であるため、個別に設定するのが望ましい。 Here, n represents the length of data. The sampling period may be used in common for all sensor nodes, or may be set individually for each sensor node. However, since the sampling period is a value that depends on the data, it is desirable to set it individually.

各センサノードでは、設定されたサンプリング周期に従い、オリジナルデータをサンプリングする。そして、サンプリングされたデータのみを、サーバである類似パターン検出装置に送信する。図２は、サンプリング後のHumidity #1データを表している。データの形状は維持しているにも関わらず、そのデータサイズは大幅に削減されている。 Each sensor node samples the original data according to the set sampling cycle. Then, only the sampled data is transmitted to the similar pattern detection apparatus which is a server. FIG. 2 shows Humidity # 1 data after sampling. Despite maintaining the shape of the data, its data size has been greatly reduced.

一方、類似パターン検出装置における類似パターン検出時に、サンプリング周期の情報を利用する。すなわち、サンプリングされたシーケンスを純粋に用いるのではなく、サンプリングにより送信されなかったデータの情報を近似的に利用し、類似度を計算する。これにより、オリジナルシーケンスを用いた場合に近い結果を得ることが可能となる。 On the other hand, information on the sampling period is used when detecting a similar pattern in the similar pattern detection apparatus. That is, instead of using a sampled sequence purely, information on data that has not been transmitted by sampling is approximately used to calculate the similarity. As a result, it is possible to obtain a result close to that when the original sequence is used.

＜システム構成＞
図３に示すように、本発明の第１の実施の形態に係るセンサノード１００は、データセンシング部１０、データ取得部１２、サンプリング部１４、周期記憶部１６、及び無線通信部１８から構成される。各データは、データセンシング部１０で検知され、データ取得部１２に送られる。周期記憶部１６は、データをサンプリングためのする情報(サンプリング周期)を記憶しており、その情報に基づき、サンプリング部１４がデータをサンプリングする。センサノード１００は、後述する類似パターン検出装置１５０から、データを選択するサンプリング周期の情報を得る。 <System configuration>
As shown in FIG. 3, the sensor node 100 according to the first embodiment of the present invention includes a data sensing unit 10, a data acquisition unit 12, a sampling unit 14, a period storage unit 16, and a wireless communication unit 18. The Each data is detected by the data sensing unit 10 and sent to the data acquisition unit 12. The cycle storage unit 16 stores information (sampling cycle) for sampling data, and the sampling unit 14 samples data based on the information. The sensor node 100 obtains information on a sampling cycle for selecting data from a similar pattern detection device 150 described later.

サンプリング部１４によってサンプリングされた時系列データが、無線通信部１８から類似パターン検出装置１５０へ送信される。 The time series data sampled by the sampling unit 14 is transmitted from the wireless communication unit 18 to the similar pattern detection device 150.

図４に示すように、類似パターン検出装置１５０は、２つのセンサノード１００から送信された、サンプリングされた時系列データを受信し、類似する部分シーケンスを検出する。この類似パターン検出装置１５０は、ＣＰＵと、ＲＡＭと、後述する各種処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたサーバで構成され、機能的には次に示すように構成されている。図４に示すように、類似パターン検出装置１５０は、データ入力部６０と、処理部７０と、データ出力部８０とを備えている。 As shown in FIG. 4, the similar pattern detection apparatus 150 receives sampled time-series data transmitted from two sensor nodes 100 and detects similar partial sequences. The similar pattern detection device 150 is configured by a server including a CPU, a RAM, and a ROM that stores a program for executing various processing routines described later, and is functionally configured as follows. Yes. As shown in FIG. 4, the similar pattern detection apparatus 150 includes a data input unit 60, a processing unit 70, and a data output unit 80.

データ入力部６０は、無線通信装置で受信した複数の時系列データの入力を受け付ける。また、データ入力部６０は、キーボードなどの入力装置から、類似部分シーケンスペア（以下、単に「類似部分シーケンス」ともいう。）の検出のための類似部分シーケンス検出条件の入力を受け付ける。この類似部分シーケンス検出条件は、例えば、適合する部分シーケンスペアの長さの下限値ｌ_ｍｉｎ（ユーザによって予め設定。つまり、Ｌ（ｌ_ｘ，ｌ_ｙ）がｌ_ｍｉｎ以上であるペアを検出する。）、時系列データの種類に応じて予め設定された所定の係数ε等である。なお、データ入力部６０は、ネットワーク等を介して外部から入力されたものを受け付けるようにしてもよい。 The data input unit 60 receives input of a plurality of time series data received by the wireless communication device. In addition, the data input unit 60 receives an input of a similar partial sequence detection condition for detecting a similar partial sequence pair (hereinafter also simply referred to as “similar partial sequence”) from an input device such as a keyboard. The similar partial sequence detection condition is, for example, detecting a pair having a lower limit value l _min (preliminarily set by the user. That is, L (l _x , l _y ) of l _min or more. ), A predetermined coefficient ε set in advance according to the type of time-series data. The data input unit 60 may accept data input from the outside via a network or the like.

処理部７０は、データ取得部７２、検出条件記憶部７４、パターン検出部７６、周期計算部７８、周期記憶部８２、及び行列情報記憶部８４から構成される。 The processing unit 70 includes a data acquisition unit 72, a detection condition storage unit 74, a pattern detection unit 76, a cycle calculation unit 78, a cycle storage unit 82, and a matrix information storage unit 84.

データ取得部７２は、データ入力部６０により受け付けた時系列データのデータを取得する。検出条件記憶部７４は、データ入力部６０による受け付けた類似部分シーケンス検出条件を記憶する。 The data acquisition unit 72 acquires time-series data received by the data input unit 60. The detection condition storage unit 74 stores the similar partial sequence detection condition received by the data input unit 60.

周期計算部７８は、２つのセンサノード１００から取得した２つの時系列データの各々に対して、離散フーリエ変換(FFT)を行い、パワースペクトルを算出して、周波数ｆを決定し、上記（１）式、（２）式に従って、サンプリング周期をそれぞれ算出する。２つのセンサノード１００に対して算出されたサンプリング周期は、周期記憶部８２に記憶されると共に、データ出力部８０により、無線通信で、２つのセンサノード１００のそれぞれに、算出されたサンプリング周期を送信する。 The period calculation unit 78 performs discrete Fourier transform (FFT) on each of the two time series data acquired from the two sensor nodes 100, calculates a power spectrum, determines the frequency f, and (1 ) And (2) respectively, the sampling period is calculated. The sampling cycles calculated for the two sensor nodes 100 are stored in the cycle storage unit 82, and the data output unit 80 sets the calculated sampling cycle to each of the two sensor nodes 100 by wireless communication. Send.

パターン検出部７６は、２つのセンサノード１００から取得した２つの時系列データに対して、類似する部分シーケンスを検出する。類似部分シーケンスの検出に必要な計算データ（スコア行列、位置行列、Ｌｉｓｔ配列）は行列情報記憶部８４に記憶される。最終的に類似部分シーケンス検出条件に該当する類似部分シーケンスのみがデータ出力部８０により出力される。 The pattern detection unit 76 detects similar partial sequences for the two time-series data acquired from the two sensor nodes 100. Calculation data (score matrix, position matrix, List array) necessary for detecting similar partial sequences is stored in the matrix information storage unit 84. Finally, only the similar partial sequence corresponding to the similar partial sequence detection condition is output by the data output unit 80.

次に、類似部分シーケンスを検出する原理について説明する。 Next, the principle of detecting a similar partial sequence will be described.

本発明は、センサデータのシーケンス間の部分的な類似パターンを見つけるためのものである。シーケンス間の類似度判定には、ダイナミックタイムワーピング距離（DTW）を利用する。DTWは時間スケールを考慮した有用な距離尺度の1つである。シーケンス間の距離を最小化するように時間軸方向にシーケンス長を調整する性質を持っているため、シーケンス長やサンプリングレートが異なるシーケンス間の類似度を計算することができる。 The present invention is for finding partial similarity patterns between sequences of sensor data. The dynamic time warping distance (DTW) is used for similarity determination between sequences. DTW is one of the useful distance measures considering the time scale. Since the sequence length is adjusted in the time axis direction so as to minimize the distance between sequences, the similarity between sequences having different sequence lengths and sampling rates can be calculated.

2つのシーケンス間のDTW距離は動的計画法によって計算され、2つのシーケンスを最適に調節した後の距離の合計となる。図５（Ａ）〜（Ｃ）はDTWの計算方法を例示したものである。図５（Ａ）は、距離計算のためのDTWによる最適なシーケンスの対応付けを表している。距離計算には図５（Ｂ）で示されるようなタイムワーピング行列が用いられる。距離値を計算するために対応付けられた要素同士の並びはワーピングパスと呼ばれ、図５（Ｂ）では濃く色付けされたセルの集合として示されている。 The DTW distance between the two sequences is calculated by dynamic programming and is the sum of the distances after optimal adjustment of the two sequences. 5A to 5C illustrate a DTW calculation method. FIG. 5A shows an optimum sequence association by DTW for distance calculation. For the distance calculation, a time warping matrix as shown in FIG. 5B is used. The arrangement of elements associated with each other for calculating the distance value is called a warping path, and is shown as a set of darkly colored cells in FIG.

長さnのシーケンスX=(x₁，x₂,...,x_n)と長さmのシーケンスY=(y₁，y₂,...,y_m)を考える。これらのDTW距離D(X，Y)は以下の（３）式のように定義される。 Consider a sequence X = (x ₁ , x ₂ , ..., x _n ) of length _n and a sequence Y = (y ₁ , y ₂ , ..., y _m ) of length m. These DTW distances D (X, Y) are defined as the following equation (3).

ここで、||x_i−y_j||=(x_i−y_j)²は2つの数値の距離を表す。L1距離(||x_i−y_j||=|x_i−y_j|)など、他の選択でもかまわない。図５（Ｃ）は具体的なDTW距離計算を例示したものである。各セルにおいて上記（３）式に従い距離を計算し、2つのシーケンス間のDTW 距離は右上のセルにより獲得する。一方のシーケンスの小さな範囲と、もう一方のシーケンスの大きな範囲が対応することによるマッチングの精度低下を避けるため、ワーピング制約が導入される。これは、不要なワーピングパスを取り除き、ワーピングの範囲を制限するものであり、Sakoe-Chiba bandが代表的なワーピング制約の1つである。図５（Ｂ）のグレーに色付けられたセルは、Sakoe-Chibabandにおけるワーピングの範囲wを表しており、具体的には|i−j|≦wという制限の基で計算される。これは、類似パターン検出における計算量の抑制にも役に立つ。センサデータは、各センサノードから連続的に送信されるデータであるため、半無限長のストリームデータであると言える。このようなデータの分析においては、過去に受け取った古いデータより、現在に受け取った新しいデータの方がより重要であるとされる。そのため、ワーピングの範囲を設定することにより、比較的新しいデータの類似パターン検出に焦点をあてることができる。さらに、計算量も1つの行列に対して一定に抑えられるため、効率的な処理につながる。本発明の実施例においても、ワーピングの範囲を設定した例を示す。 Here, || x _i −y _j || = (x _i −y _j ) ² represents a distance between two numerical values. Other selections such as L1 distance (|| x _i −y _j || = | x _i −y _j |) may be used. FIG. 5C illustrates a specific DTW distance calculation. In each cell, the distance is calculated according to the above equation (3), and the DTW distance between the two sequences is acquired by the upper right cell. A warping constraint is introduced in order to avoid a reduction in matching accuracy due to a correspondence between a small range of one sequence and a large range of the other sequence. This removes unnecessary warping paths and limits the scope of warping, and the Sakoe-Chiba band is one of the typical warping constraints. The cell colored in gray in FIG. 5B represents a warping range w in Sakoe-Chibaband, and is specifically calculated based on a restriction of | i−j | ≦ w. This is also useful for reducing the amount of calculation in detecting similar patterns. Since the sensor data is data transmitted continuously from each sensor node, it can be said that the sensor data is semi-infinite stream data. In such data analysis, new data received at present is more important than old data received in the past. Therefore, by setting the warping range, it is possible to focus on detecting a similar pattern of relatively new data. Furthermore, the amount of calculation is also kept constant for one matrix, leading to efficient processing. Also in the embodiment of the present invention, an example in which a warping range is set is shown.

センサデータXは、x₁，x₂,...,...の値からなる半無限長のデータシーケンスである。x_nは最新の値であり、時間が進むごとにnは増加する。X[i_s:i_e]は時刻isから始まりi_eで終わるXの部分シーケンス(パターン) であり、X[i_s:i_e]の長さはl_x=i_e−i_s+1となる。同様に、Y[j_s:j_e]は時刻j_sから始まりj_eで終わるYの部分シーケンスであり、その長さはl_y=j_e−j_s+1となる。 The sensor data X is a semi-infinite data sequence composed of values of x ₁ , x ₂ ,. x _n is the latest value, and n increases with time. X [i _s : i _e ] is a partial sequence (pattern) of X starting from time is and ending at i _e , and the length of X [i _s : i _e ] is l _x = i _e −i _s +1 Become. Similarly, Y [j _s : j _e ] is a partial sequence of Y that starts at time j _s and ends at j _e , and its length is l _y = j _e −j _s +1.

本発明で検出するセンサデータシーケンス間の部分的な類似パターンとは、以下の（４）式に示す条件を満たす部分シーケンスペアを検出することである。 The partial similar pattern between sensor data sequences detected by the present invention is to detect a partial sequence pair that satisfies the condition shown in the following equation (4).

D(X[i_s:i_e],Y[j_s:j_e])は部分シーケンスペアX[i_s:i_e]とY[j_s:j_e]間のDTW距離であり、Lは部分シーケンスペアの長さを表す関数である。以下、上記（４）式の条件を満たす部分シーケンスを、適合する部分シーケンスとも称する。本発明では、2つの部分シーケンスの平均長であるL(l_x,l_y)=(l_x+l_y)=2を用いるが、L(l_x,l_y)=max(l_x,l_y)やL(l_x,l_y)=min(l_x,l_y)を用いてもかまわない。εは類似判定の閾値を指す。DTW距離は、対応付けられた要素の距離の合計で表されるため、部分シーケンスの長さが長くなるにつれてその値は増加する。そのため、類似判定の閾値も部分シーケンス長に比例することが望ましく、ε(L(l_x,l_y)−l_min)を用いて判定を行なう。適合する部分シーケンスの長さの閾値l_minは、ユーザによって与えられる値であり、部分シーケンスペアの長さLがl_min以上となる部分シーケンスペアを検出することが保証される。 D (X [i _s : i _e ], Y [j _s : j _e ]) is the DTW distance between the partial sequence pair X [i _s : i _e ] and Y [j _s : j _e ], and L is It is a function representing the length of a partial sequence pair. Hereinafter, the partial sequence that satisfies the condition of the above expression (4) is also referred to as a suitable partial sequence. In the present invention, L (l _x , l _y ) = (l _x + l _y ) = 2, which is the average length of two partial sequences, is used, but L (l _x , l _y ) = max (l _x , l _y ) or L (l _x , l _y ) = min (l _x , l _y ) may be used. ε represents a threshold value for similarity determination. Since the DTW distance is represented by the sum of the distances of the associated elements, the value increases as the length of the partial sequence increases. Therefore, it is desirable that the threshold for similarity determination is also proportional to the partial sequence length, and determination is performed using ε (L (l _x , l _y ) −l _min ). The threshold value l _min of the length of the matching partial sequence is a value given by the user, and it is guaranteed that a partial sequence pair whose length L of the partial sequence pair is equal to or greater than l _min is detected.

部分シーケンスペアX[i_s:i_e]とY[j_s:j_e]が適合する時、距離値が極小値をとる部分シーケンスペアに重複(overlap)する多くの部分シーケンスペアも適合してしまう。重複とは部分シーケンスペア間のワーピングパスが交わることであり、2つのワーピングパスにおいて少なくとも1つの要素が共有されることを指す。重複する部分シーケンスペアは冗長な情報であり、ほぼ同区間から複数の類似パターンが検出されることになる。また、不必要な結果についても報告することでアルゴリズムの処理速度が低下する可能性もある。そのため、重複する部分シーケンスペアの中から距離値が極小値をとる部分シーケンスペアを検出する。具体的には、以下の（１）、（２）の条件を満たす部分シーケンスペアを類似パターンとして検出する。 When the partial sequence pairs X [i _s : i _e ] and Y [j _s : j _e ] are matched, many partial sequence pairs that overlap the partial sequence pair whose distance value takes the minimum value are also matched. End up. Overlap means that warping paths between partial sequence pairs intersect, and that at least one element is shared in two warping paths. The overlapping partial sequence pairs are redundant information, and a plurality of similar patterns are detected from substantially the same section. Also, reporting unnecessary results may reduce the processing speed of the algorithm. Therefore, a partial sequence pair having a minimum distance value is detected from among the overlapping partial sequence pairs. Specifically, a partial sequence pair that satisfies the following conditions (1) and (2) is detected as a similar pattern.

（問題1） 2つのシーケンスXとY、閾値ε、適合する部分シーケンスの長さの閾値l_minが与えられた時、次の（１）、（２）の条件を満たす部分シーケンスペアX[i_s:i_e]とY[j_s:j_e]を検出する。 (Problem 1) Given two sequences X and Y, a threshold value ε, and a threshold value l _min for the length of a matching partial sequence, a partial sequence pair X [i that satisfies the following conditions (1) and (2) _s : i _e ] and Y [j _s : j _e ] are detected.

（１） X[i_s:i_e]とY[j_s:j_e]は上記（４）式を満たす。
（２）重複する部分シーケンスペアのグループの中で、D(X[i_s:i_e];Y[j_s:j_e])−ε(L(l_x;l_y)−l_min)が最小値をとる。 (1) X [i _s : i _e ] and Y [j _s : j _e ] satisfy the above equation (4).
(2) Among the groups of overlapping partial sequence pairs, D (X [i _s : i _e ]; Y [j _s : j _e ]) − ε (L (l _x ; l _y ) −l _min ) Take the minimum value.

また、第１の実施の形態では、固定のサンプリング周期に基づいて各センサノード１００がデータを送信し、削減された（サンプリングされた）データを使って類似パターン検出を行う。各センサノード１００のサンプリング周期は、類似パターン検出装置１５０上で決定する。類似パターン検出装置１５０でサンプリング周期を決定するため、サンプリング周期を決定するまでのデータは一度すべて受信する必要がある。その後、各センサノード１００は決定したサンプリング周期に基づいて、サンプリングしたデータを類似パターン検出装置１５０に送信する。サンプリング後のデータに対し、上記特許文献３の技術をそのまま用いて類似パターン検出を行うことも可能である。しかしながら、センサノード１００で収集されたオリジナルデータとは異なるため、オリジナルデータを利用した類似パターン検出と結果が大きく異なる可能性がある。そこで、オリジナルデータにより近い検出結果を得るため、上記特許文献３における検出方法を改良する。具体的には、サンプリングによって送信されなかったデータの情報を、近似的に組み込むことでパターン検出を行うというものである。これにより、データ送信の効率化を実現しつつも、オリジナルデータに近い精度を達成することが可能となる。上記特許文献３では、DTWに基づく類似パターン検出を実現するため、DTWと同等の精度を達成するスコアリング関数と位置行列を用いて類似度の計算を行う。スコアリング関数は、2つのシーケンス間の部分シーケンスペアの類似度をスコアとして出力する関数である。スコアは動的計画法に基づき、スコア行列を用いて計算される。具体的には、2つのシーケンスX=(x₁,...,x_i,...,x_n)とY=(y₁,...,y_j,...,...,y_m)が与えられた時、X[i_s:i_e]とY[j_s:j_e]のスコア値V(X[i_s:i_e];Y[j_s:j_e])は、以下の（５）式のように計算される。 In the first embodiment, each sensor node 100 transmits data based on a fixed sampling period, and similar pattern detection is performed using the reduced (sampled) data. The sampling period of each sensor node 100 is determined on the similar pattern detection device 150. Since the similar pattern detection apparatus 150 determines the sampling period, it is necessary to receive all data until the sampling period is determined. Thereafter, each sensor node 100 transmits the sampled data to the similar pattern detection device 150 based on the determined sampling period. Similar pattern detection can also be performed on the sampled data using the technique of Patent Document 3 as it is. However, since it is different from the original data collected by the sensor node 100, there is a possibility that the result is greatly different from the similar pattern detection using the original data. Therefore, in order to obtain a detection result closer to the original data, the detection method in Patent Document 3 is improved. Specifically, pattern detection is performed by approximately incorporating data information that has not been transmitted by sampling. As a result, it is possible to achieve an accuracy close to that of the original data while realizing an efficient data transmission. In the above Patent Document 3, in order to realize similar pattern detection based on DTW, similarity is calculated using a scoring function and a position matrix that achieve accuracy equivalent to that of DTW. The scoring function is a function that outputs the similarity of a partial sequence pair between two sequences as a score. The score is calculated using a score matrix based on dynamic programming. Specifically, two sequences X = (x ₁ , ..., x _i , ..., x _n ) and Y = (y ₁ , ..., y _j , ..., ..., when y _m) is _{_{given, X [i s: i e}} ] and Y [j _s: j _e] score value V (X a _{_{[i s: i e];}} Y [j s: j e]) is And is calculated as in the following equation (5).

スコア値を計算するために対応付けられた要素同士の並びが、スコア行列におけるワーピングパスである。スコア値は距離の閾値εと各要素の距離値の差を累積することにより決定されるため、部分シーケンスペアが条件を満たさない場合には、スコア値は負の値を示す。そのため、不適合な部分シーケンスペアをスコアの値から判断することが可能となる。また、負の値となった要素のスコアをゼロでリセットすることで、その要素から新たにスコア計算を開始する。この性質を利用して、不適合な部分シーケンスペアを枝刈りし、適合する可能性が高い部分シーケンスペアのみを1つの行列で効率的に計算することができる。また、上記（５）式において、b_v,b_h,b_dはLによって決定される重みである。例えば、L(l_x,l_y)=(l_x+l_y)=2において、b_v=b_h=1/2、b_d=1となる。これは、垂直または水平方向の要素が引き継がれた場合、シーケンス長は1/2増加し、対角方向の要素が引き継がれた場合は1 増加するためである。また、L(l_x,l_y)=max(l_x,l_y)において、l_x>l_yの場合、b_d=b_h=1,b_v=０,l_x<l_yの場合、b_d=b_v=1,b_h=0、l_x=l_yの場合、b_d=1,b_v=b_h=0となる。L(l_x,l_y)=min(l_x,l_y)についても同様に決定される。 An arrangement of elements associated with each other to calculate a score value is a warping path in the score matrix. Since the score value is determined by accumulating the difference between the distance threshold ε and the distance value of each element, the score value indicates a negative value when the partial sequence pair does not satisfy the condition. Therefore, it becomes possible to determine a non-conforming partial sequence pair from the score value. In addition, by resetting the score of an element having a negative value to zero, score calculation is newly started from that element. By utilizing this property, it is possible to prun out incompatible partial sequence pairs and efficiently calculate only partial sequence pairs that are highly likely to be matched with one matrix. In the above equation (5), b _v , b _h , and b _d are weights determined by L. For example, when L (l _x , l _y ) = (l _x + l _y ) = 2, b _v = b _h = 1/2 and b _d = 1. This is because the sequence length increases by 1/2 when a vertical or horizontal element is inherited and increases by 1 when a diagonal element is inherited. In addition, in L (l _x , l _y ) = max (l _x , l _y ), when l _x > l _y , b _d = b _h = 1, b _v = 0, l _x <l _y , When b _d = b _v = 1, b _h = 0 and l _x = l _y , b _d = 1, b _v = b _h = 0. L (l _x , l _y ) = min (l _x , l _y ) is similarly determined.

スコア行列の中での各ワーピングパス上の重み(b_v,b_h,b_d)の合計は、L(l_x;l_y)と等しくなるように設計されており、DTW距離とスコアが変換可能であることが保証される。部分シーケンスペアのDTW距離は、次の（６）式により計算される。 The sum of the weights (b _v , b _h , b _d ) on each warping path in the score matrix is designed to be equal to L (l _x ; l _y ), and the DTW distance and score are converted It is guaranteed that it is possible. The DTW distance of the partial sequence pair is calculated by the following equation (6).

スコア行列では、類似する部分シーケンスペアのスコアと終了点の情報が保持される。どこから始まる部分シーケンスペアが類似していたのかという開始点の情報は、位置行列により保持される。開始点は座標値として表され、以下の（７）式のように求められる。 In the score matrix, information on the scores and end points of similar partial sequence pairs is held. Information of the starting point indicating where the partial sequence pairs starting from similar are similar is held by the position matrix. The start point is expressed as a coordinate value, and is obtained as in the following equation (7).

s(i_e,j_e)は、X[i_s:i_e]とY[j_s:j_e]のマッチングの開始点(i_s,j_s)を指し示す。スコア行列を位置行列の2つを用いることで、効率的に類似パターンを検出する。 s (i _e , j _e ) indicates the starting point (i _s , j _s ) of matching between X [i _s : i _e ] and Y [j _s : j _e ]. Similar patterns are efficiently detected by using two score matrices.

本発明では、サンプリングされたシーケンスを用いて類似パターンを検出する。オリジナルデータに近い精度を達成するため、上記スコアリング関数と位置行列の改良を行う。直感的なアイディアは、スコアの計算において垂直方向、水平方向、対角方向のいずれかの要素を選択する場合、オリジナルのシーケンスから削除された要素によって与えられる距離値を補完することである。サンプリング周期をTとすると、サンプリングされたシーケンス間のスコアを計算するとき、現在のセルと隣接するセルとの間にはT−1個の隠れたセルが存在する。この隠れたセルにおける距離値を、現在のセルの距離値を用いて近似する。長さn のシーケンスXと長さmのシーケンスYのサンプリング周期をT_xとT_y、オリジナルシーケンスXとYのサンプリングされたシーケンスをX’=(x₁,...,x_i,...,x_{|_n/Tx」})、Y’=(y₁,...,y_j,...,y_{|_ｍ/Ty」})とする時、X’とY’の部分シーケンスのスコアは、以下の（８）式のように計算される。 In the present invention, a similar pattern is detected using a sampled sequence. In order to achieve accuracy close to that of the original data, the scoring function and the position matrix are improved. The intuitive idea is to complement the distance values given by the elements deleted from the original sequence when selecting either vertical, horizontal or diagonal elements in the score calculation. If the sampling period is T, when calculating the score between the sampled sequences, there are T−1 hidden cells between the current cell and the adjacent cells. The distance value in this hidden cell is approximated using the current cell distance value. Sampling period of sequence x of length n and sequence Y of length m is T _x and T _y , and sampled sequence of original sequence X and Y is X '= (x ₁ , ..., x _i , ... ., x _{| _n / Tx "} ), Y '= (y ₁ , ..., y _j , ..., y _{| _m / Ty"} ), the score of the partial sequence of X' and Y 'is And is calculated as in the following equation (8).

ただし、|_Ｍ」は、Ｍの小数点以下を切り捨てた値を表わす。 However, | _M ”represents a value obtained by truncating the decimal point of M.

検出エラーを最小化するため、各方向のスコアを選択した場合にT_xまたはT_y個の距離値を補完する。上記（８）式において、b_v,b_h,b_dの重みも更新される。L(l_x,l_y)=(l_x+l_y)=2の時、垂直方向または水平方向のセルのスコアが選択された場合、Lの値はT_x/2またはT_y/2増加する。同様に、対角方向のセルのスコアが選択された場合、Lの値は(T_x+T_y)/2増加する。そのため、b_v＝T_y/2、b_h＝T_x/2、b_d＝(T_x+T_y)/2となる。 To minimize detection errors, complementing the T _x or T _y-number of distance values when you select a score for each direction. In the above equation (8), the weights of b _v , b _h and b _d are also updated. When L (l _x , l _y ) = (l _x + l _y ) = 2, if a vertical or horizontal cell score is selected, the value of L increases by T _x / 2 or T _y / 2 To do. Similarly, if a diagonal cell score is selected, the value of L increases by (T _x + T _y ) / 2. Therefore, b _v = T _y / 2, b _h = T _x / 2, and b _d = (T _x + T _y ) / 2.

上記（８）式の改良に合わせて、位置行列も改良される。しかし、その操作は上記（７）式とほぼ同様である。もし、3つのセル(垂直、水平、対角方向のセル) のいずれかが選択される場合は、そのセルの開始点を引き継ぐ。そうでなければ、サンプリング周期T_x,T_yに応じて開始点が設定される。すなわち、開始点s(i,j)は以下の（９）式のように決定される。 The position matrix is also improved in accordance with the improvement of the above equation (8). However, the operation is almost the same as the above equation (7). If any of the three cells (vertical, horizontal and diagonal cells) is selected, the starting point of that cell is taken over. Otherwise, the starting point is set according to the sampling periods T _x and T _y . That is, the starting point s (i, j) is determined as in the following equation (9).

次に、スコアリング関数、スコア行列、位置行列を用い、サンプリングされたシーケンス間の類似部分シーケンスペアをどのように検出か述べる。パターン検出対象となるセンサノードからデータを受信した時、検出処理を開始する。ここで、2つのセンサノード１００のオリジナルのデータシーケンスをXとY(最新の要素はそれぞれnとm)、サンプリングされたシーケンスをX’=(x₁,...,x_i,...,x_Ln/Tx」)、Y’=(y₁,...,y_j,...,y_Lm/Ty」)とする。何れかのセンサノード１００から最新のデータを受信すると、そのデータに対応するスコア行列、位置行列の要素を更新する。そして、条件を満たすスコアを示す部分シーケンスペアが存在するかどうかをチェックし(問題1の条件（1）)、存在する場合にはそのペアが最適なペアであるかの確認を行う(問題1の条件（2）)。最適なペアを検出するために、List配列を利用する。List配列が空の場合、条件を満たした類似部分シーケンスペアのスコアv_max、開始点(i_s,j_s)、終了点(j_s,j_e)が保持される。List配列が空でない場合には、新たに検出された部分シーケンスペアがList配列に格納されている部分シーケンスペアと重複するかどうかのチェックが行われ、もし重複する場合にはスコアが大きい方の部分シーケンスペアのみの情報がＬｉｓｔ配列に保持される。重複する部分シーケンスペアが存在しない場合には、新たにList配列に追加される。そして、現在の最適な部分シーケンスペアが今後出現する部分シーケンスペアによって置き換わることがないことを確認した後、すなわち、以下の（１０）式に示す条件を満たす時、類似部分シーケンスペアとして報告する。 Next, how to detect similar partial sequence pairs between sampled sequences using a scoring function, a score matrix, and a position matrix will be described. When data is received from a sensor node that is a pattern detection target, detection processing is started. Here, the original data sequences of the two sensor nodes 100 are X and Y (the latest elements are n and m, respectively), and the sampled sequences are X ′ = (x ₁ , ..., x _i , ... , x _{Ln / Tx _"),} _{Y '= (y 1} , ..., y j, ..., and y _{Lm / Ty").} When the latest data is received from any one of the sensor nodes 100, the elements of the score matrix and the position matrix corresponding to the data are updated. Then, it checks whether there is a partial sequence pair indicating a score that satisfies the condition (condition 1 of problem 1), and if it exists, checks whether the pair is an optimal pair (problem 1 Condition (2)). Use the List array to find the best pair. When the List array is empty, the score v _max , the start point (i _s , j _s ), and the end point (j _s , j _e ) of similar partial sequence pairs that satisfy the condition are retained. If the List array is not empty, a check is made to see if the newly detected partial sequence pair overlaps with the partial sequence pair stored in the List array. Information of only the partial sequence pair is held in the List array. If there is no overlapping partial sequence pair, it is newly added to the List array. Then, after confirming that the present optimum partial sequence pair is not replaced by a partial sequence pair that appears in the future, that is, when the condition shown in the following equation (10) is satisfied, it is reported as a similar partial sequence pair.

受信した最新のデータに対応するスコア行列、位置行列の各要素が計算される時、List配列の中の部分シーケンスペアが上記（１０）式に示す条件を満たすかどうかが調べられる。受信した最新のデータに対応するスコア行列、位置行列の各要素を計算し終えた時点で、上記（１０）式の条件を満たす部分シーケンスペアは報告され、その後List 配列から削除される。 When each element of the score matrix and the position matrix corresponding to the latest received data is calculated, it is checked whether the partial sequence pair in the List array satisfies the condition shown in the above equation (10). When the calculation of each element of the score matrix and the position matrix corresponding to the latest received data is completed, the partial sequence pair that satisfies the condition of the above expression (10) is reported and then deleted from the List array.

＜類似パターン検出装置の作用＞
次に、類似パターン検出装置１５０の具体的な処理について説明する。ここでは、２つの時系列データX’=(x₁,...,x_i,...,x_Ln/Tx」)と、時系列データY’=(y₁,...,y_j,...,y_Lm/Ty」)とにおける各要素が、順次与えられるものとする。このとき、図６に示す類似部分シーケンスペア検出処理ルーチンが、類似パターン検出装置１５０において実行される。 <Operation of Similar Pattern Detection Device>
Next, specific processing of the similar pattern detection apparatus 150 will be described. Here, two time series data X ′ = (x ₁ , ..., x _i , ..., x _{Ln / Tx ”} ) and time series data Y ′ = (y ₁ , ..., y _j , ..., y _{Lm / Ty "} )) and the respective elements are given sequentially. At this time, the similar partial sequence pair detection processing routine shown in FIG.

まず、ステップＳ１０１において、データ取得部７２は、データ入力部６０経由で、最新のデータx_Ln/Tx」を受信する。次に、ステップＳ１０２において、処理部７０は、ｊの値を「|_（ｍ−ｗ）／Ｔy」」にリセットする。 First, in step S <b _> 101, the data acquisition unit 72 receives the latest data x _{Ln / Tx ”} via the data input unit 60. Next, in step S102, the processing unit 70 resets the value of j to “| _ (m−w) / Ty”.

その後、パターン検出部７６は、ｊの値をインクリメント（「１」増加）させながら、ｊの値が|_（ｍ−ｗ）／Ｔy」になるまで、ステップＳ１０３〜Ｓ１０５の処理を繰り返す。 Thereafter, the pattern detection unit 76 repeats the processing of steps S103 to S105 until the value of j becomes | _ (m−w) / Ty ”while incrementing the value of j (increasing“ 1 ”).

ステップＳ１０３において、パターン検出部７６は、ｘ_n/Txとｙ_jの類似度スコアｖ（ｎ/Ｔ_ｘ,ｊ）、開始点ｓ（ｎ/Ｔ_x,ｊ）を、（８）式、（９）式を用いて計算する。 In step S103, the pattern detection unit 76 uses the similarity score v (n / T _x , j) of x _{n / Tx} and y _j , and the starting point s (n / T _x , j) as the expression (8), ( 9) Calculate using the equation.

ステップＳ１０４において、パターン検出部７６は、ｖ（ｎ/Ｔ_ｘ,ｊ）−εｌ_ｍｉｎ≧０の条件を満たすか否かを判断し、この条件を満たす場合（Ｙｅｓ）、ステップＳ１０５に進み、この条件を満たさない場合（Ｎｏ）、上記ステップＳ１０３へ戻るか、ステップＳ１１１に進む。 In step S104, the pattern detection unit 76 determines whether or not the condition of v (n / T _x , j) −εl _min ≧ 0 is satisfied, and when this condition is satisfied (Yes), the process proceeds to step S105. If the condition is not satisfied (No), the process returns to step S103 or proceeds to step S111.

ステップＳ１０５では、Ｌｉｓｔ配列格納処理を行う。Ｌｉｓｔ配列格納処理は、図７に示す処理ルーチンによって実現される。 In step S105, List array storage processing is performed. The List array storage processing is realized by the processing routine shown in FIG.

ステップＳ１２０において、パターン検出部７６は、行列情報記憶部８４に記憶されているＬｉｓｔ配列が空であるか否かを判定する。Ｌｉｓｔ配列が空である場合には、ステップＳ１２１において、パターン検出部７６は、Ｌｉｓｔ配列のＬｉｓｔ[１].ｖ_maxに、スコアｖ（ｎ/Ｔ_ｘ,ｊ）を格納し、Ｌｉｓｔ配列の（Ｌｉｓｔ[１].ｉ_s、Ｌｉｓｔ[１].ｊ_s）に、ｓ（ｎ/Ｔ_ｘ,ｊ）を格納し、Ｌｉｓｔ配列の（Ｌｉｓｔ[１].ｉ_e、Ｌｉｓｔ[１].ｊ_e）に、（ｎ,ｊ・Ｔ_y）を格納する。そして、当該処理ルーチンを終了する。 In step S120, the pattern detection unit 76 determines whether the List array stored in the matrix information storage unit 84 is empty. If the List array is empty, in step S121, the pattern detection unit 76 stores the score v (n / T _x , j) in List [1] .v _max of the List array, and ( S (n / T _x , j) is stored in List [1] .i _s , List [1] .j _s ), and (List [1] .i _e , List [1] .j _e in the List array) ), (N, j · T _y ) is stored. Then, the processing routine ends.

一方、Ｌｉｓｔ配列が空でない場合には、パターン検出部７６は、Ｌｉｓｔ配列内の各要素ｌ（ｌ＝１,...,Ｌ）について、ステップＳ１２２の処理を繰り返す。 On the other hand, when the List array is not empty, the pattern detection unit 76 repeats the process of Step S122 for each element l (l = 1,..., L) in the List array.

ステップＳ１２２において、パターン検出部７６は、開始点ｓ（ｎ/Ｔ_x,ｊ）が、Ｌｉｓｔ配列の（Ｌｉｓｔ[ｌ].ｉ_s、Ｌｉｓｔ[ｌ].ｊ_s）と一致するか否かを判定し、一致しない場合には、ステップＳ１２２を繰り返すか、ステップＳ１２３へ進む。 In step S122, the pattern detection unit 76 determines whether or not the start point s (n / T _x , j) matches (List [l] .i _s , List [l] .j _s ) in the List array. If it is determined that they do not match, step S122 is repeated or the process proceeds to step S123.

ステップＳ１２３において、パターン検出部７６は、Ｌｉｓｔ配列の全ての要素と重複しないか否かを判定する。上記ステップＳ１２２の判定の結果、全ての要素と開始点が一致しない場合には、重複しないと判断し、ステップＳ１２４において、 In step S123, the pattern detection unit 76 determines whether or not it overlaps with all elements of the List array. As a result of the determination in step S122, if all elements do not match the start point, it is determined that there is no overlap, and in step S124,

Ｌｉｓｔ配列のＬｉｓｔ[Ｌ＋１].ｖ_maxに、スコアｖ（ｎ/Ｔ_ｘ,ｊ）を格納し、Ｌｉｓｔ配列の（Ｌｉｓｔ[Ｌ＋１].ｉ_s、Ｌｉｓｔ[Ｌ＋１].ｊ_s）に、ｓ（ｎ/Ｔ_ｘ,ｊ）を格納し、Ｌｉｓｔ配列の（Ｌｉｓｔ[Ｌ＋１].ｉ_e、Ｌｉｓｔ[Ｌ＋１].ｊ_e）に、（ｎ,ｊ・Ｔ_y）を格納し、当該処理ルーチンを終了する。 The score v (n / T _x , j) is stored in List [L + 1] .v _max of the List array, and s (is stored in (List [L + 1] .i _s , List [L + 1] .j _s ) of the List array. n / _{T x,} j) stores, in the List sequence (List [L + 1] .i e, the _{List [L + 1] .j e} ), store the _{(n, j · T y)} , it terminates the processing routine To do.

一方、上記ステップＳ１２２において、開始点ｓ（ｎ/Ｔ_x,ｊ）が、Ｌｉｓｔ配列の（Ｌｉｓｔ[ｌ].ｉ_s、Ｌｉｓｔ[ｌ].ｊ_s）と一致すると判定された場合、Ｌｉｓｔ配列の当該要素と重複すると判断し、ステップＳ１２５において、パターン検出部７６は、 On the other hand, when it is determined in step S122 that the starting point s (n / T _x , j) matches (List [l] .i _s , List [l] .j _s ) in the List array, the List array In step S125, the pattern detection unit 76 determines that the element overlaps the element.

スコアｖ（ｎ/Ｔ_ｘ,ｊ）が、Ｌｉｓｔ配列の当該要素のスコアＬｉｓｔ[ｌ].ｖ_max以上であるか否かを判定する。スコアｖ（ｎ/Ｔ_ｘ,ｊ）が、Ｌｉｓｔ配列の当該要素のスコアＬｉｓｔ[ｌ].ｖ_max以上である場合には、ステップＳ１２６において、パターン検出部７６は、Ｌｉｓｔ配列のＬｉｓｔ[ｌ].ｖ_maxに、スコアｖ（ｎ/Ｔ_ｘ,ｊ）を上書きし、Ｌｉｓｔ配列の（Ｌｉｓｔ[ｌ].ｉ_s、Ｌｉｓｔ[ｌ].ｊ_s）に、ｓ（ｎ/Ｔ_ｘ,ｊ）を上書きし、Ｌｉｓｔ配列の（Ｌｉｓｔ[ｌ].ｉ_e、Ｌｉｓｔ[ｌ].ｊ_e）に、（ｎ,ｊ・Ｔ_y）を上書きし、当該処理ルーチンを終了する。一方、スコアｖ（ｎ/Ｔ_ｘ,ｊ）が、Ｌｉｓｔ配列の当該要素のスコアＬｉｓｔ[ｌ].ｖ_max未満である場合には、当該処理ルーチンを終了する。 It is determined whether the score v (n / T _x , j) is equal to or higher than the score List [l] .v _{max of the} element in the List array. If the score v (n / T _x , j) is equal to or higher than the score List [l] .v _{max of the} element of the List array, in step S126, the pattern detection unit 76 lists List [l] of the List array. .v _max is overwritten with a score v (n / T _x , j), and (List [l] .i _s , List [l] .j _s ) of the List array is replaced with s (n / T _x , j) Is overwritten, (n, j · T _y ) is overwritten on (List [l] .i _e , List [l] .j _e ) of the List array, and the processing routine is terminated. On the other hand, when the score v (n / T _x , j) is less than the score List [l] .v _{max of the} element in the List array, the processing routine is terminated.

また、ステップＳ１０６において、データ取得部７２は、データ入力部６０経由で、最新のデータｙ_Ln/Ty」を受信すると、パターン検出部７６は、ステップＳ１０２〜ステップＳ１０５の処理と同様にして、ステップＳ１０７〜ステップＳ１１０の処理を行う。 In step S106, when the data acquisition unit 72 receives the latest data y _{Ln / Ty} via the data input unit 60, the pattern detection unit 76 performs steps similar to those in steps S102 to S105. The processing from S107 to step S110 is performed.

上記ステップＳ１１０は、図８に示す処理ルーチンによって実現され、ステップＳ１２０〜ステップＳ１２６の処理と同様にして、ステップＳ１３０〜ステップＳ１３６の処理を行う。 Step S110 is realized by the processing routine shown in FIG. 8, and the processing of Step S130 to Step S136 is performed in the same manner as the processing of Step S120 to Step S126.

そして、ステップＳ１１１において、パターン検出部７６は、Ｌｉｓｔ配列に格納されている部分シーケンスペアの各々が、上記（１０）式に示す条件を満たすか否かを判定し上記（１０）式に示す条件を満たす部分シーケンスペアが存在する場合には、当該要素について、その後出現する部分シーケンスペアによって置き換わることがないと判断し、それに対応するスコア、開始点、および終了点を、データ出力部８０経由でユーザに報知し、当該部分シーケンスペアを、Ｌｉｓｔ配列から削除する。 In step S111, the pattern detection unit 76 determines whether or not each of the partial sequence pairs stored in the List array satisfies the condition shown in the above expression (10), and the condition shown in the above expression (10). If there is a partial sequence pair that satisfies the condition, the element is determined not to be replaced by a partial sequence pair that appears thereafter, and the corresponding score, start point, and end point are determined via the data output unit 80. The user is notified and the partial sequence pair is deleted from the List array.

そして、ステップＳ１１２において、パターン検出部７６は、２つの時系列データの受信が終了したか否か判断し、終了していなければ（Ｎｏ）、ステップＳ１０１あるいはステップＳ１０６に戻り、終了していれば（Ｙｅｓ）、処理を終了する。 In step S112, the pattern detection unit 76 determines whether or not the reception of the two time-series data has been completed. If not completed (No), the process returns to step S101 or step S106, and if completed. (Yes), the process ends.

以上説明したように、本発明の第１の実施の形態によれば、それぞれの時系列データのサンプリング周期を用いて、類似度スコアを算出することにより、時系列データのすべてのデータを用いることなく、少ないデータであっても、高い精度で類似部分シーケンスを検出することができる。 As described above, according to the first embodiment of the present invention, all data of time-series data is used by calculating a similarity score using the sampling period of each time-series data. Even if there is little data, a similar partial sequence can be detected with high accuracy.

また、センサなどのデバイスにより取得されるデータからの類似パターンの検出において、センサノードで送信するデータを制限しつつも、オリジナルのデータを用いた場合に近い精度で類似パターンを検出することが可能となる。また、センサネットワークに限らず、時系列データからの類似パターン検出を高速に検出する場合においても適用することが可能である。 In addition, in detecting similar patterns from data acquired by devices such as sensors, it is possible to detect similar patterns with accuracy close to that of original data while limiting the data transmitted by sensor nodes. It becomes. Further, the present invention can be applied not only to the sensor network but also to the case of detecting similar pattern detection from time series data at high speed.

〔第２の実施の形態〕
＜システム構成＞
次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Second Embodiment]
<System configuration>
Next, a second embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、サンプリング周期を、センサノード側で計算している点と、定期的にサンプリング周期を計算している点とが第１の実施の形態と異なっている。 The second embodiment is different from the first embodiment in that the sampling cycle is calculated on the sensor node side and the sampling cycle is periodically calculated.

図９に示すように、第２の実施の形態に係るセンサノード２００は、データセンシング部１０、データ取得部１２、サンプリング部１４、周期記憶部１６、無線通信部１８、及び周期計算部２２０から構成される。 As shown in FIG. 9, the sensor node 200 according to the second embodiment includes a data sensing unit 10, a data acquisition unit 12, a sampling unit 14, a cycle storage unit 16, a wireless communication unit 18, and a cycle calculation unit 220. Composed.

周期計算部２２０は、データ取得部１２から得た情報を用いて、サンプリング周期を敵的に計算する。サンプリング周期の計算方法は、周期計算部７８と同様である。 The period calculation unit 220 uses the information obtained from the data acquisition unit 12 to calculate the sampling period in an adversarial manner. The sampling period calculation method is the same as that of the period calculation unit 78.

周期計算部２２０で計算されたサンプリング周期は、無線通信部１８により、類似パターン検出装置１５０へ送信される。 The sampling period calculated by the period calculation unit 220 is transmitted to the similar pattern detection device 150 by the wireless communication unit 18.

第２の実施の形態における類似部分シーケンスを検出する原理について説明する。 The principle of detecting a similar partial sequence in the second embodiment will be described.

第２の実施の形態では、サンプリング周期を随時更新し、類似パターン検出を行う。上記の第1の実施の形態では、固定のサンプリング周期を用いたが、サンプリング周期は既に取得したデータを用いて決定されたものであるため、それ以降に取得したデータに対して有効であるかどうかは保証されない。例えば、温度センサから取得されるデータを考える。サンプリング周期を決定したデータシーケンスは温度変化があまり見られないデータであり、その後天候や環境変化により定期的に温度変化が生じた場合、温度変化前のサンプリング周期を用いてサンプリングを行うと、細かな変化の情報を落としたデータシーケンスが、サーバである類似パターン検出装置１５０に送信される。このような変化に対応するためには、その都度サンプリング周期を決定し、データをサンプリングすることが望ましい。 In the second embodiment, the sampling cycle is updated as needed to detect similar patterns. In the first embodiment, a fixed sampling period is used. However, since the sampling period is determined using already acquired data, is it effective for data acquired thereafter? There is no guarantee. For example, consider data acquired from a temperature sensor. The data sequence that determines the sampling period is data that does not change much in temperature.If the temperature changes periodically due to changes in the weather or the environment after that, sampling is performed using the sampling period before the temperature change. A data sequence in which information on various changes is dropped is transmitted to the similar pattern detection apparatus 150 serving as a server. In order to cope with such a change, it is desirable to determine the sampling period and sample the data each time.

そこで、本実施の形態では、センサノード２００において定期的にサンプリング周期を計算し直すことで、サンプリング周期を適切に調節する。センサノード２００で取得されたオリジナルシーケンスをX=(x₁,...,x_i,...,x_n)、調整されたサンプリング周期でサンプリングされたシーケンスをX’=(x₁,...,x_i,...,x_n′)、サンプリング周期をT_x=(t_x1,...,t_xi,...,t_xn′)とする。センサノード２００は、現在取得した最新のデータx_nと、センサノード２００から類似パターン検出装置１５０に送信された最後のデータx_n′−1間のデータ数がt_xn′−1である場合、センサノード２００は、x_nをx_n′として無線通信部１８により類似パターン検出装置１５０に送信する。同時に、x_nまでのオリジナルデータを用いて、次に送信するデータのためのサンプリング周期t_xn′を決定する。決定したサンプリング周期は、類似パターン検出のために類似パターン検出装置１５０に送られる。 Therefore, in the present embodiment, the sampling period is appropriately recalculated in the sensor node 200 to appropriately adjust the sampling period. The original sequence acquired by the sensor node 200 is X = (x ₁ , ..., x _i , ..., x _n ), and the sequence sampled at the adjusted sampling period is X ′ = (x ₁ ,. .., x _i , ..., x _{n ′} ), and the sampling period is T _x = (t _x1 , ..., t _xi , ..., t _{xn ′} ). If the number of data between the latest data x _n currently acquired and the last data x _n′−1 transmitted from the sensor node 200 to the similar pattern detection device 150 is t _xn′−1 , the sensor node 200 The sensor node 200 transmits _xn to _{xn ′} to the similar pattern detection device 150 by the wireless communication unit 18. At the same time, the sampling period t _{xn ′} for the data to be transmitted next is determined using the original data up to x _n . The determined sampling period is sent to the similar pattern detection apparatus 150 for detecting a similar pattern.

この時、センサノード２００において取得したすべてのデータを保持することは不可能であるため、過去のデータを捨てて、最新のデータのみをサンプリング周期の計算のために保持することが望ましい。例えば、スライディングウィンドウを使って、常に一定のデータ数を保持するなどが考えられる。また、蓄積したデータに対してバッチ的にサンプリング周期を計算するのではなく、取得したデータに対してインクリメンタルにサンプリング周期を計算するアルゴリズムも提案されている(http://dl.acm.org/citation.cfm?id=1146470)。これらのアルゴリズムを利用することで、センサノード２００にデータを蓄積することなく、サンプリング周期を計算することも可能である。 At this time, since it is impossible to hold all data acquired in the sensor node 200, it is desirable to discard past data and hold only the latest data for the calculation of the sampling period. For example, it may be possible to always maintain a certain number of data using a sliding window. In addition, an algorithm has been proposed that calculates the sampling period incrementally for the acquired data, instead of calculating the sampling period batchwise for the accumulated data (http://dl.acm.org/ citation.cfm? id = 1146470). By using these algorithms, the sampling period can be calculated without accumulating data in the sensor node 200.

類似パターン検出装置１５０は、各センサノード２００からのデータとサンプリング周期の情報を受信し、類似パターンを検出する。最新の要素がn′であるサンプリングされたシーケンスX’=(x₁,.., x_i,...,x_n′)と、最新の要素がm′であるサンプリングされたシーケンスY’=(y₁,...,y_j,...,y_m′)を考える。また、XのオリジナルシーケンスXのサンプリング周期をT_x=(t_x1,...,t_xi,...,t_xn′)、YのオリジナルシーケンスYのサンプリング周期をT_y=(t_y1,...,t_yj,...,t_ym′)とする。サンプリングされたシーケンスにおける部分シーケンスペアのスコアは、以下の（１１）式に従って計算される。 The similar pattern detection apparatus 150 receives data from each sensor node 200 and information on the sampling period, and detects a similar pattern. Sampled sequence X ′ = (x ₁ , .., x _i , ..., x _{n ′} ) with latest element _{n ′} and sampled sequence Y ′ = with latest element m ′ Consider (y ₁ , ..., y _j , ..., y _{m ′} ). Also, the sampling period of the original sequence X of X is T _x = (t _x1 , ..., t _xi , ..., t _{xn ′} ), and the sampling period of the original sequence Y of Y is T _y = (t _y1 , ..., t _yj , ..., t _{ym ′} ). The score of the partial sequence pair in the sampled sequence is calculated according to the following equation (11).

サンプリング周期がスコア計算の都度変わることにより、シーケンス長が変化するため、各方向の重み(B_v,B_h,B_d)も随時更新される。一方、位置行列については、サンプリング周期の変化を開始点情報に組み込み、以下の（１２）式に従って計算される。 Since the sequence length changes as the sampling period changes each time the score is calculated, the weights (B _v , B _h , B _d ) in each direction are also updated as needed. On the other hand, the position matrix is calculated according to the following equation (12) by incorporating the change of the sampling period into the starting point information.

＜センサノードの作用＞
次に、第２の実施の形態に係るセンサノード２００の作用について説明する。 <Operation of sensor node>
Next, the operation of the sensor node 200 according to the second embodiment will be described.

センサノード２００によって、図１０に示す処理ルーチンが実行される。 A processing routine shown in FIG. 10 is executed by the sensor node 200.

まず、ステップＳ２０１において、データ取得部１２から、データセンシング部１０からの現時点ｎのデータｘ_nを取得する。ステップＳ２０２では、サンプリング部１４が、前回のサンプルの要素ｎ’−１と、現時点ｎとの間隔が、前回計算されたサンプリング周期Ｔ_xn'-1に到達したか否かを判定する。前回のサンプルの要素ｎ’−１と、現時点ｎとの間隔が、前回計算されたサンプリング周期Ｔ_xn'-1に到達した場合には、ステップＳ２０３において、周期計算部２２０によって、ｘ_nまでのデータを用いて、サンプリング周期を計算し、周期記憶部１６に格納する。そして、ステップＳ２０４において、上記ステップＳ２０１で取得したデータｘ_n（すなわち、サンプリングされたデータx_n′）と、上記ステップＳ２０３で計算されたサンプリング周期とを、無線通信部１８により、類似パターン検出装置１５０へ送信する。 First, in step S <b _> 201, the current data n from the data sensing unit 10 is acquired from the data acquisition unit 12. In step S202, the sampling unit 14 determines whether or not the interval between the previous sample element n′−1 and the current n has reached the previously calculated sampling cycle T _xn′−1 . When the interval between the element n′−1 of the previous sample and the current n has reached the previously calculated sampling period T _xn′−1 , in step S203, the period calculation unit 220 sets the interval up to x _n . The sampling period is calculated using the data and stored in the period storage unit 16. In step S204, the wireless communication unit 18 converts the data x _n acquired in step S201 (that is, the sampled data x _{n ′} ) and the sampling period calculated in step S203 to the similar pattern detection device. 150.

一方、前回のサンプルの要素ｎ’−１と、現時点ｎとの間隔が、前回計算されたサンプリング周期Ｔ_xn'-1に到達していない場合には、ステップＳ２０５へ移行する。 On the other hand, when the interval between the element n′−1 of the previous sample and the current time n has not reached the previously calculated sampling period T _xn′−1 , the process proceeds to step S205.

ステップＳ２０５では、データセンシング部１０からのデータの受信が終了したか否か判断し、終了していなければ（Ｎｏ）、ステップＳ２０１に戻り、終了していれば（Ｙｅｓ）、処理を終了する。 In step S205, it is determined whether or not the reception of data from the data sensing unit 10 is completed. If not completed (No), the process returns to step S201, and if completed (Yes), the process is terminated.

＜類似パターン検出装置の作用＞
類似パターン検出装置１５０は、上記の第１の実施の形態で説明した類似部分シーケンスペア検出処理ルーチンを実行する。ここで、上記ステップＳ１０３では、パターン検出部７６は、ｘ_n/Txとｙ_jの類似度スコアｖ（ｎ/Ｔ_ｘ,ｊ）、開始点ｓ（ｎ/Ｔ_x,ｊ）を、上記（１１）式、（１２）式を用いて計算する。また、サンプリング周期Ｔ_ｘとして、データ入力部６０によりセンサノード２００から受信したサンプリング周期を用いる。 <Operation of Similar Pattern Detection Device>
The similar pattern detection device 150 executes the similar partial sequence pair detection processing routine described in the first embodiment. Here, in step S103, the pattern detection unit 76 uses the similarity score v (n / T _x , j) of x _{n / Tx} and y _j and the start point s (n / T _x , j) to the above ( Calculation is performed using equations (11) and (12). Further, as the sampling period T _x, using a sampling period received from the sensor node 200 by the data input unit 60.

また、上記ステップＳ１０８も同様に、上記（１１）式、（１２）式と同様の式を用いて計算する。 In the same manner, step S108 is calculated using the same equations as the above equations (11) and (12).

以上説明したように、第２の実施の形態によれば、それぞれの時系列データについて随時計算されるサンプリング周期を用いて、類似度スコアを算出することにより、時系列データのすべてのデータを用いることなく、少ないデータであっても、高い精度で類似部分シーケンスを検出することができる。 As described above, according to the second embodiment, all the data of the time series data is used by calculating the similarity score using the sampling period calculated as needed for each time series data. Therefore, even with a small amount of data, a similar partial sequence can be detected with high accuracy.

＜実施例＞
本発明の有効性を検証するため、実データを用いた実験を行った。実験には、4GBのメモリとIntel（Ｒ） Core2 2.4GHzのCPUを搭載したLinux（登録商標）マシンを用いた。実験結果を視覚的に把握するため、ここでは散布図を用いる。散布図は、データシーケンスXとYにおける類似部分シーケンスペアであるX[i_s:i_e]とY[j_s:j_e]のワーピングパスを描画している。すなわち、類似部分シーケンスペアのどの要素同士が対応して検出されたかという要素の対応付けが示される。散布図における横軸はXの要素を、縦軸はYの要素を表す。 <Example>
In order to verify the effectiveness of the present invention, experiments using actual data were performed. In the experiment, a Linux (registered trademark) machine equipped with 4 GB of memory and an Intel (R) Core 2 2.4 GHz CPU was used. In order to visually grasp the experimental results, a scatter diagram is used here. The scatter diagram depicts the warping path of X [i _s : i _e ] and Y [j _s : j _e ], which are similar partial sequence pairs in the data sequences X and Y. That is, the element correspondence indicating which elements of the similar partial sequence pair are detected correspondingly is shown. In the scatter diagram, the horizontal axis represents the X element, and the vertical axis represents the Y element.

図１１は温度センサから取得されたデータを示している。およそ1分間隔で取得されたデータに対し、18度から32度の間で大きく変化する類似パターンが確認される。図１２が、上記特許文献３を用いたオリジナルの類似パターン検出に対し、実施例1と2を用いた場合の検出結果を示している。図から確認される通り、どちらの実施例においても、オリジナルの検出と非常に近い結果を示している。 FIG. 11 shows data acquired from the temperature sensor. A similar pattern that changes greatly between 18 degrees and 32 degrees is confirmed for data acquired at approximately one minute intervals. FIG. 12 shows the detection results when Examples 1 and 2 are used for the original similar pattern detection using Patent Document 3. As can be seen from the figure, both examples show results very close to the original detection.

図１３は、太陽の黒点数を記録したデータである。太陽の黒点には周期性があることがよく知られており、太陽の活動とも密接に関連している。太陽活動が活発な時は黒点が多く出現し、逆に太陽活動が不活発な時は黒点が減少する。この変化は約11年の周期で増減する。図１４に検出したパターンの結果を示す。こちらのデータにおいても、オリジナルの類似パターン検出と同等の結果が確認される。 FIG. 13 is data in which the sunspot number is recorded. It is well known that sunspots have periodicity and are closely related to solar activity. When sun activity is active, many sunspots appear. Conversely, when sun activity is inactive, sunspots decrease. This change increases and decreases with a period of about 11 years. FIG. 14 shows the result of the detected pattern. This data also confirms the same result as the original similar pattern detection.

図１５、１６は、温度センサから取得されたデータセットにおける類似パターン検出結果を詳細に示すものであり、図１６は、太陽の黒点数を記録したセットにおける類似パターン検出結果を詳細に示すものである。本発明は近似手法であり、サンプリングによりサーバに送信されなかったデータの情報を送信されたデータの情報で補完するものであるため、その値の違いにより、検出された類似パターンのDTW距離は少し異なる結果となった。しかしながら、図１２、１４で確認された通り、類似パターンの位置は高い精度で検出可能であることが確認される。 15 and 16 show in detail the similar pattern detection result in the data set acquired from the temperature sensor, and FIG. 16 shows in detail the similar pattern detection result in the set in which the sunspot number is recorded. is there. Since the present invention is an approximation method and complements information on data that has not been transmitted to the server by sampling with information on the transmitted data, the DTW distance of the detected similar pattern is slightly different due to the difference in value. The result was different. However, as confirmed in FIGS. 12 and 14, it is confirmed that the position of the similar pattern can be detected with high accuracy.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、‖ｘ_ｉ−ｙ_ｊ‖として、ユークリッド距離の二乗の値を使用したが、ユークリッド距離などの他の距離を使用してもかまわない。 For example, although the square value of the Euclidean distance is used as ‖x _i −y _j 、, other distances such as the Euclidean distance may be used.

その他、ハードウェア、ソフトウェアの具体的な構成について、本発明の主旨を逸脱しない範囲で適宜変更が可能である。 In addition, specific configurations of hardware and software can be appropriately changed without departing from the gist of the present invention.

また、時系列データは、映像やセンサネットワーク、金融など様々な分野で発生する。本発明はこれらのすべての分野に適用可能である。 Time series data is generated in various fields such as video, sensor networks, and finance. The present invention is applicable to all these fields.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 In the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium.

１０データセンシング部
１２、７２データ取得部
１４サンプリング部
１６、８２周期記憶部
１８無線通信部
６０データ入力部
７０処理部
７４検出条件記憶部
７６パターン検出部
７８、２２０周期計算部
８０データ出力部
８４行列情報記憶部
１００、２００センサノード
１５０類似パターン検出装置 DESCRIPTION OF SYMBOLS 10 Data sensing part 12, 72 Data acquisition part 14 Sampling part 16, 82 Period storage part 18 Wireless communication part 60 Data input part 70 Processing part 74 Detection condition storage part 76 Pattern detection part 78, 220 Period calculation part 80 Data output part 84 Matrix information storage unit 100, 200 sensor node 150 Similar pattern detection device

Claims

From the first time series data obtained by sampling the first time series data at the first sampling period and the second time series data obtained by sampling the second time series data at the second sampling period, two similar partial sequence pairs are obtained. Similarity detected using a single score matrix whose matrix elements are DTW (Dynamic Time Warping) distances between subsequences and a similarity score that can be interconverted, and a position matrix whose matrix elements are the start positions of the partial sequences A partial sequence detection device comprising:
A storage unit for storing the score matrix and the position matrix;
When one element of any one of the sampled first time-series data and second time-series data is received, a portion in any one of the first time-series data and the second time-series data including the element Calculating a similarity score between the sequence and the partial sequence in the other one of the first time-series data and the second time-series data;
The calculated similarity score is stored as a matrix element of the score matrix of the storage unit corresponding to the end positions of the two partial sequences used for the calculation of the similarity score, and the similarity score is calculated. Storing the start positions of the two partial sequences used as matrix elements of the position matrix of the storage unit corresponding to the end positions;
A pair of partial sequences in which the similarity score stored in the score matrix of the storage unit is equal to or greater than a predetermined threshold is determined as a pair of similar partial sequences, and the pair of similar partial sequences is matched And a processing unit for detecting as
The processor is
When calculating any of the similarity scores,
The similarity score that is adjacent to the similarity score in the score matrix and that corresponds to the previous time with respect to the first time-series data is acquired, and the two portions of the object are obtained with respect to the acquired similarity score Subtract the value obtained by multiplying the first sampling period by the magnitude of the difference between the corresponding data elements in the sequence, and add the value obtained by multiplying a predetermined coefficient by a numerical value corresponding to the first sampling period. , Calculate the first score,
The similarity score that is adjacent to the similarity score in the score matrix and that corresponds to the previous time with respect to the second time-series data is acquired, and the two portions of the object are obtained with respect to the acquired similarity score Subtract the value obtained by multiplying the second sampling period by the magnitude of the difference between the corresponding data elements in the sequence, and add the value obtained by multiplying the predetermined coefficient by the numerical value corresponding to the second sampling period. To calculate the second score,
The similarity score that is adjacent to the similarity score in the score matrix and that corresponds to the previous time for both the first time-series data and the second time-series data is acquired, and the acquired similarity score On the other hand, a value obtained by multiplying the magnitude of the difference between corresponding data elements in the two target partial sequences by the larger one of the first sampling period and the second sampling frequency is subtracted, and the predetermined value is set. Adding a value obtained by multiplying the obtained coefficient by a numerical value corresponding to the sum of the first sampling period and the second sampling period to calculate a third score;
The similar partial sequence detection apparatus which uses the largest score among the first score, the second score, and the third score as the similarity score.

The predetermined threshold is a value obtained by multiplying a predetermined value as a lower limit value of the length of two partial sequences of the pair of matching partial sequences by the predetermined coefficient;
The processor is
The similar partial sequence detection device according to claim 1, wherein when detecting a pair of matching partial sequences, the pair of matching partial sequences having a length equal to or longer than the lower limit value is detected by using the predetermined threshold.

The processor is
Storing the detected pair of matching partial sequences in the storage unit;
When the detected matching partial sequence pair is stored in the storage unit, one of the matching partial sequence pairs stored in the storage unit has one matrix element in the position matrix used for calculation. If there is a match, a pair of matching partial sequences having the maximum similarity score among a plurality of matching partial sequence pairs having matching matrix elements in the position matrix is stored in the storage unit. The similar partial sequence detection device according to claim 2 for storing.

The processor is
When the start positions of the two partial sequences used to calculate the similarity score are stored in the position matrix of the storage unit, the start positions of the two partial sequences used to calculate the acquired maximum similarity score Are stored in the position matrix of the storage unit as the start positions of the two partial sequences used for the calculation of the similarity score,
4. The method according to claim 1, wherein when the pair of matching partial sequences is detected, a start position corresponding to the similarity score equal to or higher than the predetermined threshold is specified with reference to a position matrix of the storage unit. The similar partial sequence detection device according to claim 1.

From the first time series data obtained by sampling the first time series data at the first sampling period and the second time series data obtained by sampling the second time series data at the second sampling period, two similar partial sequence pairs are obtained. Detect using a single score matrix whose matrix element is a DTW (Dynamic Time Warping) distance between partial sequences and a similarity score that can be interconverted, and a position matrix whose matrix element is the start position of the partial sequence, A similar partial sequence detection method in a similar partial sequence detection device comprising a storage unit for storing the score matrix and the position matrix,
Depending on the processing unit,
When one element of any one of the sampled first time-series data and second time-series data is received, a portion in any one of the first time-series data and the second time-series data including the element Calculating a similarity score between the sequence and the partial sequence in the other one of the first time-series data and the second time-series data;
The calculated similarity score is stored as a matrix element of the score matrix of the storage unit corresponding to the end positions of the two partial sequences used for the calculation of the similarity score, and the similarity score is calculated. Storing the start positions of the two partial sequences used as matrix elements of the position matrix of the storage unit corresponding to the end positions;
A pair of partial sequences in which the similarity score stored in the score matrix of the storage unit is equal to or greater than a predetermined threshold is determined as a pair of similar partial sequences, and the pair of similar partial sequences is matched Including detecting as
When calculating any of the similarity scores by the processing unit,
The similarity score that is adjacent to the similarity score in the score matrix and that corresponds to the previous time with respect to the first time-series data is acquired, and the two portions of the object are obtained with respect to the acquired similarity score Subtract the value obtained by multiplying the first sampling period by the magnitude of the difference between the corresponding data elements in the sequence, and add the value obtained by multiplying a predetermined coefficient by a numerical value corresponding to the first sampling period. , Calculate the first score,
The similarity score that is adjacent to the similarity score in the score matrix and that corresponds to the previous time with respect to the second time-series data is acquired, and the two portions of the object are obtained with respect to the acquired similarity score Subtract the value obtained by multiplying the second sampling period by the magnitude of the difference between the corresponding data elements in the sequence, and add the value obtained by multiplying the predetermined coefficient by the numerical value corresponding to the second sampling period. To calculate the second score,
The similarity score that is adjacent to the similarity score in the score matrix and that corresponds to the previous time for both the first time-series data and the second time-series data is acquired, and the acquired similarity score On the other hand, a value obtained by multiplying the magnitude of the difference between corresponding data elements in the two target partial sequences by the larger one of the first sampling period and the second sampling frequency is subtracted, and the predetermined value is set. Adding a value obtained by multiplying the obtained coefficient by a numerical value corresponding to the sum of the first sampling period and the second sampling period to calculate a third score;
The similar partial sequence detection method which uses the maximum score among the first score, the second score, and the third score as the similarity score.

The predetermined threshold is a value obtained by multiplying a predetermined value as a lower limit value of the length of two partial sequences of the pair of matching partial sequences by the predetermined coefficient;
By the processing unit,
6. The similar partial sequence detection method according to claim 5, wherein when detecting the matching partial sequence pair, the matching partial sequence pair having a length equal to or longer than the lower limit value is detected by using the predetermined threshold.

By the processing unit,
Storing the detected pair of matching partial sequences in the storage unit;
When the detected matching partial sequence pair is stored in the storage unit, one of the matching partial sequence pairs stored in the storage unit has one matrix element in the position matrix used for calculation. If there is a match, a pair of matching partial sequences having the maximum similarity score among a plurality of matching partial sequence pairs having matching matrix elements in the position matrix is stored in the storage unit. The similar partial sequence detecting method according to claim 6, wherein the similar partial sequence detecting method is stored.

The program for functioning a computer as each part of the similar partial sequence detection apparatus of any one of Claims 1-4.