JP6795448B2

JP6795448B2 - Data processing equipment, data processing methods and programs

Info

Publication number: JP6795448B2
Application number: JP2017094048A
Authority: JP
Inventors: 理基鈴木
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-05-10
Filing date: 2017-05-10
Publication date: 2020-12-02
Anticipated expiration: 2037-05-10
Also published as: JP2018190281A

Description

本発明は、データ処理装置、データ処理方法およびプログラムに関する。 The present invention relates to data processing devices, data processing methods and programs.

交通あるいはヘルスケアなど様々な分野において、所定の対象の時系列データを検出することが行われている。このような時系列データから検出対象の異常を即座に発見することが重要な場合がある。
また、時系列データのサンプル数が膨大になりかつ常に次々とデータが発生するストリームデータを処理の対象とする場合も多い。 In various fields such as transportation and healthcare, time-series data of a predetermined target is detected. It may be important to immediately detect the abnormality to be detected from such time series data.
In addition, in many cases, stream data in which the number of samples of time-series data becomes enormous and data is constantly generated one after another is processed.

異常を検出する方法としては、事前知識を用いて異常を検出する方法と、異常として外れ値を検出する方法とに大別される。
事前知識を用いて異常を検出する方法では、特定の条件を満たす時系列データが発生したときに異常の発生として当該異常を検出するが、未知の事象に対応することができない。また、事前知識を用いて異常を検出する方法では、前提条件が仮定されている場合が多いため、前提条件が変化するときには、その都度、特定の条件を修正する必要がある。また、この方法では、事前知識を学習する必要がある。このような点を補うために、機械学習を利用することも考えられるが、十分に高速な処理は確立されていない。 The method of detecting an abnormality is roughly divided into a method of detecting an abnormality using prior knowledge and a method of detecting an outlier as an abnormality.
In the method of detecting an abnormality using prior knowledge, when time-series data satisfying a specific condition is generated, the abnormality is detected as the occurrence of the abnormality, but it is not possible to deal with an unknown event. In addition, in the method of detecting an abnormality using prior knowledge, preconditions are often assumed, so it is necessary to modify a specific condition each time the precondition changes. In addition, this method requires learning prior knowledge. Machine learning may be used to make up for this, but sufficiently high-speed processing has not been established.

こうした事情から、ストリームデータを対象として、外れ値を検出する方法の確立および高度化が重要となる。
例えば、外れ値を検出する技術は従前から存在しているが、特にストリームデータを対象とするときに、十分に高速な処理が行われない場合があった。異常が発生するときを予測することが困難な状況では、常に異常の発生の有無を監視する必要があり、時系列データを構成するデータが次々と発生する速度と比べて十分に高速な処理を実現する必要があった。 Under these circumstances, it is important to establish and improve the method for detecting outliers in stream data.
For example, although techniques for detecting outliers have existed for some time, there are cases where sufficiently high-speed processing is not performed, especially when targeting stream data. In situations where it is difficult to predict when an abnormality will occur, it is necessary to constantly monitor the presence or absence of an abnormality, and the processing speed is sufficiently high compared to the speed at which the data that makes up the time series data occurs one after another. It had to be realized.

一例として、非特許文献１では、ｔ−ｄｉｇｅｓｔと呼ばれる技術が提案されている。ｔ−ｄｉｇｅｓｔは、値の集合に対してその分布を推定するデータ構造を有し、例えば、ストリームデータを対象としてその分布を推定することが可能である。
しかしながら、ｔ−ｄｉｇｅｓｔだけでは、十分に高速な処理を実現することが困難な場合があった。 As an example, Non-Patent Document 1 proposes a technique called t-digest. The t-digest has a data structure for estimating the distribution of a set of values, and for example, it is possible to estimate the distribution of stream data.
However, it may be difficult to realize sufficiently high-speed processing only by t-digest.

ＴＥＤＤＵＮＮＩＮＧＡＮＤＯＴＭＡＲＥＲＴＬ、“ＣＯＭＰＵＴＩＮＧＥＸＴＲＥＭＥＬＹＡＣＣＵＲＡＴＥＱＵＡＮＴＩＬＥＳＵＳＩＮＧｔ−ＤＩＧＥＳＴＳ”、［ｏｎｌｉｎｅ］、［平成２９年４月１２日検索］、インターネット＜ＵＲＬ：https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf＞TED DUNNING AND HTTPS ERTL, "COMPUTING EXTREMELY ACCURATE QUANTILES USING t-DIGESTS", [online], [Search April 12, 2017], Internet <URL: https://github.com/tdunning/t-di blob / master / docs / t-digest-paper / histo.pdf ＞

従来では、多次元およびマルチスケールのデータ解析を行うことが困難である場合があった。なお、ｔ−ｄｉｇｅｓｔでは、多次元およびマルチスケールについては想定されていなかった。 In the past, it has sometimes been difficult to perform multidimensional and multiscale data analysis. In addition, in t-digest, multidimensional and multiscale were not assumed.

本発明は、このような事情を考慮してなされたもので、多次元およびマルチスケールのデータ解析を行うことで異常を検出することを可能とするデータ処理装置、データ処理方法およびプログラムを提供することを課題とする。 The present invention has been made in consideration of such circumstances, and provides a data processing apparatus, a data processing method, and a program capable of detecting an abnormality by performing multidimensional and multiscale data analysis. That is the issue.

一構成例として、少なくとも１個の観点および前記観点ごとの複数のスケールについて、前記観点と前記スケールとの組み合わせごとに含まれるデータに基づく統計量データである基準データと比較対象データとを比較する統計量データ比較部と、前記統計量データ比較部による比較の結果に基づいて、前記観点の複数の範囲のなかで異常があるとみなされる範囲を判定する異常範囲判定部と、を備え、前記異常範囲判定部は、前記統計量データ比較部による比較の結果に基づいて、異常があるとみなされる前記観点の範囲候補を判定し、前記異常範囲判定部は、前記観点の前記範囲候補に含まれる複数の範囲のなかで、前記基準データと前記比較対象データとの差を表すＬＯＦが最大となる範囲を、異常があるとみなされる範囲として、判定する、統計量データ処理装置である。 As a configuration example, for at least one viewpoint and a plurality of scales for each viewpoint, reference data, which is statistical data based on data included in each combination of the viewpoint and the scale, and comparison target data are compared. A statistic data comparison unit and an abnormality range determination unit for determining a range deemed to be abnormal among a plurality of ranges of the viewpoint based on the result of comparison by the statistic data comparison unit are provided . The abnormality range determination unit determines a range candidate of the viewpoint considered to have an abnormality based on the result of comparison by the statistics data comparison unit, and the abnormality range determination unit is included in the range candidate of the viewpoint. It is a statistic data processing apparatus that determines the range in which the LOF representing the difference between the reference data and the comparison target data is maximum among the plurality of ranges to be determined as a range considered to be abnormal .

本発明によれば、多次元およびマルチスケールのデータ解析を行うことで異常を検出することを可能とすることができる。 According to the present invention, it is possible to detect anomalies by performing multidimensional and multiscale data analysis.

本発明の一実施形態（第１実施形態）に係るデータ処理システムの概略的な構成を示すブロック図である。It is a block diagram which shows the schematic structure of the data processing system which concerns on one Embodiment (1st Embodiment) of this invention. 本発明の一実施形態に係る統計量データ処理装置の概略的な構成を示すブロック図である。It is a block diagram which shows the schematic structure of the statistic data processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る統計量データ群の一例のデータ構造を模式的に示す図である。It is a figure which shows typically the data structure of an example of the statistic data group which concerns on one Embodiment of this invention. 本発明の一実施形態に係る統計量データに基づく異常検出処理の概要を示す図である。It is a figure which shows the outline of the abnormality detection processing based on the statistical data which concerns on one Embodiment of this invention. 本発明の一実施形態に係る基準となる統計量データの一例を示す図である。It is a figure which shows an example of the statistical data which becomes the reference which concerns on one Embodiment of this invention. 本発明の一実施形態に係る基準となる統計量データの他の一例を示す図である。It is a figure which shows another example of the statistic data which becomes the reference which concerns on one Embodiment of this invention. 本発明の一実施形態に係る解析対象となる統計量データの一例を示す図である。It is a figure which shows an example of the statistical data to be analyzed which concerns on one Embodiment of this invention. 本発明の一実施形態に係る時刻範囲ごとにおける統計量データの比較結果の一例を示す図である。It is a figure which shows an example of the comparison result of the statistic data for each time range which concerns on one Embodiment of this invention. 本発明の一実施形態に係る異常期間の判定処理の一例を説明するための図である。It is a figure for demonstrating an example of the determination process of the abnormal period which concerns on one Embodiment of this invention. 本発明の一実施形態に係る統計量データ処理装置において行われる異常の有無を検出するための処理の手順の一例を示す図である。It is a figure which shows an example of the process procedure for detecting the presence or absence of an abnormality performed in the statistic data processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態（第２実施形態）に係るデータ処理システムの概略的な構成を示すブロック図である。It is a block diagram which shows the schematic structure of the data processing system which concerns on one Embodiment (second Embodiment) of this invention.

以下、図面を参照し、本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１実施形態）
［データ処理システム］
図１は、本発明の一実施形態（第１実施形態）に係るデータ処理システム１の概略的な構成を示すブロック図である。
データ処理システム１は、ｎ（ｎは２以上の整数とする。）個の端末装置１１−１〜１１−ｎと、統計量データ処理装置１２と、データベース１３と、ネットワーク２１を備える。
ネットワーク２１としては、有線または無線の任意のネットワークが用いられてもよく、例えば、インターネットあるいはＷｉ−Ｆｉ（登録商標）のネットワークなどが用いられてもよい。
データベース１３は、本実施形態では、統計量データ処理装置１２とは別に備えられるが、他の例として、統計量データ処理装置１２に一体化されていてもよい。 (First Embodiment)
[Data processing system]
FIG. 1 is a block diagram showing a schematic configuration of a data processing system 1 according to an embodiment (first embodiment) of the present invention.
The data processing system 1 includes n (n is an integer of 2 or more) terminal devices 11-1 to 11-n, a statistic data processing device 12, a database 13, and a network 21.
As the network 21, any wired or wireless network may be used, and for example, the Internet or a Wi-Fi (registered trademark) network may be used.
In the present embodiment, the database 13 is provided separately from the statistic data processing device 12, but as another example, the database 13 may be integrated with the statistic data processing device 12.

統計量データ処理装置１２は、端末装置１１−１〜１１−ｎに関する所定の対象のデータを取得して解析する。本実施形態では、当該データは、所定の対象の時系列データについて、サンプル数が膨大（ビックデータ）になりかつ常に次々とデータが発生するストリームデータであるとする。 The statistic data processing device 12 acquires and analyzes predetermined target data regarding the terminal devices 11-1 to 11-n. In the present embodiment, it is assumed that the data is stream data in which the number of samples becomes enormous (big data) and data is always generated one after another for a predetermined target time-series data.

所定の対象のデータとしては、任意のデータが用いられてもよく、例えば、ＩｏＴ（ＩｎｔｅｒｎｅｔｏｎＴｈｉｎｇｓ）に関するデータが用いられてもよく、あるいは、他のデータが用いられてもよい。
具体例として、所定の対象のデータとしては、任意のシステムにおけるデータが用いられてもよく、例えば、車両などの交通システムに関するデータ、人などのヘルスケアシステムに関するデータ、製品を生産などする工場システムに関するデータ、証券などの金融システムに関するデータ、有線または無線の通信システムに関するデータなどが用いられてもよい。また、任意のシステムにおいて、所定の対象のデータとしては、様々なデータが用いられてもよく、例えば、温度に関するデータ、湿度に関するデータ、速度に関するデータ、加速度に関するデータ、画像に関するデータ、酸素などの物質の濃度に関するデータ、品質に関するデータ、株価に関するデータ、通信信号に関するデータ、端末装置（本実施形態では、端末装置１１−１〜１１−ｎ）の位置に関するデータ、端末装置が存在する領域に関するデータなどが用いられてもよい。 As the predetermined target data, arbitrary data may be used, for example, data relating to IoT (Internet of Things) may be used, or other data may be used.
As a specific example, data in an arbitrary system may be used as the data of a predetermined target, for example, data related to a transportation system such as a vehicle, data related to a healthcare system such as a person, a factory system for producing a product, or the like. Data about financial systems such as securities, data about wired or wireless communication systems, and the like may be used. Further, in an arbitrary system, various data may be used as the predetermined target data, for example, temperature-related data, humidity-related data, velocity-related data, acceleration-related data, image-related data, oxygen, and the like. Data on the concentration of substances, data on quality, data on stock prices, data on communication signals, data on the position of terminal devices (terminal devices 11-1 to 11-n in this embodiment), data on areas where terminal devices exist. Etc. may be used.

また、所定の対象のデータを検出する手法としては、様々な手法が用いられてもよい。
一例として、それぞれの端末装置１１−１〜１１−ｎにおいて、当該それぞれの端末装置１１−１〜１１−ｎに関する所定の対象のデータを検出する構成が用いられてもよい。この構成では、それぞれの端末装置１１−１〜１１−ｎは、所定の対象のデータを検出する検出部を備え、当該検出部により検出されたデータを、ネットワーク２１を介して、統計量データ処理装置１２に送信する。当該検出部は、例えば、センサー、あるいは、撮像装置（カメラ）であってもよい。ここで、撮像装置（カメラ）も、センサーの一例であると捉えられてもよい。
なお、それぞれの端末装置１１−１〜１１−ｎは、ＩｏＴの端末装置であってもよく、あるいは、他の端末装置であってもよい。 In addition, various methods may be used as a method for detecting data of a predetermined target.
As an example, in each terminal device 11-1 to 11-n, a configuration for detecting predetermined target data regarding the respective terminal device 11-11 to 11-n may be used. In this configuration, each terminal device 11-1 to 11-n includes a detection unit that detects data of a predetermined target, and processes the data detected by the detection unit as statistical data via the network 21. It is transmitted to the device 12. The detection unit may be, for example, a sensor or an imaging device (camera). Here, the image pickup device (camera) may also be regarded as an example of the sensor.
In addition, each terminal device 11-1 to 11-n may be an IoT terminal device, or may be another terminal device.

他の例として、データ処理システム１は、端末装置１１−１〜１１−ｎとは別の検出装置（図示を省略）を備えてもよい。この構成では、当該検出装置は、それぞれの端末装置１１−１〜１１−ｎに関する所定の対象のデータを検出する検出部を有しており、当該検出部により検出されたデータを、ネットワーク２１を介して、統計量データ処理装置１２に送信する。 As another example, the data processing system 1 may include a detection device (not shown) different from the terminal devices 11-11 to 11-n. In this configuration, the detection device has a detection unit that detects predetermined target data regarding each terminal device 11-1 to 11-n, and the data detected by the detection unit is transmitted to the network 21. It is transmitted to the statistic data processing device 12 via.

当該検出装置は、例えば、所定の領域の画像を撮像する撮像装置（カメラ）を備えてもよく、当該画像に基づいて、当該所定の領域に存在する端末装置１１−１〜１１−ｎの数のデータなどを所定の対象のデータとして検出してもよい。
当該検出装置は、例えば、所定の領域に存在する端末装置１１−１〜１１−ｎと無線または有線で通信する通信部を備えてもよく、この通信の結果に基づいて、当該所定の領域に存在する端末装置１１−１〜１１−ｎの数のデータなどを所定の対象のデータとして検出してもよい。
当該検出装置は、例えば、端末装置１１−１〜１１−ｎから発信される信号を取得する信号取得部を備えてもよく、当該信号の状況に基づいて、当該信号の発生頻度あるいは遅延度などを表すデータなどを所定の対象のデータとして検出してもよい。 The detection device may include, for example, an imaging device (camera) that captures an image of a predetermined area, and the number of terminal devices 11-1 to 11-n existing in the predetermined area based on the image. Data and the like may be detected as predetermined target data.
The detection device may include, for example, a communication unit that communicates wirelessly or by wire with terminal devices 11-1 to 11-n existing in a predetermined area, and based on the result of this communication, the detection device may be located in the predetermined area. Data of the number of existing terminal devices 11-1 to 11-n may be detected as predetermined target data.
The detection device may include, for example, a signal acquisition unit that acquires a signal transmitted from the terminal devices 11-1 to 11-n, and based on the status of the signal, the frequency of occurrence or the degree of delay of the signal, etc. Data representing the above may be detected as predetermined target data.

本実施形態では、それぞれの端末装置１１−１〜１１−ｎは、ネットワーク２１を介して、他の装置（例えば、統計量データ処理装置１２など）と通信することが可能である。なお、それぞれの端末装置１１−１〜１１−ｎは、有線または無線により、ネットワーク２１との間で通信接続する。
他の例として、それぞれの端末装置１１−１〜１１−ｎは、通信機能を備えなくてもよい。 In the present embodiment, each terminal device 11-11 to 11-n can communicate with another device (for example, a statistic data processing device 12) via the network 21. Each terminal device 11-1 to 11-n communicates with the network 21 by wire or wirelessly.
As another example, each terminal device 11-1 to 11-n does not have to have a communication function.

また、それぞれの端末装置１１−１〜１１−ｎとしては、例えば、同じ構成を有する端末装置が用いられてもよく、あるいは、異なる構成を有する端末装置が含まれてもよい。
また、それぞれの端末装置１１−１〜１１−ｎは、例えば、物に付加あるいは装着などされてもよく、あるいは、人により携帯あるいは装着などされてもよい。当該物としては、任意のものであってもよく、例えば、自動車などの車両、あるいは、電化製品などであってもよい。 Further, as the respective terminal devices 11-1 to 11-n, for example, terminal devices having the same configuration may be used, or terminal devices having different configurations may be included.
Further, each of the terminal devices 11-1 to 11-n may be added to or attached to an object, or may be carried or attached by a person. The thing may be arbitrary, for example, a vehicle such as an automobile, an electric appliance, or the like.

また、端末装置１１−１〜１１−ｎとは別の検出装置は、例えば、統計量データ処理装置１２に備えられてもよい。この構成では、統計量データ処理装置１２は、当該検出装置により検出されたデータを取得して解析する。 Further, a detection device other than the terminal devices 11-1 to 11-n may be provided in, for example, the statistic data processing device 12. In this configuration, the statistic data processing device 12 acquires and analyzes the data detected by the detection device.

［統計量データ処理装置］
図２は、本発明の一実施形態に係る統計量データ処理装置１２の概略的な構成を示すブロック図である。
統計量データ処理装置１２は、入力部１１１と、出力部１１２と、記憶部１１３と、通信部１１４と、制御部１１５を備える。
制御部１１５は、データ取得部１３１と、統計量データ処理部１３２と、データ出力制御部１３３を備える。
統計量データ処理部１３２は、観点設定部１５１と、スケール設定部１５２と、統計量データ群生成部１５３と、統計量データ比較部１５４と、異常範囲判定部１５５を備える。 [Statistical data processing device]
FIG. 2 is a block diagram showing a schematic configuration of a statistic data processing device 12 according to an embodiment of the present invention.
The statistic data processing device 12 includes an input unit 111, an output unit 112, a storage unit 113, a communication unit 114, and a control unit 115.
The control unit 115 includes a data acquisition unit 131, a statistic data processing unit 132, and a data output control unit 133.
The statistic data processing unit 132 includes a viewpoint setting unit 151, a scale setting unit 152, a statistic data group generation unit 153, a statistic data comparison unit 154, and an abnormality range determination unit 155.

入力部１１１は、外部から情報を入力する。入力部１１１は、例えば、ユーザ（人）により行われる操作を受け付ける操作部を有し、当該操作部により受け付けられた操作に応じた情報を入力する。入力部１１１は、例えば、外部の装置（例えば、記録媒体など）と接続されて当該外部の装置から出力される情報を入力する。
出力部１１２は、情報を出力する。出力部１１２は、例えば、画面を有しており、情報を画面に表示（出力）する。出力部１１２は、例えば、外部の装置（例えば、記録媒体など）と接続されて当該外部の装置に情報を出力する。
記憶部１１３は、情報を記憶する。なお、本実施形態では、記憶部１１３とデータベース１３とは、任意に使い分けられてもよい。
通信部１１４は、情報を通信する。本実施形態では、通信部１１４は、ネットワーク２１を介して、他の装置（例えば、端末装置１１−１〜１１−ｎあるいは別の検出装置）と情報を通信する。 The input unit 111 inputs information from the outside. The input unit 111 has, for example, an operation unit that accepts an operation performed by a user (person), and inputs information according to the operation received by the operation unit. The input unit 111 is connected to, for example, an external device (for example, a recording medium) and inputs information output from the external device.
The output unit 112 outputs information. The output unit 112 has, for example, a screen, and displays (outputs) information on the screen. The output unit 112 is connected to, for example, an external device (for example, a recording medium) and outputs information to the external device.
The storage unit 113 stores information. In this embodiment, the storage unit 113 and the database 13 may be used arbitrarily.
The communication unit 114 communicates information. In the present embodiment, the communication unit 114 communicates information with another device (for example, terminal device 11-1 to 11-n or another detection device) via the network 21.

制御部１１５は、統計量データ処理装置１２における各種の制御を行う。
本実施形態では、記憶部１１３は、所定の制御プログラムおよびそのパラメーターの情報を記憶する。また、制御部１１５は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を用いて構成される。そして、制御部１１５では、ＣＰＵが記憶部１１３に記憶された制御プログラムを、記憶部１１３に記憶されたパラメーターを使用して実行することで、各種の制御を行う。 The control unit 115 performs various controls on the statistic data processing device 12.
In the present embodiment, the storage unit 113 stores information on a predetermined control program and its parameters. Further, the control unit 115 is configured by using a CPU (Central Processing Unit). Then, in the control unit 115, the CPU executes various controls by executing the control program stored in the storage unit 113 using the parameters stored in the storage unit 113.

なお、統計量データ処理装置１２について、図２に示した各処理部１１１〜１１５を備える構成は一例であり、他の構成が用いられてもよい。例えば、各処理部１１１〜１１５の機能の区分は、説明の便宜上のものであり、必ずしも図２に示した構成に限定されない。 The configuration of the statistic data processing device 12 including the processing units 111 to 115 shown in FIG. 2 is an example, and other configurations may be used. For example, the functional divisions of the processing units 111 to 115 are for convenience of explanation and are not necessarily limited to the configuration shown in FIG.

制御部１１５の機能について説明する。
データ取得部１３１は、解析対象となるデータとして、所定の対象のデータを取得する。
一例として、データ取得部１３１は、通信部１１４により他の装置（例えば、端末装置１１−１〜１１−ｎあるいは別の検出装置）から受信されたデータを、解析対象となるデータとして取得してもよい。
他の例として、データ取得部１３１は、過去に取得されたデータを順次データベース１３に記憶しておき、当該データの処理を行うときに、当該データベース１３から当該データを解析対象となるデータとして取得してもよい。
また、他の例として、所定の対象のデータが統計量データ処理装置１２を経由せずにデータベース１３に記憶される構成が用いられてもよく、この場合、データ取得部１３１は、当該データの処理を行うときに、当該データベース１３から当該データを解析対象となるデータとして取得してもよい。 The function of the control unit 115 will be described.
The data acquisition unit 131 acquires a predetermined target data as the data to be analyzed.
As an example, the data acquisition unit 131 acquires data received from another device (for example, terminal device 11-1 to 11-n or another detection device) by the communication unit 114 as data to be analyzed. May be good.
As another example, the data acquisition unit 131 sequentially stores the data acquired in the past in the database 13, and when processing the data, acquires the data from the database 13 as the data to be analyzed. You may.
Further, as another example, a configuration may be used in which the data of a predetermined target is stored in the database 13 without going through the statistics data processing device 12, and in this case, the data acquisition unit 131 of the data At the time of processing, the data may be acquired from the database 13 as data to be analyzed.

統計量データ処理部１３２は、データ取得部１３１により取得されたデータについて統計的な処理を行い、その結果のデータ（本実施形態において、「統計量データ」ともいう。）を取得する。
なお、統計量データ処理部１３２では、例えば、データ取得部１３１によりリアルタイムで次々と取得されるデータ（新たに増えていくデータのまとまり）について処理を行う場合があってもよく、また、データ取得部１３１により取得された過去のデータ（新たに増えないデータのまとまり）について処理を行う場合があってもよい。
また、統計量データ処理部１３２は、統計量データに基づいて異常の有無に関する検出（判定）を行う。 The statistic data processing unit 132 performs statistical processing on the data acquired by the data acquisition unit 131, and acquires the resulting data (also referred to as “statistical data” in the present embodiment).
In addition, the statistic data processing unit 132 may process, for example, data acquired one after another in real time by the data acquisition unit 131 (a group of newly increasing data), and data acquisition. There may be a case where processing is performed on the past data (a group of data that does not newly increase) acquired by the unit 131.
In addition, the statistic data processing unit 132 detects (determines) the presence or absence of an abnormality based on the statistic data.

データ出力制御部１３３は、出力対象となるデータを出力部１１２により出力する制御を行う。この出力としては、例えば、文字、図形あるいはグラフなどの表示出力が用いられる。
出力対象となるデータとしては、任意のデータが用いられてもよく、例えば、データ取得部１３１により取得されたデータ、あるいは、統計量データ処理部１３２による処理により得られた結果のデータなどが用いられてもよい。また、統計量データ処理部１３２による処理により得られた結果のデータとしては、例えば、統計量データ群生成部１５３による処理により得られた結果のデータ、統計量データ比較部１５４による処理により得られた結果のデータ、あるいは、異常範囲判定部１５５による処理により得られた結果のデータのうちの１以上が用いられてもよい。 The data output control unit 133 controls the output unit 112 to output the data to be output. As this output, for example, a display output such as a character, a figure, or a graph is used.
Arbitrary data may be used as the data to be output. For example, the data acquired by the data acquisition unit 131 or the result data obtained by the processing by the statistic data processing unit 132 is used. May be done. The result data obtained by the processing by the statistic data processing unit 132 includes, for example, the result data obtained by the processing by the statistic data group generation unit 153 and the result data obtained by the processing by the statistic data comparison unit 154. One or more of the resulting data or the result data obtained by processing by the abnormal range determination unit 155 may be used.

統計量データ処理部１３２について説明する。
観点設定部１５１は、解析対象のデータについて、観点（項目）を設定する。観点設定部１５１は、解析対象のデータについて、２個以上の観点を設定してもよい。
ここで、観点としては、任意の観点が用いられてもよく、例えば、時間（時刻）、領域（地域）、デバイス種別などが用いられてもよい。
また、観点設定部１５１は、例えば、あらかじめ定められた観点を設定してもよく、あるいは、ユーザなどから指示された観点を設定してもよい。観点があらかじめ定められる場合には、例えば、当該観点を特定する情報が記憶部１１３に記憶される。 The statistic data processing unit 132 will be described.
The viewpoint setting unit 151 sets a viewpoint (item) for the data to be analyzed. The viewpoint setting unit 151 may set two or more viewpoints for the data to be analyzed.
Here, as the viewpoint, any viewpoint may be used, and for example, time (time), area (region), device type, and the like may be used.
Further, the viewpoint setting unit 151 may set, for example, a predetermined viewpoint, or may set a viewpoint instructed by a user or the like. When the viewpoint is predetermined, for example, information that identifies the viewpoint is stored in the storage unit 113.

スケール設定部１５２は、それぞれの観点について、スケール（粒度）を設定する。スケール設定部１５２は、それぞれの観点について、２個以上のスケールを設定してもよい。
ここで、スケールとしては、任意の大きさを有するスケールが用いられてもよい。
例えば、時間のスケールとして、１秒のスケール、１分のスケール、１時間のスケール、１日のスケール、あるいは、他の任意の大きさのスケールが用いられてもよい。
例えば、領域のスケールとして、地区のスケール、市区町村のスケール、都道府県のスケール、全国のスケール、あるいは、他の任意の大きさのスケールが用いられてもよい。
例えば、デバイス種別のスケールとして、機種のスケール、メーカーのスケール、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）のスケール、あるいは、他の任意の種別（属性）のスケールが用いられてもよい。
スケール設定部１５２は、例えば、それぞれの観点について、あらかじめ定められたスケールを設定してもよく、あるいは、ユーザなどから指示されたスケールを設定してもよい。スケールがあらかじめ定められる場合には、例えば、当該スケールを特定する情報が記憶部１１３に記憶される。 The scale setting unit 152 sets the scale (grain size) for each viewpoint. The scale setting unit 152 may set two or more scales for each viewpoint.
Here, as the scale, a scale having an arbitrary size may be used.
For example, as the time scale, a 1-second scale, a 1-minute scale, a 1-hour scale, a 1-day scale, or any other scale of any size may be used.
For example, the area scale may be a district scale, a city scale, a prefecture scale, a national scale, or any other scale of any size.
For example, as the scale of the device type, the scale of the model, the scale of the manufacturer, the scale of the OS (Operating System), or the scale of any other type (attribute) may be used.
For example, the scale setting unit 152 may set a predetermined scale for each viewpoint, or may set a scale instructed by a user or the like. When the scale is predetermined, for example, information for identifying the scale is stored in the storage unit 113.

統計量データ群生成部１５３は、データ取得部１３１により取得されたデータについて、観点設定部１５１により設定された観点およびスケール設定部１５２により設定されたスケールに基づいて、統計的な処理を行うことで、複数の統計量データの集合（本実施形態において、「統計量データ群」ともいう。）を生成する。
本実施形態では、統計量データ群生成部１５３は、多次元およびマルチスケールのデータ解析を行った結果のデータを、統計量データ群（本実施形態において、「多次元マルチスケール統計量データ群」ともいう。）として生成する。
ここで、本実施形態では、次元は観点を表わしており、多次元は複数の観点があることを表わしている。
また、本実施形態では、マルチスケールは、複数のスケールがあることを表わしている。 The statistic data group generation unit 153 performs statistical processing on the data acquired by the data acquisition unit 131 based on the viewpoint set by the viewpoint setting unit 151 and the scale set by the scale setting unit 152. Then, a set of a plurality of statistical data (also referred to as a "statistical data group" in the present embodiment) is generated.
In the present embodiment, the statistic data group generation unit 153 collects the data resulting from the multidimensional and multiscale data analysis into the statistic data group (in the present embodiment, the “multidimensional multiscale statistic data group””. Also called.).
Here, in the present embodiment, the dimension represents a viewpoint, and the multidimensional represents a plurality of viewpoints.
Further, in the present embodiment, the multi-scale indicates that there are a plurality of scales.

統計量データ比較部１５４は、基準となる統計量データと、解析対象となる統計量データとを比較する。本実施形態では、統計量データ比較部１５４は、複数の異なるスケールについて、基準となる統計量データと、解析対象となる統計量データとを比較する。
ここで、基準となる統計量データは、例えば、データ取得部１３１により取得されてもよく、あるいは、統計量データ群生成部１５３により生成された統計量データ群に含まれる統計量データが用いられてもよい。
また、解析対象となる統計量データは、例えば、データ取得部１３１により取得されてもよく、あるいは、データ取得部１３１により取得されたデータに基づいて統計量データ処理部１３２（例えば、統計量データ比較部１５４）により生成された統計量データが用いられてもよく、あるいは、統計量データ群生成部１５３により生成された統計量データ群に含まれる統計量データが用いられてもよい。 The statistic data comparison unit 154 compares the reference statistic data with the statistic data to be analyzed. In the present embodiment, the statistic data comparison unit 154 compares the reference statistic data with the statistic data to be analyzed for a plurality of different scales.
Here, as the reference statistic data, for example, the statistic data may be acquired by the data acquisition unit 131, or the statistic data included in the statistic data group generated by the statistic data group generation unit 153 is used. You may.
Further, the statistic data to be analyzed may be acquired by, for example, the data acquisition unit 131, or the statistic data processing unit 132 (for example, statistic data) based on the data acquired by the data acquisition unit 131. The statistic data generated by the comparison unit 154) may be used, or the statistic data included in the statistic data group generated by the statistic data group generation unit 153 may be used.

異常範囲判定部１５５は、統計量データ比較部１５４による比較の結果に基づいて、解析対象となる統計量データについて、異常が発生した範囲（本実施形態では、時刻（時間）の範囲であり、「異常期間」ともいう。）を判定する。なお、この判定は、例えば、推定的な判定であってもよい。
本実施形態では、解析対象となる統計量データについて、異常範囲判定部１５５により異常期間が判定されたことによって、異常が存在することを検出する。 The abnormality range determination unit 155 is a range in which an abnormality has occurred (in the present embodiment, a time (time) range) for the statistic data to be analyzed based on the result of comparison by the statistic data comparison unit 154. Also referred to as "abnormal period"). It should be noted that this determination may be, for example, a presumptive determination.
In the present embodiment, the existence of an abnormality is detected by determining the abnormality period by the abnormality range determination unit 155 for the statistical data to be analyzed.

［統計量データ群のデータ構造の例］
図３は、本発明の一実施形態に係る統計量データ群２０１の一例のデータ構造を模式的に示す図である。
図３の例では、当該統計量データ群２０１は、統計量データ処理部１３２により時系列データが処理された結果である多次元マルチスケール統計量データ群となっている。当該統計量データ群２０１は、時系列データを対象として多次元およびマルチスケールに拡張された分布データ（本実施形態では、統計量データ）の構造を有する。
図３の例では、複数の観点として、時間、領域、デバイス種別が用いられている。また、それぞれの観点について、複数のスケール（マルチスケール）が用いられている。 [Example of data structure of statistic data group]
FIG. 3 is a diagram schematically showing a data structure of an example of the statistic data group 201 according to the embodiment of the present invention.
In the example of FIG. 3, the statistic data group 201 is a multidimensional multi-scale statistic data group which is the result of processing time series data by the statistic data processing unit 132. The statistic data group 201 has a structure of distribution data (statistical data in the present embodiment) extended in multiple dimensions and multiscales for time series data.
In the example of FIG. 3, time, area, and device type are used as a plurality of viewpoints. In addition, a plurality of scales (multi-scales) are used for each viewpoint.

図３の例では、時間に関する複数のスケールとして、時間スケールｓ０、時間スケールｓ１、時間スケールｓ２が用いられている。一例として、時間スケールｓ０は「１０秒」であり、時間スケールｓ１は「２０秒」であり、時間スケールｓ２は「４０秒」である。
また、領域に関する複数のスケールとして、領域スケールＡ１、領域スケールＡ２、領域スケールＡ３が用いられている。一例として、領域スケールＡ１は「東京」であり、領域スケールＡ２は「関東」であり、領域スケールＡ３は「日本（全国）」である。
また、デバイス種別に関する複数のスケールとして、デバイス種別スケールＤ１、デバイス種別スケールＤ２が用いられている。一例として、デバイス種別スケールＤ１は「特定の機種Ａ１」であり、デバイス種別スケールＤ２は「特定のメーカーＢ１」である。 In the example of FIG. 3, a time scale s0, a time scale s1, and a time scale s2 are used as a plurality of scales related to time. As an example, the time scale s0 is "10 seconds", the time scale s1 is "20 seconds", and the time scale s2 is "40 seconds".
Further, as a plurality of scales related to the region, the region scale A1, the region scale A2, and the region scale A3 are used. As an example, the area scale A1 is "Tokyo", the area scale A2 is "Kanto", and the area scale A3 is "Japan (nationwide)".
Further, as a plurality of scales related to the device type, the device type scale D1 and the device type scale D2 are used. As an example, the device type scale D1 is a "specific model A1", and the device type scale D2 is a "specific manufacturer B1".

また、図３の例では、時刻を表す軸の方向（矢印）を示してある。時刻の範囲（時間の区分）として、時刻範囲ｔ０、時刻範囲ｔ１、時刻範囲ｔ２、時刻範囲ｔ３が用いられている。時刻範囲ｔ０は「０秒以上１０秒未満」であり、時刻範囲ｔ１は「１０秒以上２０秒未満」であり、時刻範囲ｔ２は「２０秒以上３０秒未満」であり、時刻範囲ｔ３は「３０秒以上４０秒未満」である。
なお、時刻の初期値（本実施形態では、０秒）としては、任意のタイミングが用いられてもよい。 Further, in the example of FIG. 3, the direction (arrow) of the axis representing the time is shown. As the time range (time division), the time range t0, the time range t1, the time range t2, and the time range t3 are used. The time range t0 is "0 seconds or more and less than 10 seconds", the time range t1 is "10 seconds or more and less than 20 seconds", the time range t2 is "20 seconds or more and less than 30 seconds", and the time range t3 is " 30 seconds or more and less than 40 seconds. "
Any timing may be used as the initial value of the time (0 seconds in this embodiment).

ここで、１個の時間スケールと、１個の領域スケールと、１個のデバイス種別スケールが特定されて、当該時間スケールに応じた１個の時刻範囲が特定されると、単位となる統計量データ（本実施形態において、「単位統計量データ」ともいう。）が特定される。
１個の時間スケールと、１個の領域スケールと、１個のデバイス種別スケールと、当該時間スケールに応じた１個の時刻範囲によって特定される単位統計量データは、当該時刻範囲の最大の時刻から当該時間スケールだけ過去に遡った時刻までの間に属し、かつ、当該領域スケールに属し、かつ、当該デバイス種別スケールに属するデータについて、統計量データ処理部１３２によって統計的な演算を行うことにより得られた統計量データに相当する。 Here, when one time scale, one area scale, and one device type scale are specified and one time range corresponding to the time scale is specified, the statistic becomes a unit. Data (also referred to as "unit statistic data" in this embodiment) is specified.
The unit statistic data specified by one time scale, one area scale, one device type scale, and one time range according to the time scale is the maximum time in the time range. By performing a statistical calculation by the statistic data processing unit 132 for data belonging to the time scale from to the time retroactive to the time, belonging to the region scale, and belonging to the device type scale. Corresponds to the obtained statistic data.

図３の例では、時間スケールｓ０かつ時刻範囲ｔ３に該当する６個の単位統計量データ２１１〜２１３、２２１〜２２３と、時間スケールｓ１かつ時刻範囲ｔ３に該当する６個の単位統計量データ３１１〜３１３、３２１〜３２３と、時間スケールｓ２かつ時刻範囲ｔ３に該当する６個の単位統計量データ４１１〜４１３、４２１〜４２３だけに符号を付してあり、他の単位統計量データについては符号を省略してある。 In the example of FIG. 3, six unit statistic data 211 to 213 and 221 to 223 corresponding to the time scale s0 and the time range t3 and six unit statistic data 311 corresponding to the time scale s1 and the time range t3. Only 313 and 321 to 223 and the six unit statistic data 411 to 413 and 421 to 423 corresponding to the time scale s2 and the time range t3 are coded, and the other unit statistic data are coded. Is omitted.

一例として、単位統計量データ２１１は、時間スケールＳ０（１０秒）、領域スケールＡ１（東京）、デバイス種別スケールＤ１（特定の機種Ａ１）、時刻範囲ｔ３（３０秒以上４０秒未満）に該当する。そして、当該単位統計量データ２１１は、時刻が３０秒以上４０秒未満に属し、かつ、領域が東京に属し、かつ、デバイス種別が特定の機種Ａ１に属するデータに基づいて得られた統計量データである。すなわち、当該データは、時刻が３０秒以上４０秒未満に発生し、東京に存在する端末装置１１−１〜１１−ｎにおいて発生し、デバイス種別が特定の機種Ａ１である当該端末装置１１−１〜１１−ｎにおいて発生したデータであることを意味する。当該単位統計量データは、このようなデータの集合を用いて得られた統計量データである。 As an example, the unit statistic data 211 corresponds to the time scale S0 (10 seconds), the area scale A1 (Tokyo), the device type scale D1 (specific model A1), and the time range t3 (30 seconds or more and less than 40 seconds). .. The unit statistic data 211 is statistic data obtained based on data in which the time belongs to 30 seconds or more and less than 40 seconds, the area belongs to Tokyo, and the device type belongs to a specific model A1. Is. That is, the data is generated in the terminal devices 11-1 to 11-n existing in Tokyo, the time is 30 seconds or more and less than 40 seconds, and the device type is the specific model A1. It means that the data is generated in ~ 11-n. The unit statistic data is statistic data obtained by using a set of such data.

ここで、単位統計量データとしては、任意の統計量のデータが用いられてもよく、例えば、順序統計に関する任意の値のデータが用いられてもよく、あるいは、平均値のデータが用いられてもよい。
順序統計に関する値としては、例えば、中央値が用いられてもよい。なお、一般に、処理対象となる複数のデータが同じである場合、平均値を取得（演算）する処理よりも、中央値を取得する処理の方が、処理時間が短くなると考えられる。
また、順序統計に関する値としては、例えば、累積分布関数（ＣＤＦ：ＣｕｍｕｌａｔｉｖｅＤｉｓｔｒｉｂｕｔｉｏｎＦｕｎｃｔｉｏｎ）の値が用いられてもよく、あるいは、確率分布関数（ＰＤＦ：ＰｒｏｂａｂｉｌｉｔｙＤｅｎｓｉｔｙＦｕｎｃｔｉｏｎ）の値が用いられてもよい。 Here, as the unit statistic data, data of an arbitrary statistic may be used, for example, data of an arbitrary value related to order statistics may be used, or data of an average value may be used. May be good.
As the value related to the order statistic, for example, the median value may be used. In general, when a plurality of data to be processed are the same, it is considered that the processing time for acquiring the median value is shorter than the processing for acquiring (calculating) the average value.
Further, as the value related to the order statistics, for example, the value of the cumulative distribution function (CDF: Cumulative Distribution Function) may be used, or the value of the probability distribution function (PDF: Probability Density Function) may be used. ..

図３の例では、それぞれの四角（直方体あるいは立方体）の単位が単位統計量データ（単位統計量データ２１１〜２１３、２２１〜２２３、３１１〜３１３、３２１〜３２３、４１１〜４１３、４２１〜４２３など）に相当する。
なお、統計量データ群生成部１５３は、任意の手法を用いて、単位統計量データを取得してもよく、例えば、既存の技術であるｔ−ｄｉｇｅｓｔの技術（例えば、非特許文献１など参照。）を用いて単位統計量データを演算して取得してもよい。本実施形態では、統計量データ群は、複数の単位統計量データを含んで構成される。 In the example of FIG. 3, the unit of each square (rectangular parallelepiped or cube) is unit statistic data (unit statistic data 211-213, 221-223, 31-13, 321-23, 411-413, 421-423, etc. ) Corresponds to.
The statistic data group generation unit 153 may acquire unit statistic data by using an arbitrary method. For example, refer to an existing technique of t-digest (for example, Non-Patent Document 1). ) May be used to calculate and obtain unit statistic data. In the present embodiment, the statistic data group is configured to include a plurality of unit statistic data.

また、１個の観点について用いられる複数のスケールとしては、例えば、すべてについて互いに包含関係にある複数のスケールが用いられてもよく、あるいは、すべてについて互いに包含関係にない複数のスケールが用いられてもよく、あるいは、一部のみについて包含関係にある複数のスケールが用いられてもよい。
すべてについて互いに包含関係にある複数のスケールとしては、領域のスケールを例とすると、例えば、「東京」、「関東」、「日本」がある。
すべてについて互いに包含関係にない複数のスケールとしては、領域のスケールを例とすると、例えば、「東京」、「千葉」、「茨城」がある。
一部のみについて包含関係にある複数のスケールとしては、領域のスケールを例とすると、例えば、「東京」、「関東」（東京を含む。）、「大阪」がある。 Further, as the plurality of scales used for one viewpoint, for example, a plurality of scales having an inclusive relationship with each other may be used, or a plurality of scales having an inclusive relationship with each other may be used. Alternatively, a plurality of scales that are inclusively related to only a part may be used.
As an example of the scale of the area, there are "Tokyo", "Kanto", and "Japan" as a plurality of scales having an inclusive relationship with each other.
As a plurality of scales that are not inclusive of each other, for example, "Tokyo", "Chiba", and "Ibaraki" are examples of the area scale.
As an example of the scale of the area, there are "Tokyo", "Kanto" (including Tokyo), and "Osaka" as a plurality of scales that are inclusively related to only a part.

［統計量データ処理部において行われる異常検出処理］
図４は、本発明の一実施形態に係る統計量データに基づく異常検出処理の概要を示す図である。
本実施形態では、解析対象（比較対象）のデータおよび基準となるデータとして、複数の端末装置１１−１〜１１−ｎについて、レイテンシー（Ｌａｔｅｎｃｙ）に関するデータが用いられている。レイテンシーとしては、それぞれの端末装置１１−１〜１１−ｎから送信された要求に対して応答が到来するまでの時間（遅延時間）が用いられている。
ここで、それぞれの端末装置１１−１〜１１−ｎはそれぞれ異なる人に所持されているとする。そして、説明の便宜上、平日には決まった時間に通勤ラッシュがあり、休日には通勤ラッシュが無いとする。また、通常、平日には朝の通勤ラッシュと夕方の通勤ラッシュがあるが、本例では、朝の通勤ラッシュのみを示し、夕方の通勤ラッシュを省略する。 [Abnormality detection processing performed in the statistic data processing unit]
FIG. 4 is a diagram showing an outline of an abnormality detection process based on statistical data according to an embodiment of the present invention.
In the present embodiment, data relating to latency is used for a plurality of terminal devices 11-1 to 11-n as data to be analyzed (comparison target) and reference data. As the latency, the time (delay time) until a response arrives in response to the request transmitted from each terminal device 11-1 to 11-n is used.
Here, it is assumed that each terminal device 11-1 to 11-n is possessed by a different person. For convenience of explanation, it is assumed that there is a commuting rush at a fixed time on weekdays and no commuting rush on holidays. In addition, there is usually a morning rush hour and an evening commuting rush on weekdays, but in this example, only the morning commuting rush is shown and the evening commuting rush is omitted.

図４の例では、解析対象のデータ５２１に基づいて、解析対象の統計量データ群５２２が生成されている。本実施形態では、解析対象の統計量データ群５２２は、異常の有無を検出する対象となる１日分のデータに基づいて生成されている。
また、図４の例では、基準となるデータ５１１として、過去の３０日分のそれぞれの日について、１日分のデータに基づいて生成された統計量データ群５１１−１〜５１１−３０が用いられている。ここで、基準となるデータ５１１が取得された３０日については、異常が発生していないとし、正常な基準のデータが取得されたとする。 In the example of FIG. 4, the statistic data group 522 to be analyzed is generated based on the data 521 to be analyzed. In the present embodiment, the statistic data group 522 to be analyzed is generated based on the data for one day to be detected for the presence or absence of abnormality.
Further, in the example of FIG. 4, as the reference data 511, the statistic data group 511-1 to 511-30 generated based on the data for one day is used for each day of the past 30 days. Has been done. Here, it is assumed that no abnormality has occurred on the 30th day when the reference data 511 is acquired, and it is assumed that the normal reference data is acquired.

統計量データ比較部１５４は、基準となるデータ５１１と、解析対象となる統計量データ群５２２とを比較する処理（比較処理５２３）を行う。
異常範囲判定部１５５は、このような比較の結果に基づいて、統計量データについて異常範囲を判定する処理（本例では、異常な分布を検出する異常検出処理５２４）を行う。これにより、異常範囲判定部１５５は、解析対象となる統計量データ群について、異常範囲５２５を判定する。
本実施形態では、異常範囲５２５が存在した場合には異常があることが検出（判定）され、異常範囲５２５が存在しない場合には異常が無いことが検出（判定）される。
なお、本例では、異常範囲５２５として、時刻（時間）の範囲である異常期間を検出（判定）する場合について説明する。 The statistic data comparison unit 154 performs a process (comparison process 523) of comparing the reference data 511 with the statistic data group 522 to be analyzed.
Based on the result of such comparison, the abnormality range determination unit 155 performs a process of determining the abnormality range of the statistical data (in this example, the abnormality detection process 524 for detecting an abnormal distribution). As a result, the abnormality range determination unit 155 determines the abnormality range 525 for the statistical data group to be analyzed.
In the present embodiment, when the abnormality range 525 exists, it is detected (determined) that there is an abnormality, and when the abnormality range 525 does not exist, it is detected (determined) that there is no abnormality.
In this example, a case where an abnormal period within a time (time) range is detected (determined) as an abnormal range 525 will be described.

図５〜図７を参照して、統計量データの例を示す。なお、図５〜図７に示される統計量データは、説明の便宜上、人為的に作成したものであり、データ分布の時間推移を表す。
図５は、本発明の一実施形態に係る基準となる統計量データの一例を示す図である。
図５の例では、平日（Ｗｅｅｋｄａｙ）における統計量データを示してある。
図５に示されるグラフにおいて、横軸は１日分の２４時間（時刻０ａｍから時刻１２ｐｍまで）について時刻を表わしており、縦軸はそれぞれの端末装置１１−１〜１１−ｎのレイテンシー［ｍｓ］を表わしている。
このグラフには、それぞれの端末装置１１−１〜１１−ｎについて検出されたレイテンシーを示してある。
また、このグラフには、複数の端末装置１１−１〜１１−ｎのレイテンシーについて、順序統計におけるパーセンタイル特性を示してある。具体的には、１０パーセンタイルの特性６１１、２５パーセンタイルの特性６１２、５０パーセンタイルの特性６１３、７５パーセンタイルの特性６１４、９０パーセンタイルの特性６１５を示してある。
また、本例では、平日には８ａｍ前後に通勤ラッシュがあり、傾向としてレイテンシーが１日のなかで最大になるとする。 An example of statistical data is shown with reference to FIGS. 5 to 7. The statistical data shown in FIGS. 5 to 7 are artificially created for convenience of explanation, and represent the time transition of the data distribution.
FIG. 5 is a diagram showing an example of statistical data as a reference according to an embodiment of the present invention.
In the example of FIG. 5, statistical data on weekdays (Weekday) is shown.
In the graph shown in FIG. 5, the horizontal axis represents the time for 24 hours per day (from time 0 am to time 12 pm), and the vertical axis represents the latency [ms] of each terminal device 11-11 to 11-n. ] Is represented.
This graph shows the latency detected for each terminal device 11-1 to 11-n.
In addition, this graph shows percentile characteristics in order statistics for the latencies of a plurality of terminal devices 11-1 to 11-n. Specifically, the characteristics of the 10th percentile 611, the characteristics of the 25th percentile 612, the characteristics of the 50th percentile 613, the characteristics of the 75th percentile 614, and the characteristics of the 90th percentile 615 are shown.
Further, in this example, it is assumed that there is a commuting rush around 8 am on weekdays, and the latency tends to be the maximum in the day.

図６は、本発明の一実施形態に係る基準となる統計量データの他の一例を示す図である。
図６の例では、休日（Ｗｅｅｋｅｎｄ）における統計量データを示してある。
図６に示されるグラフにおいて、横軸は１日分の２４時間（時刻０ａｍから時刻１２ｐｍまで）について時刻を表わしており、縦軸はそれぞれの端末装置１１−１〜１１−ｎのレイテンシー［ｍｓ］を表わしている。
このグラフには、それぞれの端末装置１１−１〜１１−ｎについて検出されたレイテンシーを示してある。
また、このグラフには、複数の端末装置１１−１〜１１−ｎのレイテンシーについて、順序統計におけるパーセンタイル特性を示してある。具体的には、１０パーセンタイルの特性６２１、２５パーセンタイルの特性６２２、５０パーセンタイルの特性６２３、７５パーセンタイルの特性６２４、９０パーセンタイルの特性６２５を示してある。
また、本例では、休日には通勤ラッシュが無く、傾向としてレイテンシーが特に大きくなる期間が観測されないとする。 FIG. 6 is a diagram showing another example of statistical data as a reference according to an embodiment of the present invention.
In the example of FIG. 6, statistical data on holidays (Weekend) is shown.
In the graph shown in FIG. 6, the horizontal axis represents the time for 24 hours per day (from time 0 am to time 12 pm), and the vertical axis represents the latency [ms] of each terminal device 11-11 to 11-n. ] Is represented.
This graph shows the latency detected for each terminal device 11-1 to 11-n.
In addition, this graph shows percentile characteristics in order statistics for the latencies of a plurality of terminal devices 11-1 to 11-n. Specifically, the characteristics 621 of the 10th percentile, the characteristic 622 of the 25th percentile, the characteristic 623 of the 50th percentile, the characteristic 624 of the 75th percentile, and the characteristic 625 of the 90th percentile are shown.
Further, in this example, it is assumed that there is no commuting rush on holidays and a period in which the latency is particularly large is not observed as a tendency.

図７は、本発明の一実施形態に係る解析対象となる統計量データの一例を示す図である。
図７の例では、平日（Ｗｅｅｋｄａｙ）における統計量データを示してある。
図７に示されるグラフにおいて、横軸は１日分の２４時間（時刻０ａｍから時刻１２ｐｍまで）について時刻を表わしており、縦軸はそれぞれの端末装置１１−１〜１１−ｎのレイテンシー［ｍｓ］を表わしている。
このグラフには、それぞれの端末装置１１−１〜１１−ｎについて検出されたレイテンシーを示してある。
また、このグラフには、複数の端末装置１１−１〜１１−ｎのレイテンシーについて、順序統計におけるパーセンタイル特性を示してある。具体的には、１０パーセンタイルの特性６３１、２５パーセンタイルの特性６３２、５０パーセンタイルの特性６３３、７５パーセンタイルの特性６３４、９０パーセンタイルの特性６３５を示してある。
また、本例では、このグラフには、８ａｍ前後に通勤ラッシュがあり、傾向としてレイテンシーが１日のなかで最大になるが、基準となるデータ（ここでは、図５の例）と比べて異常なデータ分布になっているとする。具体的には、図７の例では、１０パーセンタイルの特性６３１と２５パーセンタイルの特性６３２とが正常時（基準時）と比べて近くなっている。 FIG. 7 is a diagram showing an example of statistical data to be analyzed according to an embodiment of the present invention.
In the example of FIG. 7, statistical data on weekdays (Weekday) is shown.
In the graph shown in FIG. 7, the horizontal axis represents the time for 24 hours per day (from time 0 am to time 12 pm), and the vertical axis represents the latency [ms] of each terminal device 11-11 to 11-n. ] Is represented.
This graph shows the latency detected for each terminal device 11-1 to 11-n.
In addition, this graph shows percentile characteristics in order statistics for the latencies of a plurality of terminal devices 11-1 to 11-n. Specifically, the characteristics 631 of the 10th percentile, the characteristic 632 of the 25th percentile, the characteristic 633 of the 50th percentile, the characteristic 634 of the 75th percentile, and the characteristic 635 of the 90th percentile are shown.
In addition, in this example, there is a commuting rush around 8 am in this graph, and the latency tends to be the maximum in one day, but it is abnormal compared to the reference data (here, the example in FIG. 5). Data distribution is assumed. Specifically, in the example of FIG. 7, the characteristic 631 of the 10th percentile and the characteristic 632 of the 25th percentile are closer than those in the normal state (reference time).

本例では、基準となる３０日分の統計量データとして、図５に示されるような平日（ここでは、月−金）の統計量データが２２日分取得されており、図６に示されるような休日（ここでは、土−日）の統計量データが８日分取得されている。
そして、本例では、統計量データ処理部１３２は、平日の統計量データを基準として用いるとともに、休日の統計量データを別の基準として用いる。このため、統計量データ比較部１５４は、解析対象となる１日分の統計量データと基準となる２２日分の平日の統計量データとを比較する処理と、解析対象となる１日分の統計量データと基準となる８日分の休日の統計量データとを比較する処理とのうちの１以上を行う。 In this example, as the reference 30-day statistic data, 22 days of weekday (here, Monday-Friday) statistic data as shown in FIG. 5 are acquired, and are shown in FIG. Statistic data for such holidays (here, Saturday-Sunday) has been acquired for eight days.
Then, in this example, the statistic data processing unit 132 uses the statistic data on weekdays as a reference and the statistic data on holidays as another reference. Therefore, the statistic data comparison unit 154 compares the statistic data for one day to be analyzed with the statistic data for 22 days as a reference on weekdays, and the statistic data comparison unit 154 for one day to be analyzed. Perform one or more of the processes of comparing the statistic data with the reference statistic data of eight days' holidays.

図８は、本発明の一実施形態に係る時刻範囲ごとにおける統計量データの比較結果の一例を示す図である。
図８の例では、図５に示される基準となる平日の統計量データと、図７に示される解析対象となる統計量データとを比較する。また、基準となる平日の統計量データとしては、２２日分のデータが使用され、それぞれの日ごとに１日分の統計量データが使用される。このような２２日分のデータはクラスタとなっている。また、解析対象となる統計量データとしては、１日分のデータが使用され、その日の１日分の統計量データが使用される。 FIG. 8 is a diagram showing an example of a comparison result of statistical data for each time range according to an embodiment of the present invention.
In the example of FIG. 8, the reference weekday statistic data shown in FIG. 5 and the statistic data to be analyzed shown in FIG. 7 are compared. In addition, as the reference weekday statistic data, 22 days worth of data is used, and one day's worth of statistic data is used for each day. Such 22 days' worth of data is in a cluster. Further, as the statistic data to be analyzed, the data for one day is used, and the statistic data for one day on that day is used.

図８の例では、時刻範囲は、３０分ごとの範囲となっており、具体的には、６時から６時３０分（６：００ｔｏ６：３０）までの範囲、６時３０分から７時（６：３０ｔｏ７：００）までの範囲、７時から７時３０分（７：００ｔｏ７：３０）までの範囲、７時３０分から８時（７：３０ｔｏ８：００）までの範囲、８時から８時３０分（８：００ｔｏ８：３０）までの範囲、８時３０分から９時（８：３０ｔｏ９：００）までの範囲となっている。 In the example of FIG. 8, the time range is a range of every 30 minutes, specifically, a range from 6:00 to 6:30 (6:00 to 6:30), and from 6:30 to 7. Range from time (6:30 to 7:00), range from 7:00 to 7:30 (7:00 to 7:30), from 7:30 to 8:00 (7:30 to 8:00) The range is from 8:00 to 8:30 (8:00 to 8:30), and from 8:30 to 9:00 (8:30 to 9:00).

また、図８の例では、観点「時間」のスケール（時間スケール）として、３０分のスケールが用いられている。この３０分の時間スケールの期間は、３０分ごとの時刻範囲の期間と合わせられている。
また、図８の例では、平面的に図示してあるが、時間以外の任意の数の観点が用いられてもよい。この場合、図８に示されるそれぞれの時刻範囲におけるデータは、時間以外のそれぞれの観点のスケールについても所定の大きさのスケールが設定されたときのデータとなる。 Further, in the example of FIG. 8, a scale of 30 minutes is used as a scale (time scale) of the viewpoint "time". The period of this 30-minute time scale is combined with the period of the time range every 30 minutes.
Further, in the example of FIG. 8, although it is shown in a plane, any number of viewpoints other than time may be used. In this case, the data in each time range shown in FIG. 8 is the data when a scale of a predetermined size is set for the scale of each viewpoint other than time.

図８の例では、それぞれの時刻範囲において、基準となる平日の統計量データに相当する値（本実施形態において、「基準値」ともいう。）が２２個示されており、クラスタを形成している。また、解析対象となる統計量データに相当する値（本実施形態において、「対象値」ともいう。）が１個示されている。
なお、図８の例では、時刻範囲（６：００ｔｏ６：３０）における１個の基準値７１１に符号を付してあり、他の基準値については符号を省略してある。また、それぞれの時刻範囲における１個の対象値７２１〜７２６に符号を付してある。 In the example of FIG. 8, 22 values (also referred to as “reference values” in the present embodiment) corresponding to the reference weekday statistical data are shown in each time range to form a cluster. ing. In addition, one value (also referred to as "target value" in the present embodiment) corresponding to the statistical data to be analyzed is shown.
In the example of FIG. 8, one reference value 711 in the time range (6:00 to 6:30) is coded, and the reference values are omitted for the other reference values. Further, one target value 721 to 726 in each time range is coded.

統計量データ比較部１５４は、それぞれの時刻範囲において、複数（本例では、２２個）の基準値と１個の対象値（それぞれの時刻範囲における対象値７２１〜７２６）との距離を表す値を演算する。
ここで、当該距離としては、任意の距離が用いられてもよく、例えば、ＣＤＦ−ｂａｓｅｄＪｅｎｓｅｎ−Ｓｈａｎｎｏｎ距離、あるいは、ＰＤＦ−ｂａｓｅｄＪｅｎｓｅｎ−Ｓｈａｎｎｏｎ距離が用いられてもよく、他の例として、それぞれの日のパーセンタイル値を特徴量化して、ユークリッド距離が用いられてもよい。 The statistic data comparison unit 154 is a value representing the distance between a plurality of (22 in this example) reference values and one target value (target values 721 to 726 in each time range) in each time range. Is calculated.
Here, as the distance, an arbitrary distance may be used, for example, a CDF-based Jensen-Shannon distance or a PDF-based Jensen-Shannon distance may be used, and as another example, respectively. The Euclidean distance may be used by quantifying the percentile value of the day.

解析対象となる統計量データが異常を含まない時刻範囲では、当該統計量データは基準となるクラスタ（本例では、基準となる平日のクラスタ、あるいは、基準となる休日のクラスタ）に含まれるが、解析対象となる統計量データが異常を含む時刻範囲では、当該統計量データは基準となるクラスタに含まれない。 In the time range where the statistic data to be analyzed does not include anomalies, the statistic data is included in the reference cluster (in this example, the reference weekday cluster or the reference holiday cluster). , In the time range where the statistic data to be analyzed contains anomalies, the statistic data is not included in the reference cluster.

図９は、本発明の一実施形態に係る異常期間の判定処理の一例を説明するための図である。
図８の例では、時間スケールが３０分である場合の例を示した。異常範囲判定部１５５は、時間スケールが３０分である場合と、さらに、他の任意の数の異なる期間（異なる時間の長さ）を有する時間スケールについても、統計量データ比較部１５４による比較処理を行わせる。本例では、異常範囲判定部１５５は、時間スケールが１分、１０分、３０分、６０分（＝１時間）のそれぞれである場合について、統計量データ比較部１５４による比較処理を行わせる。 FIG. 9 is a diagram for explaining an example of an abnormal period determination process according to an embodiment of the present invention.
In the example of FIG. 8, an example when the time scale is 30 minutes is shown. The abnormality range determination unit 155 also performs comparison processing by the statistic data comparison unit 154 for the case where the time scale is 30 minutes and for the time scale having an arbitrary number of different periods (different time lengths). To do. In this example, the abnormality range determination unit 155 causes the statistic data comparison unit 154 to perform comparison processing when the time scales are 1 minute, 10 minutes, 30 minutes, and 60 minutes (= 1 hour), respectively.

図９に示されるグラフにおいて、横軸は６時（６：００）から１０時（１０：００）までの時刻を表わしており、縦軸はＬＯＦ（ＬｏｃａｌＯｕｔｌｉｅｒＦａｃｔｏｒ）の値を表わしている。ここで、ＬＯＦとしては、それぞれの時刻範囲における２２個の基準値のクラスタと１個の対象値７２１〜７２６とのＬＯＦが用いられている。当該ＬＯＦは、当該対象値７２１〜７２６が当該クラスタから外れている度合いを表す。
図９の例では、時間スケール「１分」が用いられたときにおけるＬＯＦの特性８１１、時間スケール「１０分」が用いられたときにおけるＬＯＦの特性８１２、時間スケール「３０分」が用いられたときにおけるＬＯＦの特性８１３、時間スケール「６０分」が用いられたときにおけるＬＯＦの特性８１４を示してある。なお、これら複数の異なる時間スケールの期間は、例えば、始点（図３の例では、時刻範囲ｔ０の始点）のタイミングが合わせられている。 In the graph shown in FIG. 9, the horizontal axis represents the time from 6:00 (6:00) to 10:00 (10:00), and the vertical axis represents the value of LOF (Local Outlier Factor). Here, as the LOF, a LOF of 22 reference value clusters and one target value 721 to 726 in each time range is used. The LOF represents the degree to which the target values 721 to 726 are out of the cluster.
In the example of FIG. 9, the LOF characteristic 811 when the time scale "1 minute" was used, the LOF characteristic 812 when the time scale "10 minutes" was used, and the time scale "30 minutes" were used. The characteristic 813 of the LOF at the time and the characteristic 814 of the LOF when the time scale "60 minutes" is used are shown. The timing of the start point (in the example of FIG. 3, the start point of the time range t0) is matched to the periods of the plurality of different time scales.

統計量データ処理装置１２では、例えば記憶部１１３に、ＬＯＦに関する所定の閾値Ｑ１が設定されている。
異常範囲判定部１５５は、まず、あらかじめ定められた複数の異なる時間スケールのそれぞれ（本例では、１分、１０分、３０分、６０分）について、演算されたＬＯＦが所定の閾値Ｑ１を超えるか否かを判定する。この判定の結果、異常範囲判定部１５５は、１個以上の時間スケールにおいてＬＯＦが所定の閾値Ｑ１を超える期間の全体を、異常期間の候補として、取得する。図９の例では、異常期間の候補は、７時から９時までとなる。なお、当該異常期間の候補は、例えば、ＬＯＦが所定の閾値Ｑ１を超える厳密な期間が用いられてもよく、あるいは、あらかじめ定められた区切り（例えば、１分単位の区切り、あるいは、５分単位の区切りなど）の期間が用いられて、当該厳密な期間からずれてもよい。 In the statistic data processing device 12, for example, the storage unit 113 is set with a predetermined threshold value Q1 regarding the LOF.
First, in the abnormality range determination unit 155, the calculated LOF exceeds a predetermined threshold value Q1 for each of a plurality of predetermined different time scales (1 minute, 10 minutes, 30 minutes, 60 minutes in this example). Judge whether or not. As a result of this determination, the abnormality range determination unit 155 acquires the entire period in which the LOF exceeds a predetermined threshold value Q1 on one or more time scales as candidates for the abnormality period. In the example of FIG. 9, the candidates for the abnormal period are from 7:00 to 9:00. As the candidate for the abnormal period, for example, a strict period in which the LOF exceeds a predetermined threshold value Q1 may be used, or a predetermined division (for example, a 1-minute unit division or a 5-minute unit) may be used. A period (such as a delimiter) may be used to deviate from the exact period.

次に、異常範囲判定部１５５は、異常期間の候補について、複数の時刻範囲を設定して、統計量データ比較部１５４による比較処理を行わせる。
本例では、異常範囲判定部１５５は、異常期間の候補である７時から９時までの期間について、既に使用された最小の時間スケールである１分を使用して、開始時刻と終了時刻とのそれぞれを１分ずつずらした時刻範囲を設定する。具体的には、異常範囲判定部１５５は、開始時刻を７時から８時５９分まで１分ごとにとり、終了時刻を７時１分から９時まで１分ごとにとり、当該開始時刻と当該終了時刻とのすべての組み合わせの時刻範囲を設定する。つまり、７時から７時１分まで、７時から７時２分まで、７時から７時３分まで、・・・、７時から８時５９分まで、７時１分から７時２分まで、７時１分から７時３分まで、・・・、８時５８分から８時５９分まで、８時５８分から９時まで、８時５９分から９時まで、の時刻範囲が設定される。なお、開始時刻の方が終了時刻よりも早くなる組み合わせだけが使用される。本例では、（開始時刻、終了時刻）の組み合わせの数が、｛１２０×１１９÷２｝となる。 Next, the abnormality range determination unit 155 sets a plurality of time ranges for the candidates of the abnormality period, and causes the statistic data comparison unit 154 to perform comparison processing.
In this example, the anomaly range determination unit 155 sets the start time and end time for the period from 7:00 to 9:00, which is a candidate for the anomaly period, by using 1 minute, which is the smallest time scale already used. Set a time range in which each of the above is shifted by 1 minute. Specifically, the abnormality range determination unit 155 takes the start time from 7:00 to 8:59 every minute and the end time from 7:01 to 9:00 every minute, and the start time and the end time. Set the time range for all combinations with. That is, from 7:00 to 7:01, from 7:00 to 7:02, from 7:00 to 7:03, ..., from 7:00 to 8:59, from 7:01 to 7:02. The time range is set from 7:01 to 7:03, ..., From 8:58 to 8:59, from 8:58 to 9:00, and from 8:59 to 9:00. Only combinations in which the start time is earlier than the end time are used. In this example, the number of combinations of (start time, end time) is {120 × 119/2}.

図９の例では、異常範囲判定部１５５は、設定した複数の時刻範囲のうちで、７時から８時３０分までの時刻範囲について演算されたＬＯＦが最大となることを判定する。そして、異常範囲判定部１５５は、当該時刻範囲に相当する期間（７時から８時３０分までの範囲）を異常範囲８１５（ＡＮＯＲＭＡＬＹＦＡＣＴＯＲ）であると判定する。
なお、本実施形態では、異常範囲判定部１５５は、常に（例えば、一定期間ごとに）、複数の時間スケールのそれぞれ（本例では、１分、１０分、３０分、６０分）におけるＬＯＦが所定の閾値Ｑ１を超えるか否かを判定しており、ＬＯＦが所定の閾値Ｑ１を超えた場合に、それ以降の処理として、複数の時刻範囲におけるＬＯＦについて最大となる時刻範囲を判定する処理へ移行する。 In the example of FIG. 9, the abnormality range determination unit 155 determines that the LOF calculated for the time range from 7:00 to 8:30 is the maximum among the plurality of set time ranges. Then, the abnormality range determination unit 155 determines that the period corresponding to the time range (the range from 7:00 to 8:30) is the abnormality range 815 (ANORMALLY FACTOR).
In the present embodiment, the abnormality range determination unit 155 always (for example, at regular intervals) has a LOF at each of a plurality of time scales (1 minute, 10 minutes, 30 minutes, 60 minutes in this example). It is determined whether or not the predetermined threshold value Q1 is exceeded, and when the LOF exceeds the predetermined threshold value Q1, as a subsequent process, the process of determining the maximum time range for the LOF in a plurality of time ranges is performed. Transition.

このように、異常範囲判定部１５５は、事前に取得された基準となる値（平常値）に対して、解析対象となる値（対象値）が乖離している度合いに基づいて、外れ値（異常値）であるか否かを判定する。
なお、基準となる値（平常値）は、例えば、機械学習などを用いて学習されてもよい。 In this way, the abnormal range determination unit 155 is outliers (outliers) based on the degree to which the value to be analyzed (target value) deviates from the reference value (normal value) acquired in advance. It is determined whether or not it is an abnormal value).
The reference value (normal value) may be learned by using, for example, machine learning.

ここで、異常期間の候補を検出するために最初に設定される時間スケールについて、その大きさ、あるいは、その数としては、それぞれ、任意であってもよい。
また、異常範囲判定部１５５は、異常期間の候補を検出するために最初に設定された時間スケールにおいて異常が無いことが判定された場合には、例えば、その異常検出処理を終了する。
また、異常期間の候補が検出された後に、検出された異常期間の候補に対して、さらに時刻範囲を設定して異常期間を判定する場合における当該時刻範囲について、その範囲、あるいは、その数としては、それぞれ、任意であってもよい。 Here, the size or the number of the time scales initially set to detect the candidates for the abnormal period may be arbitrary.
Further, when it is determined that there is no abnormality in the time scale initially set for detecting the candidate of the abnormality period, the abnormality range determination unit 155 ends, for example, the abnormality detection process.
In addition, after a candidate for an abnormal period is detected, the time range when a time range is further set for the detected candidate for the abnormal period to determine the abnormal period is defined as the range or the number thereof. May be arbitrary.

また、本実施形態では、時刻（あるいは、時間）の観点に関する指標（時間スケール、時刻範囲）について異常期間を判定する場合を示したが、他の例として、時刻（あるいは、時間）以外の観点に関する指標（例えば、当該観点のスケール、当該観点の範囲）について異常範囲を判定することが行われてもよい。時刻（あるいは、時間）以外の観点としては、例えば、領域の観点、あるいは、デバイス種別の観点などが用いられてもよい。
また、本実施形態では、１個の観点（時刻（あるいは、時間）の観点）について異常範囲を判定する場合を示したが、他の例として、２個以上の異なる観点の組み合わせについて異常範囲を判定することが行われてもよい。
また、本実施形態では、複数の基準値のクラスタと対象値との距離を演算して用いたが、他の構成例として、複数の基準値のクラスタの代わりに、当該クラスタにおける１個の代表値が用いられてもよい。当該クラスタにおける１個の代表値としては、例えば、当該クラスタに含まれる値の平均値（あるいは、重心値）が用いられてもよい。 Further, in the present embodiment, the case where the abnormal period is determined for the index (time scale, time range) related to the viewpoint of time (or time) is shown, but as another example, the viewpoint other than time (or time) Anomalous range may be determined for an index relating to (eg, scale of the viewpoint, range of the viewpoint). As a viewpoint other than the time (or time), for example, a viewpoint of an area, a viewpoint of a device type, or the like may be used.
Further, in the present embodiment, the case where the abnormal range is determined for one viewpoint (time (or time) viewpoint) is shown, but as another example, the abnormal range is determined for a combination of two or more different viewpoints. Judgment may be made.
Further, in the present embodiment, the distances between the clusters of a plurality of reference values and the target values are calculated and used, but as another configuration example, one representative in the cluster is used instead of the clusters of the plurality of reference values. Values may be used. As one representative value in the cluster, for example, the average value (or the center of gravity value) of the values included in the cluster may be used.

なお、基準となる統計量データとしては、基準となる平日の統計量データの代わりに、図６に示される基準となる休日の統計量データが用いられてもよい。
また、他の構成例として、複数種類（例えば、平日と休日）の基準となる統計量データと解析対象となる統計量データとを並列的に比較してもよい。この場合、例えば、比較処理が行われるタイミングごとに、複数種類の基準となる統計量データ（本実施形態の場合、複数種類のクラスタ）のなかで解析対象となる統計量データに最も近いもの（本実施形態の場合、距離が最小となる１個のクラスタ）を比較対象とする手法が用いられてもよい。 As the reference statistic data, the reference holiday statistic data shown in FIG. 6 may be used instead of the reference weekday statistic data.
Further, as another configuration example, a plurality of types (for example, weekdays and holidays) of reference statistic data and analysis target statistic data may be compared in parallel. In this case, for example, among a plurality of types of reference statistic data (in the case of the present embodiment, a plurality of types of clusters), the one closest to the statistic data to be analyzed (in this embodiment) at each timing when the comparison process is performed In the case of the present embodiment, a method of comparing (one cluster having the smallest distance) may be used.

［統計量データ処理装置において行われる処理の例］
図１０は、本発明の一実施形態に係る統計量データ処理装置１２において行われる異常の有無を検出するための処理の手順の一例を示す図である。
本実施形態では、異常範囲判定部１５５は、統計量データ比較部１５４、観点設定部１５１、スケール設定部１５２にそれぞれの処理を行わせて、異常範囲を判定する処理を行う。
なお、本例は一例であり、他の任意の処理手順が用いられてもよい。 [Example of processing performed in the statistic data processing device]
FIG. 10 is a diagram showing an example of a processing procedure for detecting the presence or absence of an abnormality performed in the statistic data processing device 12 according to the embodiment of the present invention.
In the present embodiment, the abnormality range determination unit 155 causes the statistic data comparison unit 154, the viewpoint setting unit 151, and the scale setting unit 152 to perform their respective processes to determine the abnormality range.
Note that this example is an example, and any other processing procedure may be used.

（ステップＳ１）
統計量データ比較部１５４は、異常範囲判定のための基準となるデータ（基準データ）を取得する。
（ステップＳ２）
統計量データ比較部１５４は、異常範囲判定のための対象となるデータ（対象データ）を取得する。
（ステップＳ３）
観点設定部１５１は、異常範囲判定の処理に使用する観点を設定する。
（ステップＳ４）
スケール設定部１５２は、異常範囲判定の処理に使用するスケールを設定する。
（ステップＳ５）
統計量データ比較部１５４は、設定された観点および設定されたスケールを使用して、基準データと対象データとを比較する。
（ステップＳ６）
異常範囲判定部１５５は、統計量データ比較部１５４による比較処理の結果に基づいて、異常範囲を判定し、これにより、異常の有無が検出される。なお、異常範囲判定部１５５は、比較処理の条件を複数種類に変更して、統計量データ比較部１５４による比較処理を行わせて、異常範囲を判定してもよい。比較処理の条件としては、例えば、観点、スケール、範囲（時刻範囲など）のうちの１以上に関する条件であってもよい。 (Step S1)
The statistic data comparison unit 154 acquires data (reference data) as a reference for determining the abnormal range.
(Step S2)
The statistic data comparison unit 154 acquires the target data (target data) for determining the abnormal range.
(Step S3)
The viewpoint setting unit 151 sets the viewpoint used for the processing of determining the abnormal range.
(Step S4)
The scale setting unit 152 sets the scale used for the processing of determining the abnormal range.
(Step S5)
The statistic data comparison unit 154 compares the reference data with the target data using the set viewpoint and the set scale.
(Step S6)
The abnormality range determination unit 155 determines the abnormality range based on the result of the comparison process by the statistic data comparison unit 154, and thereby detects the presence or absence of an abnormality. The abnormality range determination unit 155 may change the conditions of the comparison processing to a plurality of types and perform the comparison processing by the statistic data comparison unit 154 to determine the abnormality range. The condition of the comparison process may be, for example, a condition relating to one or more of the viewpoint, scale, and range (time range, etc.).

［第１実施形態のまとめ］
以上のように、本実施形態に係るデータ処理システム１では、統計量データ処理装置１２において、多次元およびマルチスケールのデータ解析を行うことで異常を検出することを可能とすることができる。
本実施形態に係る統計量データ群のデータ構造では、例えば、複数の観点および複数のスケールで、異常の発生などの事象を監視して検出することなどが可能であり、様々な観点および様々なスケールの事象を並列に監視して検出することなどが可能である。この場合に、本実施形態に係る統計量データ群のデータ構造では、例えば、発生した事象がいずれの観点およびいずれのスケールでの事象であるかを判定することが可能である。具体例として、広い領域のスケールで事象が発生した場合には、広い領域にわたる原因による事象であると推定することができ、また、特定の領域のスケールで事象が発生した場合には、当該特定の領域に限られた原因による事象であると推定することができる。
ここで、本実施形態では、端末装置１１−１〜１１−ｎに関する値について統計量データを処理する構成としたが、他の任意の値について統計量データを処理する構成が実施されてもよい。 [Summary of the first embodiment]
As described above, in the data processing system 1 according to the present embodiment, it is possible to detect an abnormality by performing multidimensional and multiscale data analysis in the statistic data processing device 12.
In the data structure of the statistic data group according to the present embodiment, for example, it is possible to monitor and detect an event such as the occurrence of an abnormality from a plurality of viewpoints and a plurality of scales, and various viewpoints and various scales. It is possible to monitor and detect scale events in parallel. In this case, in the data structure of the statistic data group according to the present embodiment, for example, it is possible to determine which viewpoint and scale the event has occurred. As a specific example, when an event occurs on a wide area scale, it can be estimated that the event is caused by a wide area, and when an event occurs on a specific area scale, the specific event occurs. It can be presumed that the cause is limited to the area of.
Here, in the present embodiment, the configuration for processing the statistic data for the values related to the terminal devices 11-1 to 11-n is set, but the configuration for processing the statistic data for other arbitrary values may be implemented. ..

（第２実施形態）
［データ処理システム］
図１１は、本発明の一実施形態（第２実施形態）に係るデータ処理システム１００１の概略的な構成を示すブロック図である。
データ処理システム１００１は、ｎ個の端末装置１１−１〜１１−ｎと、統計量データ処理装置１０１１と、データベース１０１２と、単位統計量データ生成装置１０２１と、ネットワーク２１を備える。ここで、端末装置１１−１〜１１−ｎと、ネットワーク２１は、図１に示されるものと同様であり、説明の便宜上、同じ符号を付してある。
また、データベース１０１２は、図１に示されるデータベース１３と同様に、データを記憶する機能を有する。 (Second Embodiment)
[Data processing system]
FIG. 11 is a block diagram showing a schematic configuration of a data processing system 1001 according to an embodiment (second embodiment) of the present invention.
The data processing system 1001 includes n terminal devices 11-11 to 11-n, a statistic data processing device 1011, a database 1012, a unit statistic data generation device 1021, and a network 21. Here, the terminal devices 11-11 to 11-n and the network 21 are the same as those shown in FIG. 1, and are designated by the same reference numerals for convenience of explanation.
Further, the database 1012 has a function of storing data in the same manner as the database 13 shown in FIG.

本実施形態に係るデータ処理システム１００１について、図１に示されるデータ処理システム１との相違点について説明する。図１１の例では、図１の例と比べて、単位統計量データ生成装置１０２１を備えている点と、統計量データ処理装置１０１１により行われる処理の一部が、相違する。
単位統計量データ生成装置１０２１は、ｔ−ｄｉｇｅｓｔの技術（例えば、非特許文献１など参照。）により実行することが可能な処理の全部または一部を行う機能を有している。本実施形態では、単位統計量データ生成装置１０２１は、ｔ−ｄｉｇｅｓｔの技術を用いて、解析対象のデータ、観点を特定する情報、および、それぞれの観点のスケールを特定する情報に基づいて、単位統計量データを生成する機能を有する。
なお、単位統計量データ生成装置１０２１としては、例えば、統計量データ処理装置１０１１を管理する者により管理されてもよく、あるいは、他の者によって提供される単位統計量データ生成装置１０２１を利用する構成が用いられてもよい。 The difference between the data processing system 1001 according to the present embodiment and the data processing system 1 shown in FIG. 1 will be described. In the example of FIG. 11, the point that the unit statistic data generation device 1021 is provided and a part of the processing performed by the statistic data processing device 1011 are different from the example of FIG.
The unit statistic data generation device 1021 has a function of performing all or a part of the processing that can be executed by the t-digest technique (see, for example, Non-Patent Document 1 and the like). In the present embodiment, the unit statistic data generation device 1021 uses the t-digest technique to unitize the data based on the data to be analyzed, the information for specifying the viewpoint, and the information for specifying the scale of each viewpoint. It has a function to generate statistic data.
As the unit statistic data generation device 1021, for example, the unit statistic data generation device 1021 may be managed by a person who manages the statistic data processing device 1011 or is provided by another person. The configuration may be used.

統計量データ処理装置１０１１は、図１に示される統計量データ処理装置１２との相違点として、単位統計量データ生成装置１０２１により行われる処理については当該単位統計量データ生成装置１０２１に当該処理を要求して処理結果を受ける構成としてある。
本実施形態では、統計量データ群生成部１５３は、単位統計量データを生成する処理を要求する信号を、通信部１１４によりネットワーク２１を介して、単位統計量データ生成装置１０２１に送信する。当該信号には、単位統計量データを生成するために必要な情報が含まれ、例えば、解析対象のデータ（または、それを特定する情報）、１個以上の観点を特定する情報、および、それぞれの観点のスケールを特定する情報が含まれる。
単位統計量データ生成装置１０２１は、このような要求の信号を受信した場合、当該要求に応じて単位統計量データを生成し、生成された単位統計量データを含む信号を、ネットワーク２１を介して、統計量データ処理装置１０１１に送信する。
統計量データ群生成部１５３は、単位統計量データ生成装置１０２１から通信部１１４により受信された単位統計量データを使用（利用）して、統計量データ群を生成する。 The statistic data processing device 1011 differs from the statistic data processing device 12 shown in FIG. 1 in that the unit statistic data generation device 1021 performs the processing performed by the unit statistic data generation device 1021. It is configured to request and receive the processing result.
In the present embodiment, the statistic data group generation unit 153 transmits a signal requesting a process for generating unit statistic data to the unit statistic data generation device 1021 by the communication unit 114 via the network 21. The signal contains information necessary to generate unit statistic data, such as data to be analyzed (or information that identifies it), information that identifies one or more viewpoints, and each. Contains information that identifies the scale of the viewpoint.
When the unit statistic data generation device 1021 receives the signal of such a request, the unit statistic data generation device 1021 generates the unit statistic data in response to the request, and transmits the signal including the generated unit statistic data via the network 21. , Transmit to the statistic data processing device 1011.
The statistic data group generation unit 153 uses (uses) the unit statistic data received from the unit statistic data generation device 1021 by the communication unit 114 to generate a statistic data group.

（以上の実施形態のまとめ）
一構成例として、少なくとも１個の観点および観点ごとの複数のスケールについて、観点とスケールとの組み合わせごとに含まれるデータに基づく統計量データである基準データと比較対象データとを比較する統計量データ比較部（図２の例では、統計量データ比較部１５４）と、統計量データ比較部による比較の結果に基づいて、観点の複数の範囲のなかで異常があるとみなされる範囲を判定する異常範囲判定部（図２の例では、異常範囲判定部１５５）と、を備える統計量データ処理装置（図１、図１１の例では、統計量データ処理装置１２、１０１１）である。
一構成例として、異常範囲判定部は、統計量データ比較部による比較の結果に基づいて、異常があるとみなされる観点の範囲候補（異常範囲の候補）を判定し、異常範囲判定部は、観点の範囲候補に含まれる複数の範囲のなかで、基準データと比較対象データとの差を表すＬＯＦが最大となる範囲を、異常があるとみなされる範囲として、判定する。
一構成例として、統計量データ比較部は、複数の基準データと比較対象データとを比較する。
一構成例として、基準データと比較対象データとのうちの一方または両方は、複数の異なる観点および観点ごとの複数の異なるスケールについて、観点とスケールとの組み合わせごとに含まれるデータに基づく統計量データを有する統計量データ群（図３の例では、統計量データ群２０１）から取得される。
一構成例として、統計量データ比較部が、少なくとも１個の観点および観点ごとの複数のスケールについて、観点とスケールとの組み合わせごとに含まれるデータに基づく統計量データである基準データと比較対象データとを比較し、異常範囲判定部が、統計量データ比較部による比較の結果に基づいて、観点の複数の範囲のなかで異常があるとみなされる範囲を判定する、統計量データ処理方法（図１、図１１の例では、統計量データ処理装置１２、１０１１により行われる処理の方法）である。
一構成例として、統計量データ比較部が、少なくとも１個の観点および観点ごとの複数のスケールについて、観点とスケールとの組み合わせごとに含まれるデータに基づく統計量データである基準データと比較対象データとを比較するステップと、異常範囲判定部が、統計量データ比較部による比較の結果に基づいて、観点の複数の範囲のなかで異常があるとみなされる範囲を判定するステップと、をコンピュータに実行させるためのプログラム（図１、図１１の例では、統計量データ処理装置１２、１０１１を構成するコンピュータ）である。 (Summary of the above embodiments)
As a configuration example, for at least one viewpoint and a plurality of scales for each viewpoint, statistic data for comparing reference data and comparison target data, which are statistic data based on data included in each combination of viewpoint and scale. Anomalies that determine a range that is considered to be abnormal among multiple ranges of viewpoints based on the results of comparison between the comparison unit (statistical data comparison unit 154 in the example of FIG. 2) and the statistic data comparison unit. It is a statistic data processing device (statistical data processing device 12, 1011 in the examples of FIGS. 1 and 11) including a range determination unit (abnormal range determination unit 155 in the example of FIG. 2).
As a configuration example, the abnormality range determination unit determines a range candidate (candidate for an abnormality range) from a viewpoint considered to have an abnormality based on the result of comparison by the statistic data comparison unit, and the abnormality range determination unit determines. Among the plurality of ranges included in the range candidates of the viewpoint, the range in which the LOF representing the difference between the reference data and the comparison target data is maximized is determined as the range considered to be abnormal.
As a configuration example, the statistic data comparison unit compares a plurality of reference data with the comparison target data.
As a configuration example, one or both of the reference data and the comparison target data are statistic data based on the data included in each combination of viewpoints and scales for a plurality of different viewpoints and a plurality of different scales for each viewpoint. Is obtained from the statistic data group (statistical data group 201 in the example of FIG. 3).
As a configuration example, the statistic data comparison unit performs reference data and comparison target data, which are statistic data based on data included in each combination of viewpoints and scales for at least one viewpoint and a plurality of scales for each viewpoint. A statistic data processing method (Fig.) In which the abnormality range determination unit determines a range considered to be abnormal among a plurality of ranges of viewpoints based on the result of comparison by the statistic data comparison unit. 1. In the example of FIG. 11, the processing method performed by the statistic data processing devices 12 and 1011).
As a configuration example, the statistic data comparison unit performs reference data and comparison target data, which are statistic data based on data included in each combination of viewpoints and scales for at least one viewpoint and a plurality of scales for each viewpoint. The computer is provided with a step of comparing with and a step of determining the range in which the abnormality range determination unit is considered to be abnormal among a plurality of ranges of the viewpoint based on the result of comparison by the statistics data comparison unit. It is a program to be executed (in the example of FIGS. 1 and 11, the computer constituting the statistical data processing devices 12 and 1011).

以上に示した実施形態に係る各装置（例えば、統計量データ処理装置１２、１０１１など）の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体（記憶媒体）に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、処理を行ってもよい。
なお、ここでいう「コンピュータシステム」とは、オペレーティング・システムあるいは周辺機器等のハードウェアを含むものであってもよい。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。
さらに、「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークあるいは電話回線等の通信回線を介してプログラムが送信された場合のサーバあるいはクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記のプログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）あるいは電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記のプログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、上記のプログラムは、前述した機能をコンピュータシステムに既に記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 A program for realizing the functions of the devices (for example, the statistic data processing devices 12, 1011 and the like) according to the above-described embodiment is recorded on a computer-readable recording medium (storage medium), and the recording medium is recorded. The processing may be performed by loading the program recorded in the computer system into a computer system and executing the program.
The term "computer system" as used herein may include hardware such as an operating system or peripheral devices.
The "computer-readable recording medium" includes a flexible disk, a magneto-optical disk, a ROM (Read Only Memory), a writable non-volatile memory such as a flash memory, and a portable medium such as a DVD (Digital Versaille Disc). A storage device such as a hard disk built into a computer system.
Further, the "computer-readable recording medium" is a volatile memory inside a computer system that serves as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line (for example, DRAM (for example, DRAM). It also includes those that hold the program for a certain period of time, such as Dynamic Random Access Memory)).
Further, the above program may be transmitted from a computer system in which this program is stored in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the "transmission medium" for transmitting a program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
Further, the above program may be for realizing a part of the above-mentioned functions. Further, the above program may be a so-called difference file (difference program) that can realize the above-mentioned functions in combination with a program already recorded in the computer system.

以上、本発明の実施形態について図面を参照して詳述したが、具体的な構成はこの実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and design changes and the like within a range not deviating from the gist of the present invention are also included.

１、１００１…データ処理システム、１１−１〜１１−ｎ…端末装置、１２、１０１１…統計量データ処理装置、１３、１０１２…データベース、２１…ネットワーク、１１１…入力部、１１２…出力部、１１３…記憶部、１１４…通信部、１１５…制御部、１３１…データ取得部、１３２…統計量データ処理部、１３３…データ出力制御部、１５１…観点設定部、１５２…スケール設定部、１５３…統計量データ群生成部、１５４…統計量データ比較部、１５５…異常範囲判定部、２０１、５１１−１〜５１１−３０、５２２…統計量データ群、２１１〜２１３、２２１〜２２３、３１１〜３１３、３２１〜３２３、４１１〜４１３、４２１〜４２３…単位統計量データ、５１１…基準となるデータ、５２１…解析対象のデータ、５２３…比較処理、５２４…異常検出処理、５２５…異常範囲、６１１〜６１５、６２１〜６２５、６３１〜６３５、８１１〜８１４…特性、７１１…基準値、７２１〜７２６…対象値、１０２１…単位統計量データ生成装置、Ｑ１…閾値 1,1001 ... Data processing system, 11-1 to 11-n ... Terminal device, 12, 1011 ... Statistics data processing device, 13, 1012 ... Database, 21 ... Network, 111 ... Input unit, 112 ... Output unit, 113 ... Storage unit, 114 ... Communication unit, 115 ... Control unit, 131 ... Data acquisition unit, 132 ... Statistics data processing unit, 133 ... Data output control unit, 151 ... Viewpoint setting unit, 152 ... Scale setting unit, 153 ... Statistics Quantitative data group generation unit, 154 ... Statistics data comparison unit, 155 ... Abnormal range determination unit, 201, 511-1 to 511-30, 522 ... Statistics data group, 211-213, 221-223, 313-1313, 321-23, 411-413, 421-423 ... Unit statistics data, 511 ... Reference data, 521 ... Data to be analyzed, 523 ... Comparison processing, 524 ... Abnormality detection processing, 525 ... Abnormal range, 611-615 , 621-625, 631-635, 811-814 ... Characteristics, 711 ... Reference value, 721-726 ... Target value, 1021 ... Unit statistics data generator, Q1 ... Threshold

Claims

A statistic data comparison unit that compares reference data, which is statistic data based on data included in each combination of the viewpoint and the scale, with data to be compared with respect to at least one viewpoint and a plurality of scales for each viewpoint. When,
Based on the result of comparison by the statistic data comparison unit, an abnormality range determination unit that determines a range considered to have an abnormality among a plurality of ranges of the viewpoint, and an abnormality range determination unit.
Equipped with a,
The abnormality range determination unit determines a range candidate of the viewpoint considered to have an abnormality based on the result of comparison by the statistic data comparison unit.
The abnormality range determination unit considers that there is an abnormality in the range in which the LOF representing the difference between the reference data and the comparison target data is maximum among the plurality of ranges included in the range candidates of the viewpoint. Judge as a range,
Statistic data processor.

The abnormal range determination unit
As a process of determining the range candidate of the viewpoint, which is considered to be abnormal, based on the result of comparison by the statistic data comparison unit.
Based on the result of comparing the reference data and the comparison target data for a plurality of different scales by the statistic data comparison unit, an abnormality is found in the entire range determined to be abnormal in one or more scales. Judging that it is the range candidate of the viewpoint that is considered to be
The statistic data processing apparatus according to claim 1.

The statistic data comparison unit compares a plurality of the reference data with the comparison target data.
The statistic data processing apparatus according to any one of claims 1 and 2.

One or both of the reference data and the comparison target data are statistics based on data included in each combination of the viewpoint and the scale for a plurality of different viewpoints and a plurality of different scales for each viewpoint. Obtained from a statistic data group that has quantitative data,
The statistic data processing apparatus according to any one of claims 1 to 3.

The statistic data comparison unit selects reference data and comparison target data, which are statistic data based on the data included in each combination of the viewpoint and the scale, for at least one viewpoint and a plurality of scales for each viewpoint. Compare and
The abnormality range determination unit determines a range considered to be abnormal among the plurality of ranges of the viewpoint based on the result of comparison by the statistic data comparison unit.
It is a statistic data processing method
The abnormality range determination unit determines a range candidate of the viewpoint that is considered to have an abnormality based on the result of comparison by the statistic data comparison unit.
The abnormality range determination unit considers that there is an abnormality in the range in which the LOF representing the difference between the reference data and the comparison target data is maximum among the plurality of ranges included in the range candidates of the viewpoint. Judge as a range,
Statistic data processing method.

The statistic data comparison unit selects reference data and comparison target data, which are statistic data based on the data included in each combination of the viewpoint and the scale, for at least one viewpoint and a plurality of scales for each viewpoint. Steps to compare and
A step in which the abnormality range determination unit determines a range considered to be abnormal among the plurality of ranges of the viewpoint based on the result of comparison by the statistic data comparison unit.
Is a program that allows a computer to execute
The abnormality range determination unit determines a range candidate of the viewpoint that is considered to have an abnormality based on the result of comparison by the statistic data comparison unit.
The abnormality range determination unit considers that there is an abnormality in the range in which the LOF representing the difference between the reference data and the comparison target data is maximum among the plurality of ranges included in the range candidates of the viewpoint. Judge as a range,
program.