JP6595884B2

JP6595884B2 - Data evaluation apparatus, data evaluation method, and program

Info

Publication number: JP6595884B2
Application number: JP2015213314A
Authority: JP
Inventors: サフビスミタ; 幸生植松; 基至大木; 済央野本
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2015-10-29
Filing date: 2015-10-29
Publication date: 2019-10-23
Anticipated expiration: 2035-10-29
Also published as: JP2017085417A

Description

本発明は、複数のデータソースから取得されるデータの信頼性を評価し、向上させる技術に関連するものである。 The present invention relates to a technique for evaluating and improving the reliability of data acquired from a plurality of data sources.

ネットワークにおける品質情報（例：スループット）は、様々な方法で取得できる。例えば、ネットワークの内部の通信装置に測定器を接続し、当該測定器による測定結果として品質情報を取得できる。また、例えば、外部の調査機関による通信速度の実測結果等を品質情報として取得することもできる。 Quality information (eg, throughput) in the network can be obtained in various ways. For example, a measurement device can be connected to a communication device inside the network, and quality information can be acquired as a measurement result by the measurement device. In addition, for example, a measurement result of a communication speed by an external research organization can be acquired as quality information.

上記のように、各種の情報源（これをデータソースと呼ぶ）から品質情報（これをデータと呼ぶ）を取得することが可能である。 As described above, it is possible to acquire quality information (referred to as data) from various information sources (referred to as data sources).

特開２００８−３１１７２０号公報JP 2008-31720 A

しかしながら、全てのデータソースから得られたデータが信頼できるものであるとは限らない。従って、データソースから得られたデータが信頼できるものであるかどうかをチェックする必要がある。例えば、データの信頼性チェックのために、複数のデータソースから得られたデータ同士を比較して、データ間の相関値が高ければ、いずれのデータも信頼性があると判断し、テータ間の相関値が低ければ、少なくともどちらかのデータは信頼できないと判断することが考えられる。 However, data obtained from all data sources is not always reliable. Therefore, it is necessary to check whether the data obtained from the data source is reliable. For example, for data reliability check, data obtained from multiple data sources are compared, and if the correlation value between the data is high, it is determined that any data is reliable, and If the correlation value is low, it may be determined that at least one of the data is not reliable.

また、品質情報等のデータは、一般に時系列データとして得られるが、当該時系列データにおける全時刻のデータが信頼できるデータであるとは限らず、例えば、特定の時間帯のデータのみが信頼できるデータである場合がある。一例として、夜間などネットワークが混雑している時間帯では、品質情報を安定して計測できないことが多く、信頼できる品質情報が得られるとは限らない。逆に、ネットワークがそれほど混雑していない特定の時間帯では、信頼できるネットワークの品質情報を取得できることが期待できる。 In addition, data such as quality information is generally obtained as time-series data. However, data at all times in the time-series data is not always reliable, for example, only data in a specific time zone can be trusted. May be data. As an example, quality information cannot be stably measured in a time zone where the network is congested such as at night, and reliable quality information is not always obtained. Conversely, it can be expected that reliable network quality information can be acquired in a specific time zone where the network is not so congested.

従って、例えば、データの信頼性チェックのために、複数のデータソースから得られた生のデータ同士を比較した場合には相関が低くても、特定の時間帯で見てみると相関が高い場合が生じることが考えられる。しかしながら、このような時系列データの特性を考慮して、データの信頼性を評価することができる従来技術はなかった。 So, for example, when comparing raw data obtained from multiple data sources to check the reliability of the data, the correlation is low, but the correlation is high when viewed in a specific time zone. May occur. However, there has been no conventional technique that can evaluate the reliability of data in consideration of the characteristics of such time-series data.

なお、時系列のデータに付随する時刻は、当該データの属性の１つである。上記のような問題は、データの属性が時刻であるデータに限らずに生じ得る問題である。 Note that the time associated with time-series data is one of the attributes of the data. The problem as described above is a problem that may occur without being limited to data whose data attribute is time.

本発明は上記の点に鑑みてなされたものであり、あるデータソースから得られたデータの信頼性を、データの属性に係る特性を考慮して評価することを可能とする技術を提供することを目的とする。 The present invention has been made in view of the above points, and provides a technique that makes it possible to evaluate the reliability of data obtained from a certain data source in consideration of the characteristics relating to the attributes of the data. With the goal.

本発明の実施の形態によれば、異なるデータソースから得られた第１のネットワーク品質情報と第２のネットワーク品質情報であって、重複する属性である信号強度を有する第１のネットワーク品質情報と第２のネットワーク品質情報を入力する入力手段と、
前記信号強度に基づいて、前記第１のネットワーク品質情報と前記第２のネットワーク品質情報のそれぞれを、複数の区間に分割し、前記第１のネットワーク品質情報と前記第２のネットワーク品質情報における同じ区間のネットワーク品質情報間での相関係数を算出する算出手段と、
前記複数の区間における区間毎に得られた相関係数の中で、所定の条件を満たす相関係数を決定し、当該相関係数に対応する区間を選択し、当該区間を示す情報を出力する評価手段と
を備えることを特徴とするデータ評価装置が提供される。 According to the embodiment of the present invention, a first network quality information and the second network quality information obtained from different data sources, and the first network quality information having a signal strength which is a duplicate attribute Input means for inputting second network quality information ;
Based on the signal strength , each of the first network quality information and the second network quality information is divided into a plurality of sections, and the same in the first network quality information and the second network quality information . A calculation means for calculating a correlation coefficient between the network quality information of the sections;
Among correlation coefficients obtained for each section in the plurality of sections, a correlation coefficient satisfying a predetermined condition is determined, a section corresponding to the correlation coefficient is selected, and information indicating the section is output. A data evaluation apparatus comprising: an evaluation means.

また、本発明の実施の形態によれば、データ評価装置が実行するデータ評価方法であって、
異なるデータソースから得られた第１のネットワーク品質情報と第２のネットワーク品質情報であって、重複する属性である信号強度を有する第１のネットワーク品質情報と第２のネットワーク品質情報を入力する入力ステップと、
前記信号強度に基づいて、前記第１のネットワーク品質情報と前記第２のネットワーク品質情報のそれぞれを、複数の区間に分割し、前記第１のネットワーク品質情報と前記第２のネットワーク品質情報における同じ区間のネットワーク品質情報間での相関係数を算出する算出ステップと
前記複数の区間における区間毎に得られた相関係数の中で、所定の条件を満たす相関係数を決定し、当該相関係数に対応する区間を選択し、当該区間を示す情報を出力する評価ステップと
を備えることを特徴とするデータ評価方法が提供される。 Moreover, according to the embodiment of the present invention, a data evaluation method executed by the data evaluation device,
A first network quality information obtained from different data sources and the second network quality information, input for inputting the first network quality information and a second network quality information having a signal strength which is a duplicate attribute Steps,
Based on the signal strength , each of the first network quality information and the second network quality information is divided into a plurality of sections, and the same in the first network quality information and the second network quality information . A step of calculating a correlation coefficient between network quality information of the sections, and determining a correlation coefficient satisfying a predetermined condition among the correlation coefficients obtained for each section in the plurality of sections; There is provided an evaluation step including selecting an interval corresponding to a number and outputting information indicating the interval.

本発明の実施の形態によれば、あるデータソースから得られたデータの信頼性を、データの属性に係る特性を考慮して評価することを可能とする技術が提供される。 According to the embodiment of the present invention, there is provided a technique that makes it possible to evaluate the reliability of data obtained from a certain data source in consideration of characteristics related to the attribute of the data.

本発明の実施の形態におけるシステムの全体構成図である。1 is an overall configuration diagram of a system in an embodiment of the present invention. データ評価装置の構成図である。It is a block diagram of a data evaluation apparatus. データ評価装置が実行する処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process which a data evaluation apparatus performs. 入力データの例を示す図である。It is a figure which shows the example of input data. 間隔別時系列データ生成処理、及びスコア算出処理の例を示す図である。It is a figure which shows the example of the time-sequential data production | generation process according to an interval, and a score calculation process. 制約付時系列データ生成処理、及びスコア算出処理の例（日毎＋制約）を示す図である。It is a figure which shows the example (every day + restrictions) of the time series data generation process with restrictions, and a score calculation process. 制約付時系列データ生成処理、及びスコア算出処理の例（週毎＋制約）を示す図である。It is a figure which shows the example (every week + restriction | limiting) of the time series data generation process with restrictions, and a score calculation process. 制約付時系列データ生成処理、及びスコア算出処理の例（月毎＋制約）を示す図である。It is a figure which shows the example (every month + restrictions) of the time series data generation process with restrictions, and a score calculation process.

以下、図面を参照して本発明の実施の形態を説明する。なお、以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。 Embodiments of the present invention will be described below with reference to the drawings. The embodiment described below is only an example, and the embodiment to which the present invention is applied is not limited to the following embodiment.

例えば、本実施の形態では、評価対象のデータとして、品質情報等、属性として時刻を有するデータを用いているが、本発明は、属性として時刻を有するデータに限らず、時刻とは関連のない種々のデータにも適用可能である。 For example, in the present embodiment, data having time as an attribute, such as quality information, is used as data to be evaluated, but the present invention is not limited to data having time as an attribute, and is not related to time. It can be applied to various data.

（システム構成）
図１に、本実施の形態に係るシステムの全体構成図を示す。図１に示すように、本実施の形態におけるシステムは、データ評価装置１００、及び複数のデータソースを有する。図１には、例として、複数のデータソースにおけるデータソース１、データソース２が示されている。 (System configuration)
FIG. 1 shows an overall configuration diagram of a system according to the present embodiment. As shown in FIG. 1, the system in the present embodiment includes a data evaluation apparatus 100 and a plurality of data sources. FIG. 1 shows a data source 1 and a data source 2 in a plurality of data sources as an example.

本実施の形態におけるデータ評価装置１００は、データソース１から得られたデータと、データソース２から得られたデータとを比較することで、これらのデータの評価を行う。具体的な処理内容については後述する。 The data evaluation apparatus 100 in the present embodiment evaluates these data by comparing the data obtained from the data source 1 with the data obtained from the data source 2. Specific processing contents will be described later.

本実施の形態における評価対象のデータは、ネットワークの品質情報等の時系列データであることを想定しているが、これは一例に過ぎず、データ評価装置１００が評価対象とするデータは、特定の種類のデータに限定されない。なお、本実施の形態における「時系列データ」は、等間隔の時刻順に並んだデータのみならず、等間隔でない時刻順に並んだデータも含むものとする。 Although the data to be evaluated in this embodiment is assumed to be time-series data such as network quality information, this is only an example, and the data to be evaluated by the data evaluation device 100 is specified. It is not limited to this type of data. It should be noted that “time-series data” in the present embodiment includes not only data arranged in time order at equal intervals but also data arranged in time order that is not equally spaced.

データソース１、２はそれぞれ、例えば、あるネットワークの品質情報を測定する測定器である。つまり、この場合、データソース１は、当該ネットワークの品質情報を測定する測定器であり、データソース２は、当該ネットワークの品質情報を測定する別の測定器である。 Each of the data sources 1 and 2 is, for example, a measuring device that measures quality information of a certain network. That is, in this case, the data source 1 is a measuring device that measures the quality information of the network, and the data source 2 is another measuring device that measures the quality information of the network.

また、例えば、データソース１が、当該ネットワークの品質情報を当該ネットワーク内部で測定する測定器であり、データソース２が、当該ネットワークに対する外部の調査機関であってもよい。 Further, for example, the data source 1 may be a measuring instrument that measures the quality information of the network inside the network, and the data source 2 may be an external research organization for the network.

本実施の形態の技術は、例えば、一方のデータソースのデータが信頼でき（例：内部測定データ）、他方のデータソースのデータの信頼性が不明である（例：外部調査機関のデータ）場合に適用し、信頼性不明のデータが、信頼できるデータとどの程度類似するか（相関があるか）を評価することで、信頼性不明のデータの信頼性を評価できる。 The technique of this embodiment is, for example, when the data of one data source is reliable (eg, internal measurement data), and the reliability of the data of the other data source is unknown (eg, data from an external research institution) The reliability of data with unknown reliability can be evaluated by evaluating to what extent the data with unknown reliability is similar to the reliable data (whether there is a correlation).

ただし、これは一例であり、両方のデータの信頼性が不明であってもよい。両方のデータの信頼性が不明であっても、例えば、ある時間帯での相関が高ければ、当該時間帯でのデータの信頼性は高いことが推定でき、当該時間帯のデータを抽出することで、その後の分析等に使用できる。また、本実施の形態の技術は、両方のデータの信頼性が高いと考えられる場合にも適用できる。両方のデータの信頼性が高いと考えられる場合でも、全ての時刻で信頼性が高いとは限らず、本実施の形態の技術を適用することで、例えば、信頼性が高い時間帯や集計期間等を特定できる。 However, this is an example, and the reliability of both data may be unknown. Even if the reliability of both data is unknown, for example, if the correlation in a certain time zone is high, it can be estimated that the reliability of the data in that time zone is high, and the data in that time zone should be extracted And can be used for subsequent analysis. Further, the technique of the present embodiment can also be applied when it is considered that the reliability of both data is high. Even when the reliability of both data is considered high, the reliability is not always high at all times. By applying the technology of the present embodiment, for example, a highly reliable time zone or total period Etc. can be specified.

（データ評価装置１００の構成例）
図２に、本実施の形態におけるデータ評価装置１００の構成例を示す。図２に示すように、データ評価装置１００は、入力部１０１、間隔別時系列データ生成部１０２、制約付時系列データ生成部１０３、スコア算出部１０４、スコア評価部１０５、出力部１０６、及びデータ記憶部１０７を備える。各部の機能の詳細については、各部により実行される処理の内容として後述する。各部の概要は以下のとおりである。 (Configuration example of data evaluation apparatus 100)
FIG. 2 shows a configuration example of the data evaluation apparatus 100 in the present embodiment. As shown in FIG. 2, the data evaluation apparatus 100 includes an input unit 101, a time-series data generation unit 102 by interval, a constrained time-series data generation unit 103, a score calculation unit 104, a score evaluation unit 105, an output unit 106, and A data storage unit 107 is provided. Details of the function of each unit will be described later as the contents of processing executed by each unit. The outline of each part is as follows.

入力部１０１は、各データソースからのデータを入力する。間隔別時系列データ生成部１０２は、入力部１００により入力されたデータから、間隔別（例：日毎、週毎、月毎）の時系列データを生成する。「間隔」は、「期間」と称してもよい。 The input unit 101 inputs data from each data source. The time-series data generation unit by interval 102 generates time-series data by interval (eg, daily, weekly, monthly) from the data input by the input unit 100. The “interval” may be referred to as a “period”.

制約付時系列データ生成部１０３は、入力部１００により入力されたデータから、制約（例：１日のうちの特定の時間のデータを利用する制約）の付いた間隔別の時系列データを生成する。なお、本実施の形態では、当該制約は、１日のうちの特定の時間帯、曜日等の「区間」に基づき実施される。 The constrained time-series data generation unit 103 generates time-series data for each interval with a restriction (for example, a restriction that uses data at a specific time in a day) from the data input by the input unit 100. To do. In the present embodiment, the restriction is implemented based on a “time zone” such as a specific time zone and day of the week.

スコア算出部１０４は、間隔別時系列データ生成部１０２により生成された間隔別時系列データ、及び制約付時系列データ生成部１０３により生成された制約付時系列データのそれぞれについて、データソース１とデータソース２との間の相関を計算し、得られた相関係数（関連度合いの大きさ）をスコアとする。 The score calculation unit 104 uses the data source 1 for each of the time series data by interval generated by the time series data generation unit 102 by interval and the restricted time series data generated by the restricted time series data generation unit 103. The correlation with the data source 2 is calculated, and the obtained correlation coefficient (relation degree) is used as a score.

スコア評価部１０５は、スコア算出部１０４により得られたスコアを評価することにより、相関が高い間隔／制約を決定する。出力部１０６は、スコア評価部１０５により決定された間隔／制約の情報や、当該間隔／制約に対応するデータ等を出力する。データ記憶部１０７は、各部の処理中のデータ、処理結果のデータ、スコア等を一時的に記憶したり、スコア評価で使用する閾値等を記憶する。 The score evaluation unit 105 evaluates the score obtained by the score calculation unit 104 to determine an interval / constraint having a high correlation. The output unit 106 outputs information on the interval / restriction determined by the score evaluation unit 105, data corresponding to the interval / restriction, and the like. The data storage unit 107 temporarily stores data being processed by each unit, data of processing results, scores, and the like, and stores threshold values and the like used for score evaluation.

本実施の形態に係るデータ評価装置１００は、例えば、１つ又は複数のコンピュータに、本実施の形態で説明する処理内容を記述したプログラムを実行させることにより実現可能である。すなわち、データ評価装置１００が有する機能は、当該コンピュータに内蔵されるＣＰＵやメモリ、ハードディスクなどのハードウェア資源を用いて、データ評価装置１００で実施される処理に対応するプログラムを実行することによって実現することが可能である。上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 The data evaluation apparatus 100 according to the present embodiment can be realized, for example, by causing one or a plurality of computers to execute a program describing the processing content described in the present embodiment. That is, the function of the data evaluation apparatus 100 is realized by executing a program corresponding to the process executed by the data evaluation apparatus 100 using hardware resources such as a CPU, a memory, and a hard disk built in the computer. Is possible. The above-described program can be recorded on a computer-readable recording medium (portable memory or the like), stored, or distributed. It is also possible to provide the program through a network such as the Internet or electronic mail.

（データ評価装置１００が実行する処理手順）
図３は、データ評価装置１００が実行する処理の手順を示すフローチャートである。図３に示すフローチャートの手順に沿って、以下、データ評価装置１００が実行する処理内容を詳細に説明する。 (Processing procedure executed by the data evaluation apparatus 100)
FIG. 3 is a flowchart showing a procedure of processing executed by the data evaluation apparatus 100. The processing contents executed by the data evaluation device 100 will be described in detail below along the procedure of the flowchart shown in FIG.

＜ステップＳ１０１：データ入力＞
まず、データ評価装置１００の入力部１０１により、データソース１とデータソース２のそれぞれからデータを入力する。以下、データソース１のデータをＤＡＴＡ_１とし、データソース２のデータをＤＡＴＡ_２とする。 <Step S101: Data Input>
First, data is input from each of the data source 1 and the data source 2 by the input unit 101 of the data evaluation apparatus 100. Hereinafter, the data of the data source 1 is DATA ₁ and the data of the data source 2 is DATA ₂ .

ＤＡＴＡ_１とＤＡＴＡ_２はそれぞれ、例えば、ネットワークの品質情報と、当該品質情報を取得した時刻を示す時刻情報とを含む時系列データである。当該時刻情報は、品質の計測を行った実際の時刻でもよいし、例えば「１時間毎の平均品質」がデータソースから提供されるような場合においては、当該時刻情報は、時間（０時、１時、...等）でもよい。また、これら以外の情報でもよい。 Each of DATA ₁ and DATA ₂ is time-series data including, for example, network quality information and time information indicating the time when the quality information is acquired. The time information may be the actual time when the quality is measured. For example, in the case where “average quality every hour” is provided from the data source, the time information is the time (0:00, 1 o'clock, etc.). Information other than these may be used.

図４に、入力データの一例を示す。図４は、ＤＡＴＡ_１とＤＡＴＡ_２がともに、１月１日（１／１）から６月３０日（６／３０）までの時間毎のデータである場合を示している。例として、１／１の「２」の下の欄には、１／１のＡＭ２時〜ＡＭ３時における平均品質等の値が存在する。なお、ＤＡＴＡ_１とＤＡＴＡ_２の区別を分かり易くするために、ＤＡＴＡ_１については実線の表で示し、ＤＡＴＡ_２については点線の表で示している。以下、他の図でも同様である。 FIG. 4 shows an example of input data. FIG. 4 shows a case where both DATA ₁ and DATA ₂ are data for each hour from January 1 (1/1) to June 30 (6/30). As an example, in the column below “2” of 1/1, there are values such as average quality at 1/1 AM2 to AM3. In order to make the distinction between DATA ₁ and DATA ₂ easier to understand, DATA ₁ is indicated by a solid line table, and DATA ₂ is indicated by a dotted line table. The same applies to the other drawings.

また、入力部１０１あるいは他の機能部が、入力されたデータにおける特異値を削除して、特異値を削除した入力データを以降の処理に用いることとしてもよい。 Alternatively, the input unit 101 or other functional unit may delete the singular value in the input data and use the input data from which the singular value is deleted for subsequent processing.

＜ステップＳ１０２：間隔別時系列データ生成＞
次に、間隔別時系列データ生成部１０２が、ステップＳ１０１において入力されたＤＡＴＡ_１とＤＡＴＡ_２のそれぞれについて、間隔別時系列データを生成する。用いる間隔は、例えば、日、週、月等である。つまり、ＤＡＴＡ_１とＤＡＴＡ_２のそれぞれについて、間隔別時系列データとして、日毎、週毎、月毎等の時系列データを生成する。例として、週毎の場合、ＤＡＴＡ_１における開始時刻から１週間の間のデータの平均値（代表値でもよい）を計算し、以降、１週間毎の平均値を計算し、時系列データとする。ＤＡＴＡ_２も同様である。 <Step S102: Generate Time Series Data by Interval>
Next, Interval time-series data generating unit 102, for each of the DATA ₁ and DATA ₂ input at step S101, generates the time-series data by Interval. The interval used is, for example, day, week, month, etc. That is, for each of DATA ₁ and DATA ₂ , time series data such as daily, weekly, monthly, etc. is generated as time series data by interval. As an example, in the case of every week, an average value (may be a representative value) of data for one week from the start time in DATA ₁ is calculated, and thereafter, an average value for each week is calculated to obtain time series data. . The same applies to DATA ₂ .

ここで、上記の例では、間隔の種類が３であるが、間隔の種類の数は３に限られない。間隔の種類の数をＮとし、間隔の種類を示すインデックスをｋとして、間隔別時系列データをｔｓ_ｋと表すことにすると、ステップＳ１０２では、ＤＡＴＡ_１とＤＡＴＡ_２のそれぞれについて、ｔｓ_１，ｔｓ_２，...．ｔｓ_Ｎが生成される。ここではこれを、ＤＡＴＡ_１については、ＤＡＴＡ_１ｔｓ_１，ＤＡＴＡ_１ｔｓ_２，...．ＤＡＴＡ_１ｔｓ_Ｎと表し、ＤＡＴＡ_２については、ＤＡＴＡ_２ｔｓ_１，ＤＡＴＡ_２ｔｓ_２，...．ＤＡＴＡ_２ｔｓ_Ｎと表す。 Here, in the above example, the type of interval is 3, but the number of types of interval is not limited to 3. Assuming that the number of interval types is N, the index indicating the interval type is k, and the time-series data by interval is expressed as ts _k , in step S102, for each of DATA ₁ and DATA ₂ , ts ₁ , ts ₂ , ... ts _N is generated. Here, for DATA ₁ , DATA ₁ ts ₁ , DATA ₁ ts ₂ ,. DATA ₁ represents a ts _N, for DATA ₂ _{_{_{are, DATA 2 ts 1, DATA 2}}} ts 2, .... DATA ₂ ts _N.

図５は、ＤＡＴＡ_１とＤＡＴＡ_２のそれぞれから、日毎、週毎、月毎の時系列データを生成する場合の生成データの例を示す。図５に示すように、日毎の時系列データとして、ＤＡＴＡ_１ｔｓ_１とＤＡＴＡ_２ｔｓ_１が生成され、週毎の時系列データとして、ＤＡＴＡ_１ｔｓ_２とＤＡＴＡ_２ｔｓ_２が生成され、月毎の時系列データとして、ＤＡＴＡ_１ｔｓ_３とＤＡＴＡ_２ｔｓ_３が生成される。 FIG. 5 shows an example of generated data when generating time-series data for each day, week, and month from each of DATA ₁ and DATA ₂ . As shown in FIG. 5, DATA ₁ ts ₁ and DATA ₂ ts ₁ are generated as time series data for each day, and DATA ₁ ts ₂ and DATA ₂ ts ₂ are generated as time series data for each week. DATA ₁ ts ₃ and DATA ₂ ts ₃ are generated as time series data.

＜ステップＳ１０３：間隔別時系列データについてのスコア算出＞
次に、スコア算出部１０４が、ステップＳ１０２で使用した間隔毎に、ＤＡＴＡ_１の間隔別時系列データとＤＡＴＡ_２の間隔別時系列データとを比較する（つまり、相関を計算する）ことにより、相関係数を算出し、これをスコアを算出する。算出されたスコアはデータ記憶部１０７に格納される。相関の計算については、２つの時系列データ間の相関を算出する一般的な相関関数を使用することができる。算出されたスコア（例えば、−１〜１の値）が大きいほど、比較した時系列データ間の類似性が高いことを示す。 <Step S103: Score Calculation for Time Series Data by Interval>
Next, the score calculation unit 104 compares the time-series data for each interval of DATA _{1 and} the time-series data for each interval of DATA ₂ for each interval used in step S102 (that is, calculates a correlation), A correlation coefficient is calculated, and a score is calculated from this. The calculated score is stored in the data storage unit 107. For the calculation of the correlation, a general correlation function for calculating the correlation between the two time series data can be used. It shows that the similarity between the time series data compared is so high that the calculated score (for example, the value of -1 to 1) is large.

より具体的には、図５に示すように、日毎、週毎、月毎の時系列データの場合、ＤＡＴＡ_１とＤＡＴＡ_２の日毎の時系列データであるＤＡＴＡ_１ｔｓ_１とＤＡＴＡ_２ｔｓ_１とを比較してスコア（ｐ１）が算出される。また、ＤＡＴＡ_１とＤＡＴＡ_２の週毎の時系列データであるＤＡＴＡ_１ｔｓ_２とＤＡＴＡ_２ｔｓ_２とを比較してスコア（ｐ２）が算出される。更に、ＤＡＴＡ_１とＤＡＴＡ_２の月毎の時系列データであるＤＡＴＡ_１ｔｓ_３とＤＡＴＡ_２ｔｓ_３とを比較してスコア（ｐ３）が算出される。 More specifically, as shown in FIG. 5, daily, weekly, if the time-series data of each month, the DATA ₁ ts ₁ and DATA ₂ ts ₁ is a time series data of daily DATA ₁ and DATA ₂ Are compared to calculate a score (p1). Also, the score (p2) is calculated by comparing DATA ₁ ts ₂ and DATA ₂ ts ₂ which are the time series data of DATA ₁ and DATA ₂ for each week. Furthermore, the score (p3) is calculated by comparing DATA ₁ ts ₃ and DATA ₂ ts ₃ which are time series data of DATA ₁ and DATA ₂ every month.

図３〜図５では、間隔の種類の数Ｎが３なので、３つのスコアｐ１、ｐ２、ｐ３が得られるが、一般にはＮ個のスコアを得る。 In FIG. 3 to FIG. 5, since the number N of types of intervals is 3, three scores p1, p2, and p3 are obtained. In general, N scores are obtained.

＜ステップＳ１０４：制約付時系列データ生成＞
次に、制約付時系列データ生成部１０３が、入力データから制約付時系列データを生成する。ここでは、データソース１のＤＡＴＡ_１とデータソース２のＤＡＴＡ_２との間の共通の属性を予め定めておき、当該属性を用いて、ＤＡＴＡ_１とＤＡＴＡ_２のぞれぞれについての制約付時系列データを生成する。例えば、ＤＡＴＡ_１には３５個の属性があり、ＤＡＴＡ_２には１２個の属性があり、共通の属性が３つであるとすれば、当該３つの属性のそれぞれで制約付時系列データを生成することができる。 <Step S104: Constrained time series data generation>
Next, the constrained time series data generation unit 103 generates constrained time series data from the input data. Here, a common attribute between DATA ₁ of data source ₁ and DATA _{2 of} data source 2 is determined in advance, and when the constraint is applied to each of DATA ₁ and DATA ₂ using the attribute. Generate series data. For example, if DATA ₁ has 35 attributes, DATA ₂ has 12 attributes, and there are three common attributes, constrained time series data is generated for each of the three attributes. can do.

本実施の形態における共通の属性とは、例えば、時系列データにおける時に関する属性であり、例えば、１日の中の時間、１日の中の時間帯（例：６時間毎の時間帯）、曜日、平日／週末等がある。また、データの内容に基づき属性としてもよい。例えば、データが品質としての信号強度を有する場合、信号強度を属性とすることができる。この場合、例えば、信号強度の強さを段階（区間）に分けて、後述するチャンクを生成する。また、共通の属性は、データソースから得たデータに含まれている属性であってもよいし、データソースから得たデータを処理することで付加した属性であってもよい。 Common attributes in the present embodiment are, for example, attributes related to time in time-series data. For example, time in a day, time zone in a day (eg, time zone every 6 hours), There are days of the week, weekdays / weekends, etc. Moreover, it is good also as an attribute based on the content of data. For example, when data has signal strength as quality, signal strength can be used as an attribute. In this case, for example, the strength of the signal strength is divided into stages (sections), and a chunk described later is generated. The common attribute may be an attribute included in data obtained from the data source, or may be an attribute added by processing data obtained from the data source.

より具体的には、例えば、共通の属性を「１日の中の時間」とすると、制約付時系列データ生成部１０３は、ＤＡＴＡ_１とＤＡＴＡ_２のそれぞれについて、当該データを時間毎のセグメントに分ける。つまり、この場合、ＡＭ０時〜ＡＭ１時のデータ、ＡＭ１時〜ＡＭ２時のデータ、...．ＰＭ１０時〜ＰＭ１１時のデータ、ＰＭ１１時〜ＡＭ０時のデータ、のように、２４個のセグメントに分ける。なお、データの全体が、例えば６ヵ月間のデータであるとすると、例えば「ＡＭ０時〜ＡＭ１時のデータ」には、６ヵ月間の各日の「ＡＭ０時〜ＡＭ１時のデータ」が含まれることになる。 More specifically, for example, when the common attribute is “time in one day”, the constrained time-series data generation unit 103 sets the data for each of DATA ₁ and DATA ₂ into segments for each hour. Divide. That is, in this case, data from AM 0 to AM 1, data from AM 1 to AM 2,. The data is divided into 24 segments, such as data from 10 PM to 11 PM and data from 11 PM to AM 0. If the entire data is, for example, data for 6 months, for example, “data from AM0 to AM1” includes “data from AM0 to AM1” for each day for 6 months. It will be.

他の例として、例えば、共通の属性が「曜日」であるとすると、ＤＡＴＡ_１とＤＡＴＡ_２のそれぞれについて、当該データは、月曜日のデータ、火曜日のデータ、...日曜日のデータ、のように７つのセグメントに分けられる。データの全体が、例えば６ヵ月間のデータであるとすると、例えば「月曜日のデータ」には、６ヵ月間の各月曜日のデータが含まれることになる。 As another example, for example, if the common attribute is “day of the week”, for each of DATA ₁ and DATA ₂ , the data is as follows: Monday data, Tuesday data,. Divided into 7 segments. If the entire data is, for example, data for six months, for example, “Monday data” includes data for each Monday for six months.

本実施の形態では、上記のようにしてデータを分割して得たセグメントを「チャンク」と呼び、ｃで表わす。そして、例えば、ＡＭ０時〜ＡＭ１時のデータ、ＡＭ１時〜ＡＭ２時のデータ、...．ＰＭ１０時〜ＰＭ１１時のデータ、ＰＭ１１時〜ＡＭ０時のデータ、のように、２４個のチャンクに分ける場合、これらのチャンクは、時間の古い順に、ｃ_１、ｃ_２、...、ｃ_２４と表現する。一般に、ｍ個のチャンクに分ける場合、ｃ_１、ｃ_２、...、ｃ_ｍと表すことができる。 In the present embodiment, a segment obtained by dividing data as described above is called a “chunk” and is represented by c. And, for example, data from AM0 to AM1, data from AM1 to AM2,. When the data is divided into 24 chunks, such as data from PM10: 00 to PM11: 00 and data from PM11: 00 to AM0, these chunks are c ₁ , c ₂ ,..., C _{24 in} chronological order. It expresses. In general, when divided into m number of _chunks, c 1, _{c 2,} ..., it can be expressed as _{c m.}

上記のように、ＤＡＴＡ_１とＤＡＴＡ_２のそれぞれをチャンクに分けた後、各チャンクについて集約を行う。なお、集約をせずにチャンクのデータ間の相関を取ることとしてもよい。 As described above, after dividing each of DATA ₁ and DATA ₂ into chunks, aggregation is performed for each chunk. Note that the correlation between chunk data may be obtained without aggregation.

本実施の形態における集約は、ステップＳ１０２で説明した間隔別時系列データを生成することにより実行する。例えば、データの全体が６ヵ月間のデータであるとして、属性が「１日の中の時間」で、データを２４個のチャンクに分けた場合において、各チャンク（例：「ＡＭ０時〜ＡＭ１時」の６ヵ月間のデータ）に対し、ステップＳ１０２で説明したように、日毎、週毎、月毎等の時系列データを生成する。 Aggregation in the present embodiment is executed by generating time-series data by interval described in step S102. For example, assuming that the entire data is data for 6 months and the attribute is “time in a day” and the data is divided into 24 chunks, each chunk (eg, “AM0 hour to AM1 hour”) As shown in step S102, time-series data such as daily, weekly, monthly, etc. is generated.

例えば、属性が「１日の中の時間」である場合において、ＤＡＴＡ_１のチャンクｃ_１（「ＡＭ０時〜ＡＭ１時」）における日毎（ｔｓ_１）の時系列データは、ＤＡＴＡ_１ｔｓ_１ｃ_１と表わされる。より一般に、間隔の種類がＮ、チャンクの数がｍであるとすると、ステップＳ１０４において、制約付時系列データ生成部１０３は、ＤＡＴＡ_１から「ＤＡＴＡ_１ｔｓ_１ｃ_１、ＤＡＴＡ_１ｔｓ_１ｃ_２、...．ＤＡＴＡ_１ｔｓ_１ｃ_ｍ、ＤＡＴＡ_１ｔｓ_２ｃ_１、ＤＡＴＡ_１ｔｓ_２ｃ_２、...．ＤＡＴＡ_１ｔｓ_２ｃ_ｍ、...．ＤＡＴＡ_１ｔｓ_Ｎｃ_１、ＤＡＴＡ_１ｔｓ_Ｎｃ_２、...．ＤＡＴＡ_１ｔｓ_Ｎｃ_ｍ」を生成し、ＤＡＴＡ_２から「ＤＡＴＡ_２ｔｓ_１ｃ_１、ＤＡＴＡ_２ｔｓ_１ｃ_２、...．ＤＡＴＡ_２ｔｓ_１ｃ_ｍ、ＤＡＴＡ_２ｔｓ_２ｃ_１、ＤＡＴＡ_２ｔｓ_２ｃ_２、...．ＤＡＴＡ_２ｔｓ_２ｃ_ｍ、...．ＤＡＴＡ_２ｔｓ_Ｎｃ_１、ＤＡＴＡ_２ｔｓ_Ｎｃ_２、...．ＤＡＴＡ_２ｔｓ_Ｎｃ_ｍ」を生成する。また、上記の例は、属性が１つの場合であるが、複数の属性（Ｍ個とする）についての制約付時系列データを生成する場合は、上記のデータのセットがＭ個生成される。 For example, when the attribute is a "time of day", time-series data of daily (ts ₁₎ in the chunk _{c 1} DATA ₁ ( "o'clock AM0 o'clock ~AM1") _is, DATA 1 _ts 1 _{c 1} It is expressed as More generally, assuming that the interval type is N and the number of chunks is m, in step S104, the constrained time-series data generation unit 103 converts DATA ₁ to “DATA ₁ ts ₁ c ₁ , DATA ₁ ts ₁ c _2. , ... DATA ₁ ts ₁ _cm , DATA ₁ ts ₂ c ₁ , DATA ₁ ts ₂ c ₂ , ... DATA ₁ ts ₂ _cm , ... DATA ₁ ts _N c ₁ , DATA ₁ _{_{_{ts N c 2, .... DATA 1}}} ts N c m "generates, from DATA _2" _{_{_{DATA 2 ts 1 c 1, DATA}}} 2 ts 1 c 2, .... DATA 2 ts 1 c m, DATA ₂ ts ₂ c ₁ , DATA ₂ ts ₂ c ₂ , ... DATA ₂ ts ₂ _cm , ... DATA ₂ ts _N c ₁ , DATA ₂ ts _N c ₂ , ... DATA ₂ ts _N c _m "the It is formed. The above example is a case where there is one attribute. However, when generating constrained time-series data for a plurality of attributes (M), M sets of the above data are generated.

図６〜図８は、属性が「１日の中の時間」である場合における制約付時系列データの例を示している。図６は、チャンク分割した後に、日毎に集約した時系列データを示す。例えば、ＤＡＴＡ_１ｔｓ_１ｃ_１は、ＤＡＴＡ_１のＡＭ０時〜ＡＭ１時のチャンクにおけるデータに対して日毎に平均値を求めて時系列データとしたものである。 6 to 8 show examples of constrained time series data when the attribute is “time in one day”. FIG. 6 shows time-series data aggregated every day after the chunk division. For example, DATA ₁ ts ₁ c ₁ is obtained by obtaining an average value for each day of data in the chunk of DATA ₁ from AM0 to AM1 and making it time-series data.

図７は、チャンク分割した後に、週毎に集約した時系列データを示す。例えば、ＤＡＴＡ_１ｔｓ_２ｃ_１は、ＤＡＴＡ_１のＡＭ０時〜ＡＭ１時のチャンクにおけるデータに対して週毎に平均値を求めて時系列データとしたものである。また、図８は、チャンク分割した後に、月毎に集約した時系列データを示す。例えば、ＤＡＴＡ_１ｔｓ_３ｃ_１は、ＤＡＴＡ_１のＡＭ０時〜ＡＭ１時のチャンクにおけるデータに対して月毎に平均値を求めて時系列データとしたものである。 FIG. 7 shows time-series data aggregated every week after chunk division. For example, DATA ₁ ts ₂ c ₁ is obtained by obtaining an average value for each week of the data in the chunk of DATA ₁ from AM0 to AM1 and making it time-series data. FIG. 8 shows time-series data aggregated every month after chunk division. For _example, DATA ₁ _ts 3 _c 1 is obtained by the time-series data to determine the average value for each month for the data in the chunk at AM0 o'clock ~AM1 of DATA _1.

＜ステップＳ１０５：制約付時系列データについてのスコア算出＞
次に、スコア算出部１０４が、ステップＳ１０４で得られた制約付時系列データ毎に、データソース間での相関を計算して、スコアを算出する。つまり、ＤＡＴＡ_１ｔｓ_１ｃ_１とＤＡＴＡ_２ｔｓ_１ｃ_１との間のスコア、ＤＡＴＡ_１ｔｓ_１ｃ_２とＤＡＴＡ_２ｔｓ_２ｃ_２との間のスコア、...．ＤＡＴＡ_１ｔｓ_１ｃ_ｍとＤＡＴＡ_２ｔｓ_１ｃ_ｍとの間のスコア、ＤＡＴＡ_１ｔｓ_２ｃ_１とＤＡＴＡ_２ｔｓ_２ｃ_１との間のスコア、......、ＤＡＴＡ_１ｔｓ_Ｎｃ_ｍとＤＡＴＡ_２ｔｓ_Ｎｃ_ｍとの間のスコアを算出する。相関の計算方法はステップＳ１０３での計算方法と同じである。 <Step S105: Score Calculation for Restricted Time Series Data>
Next, the score calculation unit 104 calculates the score by calculating the correlation between the data sources for each constrained time-series data obtained in step S104. In other _words, the score between the DATA ₁ _ts 1 _c 1 and DATA ₂ ts _₁ c _{_1,} the score between the DATA 1 _ts 1 _{c 2} and DATA _{_₂} ts ₂ c _2, .... Score between DATA ₁ ts ₁ _cm and DATA ₂ ts ₁ _cm , score between DATA ₁ ts ₂ c ₁ and DATA ₂ ts ₂ c ₁ , ..., DATA ₁ ts _N c The score between _m and DATA ₂ ts _N _cm is calculated. The correlation calculation method is the same as the calculation method in step S103.

そして、間隔毎にスコアが最大となるチャンクを求め、そのスコアと、当該スコアが得られたチャンクの情報（どの区間のチャンクかを示す情報）をデータ記憶部１０７に格納する、例えば、属性が１日の時間の場合に、日毎については、ＤＡＴＡ_１ｔｓ_１ｃ_１とＤＡＴＡ_２ｔｓ_１ｃ_１との間のスコア、ＤＡＴＡ_１ｔｓ_１ｃ_２とＤＡＴＡ_２ｔｓ_２ｃ_２との間のスコア、...．ＤＡＴＡ_１ｔｓ_１ｃ_ｍとＤＡＴＡ_２ｔｓ_１ｃ_２４との間のスコア、のように、２４個のスコアが得られ、このうちの最大値をとるチャンク（例：ＰＭ３時〜ＰＭ４時のチャンク）を特定し、その情報とそのスコアをデータ記憶部１０７に格納する。週毎のデータ、月毎のデータについても同様である。 Then, the chunk having the maximum score is obtained for each interval, and the score and the information of the chunk from which the score is obtained (information indicating which section of the chunk) is stored in the data storage unit 107. For example, the attribute is For a time of day, for each day, a score between DATA ₁ ts ₁ c ₁ and DATA ₂ ts ₁ c ₁ , a score between DATA ₁ ts ₁ c ₂ and DATA ₂ ts ₂ c ₂ , .... 24 scores are obtained, such as a score between DATA ₁ ts ₁ _cm and DATA ₂ ts ₁ c _24, and the maximum value of these is obtained (for example, chunk at 3 PM to 4 PM) And the information and the score thereof are stored in the data storage unit 107. The same applies to weekly data and monthly data.

例えば、図６に示すように、日毎の時系列データについて、チャンク毎にスコアを求め、最大値ｐ４を得る。また、図７に示す例では、週毎の時系列データについて、チャンク毎にスコアを求め、最大値ｐ５を得る。また、図８に示す例では、月毎の時系列データについて、チャンク毎にスコアを求め、最大値ｐ６を得る。 For example, as shown in FIG. 6, for the time-series data for each day, a score is obtained for each chunk to obtain a maximum value p4. In the example shown in FIG. 7, for the time-series data for each week, a score is obtained for each chunk to obtain the maximum value p5. Further, in the example shown in FIG. 8, a score is obtained for each chunk of the time series data for each month, and the maximum value p6 is obtained.

図６〜図８は、間隔の種類の数Ｎが３なので、３つの最大値ｐ４、ｐ５、ｐ６が得られるが、一般にはＮ個の最大値を得る。なお、Ｎは１であってもよい。 In FIGS. 6 to 8, since the number N of types of intervals is 3, three maximum values p4, p5, and p6 are obtained. In general, N maximum values are obtained. N may be 1.

また、上記の例は、属性が１つの場合であるが、複数の属性（Ｍ個とする）についての制約付時系列データを生成する場合は、スコアのセットがＭ個生成されるので、最大値のスコアは、Ｎ×Ｍ個得られる。 In addition, the above example is a case where there is one attribute. However, when generating constrained time-series data for a plurality of attributes (M), since M score sets are generated, the maximum N × M value scores are obtained.

なお、ステップＳ１０３で説明した間隔別時系列データについてのスコア算出を、間隔別時系列データと制約付時系列データを生成した後に、制約付時系列データについてのスコア算出とともに行うこととしてもよい。また、ステップＳ１０３で説明した間隔別時系列データについてのスコア算出を行わないこととしてもよい。この場合、ｐ１〜ｐ６で説明した例において、ｐ４、ｐ５、ｐ６のみで下記の評価が行われることになる。また、この場合に、Ｎが１であれば、１つのスコア（例：ｐ４）のみに対して下記の評価が行われることになる。 The score calculation for the time series data by intervals described in step S103 may be performed together with the score calculation for the time series data with constraints after generating the time series data by intervals and the time series data with constraints. Moreover, it is good also as not performing the score calculation about the time-sequential data classified by interval demonstrated by step S103. In this case, in the example described in p1 to p6, the following evaluation is performed using only p4, p5, and p6. In this case, if N is 1, the following evaluation is performed only for one score (eg, p4).

＜ステップＳ１０６：スコア評価＞
次に、スコア評価部１０５が、これまでの処理によりデータ記憶部１０７に格納されたスコアを評価する。 <Step S106: Score Evaluation>
Next, the score evaluation unit 105 evaluates the score stored in the data storage unit 107 by the processing so far.

これまでに説明したとおり、間隔の数がＮ（例：日毎、週毎、月毎の場合、Ｎ＝３）、属性の数がＭ（例：「１日の中の時間」のみを使用する場合、Ｍ＝１）である場合に、ステップＳ１０３ではＮ個のスコアが得られ、ステップＳ１０６では、評価対象とするスコアとして、Ｎ×Ｍ個のスコアが得られるので、全体のスコアの数はＮ＋Ｎ×Ｍ（＝Ｎ（１＋Ｍ））である。 As described above, the number of intervals is N (eg, daily, weekly, monthly, N = 3), and the number of attributes is M (eg, “time in one day”). In this case, if M = 1), N scores are obtained in step S103, and N × M scores are obtained as evaluation target scores in step S106. Therefore, the total number of scores is N + N × M (= N (1 + M)).

スコア評価部１０５は、Ｎ（１＋Ｍ）個のスコアの中から最大のスコアを選択する。選択されたスコア、及び、当該スコアが得られた間隔及びチャンクの情報（制約の情報）が出力の対象となる。 The score evaluation unit 105 selects the maximum score from N (1 + M) scores. The selected score, and the interval and chunk information (constraint information) from which the score is obtained are output targets.

また、ある閾値（あるいはベンチマーク）を定めておき、最大のスコアが当該閾値を超えるかどうかを判定し、超える場合に、当該最大のスコア、当該スコアに係る間隔及びチャンクの情報（制約の情報）を出力の対象として決定してもよい。 Also, a certain threshold value (or benchmark) is determined, and it is determined whether or not the maximum score exceeds the threshold value. If so, the maximum score, the interval related to the score, and chunk information (constraint information) May be determined as an output target.

また、Ｎ（１＋Ｍ）個のスコアを降順（大きいものから小さいもの）にソートして、閾値（ベンチマーク）を超える全てのスコア、及び当該スコアに係る間隔及びチャンクの情報（制約の情報）を出力の対象として決定してもよい。また、予め所定数Ｐを定め、閾値（ベンチマーク）を超える全てのスコアの中で、上位Ｐ個のスコアを出力対象として決定してもよい。 In addition, N (1 + M) scores are sorted in descending order (from largest to smallest), and all scores that exceed the threshold (benchmark), and interval and chunk information (constraint information) related to the score are output. It may be determined as a target of. Alternatively, a predetermined number P may be determined in advance, and the top P scores among all scores exceeding the threshold (benchmark) may be determined as output targets.

図５〜図８に示した例（間隔＝（日毎、週毎、月毎）、属性＝（１日の中の時間））においては、ｐ１、ｐ２、...．ｐ６の６個（Ｎ（１＋Ｍ）＝３（１＋１）＝６）のスコアが得られるので、例えば、これらのうちの最大のスコア（あるいは、閾値（ベンチマーク）を超えるスコア）が出力対象のスコアとして決定される。 5 to 8 (interval = (daily, weekly, monthly), attribute = (time in one day)), p1, p2,. Since six scores of p6 (N (1 + M) = 3 (1 + 1) = 6) are obtained, for example, the maximum score (or the score exceeding the threshold (benchmark)) among these is the output target score. It is determined.

なお、上記の例では、正の相関が大きなスコアを出力対象とすることを想定しているが、これは例であり、負の方向に相関が大きなスコア（つまり、負の相関係数の絶対値が大きなスコア）を出力対象とすることとしてもよい。 In the above example, it is assumed that a score with a large positive correlation is output, but this is an example, and a score with a large correlation in the negative direction (that is, the absolute value of the negative correlation coefficient) A score with a large value) may be output.

＜ステップＳ１０７：データ出力＞
ステップＳ１０７では、出力部１０６が、ステップＳ１０６において出力対象として決定されたスコア、及び、当該スコアが得られた間隔、及びチャンクの情報を出力する。これらに加えて、当該スコアが得られた時系列データを出力してもよい。この場合、例えば、図５〜図８に示した例において、週毎のＰＭ３時〜ＰＭ４時のチャンクにおけるスコアが出力対象として決定された場合に、ＤＡＴＡ_１ｔｓ_２ｃ_１６とＤＡＴＡ_２ｔｓ_２ｃ_１６を出力する。また、この場合、週毎に集約する前のチャンクデータを出力してもよい。なお、入力データや、処理の過程で得られた時系列データ等はデータ記憶部１０７に格納されているので、ここからデータを読み出すことで出力することができる。ただし、使用しないデータはデータ記憶部１０７から削除することとしてもよい。 <Step S107: Data output>
In step S107, the output unit 106 outputs the score determined as the output target in step S106, the interval at which the score is obtained, and the chunk information. In addition to these, time-series data from which the score is obtained may be output. In this case, for example, in the example shown in FIG. 5 to FIG. 8, when the score in the chunk from weekly PM3 to PM4 is determined as an output target, DATA ₁ ts ₂ c ₁₆ and DATA ₂ ts ₂ c ₁₆ is output. In this case, chunk data before aggregation for each week may be output. Note that since input data, time-series data obtained in the course of processing, and the like are stored in the data storage unit 107, they can be output by reading the data from here. However, unused data may be deleted from the data storage unit 107.

また、これまでに説明した処理において、データソース間でデータ（生データでもよいし、間隔別時系列データでもよいし、制約付時系列データでもよい）を時間方向にずらして相関を取った場合に、高い相関係数（例：ベンチマークを超える相関係数）が得られる場合には、当該ずらした時間長を出力してもよい。 In the processing described so far, when data is correlated between data sources (raw data, time-series data by interval, or time-series data with constraints) may be shifted in the time direction. In addition, when a high correlation coefficient (eg, a correlation coefficient exceeding the benchmark) is obtained, the shifted time length may be output.

ここで、時間方向にずらすとは、例えば、生データであれば、比較するデータソース間における一方の取得データの時刻（タイムスタンプ）を、入力部１０１により、所定の時間だけ増加（又は減少）させ、当該所定の時間だけ増加（又は減少）させたタイムスタンプの付いたデータを入力データとして、これまでに説明した処理を行うことである。また、間隔別時系列データの場合であれば、間隔別時系列データ生成部１０２により、例えば、間隔（例：日、図３のＤＡＴＡ_１ｔｓ_１）を単位として、一方のデータの時刻を所定の時間だけ増加（又は減少）させ、当該増加（又は減少）を行ったデータと、他方のデータとの間で相関係数を計算する。一例として、図３のＤＡＴＡ_１ｔｓ_１の時刻を２日間だけずらす場合、ずらす前の１／１の欄のデータが、１／３の欄に移り、ずらす前の１／２の欄のデータが、１／４の欄に移り、といったようにずらす。もしくは、データの位置を変えずに、１／３を１／１に変更するといったように、時刻のほうを変えてもよい。他の間隔、また、制約付時系列データについても同様である。 Here, shifting in the time direction means that, for example, in the case of raw data, the time (time stamp) of one acquired data between the data sources to be compared is increased (or decreased) by a predetermined time by the input unit 101. The processing described so far is performed by using, as input data, data with a time stamp that is increased (or decreased) by the predetermined time. Further, in the case of time-series data by interval, the time-series data generation unit by interval 102 sets the time of one data to a predetermined time, for example, with an interval (eg, day, DATA ₁ ts _{1 in} FIG. 3) as a unit. The correlation coefficient is calculated between the data that has been increased (or decreased) and the data that has been increased (or decreased) and the other data. As an example, when the time of DATA ₁ ts ₁ in FIG. 3 is shifted by two days, the data in the 1/1 column before the shift is moved to the 1/3 column, and the data in the 1/2 column before the shift is , Move to the quarter column, and so on. Alternatively, the time may be changed such that 1/3 is changed to 1/1 without changing the data position. The same applies to other intervals and constrained time-series data.

時間方向でずらす処理を含める場合、例えば、複数の方向付きの時間長（例：データソース１について、A時間増加、A時間減少、０時間増加／減少、B時間増加、B時間減少）を予め用意しておき、時間長毎に処理（０時間増加／減少、つまり、これまでに説明したずらさない場合の処理を含む）を行って、全ての処理により得られたスコア全体に対して、ステップＳ１０６で説明したスコア評価処理を実施する。出力については、出力対象として決定したスコアが得られた間隔及びチャンクの情報（制約の情報）とともに、ずらす処理で用いた時間長も出力する。 When including processing to shift in the time direction, for example, time lengths with a plurality of directions (for example, for data source 1, increase A time, decrease A time, increase / decrease 0 time, increase B time, decrease B time) Prepare and perform processing for each time length (increase / decrease in 0 hours, that is, including processing in the case of not shifting as described above), and perform steps for the entire score obtained by all processing The score evaluation process described in S106 is performed. As for the output, the time length used in the shifting process is output together with the interval and chunk information (constraint information) at which the score determined as the output target is obtained.

（３以上のデータソースを使用する場合の例）
これまでデータソースの数が２である場合について説明したが、これは一例であり、データソースの数は３以上であってもよい。この場合の処理例を以下に説明する。 (Example when using 3 or more data sources)
Although the case where the number of data sources is two has been described so far, this is an example, and the number of data sources may be three or more. A processing example in this case will be described below.

ここで、データソースがＸ個である場合、Ｘ個から２個（比較する対象とするペア）を選択する組み合わせの数は、_ＸＣ_２＝Ｘ！／（Ｘ−２）！２！である。例えば、Ｘ＝４とすると、その数は、_４Ｃ_２＝４！／２！２！＝６となる。以下では、Ｘ＝４として、Ａ、Ｂ、Ｃ、Ｄの４つのデータソースのデータを使用するものとする。 Here, when there are X data sources, the number of combinations for selecting two from X (pairs to be compared) is _X C ₂ = X! / (X-2)! 2! It is. For example, if X = 4, the number is ₄ C ₂ = 4! / 2! 2! = 6. In the following, it is assumed that data of four data sources A, B, C, and D is used with X = 4.

この場合、データ評価装置１００は、Ａ、Ｂ、Ｃ、Ｄを取得すると、Ａ、Ｂ、Ｃ、Ｄの中から２つを選択した組み合わせとして、ＡＢ、ＡＣ、ＡＤ、ＢＣ、ＢＤ、ＣＤがあることを把握する。 In this case, when the data evaluation apparatus 100 acquires A, B, C, and D, AB, AC, AD, BC, BD, and CD are selected as combinations of two selected from A, B, C, and D. Know that there is.

データ評価装置１００は、組み合わせ毎に、これまでに説明した処理と同様の処理を行うことで、Ｎ（Ｍ＋１）のスコアを算出する。ここで、Ｎ（間隔の数）とＭ（属性の数）のそれぞれの値について、データソース毎に同じでもよいし、異なっていてもよい。 The data evaluation apparatus 100 calculates a score of N (M + 1) for each combination by performing the same process as the process described so far. Here, the values of N (number of intervals) and M (number of attributes) may be the same or different for each data source.

そして、データ評価装置１００は、組み合わせ毎に、Ｎ（Ｍ＋１）個のスコアの中に、閾値（あるいはベンチマーク、以下同様）を超えるスコアがあるかどうかをチェックし、当該閾値を超えるスコアを有する組み合わせを抽出する。 Then, for each combination, the data evaluation apparatus 100 checks whether there is a score that exceeds a threshold (or benchmark, the same applies hereinafter) among N (M + 1) scores, and a combination having a score that exceeds the threshold To extract.

例えば、ＡＢ、ＡＣ、ＡＤ、ＢＣ、ＢＤ、ＣＤのうち、ＡＢ、ＣＤ、ＡＤの３つの組み合わせにおいて、閾値を超えるスコアが得られたものとすると、データ評価装置１００は、組み合わせ毎に、既に説明したデータを出力する。例えば、組み合わせＡＢについては、閾値を超えるスコアが得られた間隔、制約の情報（チャンクの情報）を出力する。これらに加えて、当該スコアが得られた間隔／制約の時系列データを出力してもよい。 For example, assuming that a score exceeding a threshold value is obtained in three combinations of AB, CD, AD among AB, AC, AD, BC, BD, and CD, the data evaluation apparatus 100 has already obtained a score for each combination. Output the explained data. For example, for the combination AB, an interval at which a score exceeding the threshold is obtained, and constraint information (chunk information) are output. In addition to these, time-series data of intervals / constraints from which the score is obtained may be output.

（実施の形態のまとめ）
以上、説明したように、本実施の形態によれば、異なるデータソースから得られた第１のデータと第２のデータであって、重複する属性を有する第１のデータと第２のデータを入力する入力手段と、前記属性に基づいて、前記第１のデータと前記第２のデータのそれぞれを、複数の区間に分割し、前記第１のデータと前記第２のデータにおける同じ区間のデータ間での相関係数を算出する算出手段と、前記複数の区間における区間毎に得られた相関係数の中で、所定の条件を満たす相関係数を決定し、当該相関係数に対応する区間を選択し、当該区間を示す情報を出力する評価手段とを備えるデータ評価装置が提供される。 (Summary of embodiment)
As described above, according to the present embodiment, the first data and the second data obtained from different data sources, the first data and the second data having overlapping attributes are obtained. Based on the input means for inputting and the attribute, each of the first data and the second data is divided into a plurality of sections, and data in the same section in the first data and the second data And calculating a correlation coefficient between the plurality of sections, determining a correlation coefficient satisfying a predetermined condition from the correlation coefficients obtained for each of the plurality of sections, and corresponding to the correlation coefficient There is provided a data evaluation device including an evaluation unit that selects a section and outputs information indicating the section.

前記算出手段は、例えば、前記第１のデータと前記第２のデータのそれぞれについて、複数の区間に分割して得られた各区間に含まれる複数データを所定の期間毎に集約し、集約したデータ間で前記相関係数の算出を行う。 For example, for each of the first data and the second data, the calculation unit aggregates a plurality of data included in each section obtained by dividing the plurality of sections into a plurality of sections and aggregates the data. The correlation coefficient is calculated between data.

前記算出手段は、前記所定の期間として、複数種類の期間を使用し、期間の種類毎に、前記複数の区間における区間毎の相関係数を算出し、前記評価手段は、前記期間の種類毎かつ前記区間毎に得られた相関係数の中で、所定の条件を満たす相関係数を決定し、当該相関係数に対応する期間の種類及び区間を選択し、当該期間及び区間を示す情報を出力することとしてもよい。 The calculating means uses a plurality of types of periods as the predetermined period, calculates a correlation coefficient for each section in the plurality of sections for each type of period, and the evaluating means calculates the period type In addition, among the correlation coefficients obtained for each section, a correlation coefficient satisfying a predetermined condition is determined, a period type and a section corresponding to the correlation coefficient are selected, and information indicating the period and section May be output.

前記算出手段は、前記第１のデータと前記第２のデータのそれぞれについて、前記複数の区間に分割せずに、所定の期間毎に集約を行い、集約が行われた第１のデータと集約が行われた第２のデータとの間で相関係数を算出し、前記評価手段は、前記区間毎に得られた相関係数と、前記集約が行われた第１のデータと前記集約が行われた第２のデータとの間で算出された相関係数の中で、所定の条件を満たす相関係数を決定することとしてもよい。 The calculation unit aggregates the first data and the second data for each predetermined period without dividing the first data and the second data, and aggregates the first data and the aggregated data. The evaluation means calculates the correlation coefficient obtained for each section, the first data on which the aggregation is performed, and the aggregation Of the correlation coefficients calculated with the second data that has been performed, a correlation coefficient that satisfies a predetermined condition may be determined.

前記所定の条件は、例えば、相関係数が所定の閾値よりも大きいことである。また、前記第１のデータ及び前記第２のデータは、例えば時刻を属性として有するデータである。前記評価手段は、所定の条件を満たす相関係数が得られた区間に対応する部分の前記第１のデータ及び前記第２のデータを出力することとしてもよい。 The predetermined condition is, for example, that the correlation coefficient is larger than a predetermined threshold. Further, the first data and the second data are data having time as an attribute, for example. The evaluation unit may output the first data and the second data of a portion corresponding to a section in which a correlation coefficient satisfying a predetermined condition is obtained.

（第１項）
異なるデータソースから得られた第１のデータと第２のデータであって、重複する属性を有する第１のデータと第２のデータを入力する入力手段と、
前記属性に基づいて、前記第１のデータと前記第２のデータのそれぞれを、複数の区間に分割し、前記第１のデータと前記第２のデータにおける同じ区間のデータ間での相関係数を算出する算出手段と、
前記複数の区間における区間毎に得られた相関係数の中で、所定の条件を満たす相関係数を決定し、当該相関係数に対応する区間を選択し、当該区間を示す情報を出力する評価手段と
を備えることを特徴とするデータ評価装置。
（第２項）
前記算出手段は、前記第１のデータと前記第２のデータのそれぞれについて、前記複数の区間に分割して得られた各区間に含まれる複数データを所定の期間毎に集約し、集約したデータ間で前記相関係数の算出を行う
ことを特徴とする第１項に記載のデータ評価装置。
（第３項）
前記算出手段は、前記所定の期間として、複数種類の期間を使用し、期間の種類毎に、前記複数の区間における区間毎の相関係数を算出し、
前記評価手段は、前記期間の種類毎かつ前記区間毎に得られた相関係数の中で、所定の条件を満たす相関係数を決定し、当該相関係数に対応する期間の種類及び区間を選択し、当該期間及び区間を示す情報を出力する
ことを特徴とする第２項に記載のデータ評価装置。
（第４項）
前記算出手段は、前記第１のデータと前記第２のデータのそれぞれについて、前記複数の区間に分割せずに、所定の期間毎に集約を行い、集約が行われた第１のデータと集約が行われた第２のデータとの間で相関係数を算出し、
前記評価手段は、前記区間毎に得られた相関係数と、前記集約が行われた第１のデータと前記集約が行われた第２のデータとの間で算出された相関係数の中で、所定の条件を満たす相関係数を決定する
ことを特徴とする第１項ないし第３項のうちいずれか１項に記載のデータ評価装置。
（第５項）
前記評価手段は、所定の条件を満たす相関係数が得られた区間に対応する部分の前記第１のデータ及び前記第２のデータを出力する
ことを特徴とする第１項ないし第４項のうちいずれか１項に記載のデータ評価装置。
（第６項）
データ評価装置が実行するデータ評価方法であって、
異なるデータソースから得られた第１のデータと第２のデータであって、重複する属性を有する第１のデータと第２のデータを入力する入力ステップと、
前記属性に基づいて、前記第１のデータと前記第２のデータのそれぞれを、複数の区間に分割し、前記第１のデータと前記第２のデータにおける同じ区間のデータ間での相関係数を算出する算出ステップと
前記複数の区間における区間毎に得られた相関係数の中で、所定の条件を満たす相関係数を決定し、当該相関係数に対応する区間を選択し、当該区間を示す情報を出力する評価ステップと
を備えることを特徴とするデータ評価方法。
（第７項）
コンピュータを、第１項ないし第５項のうちいずれか１項に記載のデータ評価装置における各手段として機能させるためのプログラム。
本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。
(Section 1)
Input means for inputting the first data and the second data, the first data and the second data obtained from different data sources, having overlapping attributes;
Based on the attribute, each of the first data and the second data is divided into a plurality of sections, and the correlation coefficient between the data in the same section in the first data and the second data Calculating means for calculating
Among correlation coefficients obtained for each section in the plurality of sections, a correlation coefficient satisfying a predetermined condition is determined, a section corresponding to the correlation coefficient is selected, and information indicating the section is output. Evaluation means
A data evaluation apparatus comprising:
(Section 2)
The calculation means aggregates a plurality of data included in each section obtained by dividing the plurality of sections into the plurality of sections for each of the first data and the second data, and aggregates the data. The correlation coefficient is calculated between
2. The data evaluation device according to item 1, wherein
(Section 3)
The calculation means uses a plurality of types of periods as the predetermined period, calculates a correlation coefficient for each section in the plurality of sections for each type of period,
The evaluation means determines a correlation coefficient satisfying a predetermined condition among the correlation coefficients obtained for each type of the period and for each section, and determines the type and section of the period corresponding to the correlation coefficient. Select and output information indicating the period and section
The data evaluation apparatus according to item 2, characterized in that:
(Section 4)
The calculation unit aggregates the first data and the second data for each predetermined period without dividing the first data and the second data, and aggregates the first data and the aggregated data. The correlation coefficient is calculated with the second data for which
The evaluation means includes a correlation coefficient obtained for each section and a correlation coefficient calculated between the first data on which the aggregation is performed and the second data on which the aggregation is performed. To determine the correlation coefficient that satisfies the predetermined condition.
4. The data evaluation device according to any one of items 1 to 3, wherein
(Section 5)
The evaluation unit outputs the first data and the second data of a portion corresponding to a section in which a correlation coefficient satisfying a predetermined condition is obtained.
The data evaluation device according to any one of Items 1 to 4, wherein the data evaluation device is characterized in that
(Section 6)
A data evaluation method executed by a data evaluation device,
An input step of inputting first data and second data, which are first data and second data obtained from different data sources, having overlapping attributes;
Based on the attribute, each of the first data and the second data is divided into a plurality of sections, and the correlation coefficient between the data in the same section in the first data and the second data A calculation step for calculating
Among correlation coefficients obtained for each section in the plurality of sections, a correlation coefficient satisfying a predetermined condition is determined, a section corresponding to the correlation coefficient is selected, and information indicating the section is output. Evaluation steps and
A data evaluation method comprising:
(Section 7)
A program for causing a computer to function as each unit in the data evaluation device according to any one of Items 1 to 5.
The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１、２データソース
１００データ評価装置
１０１入力部
１０２間隔別時系列データ生成部
１０３制約付時系列データ生成部
１０４スコア算出部
１０５スコア評価部
１０６出力部
１０７データ記憶部 DESCRIPTION OF SYMBOLS 1, 2 Data source 100 Data evaluation apparatus 101 Input part 102 Time-sequential data generation part 103 by intervals 103 Time series data generation part with restrictions 104 Score calculation part 105 Score evaluation part 106 Output part 107 Data storage part

Claims

A first network quality information obtained from different data sources and the second network quality information, input for inputting the first network quality information and a second network quality information having a signal strength which is a duplicate attribute Means,
Based on the signal strength , each of the first network quality information and the second network quality information is divided into a plurality of sections, and the same in the first network quality information and the second network quality information . A calculation means for calculating a correlation coefficient between the network quality information of the sections;
Among correlation coefficients obtained for each section in the plurality of sections, a correlation coefficient satisfying a predetermined condition is determined, a section corresponding to the correlation coefficient is selected, and information indicating the section is output. A data evaluation device comprising: an evaluation means.

It said calculation means for each of said first network quality information and said second network quality information, a plurality network quality information included in each interval obtained by dividing the plurality of sections for every predetermined time period The data evaluation apparatus according to claim 1, wherein the correlation coefficient is calculated between the aggregated network quality information .

The calculation means uses a plurality of types of periods as the predetermined period, calculates a correlation coefficient for each section in the plurality of sections for each type of period,
The evaluation means determines a correlation coefficient satisfying a predetermined condition among the correlation coefficients obtained for each type of the period and for each section, and determines the type and section of the period corresponding to the correlation coefficient. The data evaluation apparatus according to claim 2, wherein the information is selected and information indicating the period and the section is output.

The calculating means aggregates the first network quality information and the second network quality information for each predetermined period without dividing the first network quality information and the second network quality information . A correlation coefficient between the network quality information of the second network quality information and the aggregated second network quality information ,
The evaluation unit includes a correlation coefficient obtained for each of the sections, the phase calculated between the first second network quality information network quality information and the aggregate has been performed in which the aggregate is performed The data evaluation apparatus according to any one of claims 1 to 3, wherein a correlation coefficient that satisfies a predetermined condition is determined among the number of relations.

The said evaluation means outputs the said 1st network quality information and the said 2nd network quality information of the part corresponding to the area from which the correlation coefficient which satisfy | fills a predetermined condition was obtained. 4. The data evaluation device according to any one of 4 above.

A data evaluation method executed by a data evaluation device,
A first network quality information obtained from different data sources and the second network quality information, input for inputting the first network quality information and a second network quality information having a signal strength which is a duplicate attribute Steps,
Based on the signal strength , each of the first network quality information and the second network quality information is divided into a plurality of sections, and the same in the first network quality information and the second network quality information . A step of calculating a correlation coefficient between network quality information of the sections, and determining a correlation coefficient satisfying a predetermined condition among the correlation coefficients obtained for each section in the plurality of sections; An evaluation step of selecting an interval corresponding to the number and outputting information indicating the interval.

The program for functioning a computer as each means in the data evaluation apparatus of any one of Claims 1 thru | or 5.