JP6550783B2

JP6550783B2 - Data output method, data output program and data output device

Info

Publication number: JP6550783B2
Application number: JP2015031096A
Authority: JP
Inventors: 孝河東; 太田　唯子; 唯子太田; 稲越　宏弥; 宏弥稲越; 湯上　伸弘; 伸弘湯上
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-02-19
Filing date: 2015-02-19
Publication date: 2019-07-31
Anticipated expiration: 2035-02-19
Also published as: JP2016152039A

Description

本発明は、データ出力方法、データ出力プログラムおよびデータ出力装置に関する。 The present invention relates to a data output method, a data output program, and a data output device.

機械学習では、学習データを用いて予測モデルを生成し、生成した予測モデルにより予測を行う。このため、機械学習は、学習データによって、性能が変化する。そこで、学習に有効なデータを選択する技術がある。例えば、入力された複数種類の特徴量の時系列データと、目的変数の時系列データとの相関をそれぞれ分析し、目的変数に対する影響度の高い特徴量を特定する。 In machine learning, a prediction model is generated using learning data, and prediction is performed using the generated prediction model. For this reason, machine learning changes its performance according to learning data. Therefore, there is a technique for selecting data effective for learning. For example, the correlation between the input time-series data of a plurality of types of feature amounts and the time-series data of the objective variable is analyzed, and the feature amount having a high influence on the objective variable is specified.

特開２０１２−２７８８０号公報Unexamined-Japanese-Patent No. 2012-27880

しかしながら、従来の技術では、学習に有効なデータを特定できない場合がある。予測の対象とする事象には、時間帯によって発生原理が変化するものがある。一方、従来の技術では、発生原理によらずデータ全体で目的変数との相関がある特徴量が特定される。このため、従来の技術では、特定の時間帯において目的変数との相関が高い特徴量が選択されず、学習に有効なデータを特定できない場合がある。 However, conventional techniques may not be able to identify data effective for learning. Among events to be predicted, the occurrence principle changes depending on the time zone. On the other hand, in the prior art, feature quantities that are correlated with the target variable in the entire data are identified regardless of the generation principle. For this reason, in the related art, a feature amount having a high correlation with a target variable is not selected in a specific time zone, and there are cases in which data that is effective for learning can not be identified.

一つの側面では、学習に有効なデータを特定できるデータ出力方法、データ出力プログラムおよびデータ出力装置を提供することを目的とする。 In one aspect, it is an object of the present invention to provide a data output method, a data output program, and a data output device that can identify data effective for learning.

第１の案では、データ出力方法は、特定の監視対象の状態を示す値の時系列データと、特定の監視対象とは異なる複数の監視対象の状態を示す値の時系列データの其々との相関値を算出する。データ出力方法は、算出した相関値間の相関に基づき、複数の監視対象の状態を示す時系列データを複数のクラスタに分類する。データ出力方法は、複数のクラスタのそれぞれから何れかの時系列データを抽出する。データ出力方法は、複数のクラスタのそれぞれから抽出した各時系列データの種別を示す情報を出力する。 In the first proposal, the data output method includes time-series data of values indicating the state of a specific monitoring target and time-series data of values indicating a plurality of monitoring target states different from the specific monitoring target. Calculate the correlation value of The data output method classifies time-series data indicating a plurality of monitoring target states into a plurality of clusters based on the correlation between the calculated correlation values. The data output method extracts any time series data from each of a plurality of clusters. The data output method outputs information indicating the type of each time-series data extracted from each of the plurality of clusters.

本発明の一の実施態様によれば、学習に有効なデータを特定できるという効果を奏する。 According to one embodiment of the present invention, there is an effect that data effective for learning can be specified.

図１は、実施例１に係るデータ出力装置の機能的な構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a functional configuration of the data output apparatus according to the first embodiment. 図２は、目的変数データのデータ構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of the data configuration of the objective variable data. 図３は、特徴量データのデータ構成の一例を示す図である。FIG. 3 is a diagram illustrating an example of a data configuration of feature amount data. 図４は、相関値の算出の一例を説明する図である。FIG. 4 is a diagram for explaining an example of calculation of correlation values. 図５Ａは、ウィンドウの一例を説明する図である。FIG. 5A is a diagram for explaining an example of a window. 図５Ｂは、ウィンドウの一例を説明する図である。FIG. 5B is a view for explaining an example of a window. 図５Ｃは、ウィンドウの一例を説明する図である。FIG. 5C is a diagram for explaining an example of a window. 図６は、スコアの求め方の一例を示す図である。FIG. 6 is a diagram illustrating an example of how to obtain a score. 図７は、クラスタごとの特徴量を説明する図である。FIG. 7 is a diagram for explaining the feature amount of each cluster. 図８は、発生原理が変化する事象の一例を示す図である。FIG. 8 is a diagram showing an example of an event whose occurrence principle changes. 図９は、ある道路の交通量の変化と、時間帯、事故数、降水量の変化を示す図である。FIG. 9 is a diagram showing changes in traffic volume on a certain road, and changes in time zone, number of accidents, and precipitation. 図１０は、時間帯ごとに予測モデルを生成する一例を説明する図である。FIG. 10 is a diagram illustrating an example of generating a prediction model for each time period. 図１１は、予測する事象に合わせて細かく予測モデルを生成する一例を示す図である。FIG. 11 is a diagram illustrating an example of generating a detailed prediction model in accordance with an event to be predicted. 図１２は、目的変数のデータと特徴量のデータの一例を示す図である。FIG. 12 is a diagram illustrating an example of objective variable data and feature amount data. 図１３は、目的変数のデータと、複数の特徴量のデータの分割の一例を示す図である。FIG. 13 is a diagram illustrating an example of division of objective variable data and a plurality of feature amount data. 図１４は、相関の算出の一例を示す図である。FIG. 14 is a diagram showing an example of correlation calculation. 図１５は、複数の特徴量のデータを分類する一例を示す図である。FIG. 15 is a diagram illustrating an example of classifying data of a plurality of feature amounts. 図１６は、クラスタごとのスコアの一例を示す図である。FIG. 16 is a diagram illustrating an example of a score for each cluster. 図１７は、クラスタごとの特徴量の抽出の一例を示す図である。FIG. 17 is a diagram illustrating an example of feature amount extraction for each cluster. 図１８は、実施例１に係るデータ出力処理の手順の一例を示すフローチャートである。FIG. 18 is a flowchart illustrating an example of the procedure of the data output process according to the first embodiment. 図１９は、実施例２に係るデータ出力装置の機能的な構成の一例を示す図である。FIG. 19 is a diagram illustrating an example of a functional configuration of the data output apparatus according to the second embodiment. 図２０は、目的変数に対する各特徴量の相関値の一例を示す図である。FIG. 20 is a diagram illustrating an example of the correlation value of each feature amount with respect to the objective variable. 図２１は、スコアが最も高い特徴量を抽出した一例を示す図である。FIG. 21 is a diagram illustrating an example in which the feature amount having the highest score is extracted. 図２２は、類似する特徴量を抽出の対象から除外した一例を示す図である。FIG. 22 is a diagram illustrating an example in which similar feature amounts are excluded from extraction targets. 図２３は、残った特徴量からスコアが最も高い特徴量を抽出した一例を示す図である。FIG. 23 is a diagram illustrating an example in which the feature amount having the highest score is extracted from the remaining feature amounts. 図２４は、類似する特徴量を抽出の対象から除外した一例を示す図である。FIG. 24 is a diagram illustrating an example in which similar feature amounts are excluded from extraction targets. 図２５は、残った特徴量からスコアが最も高い特徴量を抽出した一例を示す図である。FIG. 25 is a diagram illustrating an example in which the feature amount having the highest score is extracted from the remaining feature amounts. 図２６は、実施例２に係るデータ出力処理の手順の一例を示すフローチャートである。FIG. 26 is a flowchart of an example of a data output process according to the second embodiment. 図２７は、データ出力プログラムを実行するコンピュータを示す図である。FIG. 27 is a diagram illustrating a computer that executes a data output program.

以下に、本発明に係るデータ出力方法、データ出力プログラムおよびデータ出力装置の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。そして、各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, embodiments of a data output method, a data output program and a data output device according to the present invention will be described in detail based on the drawings. The present invention is not limited by this embodiment. And each Example can be suitably combined in the range which does not make processing contents contradictory.

［装置構成］
本実施例に係るデータ出力装置１０について説明する。データ出力装置１０は、機械学習の予測モデルの生成に利用可能な各種のデータから、学習に有効なデータを特定して出力する装置である。データ出力装置１０は、例えば、パーソナルコンピュータやサーバコンピュータなどのコンピュータなどである。データ出力装置１０は、学習に有効なデータを用いて学習を行って予測モデルを生成し、生成した予測モデルにより予測を行う。 [Device configuration]
A data output device 10 according to the present embodiment will be described. The data output device 10 is a device that identifies and outputs data effective for learning from various data that can be used for generating a prediction model of machine learning. The data output device 10 is, for example, a computer such as a personal computer or a server computer. The data output device 10 performs learning using data effective for learning to generate a prediction model, and performs prediction using the generated prediction model.

図１は、実施例１に係るデータ出力装置の機能的な構成の一例を示す図である。図１に示すように、データ出力装置１０は、通信Ｉ／Ｆ（インタフェース）部２０と、入力部２１と、表示部２２と、記憶部２３と、制御部２４とを有する。なお、データ出力装置１０は、上記の機器以外の他の機器を有してもよい。 FIG. 1 is a diagram illustrating an example of a functional configuration of the data output apparatus according to the first embodiment. As illustrated in FIG. 1, the data output device 10 includes a communication I / F (interface) unit 20, an input unit 21, a display unit 22, a storage unit 23, and a control unit 24. The data output device 10 may have other devices other than the above-described devices.

通信Ｉ／Ｆ部２０は、他の装置との間で通信制御を行うインタフェースである。通信Ｉ／Ｆ部２０としては、ＬＡＮカードなどのネットワークインタフェースカードを採用できる。 The communication I / F unit 20 is an interface that controls communication with other devices. As the communication I / F unit 20, a network interface card such as a LAN card can be adopted.

通信Ｉ／Ｆ部２０は、不図示のネットワークを介して他の装置と各種情報を送受信する。例えば、通信Ｉ／Ｆ部２０は、機械学習において予測モデルの生成に用いる各種のデータを受信する。例えば、通信Ｉ／Ｆ部２０は、機械学習で予測の対象とする特定の監視対象の状態を示す値の時系列データを受信する。この予測の対象とする特定の監視対象の状態を示す値の時系列データは、機械学習で予測モデルを生成する際の目的変数のデータとなる。また、通信Ｉ／Ｆ部２０は、特定の監視対象とは異なる複数の監視対象の状態を示す値の時系列データを受信する。この特定の監視対象とは異なる複数の監視対象の状態を示す値の時系列データは、機械学習で予測モデルを生成する際の学習データの候補となる。 The communication I / F unit 20 transmits and receives various types of information to and from other devices via a network (not shown). For example, the communication I / F unit 20 receives various data used for generating a prediction model in machine learning. For example, the communication I / F unit 20 receives time-series data of values indicating the state of a specific monitoring target to be predicted by machine learning. The time-series data of values indicating the state of a specific monitoring target that is the target of prediction is data of an objective variable when generating a prediction model by machine learning. Further, the communication I / F unit 20 receives time-series data of values indicating the states of a plurality of monitoring targets different from the specific monitoring target. The time series data of values indicating the states of a plurality of monitoring targets different from the specific monitoring target are candidates for learning data when generating a prediction model by machine learning.

入力部２１は、各種の情報を入力する入力デバイスである。入力部２１としては、マウスやキーボードなどの操作の入力を受け付ける入力デバイスが挙げられる。入力部２１は、各種の情報の入力を受け付ける。例えば、入力部２１は、機械学習に関する各種の操作入力を受け付ける。入力部２１は、ユーザからの操作入力を受け付け、受け付けた操作内容を示す操作情報を制御部２４に入力する。 The input unit 21 is an input device for inputting various types of information. The input unit 21 may be an input device that receives an input of an operation such as a mouse or a keyboard. The input unit 21 receives input of various types of information. For example, the input unit 21 receives various operation inputs related to machine learning. The input unit 21 receives an operation input from the user, and inputs operation information indicating the received operation content to the control unit 24.

表示部２２は、各種情報を表示する表示デバイスである。表示部２２としては、ＬＣＤ（Liquid Crystal Display）やＣＲＴ（Cathode Ray Tube）などの表示デバイスが挙げられる。表示部２２は、各種情報を表示する。例えば、表示部２２は、各種の操作画面や予測結果を示した画面など各種の画面を表示する。 The display unit 22 is a display device that displays various information. Examples of the display unit 22 include display devices such as a liquid crystal display (LCD) and a cathode ray tube (CRT). The display unit 22 displays various information. For example, the display unit 22 displays various screens such as various operation screens and a screen showing a prediction result.

記憶部２３は、各種のデータを記憶する記憶デバイスである。例えば、記憶部２３は、ハードディスク、ＳＳＤ（Solid State Drive）、光ディスクなどの記憶装置である。なお、記憶部２３は、ＲＡＭ（Random Access Memory）、フラッシュメモリ、ＮＶＳＲＡＭ（Non Volatile Static Random Access Memory）などのデータを書き換え可能な半導体メモリであってもよい。 The storage unit 23 is a storage device that stores various data. For example, the storage unit 23 is a storage device such as a hard disk, a solid state drive (SSD), or an optical disk. The storage unit 23 may be a semiconductor memory that can rewrite data, such as a random access memory (RAM), a flash memory, and a non-volatile static random access memory (NVSRAM).

記憶部２３は、制御部２４で実行されるＯＳ（Operating System）や各種プログラムを記憶する。例えば、記憶部２３は、後述する各種の処理を実行するプログラムを含む各種のプログラムを記憶する。さらに、記憶部２３は、制御部２４で実行されるプログラムで用いられる各種データを記憶する。例えば、記憶部２３は、目的変数データ３０と、特徴量データ３１とを記憶する。 The storage unit 23 stores an OS (Operating System) executed by the control unit 24 and various programs. For example, the storage unit 23 stores various programs including programs for executing various processes described later. Furthermore, the storage unit 23 stores various data used in a program executed by the control unit 24. For example, the storage unit 23 stores objective variable data 30 and feature amount data 31.

目的変数データ３０は、機械学習の目的変数のデータを記憶したデータである。目的変数データ３０には、機械学習で予測の対象とする特定の監視対象の状態を示す値の時系列データが、目的変数のデータとして記憶されている。 The target variable data 30 is data in which data of a target variable of machine learning is stored. In the objective variable data 30, time-series data of values indicating the state of a specific monitoring target to be predicted by machine learning is stored as data of the objective variable.

図２は、目的変数データのデータ構成の一例を示す図である。目的変数データ３０には、計測された時間ごとに、特定の監視対象の状態を示す値が目的変数のデータとして記憶されている。図２の例では、時間ｔ₁に対応して目的変数ｘ₁、時間ｔ₂に対応して目的変数ｘ₂、・・・、時間ｔ_ｍに対応して目的変数ｘ_ｍが記憶されている。 FIG. 2 is a diagram illustrating an example of the data configuration of the objective variable data. In the target variable data 30, a value indicating the state of a specific monitoring target is stored as data of the target variable for each measured time. In the example of FIG. 2, object variables x ₁ corresponds to the time t _1, the dependent variable x ₂ corresponds to the time t _2, · · ·, the target variable x _m corresponding to the time t _m stored .

図１に戻り、特徴量データ３１は、機械学習で予測モデルを生成する際の学習データの候補となる複数の特徴量のデータを記憶したデータである。特徴量データ３１には、機械学習で予測の対象とする特定の監視対象とは異なる複数の監視対象の状態を示す値の時系列データが、複数の特徴量のデータとして記憶されている。 Returning to FIG. 1, the feature amount data 31 is data that stores data of a plurality of feature amounts that are candidates for learning data when a prediction model is generated by machine learning. The feature amount data 31 stores time-series data of values indicating the states of a plurality of monitoring targets different from a specific monitoring target to be predicted by machine learning as data of a plurality of feature amounts.

図３は、特徴量データのデータ構成の一例を示す図である。特徴量データ３１には、計測された時間ごとに、複数の特徴量の状態を示す値が、特徴量のデータとして記憶されている。図３の例では、種別として、特徴量の種別を示すｆａ〜ｆｚが格納されている。また、図３の例では、特徴量ｆａ〜ｆｚについて、時間ｔ₁に対応して特徴量ｆａ₁、ｆｂ₁〜ｆｚ₁、時間ｔ₂に対応して特徴量ｆａ₂、ｆｂ₂〜ｆｚ₂、・・・、時間ｔ_ｍに対応して特徴量ｆａ_ｍ、ｆｂ_ｍ〜ｆｚ_ｍが記憶されている。 FIG. 3 is a diagram illustrating an example of a data configuration of feature amount data. In the feature amount data 31, values indicating the states of a plurality of feature amounts are stored as feature amount data for each measured time. In the example of FIG. 3, fa to fz indicating the type of the feature amount are stored as the type. Further, in the example of FIG. 3, the feature quantity Fa～fz, feature amount fa ₁ corresponds to the time t _{_{_1,}} fb ₁ ~fz _1, feature amounts fa ₂ corresponds to the time t _{_{_2,}} fb ₂ ~fz ₂ The feature quantities fa _m and fb _{m to} fz _m are stored corresponding to the time t _m .

図１に戻り、制御部２４は、データ出力装置１０を制御するデバイスである。制御部２４としては、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等の電子回路や、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等の集積回路を採用できる。制御部２４は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、これらによって種々の処理を実行する。制御部２４は、各種のプログラムが動作することにより各種の処理部として機能する。例えば、制御部２４は、受付部４０と、算出部４１と、分類部４２と、抽出部４３と、出力部４４と、予測部４５とを有する。 Returning to FIG. 1, the control unit 24 is a device that controls the data output device 10. As the control unit 24, an electronic circuit such as a CPU (Central Processing Unit) and an MPU (Micro Processing Unit), or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) and an FPGA (Field Programmable Gate Array) can be employed. The control unit 24 has an internal memory for storing programs defining various processing procedures and control data, and executes various processes using these. The control unit 24 functions as various processing units by operating various programs. For example, the control unit 24 includes a reception unit 40, a calculation unit 41, a classification unit 42, an extraction unit 43, an output unit 44, and a prediction unit 45.

受付部４０は、各種の受け付けを行う。例えば、受付部４０は、各種の操作指示を受け付ける。例えば、受付部４０は、機械学習に関する操作画面を表示部２２に表示させて、入力部２１から、処理開始などの操作指示を受け付ける。 The reception unit 40 performs various receptions. For example, the receiving unit 40 receives various operation instructions. For example, the reception unit 40 displays an operation screen related to machine learning on the display unit 22 and receives an operation instruction such as a process start from the input unit 21.

算出部４１は、各種の算出を行う。例えば、算出部４１は、特定の監視対象の状態を示す値の時系列データと、特定の監視対象とは異なる複数の監視対象の状態を示す値の時系列データの其々との相関値を算出する。例えば、算出部４１は、特定の監視対象の状態を示す値の時系列データと、複数の監視対象の状態を示す値の時系列データのそれぞれとを複数の期間に分割する。例えば、算出部４１は、目的変数データ３０に記憶された目的変数のデータと、特徴量データ３１に記憶された複数の特徴量のデータとを所定の時間帯ごとのウィンドウに分割する。そして、算出部４１は、期間ごとに、特定の監視対象の状態を示す値の時系列データと、複数の監視対象の状態を示す値の時系列データのそれぞれとの相関値を算出する。例えば、算出部４１は、ウィンドウごとに、目的変数と、複数の特徴量との相関値を算出する。この相関値は、目的変数と特徴量との相関度合い示す値であれば、何れの方式で算出してもよい。例えば、算出部４１は、相関値として、積率相関係数を算出する。 The calculator 41 performs various calculations. For example, the calculation unit 41 calculates a correlation value between time-series data indicating a state of a specific monitoring target and each of time-series data indicating values of a plurality of monitoring targets different from the specific monitoring target. calculate. For example, the calculation unit 41 divides time-series data of values indicating a specific monitoring target state and time-series data of values indicating a plurality of monitoring target states into a plurality of periods. For example, the calculation unit 41 divides the objective variable data stored in the objective variable data 30 and the plurality of feature amount data stored in the feature amount data 31 into windows for each predetermined time period. Then, the calculation unit 41 calculates, for each period, a correlation value between time-series data of values indicating a specific monitoring target state and time-series data of values indicating a plurality of monitoring target states. For example, the calculation unit 41 calculates a correlation value between the objective variable and a plurality of feature amounts for each window. The correlation value may be calculated by any method as long as the value indicates the degree of correlation between the objective variable and the feature amount. For example, the calculation unit 41 calculates a product moment correlation coefficient as the correlation value.

図４は、相関値の算出の一例を説明する図である。図４の例では、ウィンドウを曜日ごととしている。算出部４１は、目的変数データ３０に記憶された目的変数ｘ₁〜ｘ_ｍと、特徴量データ３１に記憶された特徴量ｆａ₁〜ｆａ_ｍ、・・・、ｆｚ₁〜ｆｚ_ｍを曜日ごとのウィンドウに分割する。そして、算出部４１は、ウィンドウごとに、対応する時間の目的変数ｘと、複数の特徴量ｆａ〜ｆｚそれぞれとの相関値を算出する。図４の例では、曜日ごとに、特徴量ｆａ〜ｆｚについての目的変数ｘとの相関値が示されている。 FIG. 4 is a diagram for explaining an example of the calculation of the correlation value. In the example of FIG. 4, the window is set for each day of the week. Calculation unit 41, for each and purpose variables x ₁ ~x _m stored in the objective variable data 30, the feature quantity fa ₁ ~fa _m stored in the feature data 31, ..., and fz ₁ ~fz _m day Split into windows. Then, the calculation unit 41 calculates, for each window, a correlation value between the target variable x of the corresponding time and each of the plurality of feature quantities fa to fz. In the example of FIG. 4, the correlation value with the target variable x about the feature-value fa-fz is shown for every day of the week.

なお、図４の例では、所定の時間帯を曜日として、曜日ごとのウィンドウに分割する場合を例示した。しかし、これに限定されるものではない。例えば、ウィンドウの期間は、ユーザが指定してもよい。例えば、予測の対象とする事象の発生原理が変化するタイミングが判明している場合、ユーザが発生原理が変化するタイミングをウィンドウの期間として指定してもよい。例えば、予測の対象とする事象の発生原理が時間帯や曜日、月によって変化するものとする。この場合、ユーザがウィンドウの期間として、発生原理が変化するタイミングに応じた時間帯や曜日、月を指定する。算出部４１は、指定された時間帯や曜日、月ごとに目的変数のデータおよび特徴量のデータを分割してもよい。図５Ａは、ウィンドウの一例を説明する図である。図５Ａの例は、ユーザが曜日ごとにウィンドウの期間を指定しており、算出部４１は、目的変数ｘと、特徴量ｆａ〜ｆｚを曜日ごとのウィンドウに分割する。 In the example of FIG. 4, the case where the predetermined time zone is a day of the week and the window of each day is divided is illustrated. However, it is not limited to this. For example, the window duration may be specified by the user. For example, when the timing at which the occurrence principle of the event to be predicted changes is known, the user may designate the timing at which the occurrence principle changes as the window period. For example, it is assumed that the principle of occurrence of the event to be predicted changes depending on the time zone, day of the week, and month. In this case, the user designates a time zone, a day of the week, and a month according to the timing when the generation principle changes as the window period. The calculation unit 41 may divide the data of the target variable and the data of the feature amount for each designated time zone, day of the week, or month. FIG. 5A is a diagram for explaining an example of a window. In the example of FIG. 5A, the user designates the window period for each day of the week, and the calculation unit 41 divides the objective variable x and the feature amounts fa to fz into windows for each day of the week.

また、例えば、ウィンドウの期間は、データの変化点を基準に定めてもよい。例えば、予測の対象とする事象の発生原理によって、目的変数ｘや特徴量ｆａ〜ｆｚのデータには変化点が発生する場合がある。この場合、算出部４１は、データの変化点ごとに目的変数のデータおよび特徴量のデータを分割してもよい。図５Ｂは、ウィンドウの一例を説明する図である。図５Ｂの例は、目的変数ｘや特徴量ｆａ〜ｆｚのデータの変化点が発生しており、算出部４１は、目的変数ｘと、特徴量ｆａ〜ｆｚのデータを変化点ごとのウィンドウに分割する。変化点は、発生原理の変化に対応するものであれば、例えば、閾値を通過する点、極大点、極小点など何れであってもよい。図５Ｂは、閾値を通過する点を変化点としている。このようにデータの変化点を基準にウィンドウの期間を定めることにより、算出部４１は、発生原理の変化に対応してデータを分割できる。 Also, for example, the window period may be determined based on the change point of data. For example, a change point may occur in the data of the objective variable x and the feature amounts fa to fz depending on the generation principle of the event to be predicted. In this case, the calculation unit 41 may divide the data of the objective variable and the data of the feature amount for each change point of the data. FIG. 5B is a view for explaining an example of a window. In the example of FIG. 5B, change points of the data of the objective variable x and the feature amounts fa to fz are generated, and the calculation unit 41 displays the data of the objective variable x and the feature amounts fa to fz in a window for each change point. To divide. As long as the change point corresponds to the change in the generation principle, it may be any point such as a point that passes a threshold, a maximum point, or a minimum point. In FIG. 5B, a point that passes the threshold is a change point. By determining the window period based on the data change point in this way, the calculation unit 41 can divide the data in accordance with the change in the generation principle.

また、例えば、ウィンドウの期間は、互いに重複させてもよい。例えば、算出部４１は、一定期間をウィンドウ期間として、一定期間よりも短い期間ずつずらしながらウィンドウ期間ごとにデータを分割してもよい。図５Ｃは、ウィンドウの一例を説明する図である。図５Ｃの例は、算出部４１は、ウィンドウの期間を重複させながら、目的変数ｘと、特徴量ｆａ〜ｆｚのデータをウィンドウに分割する。このようにウィンドウの期間を互いに重複させることにより、算出部４１は、発生原理の変化が明確ではない場合でも、特徴量のデータを分割でき、後述する分類部４２により、類似する特徴量を同じ分類に分類できる。 Further, for example, the window periods may overlap each other. For example, the calculation unit 41 may divide the data for each window period while shifting the window by a period shorter than the certain period with the certain period as the window period. FIG. 5C is a diagram for explaining an example of a window. In the example of FIG. 5C, the calculation unit 41 divides the data of the objective variable x and the feature quantities fa to fz into windows while overlapping the window periods. By overlapping the window periods in this way, the calculation unit 41 can divide the feature amount data even when the change in the generation principle is not clear, and the similar feature amount can be the same by the classification unit 42 described later. It can be classified into classification.

図１に戻り、分類部４２は、各種の分類を行う。例えば、分類部４２は、相関値間の相関に基づき、複数の監視対象の状態を示す時系列データを複数のクラスタに分類する。例えば、分類部４２は、期間ごとに算出した相関値の複数の期間における分布に基づき、複数の監視対象の状態を示す値の時系列データを複数のクラスタに分類する。例えば、分類部４２は、ウィンドウ毎の相関の変化が類似する特徴量を同じ分類に分類して、特徴量を複数のクラスタに分類する。例えば、分類部４２は、特徴量ごとに、他の特徴量と各期間で相関値の誤差を求める。例えば、分類部４２は、各期間で相関値の誤差として、各期間の相関値のユークリッド距離を求める。期間１〜ｍの特徴量ｆａの相関値をｔａ₁〜ｔａ_ｍとし、特徴量ｆｂの相関値をｔｂ₁〜ｔｂ_ｍとした場合、ユークリッド距離は、以下の式（１）に示すように期間ごとの相関値ｔａ、ｔｂの差を二乗して合計し、合計値の平方根を求めることで算出される。 Returning to FIG. 1, the classification unit 42 performs various classifications. For example, the classification unit 42 classifies time series data indicating the states of a plurality of monitoring targets into a plurality of clusters based on the correlation between correlation values. For example, the classification unit 42 classifies time-series data of values indicating the states of a plurality of monitoring targets into a plurality of clusters based on the distribution of correlation values calculated for each period in a plurality of periods. For example, the classification unit 42 classifies the feature quantities having similar changes in correlation for each window into the same class, and classifies the feature quantities into a plurality of clusters. For example, for each feature quantity, the classification unit 42 obtains an error in correlation values with other feature quantities in each period. For example, the classification unit 42 obtains the Euclidean distance of the correlation value of each period as the error of the correlation value in each period. When the correlation value of the feature quantity fa in the periods _{1 to} _m is ta _{1 to} tam and the correlation value of the feature quantity fb is tb ₁ to tb _m , the Euclidean distance is the period as shown in the following equation (1). The difference between each correlation value ta and tb is squared and summed, and the square root of the total value is calculated.

ユークリッド距離＝（（ｔａ₁−ｔｂ₁）²＋・・・＋（ｔａ_ｍ−ｔｂ_ｍ）²）^1/2 （１） Euclidean distance = ((ta ₁ −tb ₁ ) ² +... + (Ta _m −tb _m ) ² ) ^1/2 (1)

分類部４２は、ユークリッド距離が近い特徴量を同じ分類に分類する。例えば、分類部４２は、何れかの特徴量を基準として、ユークリッド距離が閾値以下の特徴量を同じ分類に分類することを繰り返して、特徴量をクラスタに分類する。図４の例では、特徴量ｆａと特徴量ｆｂがクラスタＡに分類され、特徴量ｆｃと特徴量ｆｄがクラスタＢに分類されている。 The classification unit 42 classifies feature quantities close in Euclidean distance into the same classification. For example, the classifying unit 42 classifies feature quantities into clusters by repeatedly classifying feature quantities having a Euclidean distance equal to or less than a threshold value to the same classification with any feature quantity as a reference. In the example of FIG. 4, the feature amount fa and the feature amount fb are classified into cluster A, and the feature amount fc and the feature amount fd are classified into cluster B.

抽出部４３は、各種の抽出を行う。例えば、抽出部４３は、複数のクラスタのそれぞれから何れかの時系列データを抽出する。例えば、抽出部４３は、クラスタごとに、当該クラスタに分類された時系列データに対して算出された期間ごとの相関値を所定の重み付けで重み付け演算してスコアを求める。例えば、抽出部４３は、各特徴量のウィンドウごとの相関値をそれぞれ所定の重み付けで重み付けする。そして、抽出部４３は、クラスタごとに、重み付けされた相関値からスコアを求める。例えば、抽出部４３は、クラスタごとに、重み付けされた特徴量の各相関値の平均値、最大値または最小値をスコアとして求める。そして、抽出部４３は、クラスタごとに、スコアが最大の特徴量を抽出する。図４の例では、クラスタＡから特徴量ｆａが抽出され、クラスタＢから特徴量ｆｄが抽出される。 The extraction unit 43 performs various types of extraction. For example, the extraction unit 43 extracts any time series data from each of the plurality of clusters. For example, for each cluster, the extraction unit 43 obtains a score by weighting the correlation value for each period calculated for the time-series data classified into the cluster with a predetermined weight. For example, the extraction unit 43 weights the correlation value of each feature amount for each window with a predetermined weight. Then, the extraction unit 43 obtains a score from the weighted correlation value for each cluster. For example, the extraction unit 43 obtains, as a score, an average value, a maximum value, or a minimum value of each correlation value of the weighted feature amount for each cluster. Then, the extraction unit 43 extracts, for each cluster, the feature amount having the largest score. In the example of FIG. 4, the feature quantity fa is extracted from the cluster A, and the feature quantity fd is extracted from the cluster B.

ここで、スコアの求め方について説明する。図６は、スコアの求め方の一例を示す図である。図６では、説明を簡易化するため、１つのクラスタに、３つの特徴量ｆａ、ｆｂ、ｆｃが分類され、３つのウィンドウＷ１、Ｗ２、Ｗ３について相関値が算出されている場合を例に説明する。図６（Ａ）には、特徴量ｆａ、ｆｂ、ｆｃについてのウィンドウＷ１、Ｗ２、Ｗ３での相関値が示されている。図６（Ｂ）には、パターンＡ〜Ｃの３つのパターンで相関値を重み付けした結果が示されている。パターンＡは、相関値に均等に重み付けした場合が示されている。例えば、パターンＡでは、相関値に均等に「１」を重み付けする。この場合、重み付けした相関値は、図６（Ａ）と同じとなる。パターンＢは、相関値が大きいほど大きい重み付けをした場合が示されている。例えば、パターンＢでは、相関値を二乗した値を重み付け後の相関値とする。パターンＣは、相関値が所定の閾値以上の場合、相関値をそのまま重み付け後の相関値とし、相関値が所定の閾値未満の場合、ゼロとした場合が示されている。図６（Ｂ）に示したパターンＣでは、閾値を０．５としている。この場合、重み付けした相関値は、相関値が０．５以上の場合、図６（Ａ）と同じとなり、相関値が０．５未満の場合、ゼロとなる。なお、重み付けのパターンは、これに限定されるものではない。 Here, how to obtain the score will be described. FIG. 6 is a diagram illustrating an example of how to obtain a score. In FIG. 6, to simplify the explanation, an example is described in which three feature quantities fa, fb, and fc are classified into one cluster, and correlation values are calculated for the three windows W1, W2, and W3. Do. FIG. 6A shows correlation values in the windows W1, W2, and W3 for the feature amounts fa, fb, and fc. FIG. 6B shows the result of weighting the correlation value with three patterns A to C. Pattern A shows a case where the correlation values are weighted equally. For example, in the pattern A, “1” is weighted equally to the correlation value. In this case, the weighted correlation value is the same as in FIG. Pattern B is shown as being weighted more heavily as the correlation value is larger. For example, in the pattern B, a value obtained by squaring the correlation value is used as the correlation value after weighting. Pattern C shows the case where the correlation value is equal to or greater than the predetermined threshold value, and the correlation value is used as the weighted correlation value as it is, and when the correlation value is less than the predetermined threshold value, it is set to zero. In the pattern C shown in FIG. 6 (B), the threshold value is 0.5. In this case, the weighted correlation value is the same as FIG. 6A when the correlation value is 0.5 or more, and is zero when the correlation value is less than 0.5. Note that the weighting pattern is not limited to this.

抽出部４３は、クラスタごとに、重み付けされた相関値からスコアを求める。図６（Ｂ）には、スコアとして、パターンＡ〜Ｃのそれぞれで各相関値の平均値、最大値または最小値を求めた結果が示されている。抽出部４３は、クラスタごとに、スコアに基づき、特徴量を抽出する。例えば、抽出部４３は、クラスタごとに、スコアが最大の特徴量を抽出する。例えば、スコアの平均値が最大の特徴量を抽出する場合、パターンＡでは特徴量ｆｃが抽出され、パターンＢでは特徴量ｆｂが抽出され、パターンＣでは特徴量ｆｂが抽出される。スコアの最大値が最大の特徴量を抽出する場合、パターンＡでは特徴量ｆｂが抽出され、パターンＢでは特徴量ｆｂが抽出され、パターンＣでは特徴量ｆｂが抽出される。スコアの最小値が最大の特徴量を抽出する場合、パターンＡでは特徴量ｆｃが抽出され、パターンＢでは特徴量ｆｃが抽出され、パターンＣでは特徴量ｆｃが抽出される。 The extraction unit 43 obtains a score from the weighted correlation value for each cluster. FIG. 6B shows the result of obtaining the average value, maximum value, or minimum value of each correlation value for each of the patterns A to C as the score. The extraction unit 43 extracts a feature amount for each cluster based on the score. For example, the extraction unit 43 extracts, for each cluster, the feature amount having the largest score. For example, when extracting the feature quantity having the maximum average score value, the feature quantity fc is extracted from the pattern A, the feature quantity fb is extracted from the pattern B, and the feature quantity fb is extracted from the pattern C. When extracting the feature amount having the maximum score value, the feature amount fb is extracted from the pattern A, the feature amount fb is extracted from the pattern B, and the feature amount fb is extracted from the pattern C. When extracting the feature quantity having the largest minimum score value, the feature quantity fc is extracted from the pattern A, the feature quantity fc is extracted from the pattern B, and the feature quantity fc is extracted from the pattern C.

スコアの平均値が最大となる特徴量を抽出する場合、重み付けされた相関値が全てのウィンドウで平均的に高い特徴量が抽出される。スコアの最大値が最大となる特徴量を抽出する場合、重み付けされた相関値が何れかのウィンドウで最も高い特徴量が抽出される。すなわち、特定のウィンドウにおいて目的変数に対して影響の大きい特徴量が抽出される。スコアの最小値が最大となる特徴量を抽出する場合、重み付けされた相関値が低いウィンドウがない特徴量が抽出される。すなわち、目的変数に対して影響の小さいウィンドウがない特徴量が抽出される。 When extracting the feature quantity having the maximum score average value, the feature quantity whose weighted correlation value is high on average in all windows is extracted. When extracting the feature quantity that maximizes the maximum score value, the feature quantity having the highest weighted correlation value in any window is extracted. That is, feature quantities that have a large influence on the target variable in a specific window are extracted. When extracting the feature quantity having the maximum minimum score value, the feature quantity without a window having a low weighted correlation value is extracted. That is, a feature amount that does not have a window having a small influence on the objective variable is extracted.

図７は、クラスタごとの特徴量を説明する図である。図７の例では、説明を簡易化するため、２つのウィンドウＷ１、Ｗ２の相関を用いてクラスタに分類する場合を説明する。図７の例では、縦軸にウィンドウＷ２の目的変数と特徴量との相関が示され、横軸にウィンドウＷ１の目的変数と特徴量との相関が示されている。図７の例では、それぞれの特徴量が、ウィンドウＷ１での相関と、ウィンドウＷ２での相関に応じてプロットされている。特徴量は、ユークリッド距離が近いものごとにクラスタに分類されている。ユークリッド距離は、特徴量の点間の距離となる。図７の例では、特徴量が４つのクラスタＣ₁〜Ｃ₄に分類されている。クラスタＣ₁は、ウィンドウＷ２で相関が高く、ウィンドウＷ１で相関が低い特徴量の分類である。クラスタＣ₂は、ウィンドウＷ１とウィンドウＷ２で共に相関が高い特徴量の分類である。クラスタＣ₃は、ウィンドウＷ１とウィンドウＷ２で共に相関が低い特徴量の分類である。クラスタＣ₄は、ウィンドウＷ１で相関が高く、ウィンドウＷ２で相関が低い特徴量の分類である。 FIG. 7 is a diagram for explaining the feature amount of each cluster. In the example of FIG. 7, in order to simplify the description, the case of classifying into clusters using the correlation of two windows W1 and W2 will be described. In the example of FIG. 7, the vertical axis indicates the correlation between the objective variable of the window W2 and the feature quantity, and the horizontal axis indicates the correlation between the target variable of the window W1 and the feature quantity. In the example of FIG. 7, the respective feature quantities are plotted according to the correlation in the window W1 and the correlation in the window W2. The feature amounts are classified into clusters for each of those having a short Euclidean distance. The Euclidean distance is the distance between the feature points. In the example of FIG. 7, the feature amounts are classified into _four clusters C _{1 to} C ₄ . The cluster C ₁ is a classification of feature amounts having high correlation in the window W2 and low correlation in the window W1. The cluster C ₂ is a classification of feature amounts having high correlation between the window W1 and the window W2. The cluster C ₃ is a classification of feature amounts having a low correlation between the window W1 and the window W2. Cluster C ₄ has a high correlation window W1, the correlation window W2 is lower characteristic of classification.

抽出部４３は、クラスタごとに、スコアに基づき、特徴量を抽出する。例えば、抽出部４３は、クラスタＣ₁から特徴量ｆ１を抽出し、クラスタＣ₂から特徴量ｆ２を抽出し、クラスタＣ₃から特徴量ｆ３を抽出し、クラスタＣ₄から特徴量ｆ４を抽出する。 The extraction unit 43 extracts a feature amount for each cluster based on the score. For example, the extraction unit 43, a feature amount f1 is extracted from the cluster C _1, the feature amount f2 extracted from the cluster C _2, the feature quantity f3 extracted from the cluster C _3, extracts a feature quantity f4 from the cluster C ₄ .

出力部４４は、各種の出力を行う。例えば、出力部４４は、複数のクラスタのそれぞれから抽出した各時系列データの種別を示す情報を出力する。例えば、出力部４４は、抽出部４３によりクラスタごとに抽出された特徴量の種別を示す情報を出力する。例えば、図７の場合、出力部４４は、抽出された特徴量の種別を示す情報として、特徴量ｆ１、ｆ２、ｆ３、ｆ４を出力する。クラスタＣ₁〜Ｃ₄には、それぞれ目的変数との相関が近い特徴量が分類される。機械学習では、目的変数との相関が似ている特徴量を多く用いても、同じようなパターンのみが学習されるため、予測精度が向上し難く、目的変数との相関に多様性のあるデータで学習することが好ましい。そこで、クラスタごとに、特徴量を抽出して出力することで、目的変数との相関に多様性のあるデータを抽出でき、学習に有効なデータを特定できるため、機械学習の予測精度を向上させることができる。 The output unit 44 performs various outputs. For example, the output unit 44 outputs information indicating the type of each time-series data extracted from each of the plurality of clusters. For example, the output unit 44 outputs information indicating the type of feature amount extracted for each cluster by the extraction unit 43. For example, in the case of FIG. 7, the output unit 44 outputs the feature amounts f1, f2, f3, and f4 as information indicating the type of the extracted feature amount. In the clusters C _{1 to} C ₄ , feature quantities having close correlation with the objective variable are classified. In machine learning, even if many feature quantities that have a similar correlation with the objective variable are used, only similar patterns are learned, so prediction accuracy is difficult to improve, and there is diversity in the correlation with the objective variable. Learning with is preferable. Therefore, by extracting and outputting the feature amount for each cluster, it is possible to extract data having various correlations with the objective variable and to identify data effective for learning, thereby improving the prediction accuracy of machine learning. be able to.

予測部４５は、機械学習により各種の予測を行う。例えば、予測部４５は、出力部４４により出力された種別の時系列データを学習データとして用いて予測モデルを生成する。そして、予測部４５は、生成した予測モデルにより予測を行う。 The prediction unit 45 performs various predictions by machine learning. For example, the prediction unit 45 generates a prediction model using the time-series data of the type output by the output unit 44 as learning data. And the prediction part 45 performs prediction with the produced | generated prediction model.

ここで、機械学習が予測の対象とする事象には、時間帯によって発生原理が変化するものがある。図８は、発生原理が変化する事象の一例を示す図である。図８は、ある道路の平日と休日の交通量の変化と、降水量の変化が示されている。平日の道路は、通勤の車両の通行が多い。通勤の車両は、降水に関わらず道路を走行する。このため、平日の交通量は、図８の符号６０に示すように降水量が多い期間でも降水量の影響を受け難い。一方、休日の道路は、観光（行楽）の車両の通行が多い。観光は、天気が良い方が好ましい。このため、観光の車両は、天気が良いと多く、降水量が多いほど減少する。このため、休日の交通量は、図８の符号６１に示すように降水量が多い期間で交通量が減少する。このように、道路の交通量は、平日と休日で交通量の発生原理が変化する。 Here, among the events that machine learning targets for prediction, the generation principle may change depending on the time zone. FIG. 8 is a diagram showing an example of an event whose occurrence principle changes. FIG. 8 shows changes in traffic volume on weekdays and holidays on a certain road, and changes in precipitation. On weekday roads, there are many commuting vehicles. Commuters travel on the road regardless of precipitation. For this reason, the traffic volume on weekdays is unlikely to be affected by precipitation even during periods of heavy precipitation, as indicated by reference numeral 60 in FIG. On the other hand, there are many tourist (excursion) vehicles on holiday roads. Tourism is better if the weather is better. For this reason, there are many vehicles for sightseeing when the weather is good, and the more precipitation there is, the more it decreases. For this reason, as for the traffic volume of a holiday, as shown to the code | symbol 61 of FIG. 8, the traffic volume reduces in the period when there is much precipitation. Thus, the traffic volume of the road changes the generation principle of traffic volume on weekdays and holidays.

例えば、従来の技術により、発生原理によらずデータ全体で目的変数との相関がある特徴量を特定する場合を想定する。この場合、特定の時間帯において目的変数との相関が高い特徴量が選択されなくなる。図９は、ある道路の交通量の変化と、時間帯、事故数、降水量の変化を示す図である。図９の例では、交通量と時間帯および事故数は、データ全体的に相関がある。このため、データ全体的での相関が中レベルとなっている。一方、交通量と降水量は、平日か休日かによって影響が変化し、データ全体的として相関が低い。このため、データ全体的での相関が低レベルなっている。この場合、従来の技術では、時間帯および事故数が目的変数との相関がある特徴量として特定される。すなわち、従来の技術では、データの発生原理によらず一定の相関がある特徴量のみが選択されるため、降水量のように特定の事象で有効な特徴量が見落とされる。 For example, a case is assumed in which a feature quantity having a correlation with an objective variable is specified in the entire data regardless of the generation principle. In this case, feature quantities having high correlation with the target variable in a specific time zone can not be selected. FIG. 9 is a diagram showing changes in traffic volume on a certain road, and changes in time zone, number of accidents, and precipitation. In the example of FIG. 9, the traffic volume, the time zone, and the number of accidents are correlated with the entire data. For this reason, the correlation of the entire data is at a medium level. On the other hand, traffic volume and precipitation change depending on whether it is a weekday or a holiday, and the correlation is low as the whole data. For this reason, the correlation in the whole data is low level. In this case, in the prior art, the time zone and the number of incidents are specified as feature quantities that are correlated with the target variable. That is, in the conventional technique, only a feature quantity having a certain correlation is selected regardless of the data generation principle, and therefore, an effective feature quantity is overlooked in a specific event such as precipitation.

一方、本実施例に係るデータ出力装置１０は、相関の変化が類似する特徴量ごとにクラスタに分類し、クラスタごとに、特徴量を抽出することで、降水量のように特定の事象で有効な特徴量も抽出できる。このように、データ出力装置１０は、学習に有効なデータを特定できるため、機械学習の予測精度を向上させることができる。 On the other hand, the data output apparatus 10 according to the present embodiment classifies the feature amounts having similar correlation changes into clusters, and extracts the feature amounts for each cluster, so that it is effective for a specific event such as precipitation. Feature amounts can also be extracted. As described above, since the data output device 10 can specify data effective for learning, the prediction accuracy of machine learning can be improved.

また、例えば、機械学習が予測の対象とする事象が時間帯によって発生原理が変化する場合、時間帯ごとに当該時間帯の特徴量のデータを用いて学習を行い、時間帯ごとに予測モデルを生成する場合を想定する。図１０は、時間帯ごとに予測モデルを生成する一例を説明する図である。図１０は、ある道路の平日と休日の交通量の変化が示されている。図１０の例では、平日の特徴量のデータを用いて平日用の特徴を学習して平日の予測モデルを生成する。また、図１０の例では、休日の特徴量のデータを用いて休日用の特徴を学習して休日の予測モデルを生成する。この場合、予測する事象に対応させるには、予測する事象に合わせて細かく予測モデルを生成する。図１１は、予測する事象に合わせて細かく予測モデルを生成する一例を示す図である。図１１の例では、平日の昼間の時間帯の特徴量のデータを用いて学習を行い、平日の昼間の交通量の予測モデルを生成する。予測する事象に合わせて細かく予測モデルを生成する場合、予測モデルに使用できる特徴量のデータが少なくなる。図１１には、平日の昼間の交通量の予測モデルに使用できる特徴量のデータの範囲が示されている。このように予測モデルに使用できる特徴量のデータが少なくなると、予測モデルの予測精度が低下する。 In addition, for example, when the occurrence principle of an event that machine learning predicts changes depending on the time zone, learning is performed for each time zone using feature amount data, and a prediction model is set for each time zone. Assume the case of generation. FIG. 10 is a diagram illustrating an example of generating a prediction model for each time period. FIG. 10 shows a change in traffic volume on a weekday and a holiday on a road. In the example of FIG. 10, weekday features are learned using weekday feature data to generate a weekday prediction model. In the example of FIG. 10, a holiday prediction model is generated by learning holiday features using holiday feature data. In this case, in order to correspond to the event to be predicted, a prediction model is finely generated in accordance with the event to be predicted. FIG. 11 is a diagram illustrating an example of generating a detailed prediction model in accordance with an event to be predicted. In the example of FIG. 11, learning is performed using data of feature quantities in a daytime daytime zone on a weekday, and a prediction model of daytime traffic volume on a weekday is generated. When the prediction model is generated in detail in accordance with the event to be predicted, feature amount data that can be used for the prediction model decreases. FIG. 11 shows the range of feature quantity data that can be used for a daytime traffic volume forecast model on weekdays. Thus, when the amount of feature data that can be used in the prediction model decreases, the prediction accuracy of the prediction model decreases.

一方、本実施例に係るデータ出力装置１０は、機械学習が予測の対象とする事象が時間帯によって発生原理が変化する場合でも１つの予測モデルで予測できる。また、データ出力装置１０は、学習に有効な種別の時系列データを全て学習データとして用いて予測モデルを生成する。この結果、データ出力装置１０は、予測モデルに使用できる特徴量のデータを確保できるため、データ不足による予測モデルの予測精度の低下が発生し難くなる。 On the other hand, the data output apparatus 10 according to the present embodiment can predict with one prediction model even if the occurrence principle changes depending on the time zone, for which the event for which machine learning is the target of prediction changes. Further, the data output device 10 generates a prediction model by using all time series data of a type effective for learning as learning data. As a result, since the data output device 10 can secure data of feature amounts that can be used for the prediction model, it is difficult for a drop in prediction accuracy of the prediction model to occur due to a lack of data.

次に具体例を用いて説明する。以下では、交通量の予測モデルを生成する場合を例に説明する。図１２は、目的変数のデータと特徴量のデータの一例を示す図である。図１２には、目的変数のデータとして、計測された時間ごとに、交通量のデータが示されている。また、図１２には、特徴量のデータとして、計測された時間ごとに、降水量、気温、通信量、電力量のデータが示されている。降水量は、交通量を計測したエリアに降った降水量である。気温は、交通量を計測したエリアの気温である。通信量は、交通量を計測したエリアを含む地域のネットワークで通信が行われた通信量である。電力量は、交通量を計測したエリアを含む地域で使用された電力量である。また、特徴量のデータには、計測されたデータから生成された２次的データも含まれている。図１２には、特徴量のデータとして、２単位時間前の気温、通信量の移動平均値が示されている。２単位時間前の気温は、２つ前に計測された気温である。通信量の移動平均値は、所定時間前までの通信量の平均値である。 Next, description will be made using a specific example. Below, the case where a prediction model of traffic is generated is explained to an example. FIG. 12 is a diagram illustrating an example of data of an objective variable and data of a feature amount. FIG. 12 shows traffic volume data for each measured time as data of the target variable. Further, FIG. 12 shows, as data of the feature amount, data of precipitation, temperature, communication amount, and electric energy for each measured time. Precipitation is the amount of precipitation that falls in the area where traffic was measured. The temperature is the temperature of the area where the traffic volume is measured. The communication amount is the communication amount in which communication is performed in the regional network including the area where the traffic amount is measured. The amount of electric power is the amount of electric power used in the area including the area where the traffic volume is measured. The feature amount data also includes secondary data generated from the measured data. FIG. 12 shows a moving average value of the air temperature and the communication amount two unit time ago as data of the feature amount. The temperature two units before is the temperature measured two times before. The moving average value of the communication amount is an average value of the communication amounts up to a predetermined time before.

算出部４１は、目的変数のデータと、複数の特徴量のデータとを所定の時間帯ごとのウィンドウに分割する。図１３は、目的変数のデータと、複数の特徴量のデータの分割の一例を示す図である。図１３の例では、ウィンドウを曜日ごととしている。算出部４１は、目的変数のデータと、複数の特徴量のデータを曜日ごとのウィンドウＷ１〜Ｗ３に分割する。 The calculation unit 41 divides the data of the objective variable and the data of the plurality of feature amounts into windows for each predetermined time zone. FIG. 13 is a diagram illustrating an example of division of objective variable data and a plurality of feature amount data. In the example of FIG. 13, the window is set for each day of the week. The calculation unit 41 divides data of the objective variable and data of a plurality of feature amounts into windows W1 to W3 for each day of the week.

算出部４１は、ウィンドウごとに、目的変数のデータと、複数の特徴量のデータそれぞれとの相関値を算出する。例えば、算出部４１は、相関値として、積率相関係数を算出する。図１４は、相関の算出の一例を示す図である。 The calculation unit 41 calculates, for each window, correlation values between the data of the objective variable and the data of the plurality of feature amounts. For example, the calculation unit 41 calculates a product moment correlation coefficient as the correlation value. FIG. 14 is a diagram showing an example of correlation calculation.

ここで、積率相関係数の算出の一例を説明する。データＸ＝（ｘ₁，・・・，ｘ_n)に対して、Ｘの平均、分散、標準偏差は、以下の式（２）〜（４）のように表せる。 Here, an example of calculation of a product moment correlation coefficient will be described. For the data X = (x ₁ ,..., X _n ), the mean, variance, and standard deviation of X can be expressed as the following formulas (2) to (4).

Ｘの平均：μ（Ｘ）＝（ｘ₁＋・・・＋ｘ_ｎ）／ｎ（２）
Ｘの分散：σ²（Ｘ）＝｛（ｘ₁−μ（Ｘ））²＋・・・
＋（ｘ_ｎ−μ（Ｘ））²｝／ｎ（３）
Ｘの標準偏差：σ（Ｘ）＝（σ²（Ｘ））^1/2 （４） Average of X: μ (X) = (x ₁ +... + X _n ) / n (2)
Dispersion of X: σ ² (X) = {(x ₁ −μ (X)) ² +...
+ (X _n- μ (X)) ² } / n (3)
Standard deviation of X: σ (X) = (σ ² (X)) ^1/2 (4)

また、データＸ＝（ｘ₁，・・・，ｘ_n)とデータＹ＝（ｙ₁，・・・，ｙ_n)対して、Ｘ、Ｙの共分散は、以下の式（５）のように表せる。 Also, for data X = (x ₁ ,..., X _n ) and data Y = (y ₁ ,..., Y _n ), the covariance of X and Y is given by the following equation (5) It can be expressed as

Ｘ、Ｙの共分散：Ｓ（Ｘ，Ｙ）＝｛（ｘ₁−μ（ｘ））×（ｙ₁−μ（ｙ））＋・・・
＋（ｘ_ｎ−μ（ｘ））×（ｙ_ｎ−μ（ｙ））｝／ｎ（５） Covariance of X and Y: S (X, Y) = {(x ₁ −μ (x)) × (y ₁ −μ (y)) +.
+ (X _n −μ (x)) × (y _n −μ (y))} / n (5)

Ｘ、Ｙの積率相関係数は、Ｒ（Ｘ，Ｙ）＝Ｓ（Ｘ，Ｙ）／（σ（Ｘ）σ（Ｙ）)とする。 The product moment correlation coefficient of X and Y is R (X, Y) = S (X, Y) / (σ (X) σ (Y)).

例えば、ウィンドウＷ１の目的変数Ｘ＝（５，６，９，４）と降水量Ｙ＝（４，２，５，１）とした場合、Ｘ、Ｙの平均、分散、標準偏差は、以下のように算出される。 For example, when the objective variable X = (5,6,9,4) and precipitation Y = (4,2,5,1) in the window W1, the average, variance, and standard deviation of X and Y are as follows: It is calculated as follows.

μ（Ｘ）＝（５＋６＋９＋４）／４＝２４／４＝６
μ（Ｙ）＝（４＋２＋５＋１）／４＝２／４＝３
σ²（Ｘ）＝（（５−６)²＋（６−６）²＋（９−６）²＋（４−６）²）／４
＝（１＋０＋９＋４）／４＝１４／４＝３．５
σ²（Ｙ）＝（（４-３）²＋（２-３）²＋（５-３）²＋（１-３）²）／４
＝（１＋１＋４＋４）／４＝１０／４＝２．５
σ（Ｘ）＝（３．５）^1/2≒１．８７
σ（Ｙ）＝（２．５）^1/2≒ １．５８ μ (X) = (5 + 6 + 9 + 4) / 4 = 24/4 = 6
μ (Y) = (4 + 2 + 5 + 1) / 4 = 2/4 = 3
σ ² (X) = ((5-6) ² + (6-6) ² + (9-6) ² + (4-6) ² ) / 4
= (1 + 0 + 9 + 4) /4=14/4=3.5
σ ² (Y) = ((4-3) ² + (2-3) ² + (5-3) ² + (1-3) ² ) / 4
= (1 + 1 + 4 + 4) /4=10/4=2.5
σ (X) = (3.5) ^1/2 1.81.87
σ (Y) = (2.5) ^{1/2 1.} 1.58

よって、Ｘ、Ｙの共分散Ｓ（Ｘ，Ｙ）、積率相関係数Ｒ（Ｘ，Ｙ）は、以下のように算出される。 Therefore, the covariance S (X, Y) of X and Y and the product moment correlation coefficient R (X, Y) are calculated as follows.

Ｓ（Ｘ，Ｙ）＝（（５−６）×（４−３）＋（６−６）×（２−３）＋
（９−６）×（５−３）＋（４−６）×（１−３）)／４
＝（−１＋０＋６＋４）／４＝９／４＝２．２５
Ｒ（Ｘ，Ｙ）≒ ２．２５／（１．８７×１．５８）≒２．２５／２．９５≒０．７６ S (X, Y) = ((5-6) x (4-3) + (6-6) x (2-3) +
(9-6) × (5-3) + (4-6) × (1-3)) / 4
= (-1 + 0 + 6 + 4) /4=9/4=2.25
R (X, Y) ≒ 2.25 / (1.87 × 1.58) ≒ 2.25 / 2.95 ≒ 0.76

相関値を積率相関係数の絶対値とした場合、相関値は、以下のように算出される。 When the correlation value is the absolute value of the product moment correlation coefficient, the correlation value is calculated as follows.

相関値：｜Ｒ（Ｘ，Ｙ）｜≒｜０．７６｜＝０．７６ Correlation value: | R (X, Y) | ≒ | 0.76 | = 0.76

分類部４２は、ウィンドウごとに算出した相関の変化が類似する特徴量を同じ分類に分類して、特徴量を複数のクラスタに分類する。例えば、分類部４２は、特徴量ごとに、他の特徴量と各期間で相関値の誤差を求める。そして、分類部４２は、誤差が近い特徴量を同じ分類に分類する。図１５は、複数の特徴量のデータを分類する一例を示す図である。図１５の例では、気温および電力量がクラスタ１に分類され、降水量がクラスタ２に分類され、通信量、２単位時間前の気温および通信量の移動平均値がクラスタ３に分類されている。 The classifying unit 42 classifies feature quantities having similar correlation changes calculated for each window into the same class, and classifies the feature quantities into a plurality of clusters. For example, for each feature quantity, the classification unit 42 obtains an error in correlation values with other feature quantities in each period. Then, the classifying unit 42 classifies feature quantities having similar errors into the same classification. FIG. 15 is a diagram illustrating an example of classifying data of a plurality of feature amounts. In the example of FIG. 15, the temperature and the electric energy are classified into cluster 1, the precipitation is classified into cluster 2, and the moving average value of the air temperature and the communication amount two units before the communication amount is classified into cluster 3. .

抽出部４３は、クラスタごとに、当該クラスタに分類された時系列データに対して算出された期間ごとの相関値を所定の重み付けで重み付け演算してスコアを求める。図１６は、クラスタごとのスコアの一例を示す図である。図１６の例では、相関値が０．７以上の重みを１とし、相関値が０．７未満の重みを０としてウィンドウＷ１〜Ｗ３の相関値を重み付け平均した値をスコアとしている。 For each cluster, the extraction unit 43 calculates a score by weighting the correlation value for each period calculated for the time-series data classified into the cluster with a predetermined weight. FIG. 16 is a diagram illustrating an example of a score for each cluster. In the example of FIG. 16, a weight is obtained by averaging the correlation values of the windows W <b> 1 to W <b> 3 with a weight having a correlation value of 0.7 or more being 1 and a weight having a correlation value of less than 0.7 being 0.

抽出部４３は、クラスタごとに、スコアに基づき、特徴量を抽出する。図１７は、クラスタごとの特徴量の抽出の一例を示す図である。図１７の例では、クラスタごとに、スコアが最大となる特徴量を抽出しており、クラスタ１から電力量が抽出され、クラスタ２から降水量が抽出され、クラスタ３から通信量の移動平均値が抽出される。 The extraction unit 43 extracts a feature amount for each cluster based on the score. FIG. 17 is a diagram illustrating an example of feature amount extraction for each cluster. In the example of FIG. 17, the feature quantity having the maximum score is extracted for each cluster, the electric energy is extracted from the cluster 1, the precipitation is extracted from the cluster 2, and the moving average value of the traffic from the cluster 3 is extracted. Is extracted.

このように、データ出力装置１０は、目的変数との相関に多様性のあるデータを抽出でき、学習に有効なデータを特定できるため、機械学習の予測精度を向上させることができる。 As described above, the data output apparatus 10 can extract data having diversity in correlation with the objective variable, and can specify data effective for learning. Therefore, the prediction accuracy of machine learning can be improved.

［処理の流れ］
実施例１に係るデータ出力装置１０が多様性のあるデータを出力するデータ出力処理の流れについて説明する。図１８は、実施例１に係るデータ出力処理の手順の一例を示すフローチャートである。このデータ出力処理は、所定のタイミング、例えば、入力部２１から処理開始の操作指示を受け付けたタイミングで実行される。 [Flow of processing]
A flow of data output processing in which the data output apparatus 10 according to the first embodiment outputs diverse data will be described. FIG. 18 is a flowchart illustrating an example of the procedure of the data output process according to the first embodiment. The data output process is performed at a predetermined timing, for example, at a timing when an operation instruction to start the process is received from the input unit 21.

図１８に示すように、算出部４１は、目的変数データ３０に記憶された目的変数のデータと、特徴量データ３１に記憶された複数の特徴量のデータとをウィンドウに分割する（Ｓ１０）。算出部４１は、ウィンドウごとに、目的変数の値と、複数の特徴量との相関値を算出する（Ｓ１１）。 As illustrated in FIG. 18, the calculation unit 41 divides the objective variable data stored in the objective variable data 30 and the plurality of feature amount data stored in the feature amount data 31 into windows (S10). The calculator 41 calculates, for each window, a correlation value between the value of the objective variable and a plurality of feature quantities (S11).

分類部４２は、ウィンドウごとに算出した相関の変化が類似する特徴量を同じ分類に分類して、特徴量を複数のクラスタに分類する（Ｓ１２）。抽出部４３は、各特徴量のウィンドウごとの相関値をそれぞれ所定の重み付けで重み付けし、クラスタごとに、重み付けされた相関値からスコアを求める（Ｓ１３）。抽出部４３は、クラスタごとに、スコアに基づき、特徴量を抽出する（Ｓ１４）。出力部４４は、複数のクラスタのそれぞれから抽出した特徴量の種別を示す情報を出力し（Ｓ１５）、処理を終了する。 The classification unit 42 classifies the feature quantities having similar changes in correlation calculated for each window into the same class, and classifies the feature quantities into a plurality of clusters (S12). The extraction unit 43 weights the correlation value of each feature amount for each window with a predetermined weight, and obtains a score from the weighted correlation value for each cluster (S13). The extraction unit 43 extracts the feature amount based on the score for each cluster (S14). The output unit 44 outputs information indicating the type of feature amount extracted from each of the plurality of clusters (S15), and ends the process.

［効果］
上述してきたように、本実施例に係るデータ出力装置１０は、特定の監視対象の状態を示す値の時系列データと、特定の監視対象とは異なる複数の監視対象の状態を示す値の時系列データの其々との相関値を算出する。データ出力装置１０は、算出した相関値間の相関に基づき、複数の監視対象の状態を示す時系列データを複数のクラスタに分類する。データ出力装置１０は、複数のクラスタのそれぞれから何れかの時系列データを抽出する。データ出力装置１０は、複数のクラスタのそれぞれから抽出した各時系列データの種別を示す情報を出力する。これにより、データ出力装置１０は、学習に有効なデータを特定できる。 [effect]
As described above, the data output device 10 according to the present embodiment has a time series data value indicating a specific monitoring target state and a value indicating a plurality of monitoring target states different from the specific monitoring target. Calculate the correlation value of each series data. The data output device 10 classifies time series data indicating the states of a plurality of monitoring targets into a plurality of clusters based on the correlation between the calculated correlation values. The data output device 10 extracts any time series data from each of the plurality of clusters. The data output device 10 outputs information indicating the type of each time-series data extracted from each of the plurality of clusters. Thereby, the data output device 10 can specify data effective for learning.

また、本実施例に係るデータ出力装置１０は、特定の監視対象の状態を示す値の時系列データと、複数の監視対象の状態を示す値の時系列データのそれぞれとを複数の期間に分割する。データ出力装置１０は、期間ごとに、特定の監視対象の状態を示す値の時系列データと、複数の監視対象の状態を示す値の時系列データのそれぞれとの相関値を算出する。データ出力装置１０は、期間ごとに算出した相関値の複数の期間における分布に基づき、複数の監視対象の状態を示す値の時系列データを複数のクラスタに分類する。これにより、データ出力装置１０は、期間ごとのデータが類似する時系列データを同じクラスタに分類できる。 Further, the data output apparatus 10 according to the present embodiment divides time-series data of values indicating a specific monitoring target state and time-series data of values indicating a plurality of monitoring target states into a plurality of periods. Do. The data output device 10 calculates, for each period, a correlation value between time-series data of values indicating a specific monitoring target state and time-series data of values indicating a plurality of monitoring target states. The data output device 10 classifies time-series data of values indicating a plurality of monitoring target states into a plurality of clusters based on the distribution of correlation values calculated for each period in a plurality of periods. Thereby, the data output device 10 can classify time-series data having similar data for each period into the same cluster.

また、本実施例に係るデータ出力装置１０は、クラスタごとに、当該クラスタに分類された時系列データに対して算出された期間ごとの相関値を所定の重み付けで重み付け演算したスコアに基づき、当該クラスタに分類された時系列データから代表の時系列データを抽出する。これにより、データ出力装置１０は、各クラスタから同様の特性を有する代表の時系列データを抽出できる。 Further, the data output device 10 according to the present embodiment, for each cluster, based on a score obtained by weighting the correlation value for each period calculated for the time-series data classified into the cluster with a predetermined weight. Representative time-series data is extracted from time-series data classified into clusters. Thereby, the data output device 10 can extract representative time-series data having similar characteristics from each cluster.

次に、実施例２について説明する。図１９は、実施例２に係るデータ出力装置の機能的な構成の一例を示す図である。なお、図１に示した実施例１に係るデータ出力装置１０と同様の部分については、同様の符号を付して、主に異なる部分について説明する。 Next, Example 2 will be described. FIG. 19 is a diagram illustrating an example of a functional configuration of the data output apparatus according to the second embodiment. In addition, about the part similar to the data output device 10 which concerns on Example 1 shown in FIG. 1, the same code | symbol is attached | subjected and a different part is mainly demonstrated.

実施例２に係るデータ出力装置１０は、特徴量をクラスタに分類することなく、学習に有効なデータを出力する。 The data output apparatus 10 according to the second embodiment outputs data effective for learning without classifying the feature quantities into clusters.

抽出部４３Ａは、算出部４１により算出した相関値間の相関の非類似性に基づき、時系列データを抽出する。例えば、抽出部４３Ａは、相関値間の相関に基づき、複数の監視対象の状態を示す値の時系列データから、何れかの時系列データを抽出し、抽出した時系列データと相関値が類似する時系列データを抽出の対象から除くことを繰り返して時系列データを抽出する。例えば、抽出部４３Ａは、何れかの特徴量を抽出し、抽出した特徴量と相関値が類似する特徴量を抽出の対象から除くことを繰り返して、目的変数との相関に多様性のあるデータを抽出する。 The extraction unit 43A extracts time-series data based on the non-similarity of the correlation between the correlation values calculated by the calculation unit 41. For example, the extraction unit 43A extracts any time-series data from the time-series data indicating values of a plurality of monitoring targets based on the correlation between the correlation values, and the correlation value is similar to the extracted time-series data. The time series data is extracted by repeating excluding the time series data from the target of extraction. For example, the extraction unit 43A extracts one of the feature amounts, repeatedly removes the feature amount having a correlation value similar to the extracted feature amount from the extraction target, and has a variety of data in correlation with the objective variable. Extract

出力部４４は、抽出部４３Ａにより抽出した時系列データの種別を示す情報を出力する。例えば、出力部４４は、抽出部４３Ａにより抽出された特徴量の種別を示す情報を出力する。 The output unit 44 outputs information indicating the type of time-series data extracted by the extraction unit 43A. For example, the output unit 44 outputs information indicating the type of the feature amount extracted by the extraction unit 43A.

具体例を用いて説明する。図２０は、目的変数に対する各特徴量の相関値の一例を示す図である。図２０には、降水量、気温、通信量、電力量、２単位時間前の気温、通信量の移動平均値のウィンドウＷ１〜Ｗ３ごとの相関値が示されている。抽出部４３Ａは、特徴量ごとに、ウィンドウごとの相関値を所定の重み付けで重み付け演算してスコアを求める。図２０の例では、相関値が０．７以上の重みを１とし、相関値が０．７未満の重みを０としてウィンドウＷ１〜Ｗ３の相関値を重み付け平均した値をスコアとしている。 This will be described using a specific example. FIG. 20 is a diagram illustrating an example of correlation values of feature amounts with respect to target variables. FIG. 20 shows the correlation values for the windows W1 to W3 of the precipitation, the temperature, the communication amount, the power amount, the temperature two unit time ago, and the moving average value of the communication amount. The extraction unit 43A calculates a score by weighting the correlation value of each window with predetermined weighting for each feature amount. In the example of FIG. 20, a weight having a correlation value of 0.7 or more is set to 1, a weight having a correlation value of less than 0.7 is set to 0, and a value obtained by weighted averaging the correlation values of the windows W1 to W3 is used as a score.

抽出部４３Ａは、何れかの特徴量を抽出する。例えば、抽出部４３Ａは、スコアが最も高い特徴量を抽出する。図２１は、スコアが最も高い特徴量を抽出した一例を示す図である。図２１の例では、降水量が抽出されている。 The extraction unit 43A extracts any feature amount. For example, the extraction unit 43A extracts the feature amount having the highest score. FIG. 21 is a diagram illustrating an example in which the feature amount having the highest score is extracted. In the example of FIG. 21, precipitation is extracted.

抽出部４３Ａは、抽出した特徴量と相関値が類似する特徴量を抽出の対象から除く。図２２は、類似する特徴量を抽出の対象から除外した一例を示す図である。図２２の例では、降水量と各期間の相関値のユークリッド距離が０．５以下の特徴量を除外している。 The extraction unit 43A excludes feature quantities having a correlation value similar to the extracted feature quantities from the extraction target. FIG. 22 is a diagram illustrating an example in which similar feature quantities are excluded from extraction targets. In the example of FIG. 22, feature quantities in which the Euclidean distance between the precipitation amount and the correlation value of each period is 0.5 or less are excluded.

抽出部４３Ａは、残った特徴量からスコアが最も高い特徴量を抽出する。図２３は、残った特徴量からスコアが最も高い特徴量を抽出した一例を示す図である。図２３の例では、気温が抽出されている。 The extraction unit 43A extracts the feature amount having the highest score from the remaining feature amounts. FIG. 23 is a diagram illustrating an example in which the feature amount having the highest score is extracted from the remaining feature amounts. In the example of FIG. 23, the air temperature is extracted.

抽出部４３Ａは、抽出した特徴量と相関値が類似する特徴量を抽出の対象から除く。図２４は、類似する特徴量を抽出の対象から除外した一例を示す図である。図２４の例では、気温と各期間の相関値のユークリッド距離が０．５以下の特徴量が無いため、何れも除外されない。 The extraction unit 43A excludes feature quantities having a correlation value similar to the extracted feature quantities from the extraction target. FIG. 24 is a diagram illustrating an example in which similar feature quantities are excluded from extraction targets. In the example of FIG. 24, since there is no feature quantity with the Euclidean distance of the correlation value between the temperature and each period being 0.5 or less, none is excluded.

抽出部４３Ａは、残った特徴量からスコアが最も高い特徴量を抽出する。図２５は、残った特徴量からスコアが最も高い特徴量を抽出した一例を示す図である。図２５の例では、残った２単位時間前の気温が抽出されている。 The extraction unit 43A extracts the feature amount having the highest score from the remaining feature amounts. FIG. 25 is a diagram illustrating an example in which the feature amount having the highest score is extracted from the remaining feature amounts. In the example of FIG. 25, the temperature of 2 units time ago which remained is extracted.

出力部４４は、抽出部４３Ａにより抽出された降水量、気温、２単位時間前の気温を出力する。 The output unit 44 outputs the amount of precipitation extracted by the extraction unit 43A, the temperature, and the temperature two units of time before.

［処理の流れ］
実施例２に係るデータ出力装置１０が多様性のあるデータを出力するデータ出力処理の流れについて説明する。図２６は、実施例２に係るデータ出力処理の手順の一例を示すフローチャートである。なお、図１８に示した実施例１に係るデータ出力処理と同一の部分については、同一の符号を付してその説明を省略する。 [Flow of processing]
A flow of data output processing in which the data output device 10 according to the second embodiment outputs various data will be described. FIG. 26 is a flowchart illustrating an exemplary procedure of a data output process according to the second embodiment. The same parts as those of the data output process according to the first embodiment shown in FIG. 18 will be assigned the same reference numerals and descriptions thereof will be omitted.

図２６に示すように、抽出部４３Ａは、何れかの特徴量を抽出する（Ｓ２０）。例えば、抽出部４３Ａは、スコアが最も高い特徴量を抽出する。抽出部４３Ａは、抽出した特徴量と相関値が類似する特徴量を抽出の対象から除外する（Ｓ２１）。抽出部４３Ａは、抽出の対象となる特徴量が存在するか判定する（Ｓ２２）。抽出の対象となる特徴量が存在する場合（Ｓ２２肯定）、上述のＳ２０へ移行し、抽出の対象となる特徴量から何れかの特徴量を抽出する。 As illustrated in FIG. 26, the extraction unit 43A extracts any feature amount (S20). For example, the extraction unit 43A extracts the feature amount having the highest score. The extraction unit 43A excludes feature quantities whose correlation values are similar to the extracted feature quantities from the extraction targets (S21). The extraction unit 43A determines whether there is a feature amount to be extracted (S22). If there is a feature to be extracted (Yes at S22), the process proceeds to S20 described above, and any feature is extracted from the features to be extracted.

一方、抽出の対象となる特徴量が存在しない場合（Ｓ２２否定）、出力部４４は、抽出部４３Ａにより抽出された特徴量の種別を示す情報を出力し（Ｓ２３）、処理を終了する。 On the other hand, when there is no feature quantity to be extracted (No at S22), the output unit 44 outputs information indicating the type of feature quantity extracted by the extraction unit 43A (S23), and ends the process.

［効果］
上述してきたように、本実施例に係るデータ出力装置１０は、特定の監視対象の状態を示す値の時系列データと、特定の監視対象とは異なる複数の監視対象の状態を示す値の時系列データの其々との相関値を算出する。データ出力装置１０は、相関値間の相関の非類似性に基づき、時系列データを抽出する。データ出力装置１０は、抽出した時系列データの種別を示す情報を出力する。これにより、データ出力装置１０は、学習に有効なデータを特定できる。 [effect]
As described above, the data output device 10 according to the present embodiment has a time series data value indicating a specific monitoring target state and a value indicating a plurality of monitoring target states different from the specific monitoring target. Calculate the correlation value of each series data. The data output device 10 extracts time series data based on the dissimilarity of the correlation between the correlation values. The data output device 10 outputs information indicating the type of extracted time-series data. Thereby, the data output device 10 can specify data effective for learning.

また、本実施例に係るデータ出力装置１０は、算出した相関値間の相関に基づき、前記複数の監視対象の状態を示す値の時系列データから、何れかの時系列データを抽出し、抽出した時系列データと相関値が類似する時系列データを抽出の対象から除くことを繰り返して時系列データを抽出する。これにより、データ出力装置１０は、特定の監視対象の状態との相関に多様性のある時系列データを抽出できる。 Further, the data output device 10 according to the present embodiment extracts any time series data from the time series data of values indicating the states of the plurality of monitoring targets based on the correlation between the calculated correlation values, and extracts the time series data. The time-series data is extracted by repeatedly removing time-series data having a correlation value similar to the extracted time-series data from the extraction target. As a result, the data output device 10 can extract time-series data having a variety of correlations with the specific monitoring target state.

さて、これまで開示の装置に関する実施例について説明したが、開示の技術は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。 Although the embodiments of the disclosed apparatus have been described above, the disclosed technology may be implemented in various different forms other than the above-described embodiments.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的状態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、受付部４０、算出部４１、分類部４２、抽出部４３（抽出部４３Ａ）、出力部４４および予測部４５の各処理部が適宜統合されてもよい。また、各処理部の処理が適宜複数の処理部の処理に分離されてもよい。さらに、各処理部にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific state of distribution / integration of each device is not limited to the one shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. It can be integrated and configured. For example, the processing units of the reception unit 40, the calculation unit 41, the classification unit 42, the extraction unit 43 (extraction unit 43A), the output unit 44, and the prediction unit 45 may be integrated as appropriate. Further, the processing of each processing unit may be appropriately separated into a plurality of processing units. Further, all or any part of each processing function performed in each processing unit can be realized by a CPU and a program analyzed and executed by the CPU, or can be realized as hardware by wired logic. .

［データ出力プログラム］
また、上記の実施例で説明した各種の処理は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータシステムで実行することによって実現することもできる。そこで、以下では、上記の実施例と同様の機能を有するプログラムを実行するコンピュータシステムの一例を説明する。図２７は、データ出力プログラムを実行するコンピュータを示す図である。 [Data output program]
The various processes described in the above embodiments can also be realized by executing a prepared program on a computer system such as a personal computer or a workstation. So, below, an example of a computer system which runs a program which has the same function as the above-mentioned example is explained. FIG. 27 shows a computer that executes a data output program.

図２７に示すように、コンピュータ３００は、ＣＰＵ（Central Processing Unit）３１０、ＨＤＤ（Hard Disk Drive）３２０、ＲＡＭ（Random Access Memory）３４０を有する。これら３００〜３４０の各部は、バス４００を介して接続される。 As shown in FIG. 27, the computer 300 includes a CPU (Central Processing Unit) 310, a HDD (Hard Disk Drive) 320, and a RAM (Random Access Memory) 340. These units 300 to 340 are connected via a bus 400.

ＨＤＤ３２０には上記の受付部４０、算出部４１、分類部４２、抽出部４３（抽出部４３Ａ）、出力部４４および予測部４５と同様の機能を発揮するデータ出力プログラム３２０ａが予め記憶される。なお、データ出力プログラム３２０ａについては、適宜分離しても良い。 The HDD 320 stores in advance a data output program 320a that performs the same functions as the reception unit 40, the calculation unit 41, the classification unit 42, the extraction unit 43 (extraction unit 43A), the output unit 44, and the prediction unit 45. The data output program 320a may be separated as appropriate.

また、ＨＤＤ３２０は、各種情報を記憶する。例えば、ＨＤＤ３２０は、ＯＳや分析に用いる各種データを記憶する。 The HDD 320 also stores various information. For example, the HDD 320 stores various data used for the OS and analysis.

そして、ＣＰＵ３１０が、データ出力プログラム３２０ａをＨＤＤ３２０から読み出して実行することで、実施例の各処理部と同様の動作を実行する。すなわち、データ出力プログラム３２０ａは、受付部４０、算出部４１、分類部４２、抽出部４３（抽出部４３Ａ）、出力部４４および予測部４５と同様の動作を実行する。 Then, the CPU 310 reads out and executes the data output program 320a from the HDD 320, thereby executing the same operation as each processing unit of the embodiment. That is, the data output program 320a performs the same operations as the reception unit 40, the calculation unit 41, the classification unit 42, the extraction unit 43 (extraction unit 43A), the output unit 44, and the prediction unit 45.

なお、上記したデータ出力プログラム３２０ａは、必ずしも最初からＨＤＤ３２０に記憶させることを要しない。 Note that the data output program 320a described above does not necessarily have to be stored in the HDD 320 from the beginning.

例えば、コンピュータ３００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」にプログラムを記憶させておく。そして、コンピュータ３００がこれらからプログラムを読み出して実行するようにしてもよい。 For example, the program is stored in a “portable physical medium” such as a flexible disk (FD), a CD-ROM, a DVD disk, a magneto-optical disk, or an IC card inserted into the computer 300. Then, the computer 300 may read and execute programs from these.

さらには、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ３００に接続される「他のコンピュータ（またはサーバ）」などにプログラムを記憶させておく。そして、コンピュータ３００がこれらからプログラムを読み出して実行するようにしてもよい。 Furthermore, the program is stored in “another computer (or server)” connected to the computer 300 via a public line, the Internet, a LAN, a WAN or the like. Then, the computer 300 may read and execute programs from these.

１０データ出力装置
２３記憶部
２４制御部
３０目的変数データ
３１特徴量データ
４０受付部
４１算出部
４２分類部
４３、４３Ａ抽出部
４４出力部
４５予測部 10 Data Output Device 23 Storage Unit 24 Control Unit 30 Objective Variable Data 31 Feature Quantity Data 40 Reception Unit 41 Calculation Unit 42 Classification Unit 43, 43A Extraction Unit 44 Output Unit 45 Prediction Unit

Claims

Each of time-series data of a value indicating a state of a specific monitoring target and time-series data of a value indicating a state of a plurality of monitoring targets different from the specific monitoring target is divided into a plurality of periods;
For each of the periods , correlation values between time series data of values indicating the status of the specific monitoring target and time series data of values indicating the status of the plurality of monitoring targets are calculated;
The time series data indicating the states of the plurality of monitoring targets are classified into a plurality of clusters based on the distribution of the correlation value calculated for each period in the plurality of periods ,
For each of the time-series data indicating the state of the plurality of monitoring targets, a score is obtained by performing a weighting operation with a predetermined weight on the correlation value calculated for the time-series data, and for each cluster, Extract time-series data with the maximum score from the time-series data classified into the cluster ,
Outputting information indicating the type of each time-series data extracted from each of the plurality of clusters;
And a computer executes the processing .

Each of time-series data of a value indicating a state of a specific monitoring target and time-series data of a value indicating a state of a plurality of monitoring targets different from the specific monitoring target is divided into a plurality of periods;
For each of the periods , correlation values between time series data of values indicating the status of the specific monitoring target and time series data of values indicating the status of the plurality of monitoring targets are calculated;
The time series data indicating the states of the plurality of monitoring targets are classified into a plurality of clusters based on the distribution of the correlation value calculated for each period in the plurality of periods ,
For each of the time-series data indicating the state of the plurality of monitoring targets, a score is obtained by performing a weighting operation with a predetermined weight on the correlation value calculated for the time-series data, and for each cluster, Extract time-series data with the maximum score from the time-series data classified into the cluster ,
Outputting information indicating the type of each time-series data extracted from each of the plurality of clusters;
A data output program that causes a computer to execute a process.

Each of time-series data of a value indicating a state of a specific monitoring target and time-series data of values indicating a state of a plurality of monitoring targets different from the specific monitoring target is divided into a plurality of periods, and each period A calculation unit that calculates correlation values between time-series data of values indicating the status of the specific monitoring target and time-series data of values indicating the status of the plurality of monitoring targets ;
A classification unit that classifies time series data indicating the states of the plurality of monitoring targets into a plurality of clusters based on the distribution of the correlation value calculated for each period by the calculation unit in the plurality of periods ;
For each of the time-series data indicating the state of the plurality of monitoring targets, a score is obtained by performing a weighting operation with a predetermined weight on the correlation value calculated for the time-series data, and for each cluster, An extraction unit for extracting time-series data having the largest score from time-series data classified into the cluster ;
An output unit that outputs information indicating a type of each time-series data extracted from each of the plurality of clusters by the extraction unit;
A data output device comprising: