JP6324124B2

JP6324124B2 - Prediction system, prediction method, and prediction program

Info

Publication number: JP6324124B2
Application number: JP2014047431A
Authority: JP
Inventors: 峰野　博史; 博史峰野; 雄也鈴木
Original assignee: Shizuoka University NUC
Current assignee: Shizuoka University NUC
Priority date: 2014-03-11
Filing date: 2014-03-11
Publication date: 2018-05-16
Anticipated expiration: 2034-03-11
Also published as: JP2015172790A

Description

本発明の一側面は、機械学習を用いる予測システム、予測方法、および予測プログラムに関する。 One aspect of the present invention relates to a prediction system, a prediction method, and a prediction program using machine learning.

従来から、サポートベクターマシンなどの機械学習を用いた将来予測が知られている。例えば下記特許文献１には、指数平滑法およびカーネルを用いた季節的時系列データの動的モデリングを使用して非線形的な時系列を予測する方法が記載されている。カーネルは、ガウスカーネルを使用した最小二乗放射基底関数回帰やサポートベクトル回帰のような非線形関数を使用して、過去の値から時系列の未来の値を予測する。 Conventionally, future prediction using machine learning such as a support vector machine is known. For example, Patent Document 1 below describes a method for predicting a nonlinear time series using exponential smoothing and dynamic modeling of seasonal time series data using a kernel. The kernel predicts future values in the time series from past values using non-linear functions such as least square radial basis function regression and support vector regression using a Gaussian kernel.

特開２０１１−１５９２８２号公報JP 2011-159282 A

個々の事案（データ系列）毎に学習期間（学習量）に最適値があること、すなわち、機械学習の精度が最も良くなる学習期間が存在することは知られている。しかし、その学習期間は手作業で（例えば、経験的に、または試行錯誤により）決めなければいけないので、ある事案についての最適な学習期間を決めるのに時間が掛かってしまう。そこで、機械学習による予測の精度を一定のレベル以上に保ちながら学習期間を簡便に特定することが望まれている。 It is known that there is an optimum value in the learning period (learning amount) for each case (data series), that is, there is a learning period in which the accuracy of machine learning is the best. However, since the learning period must be determined manually (for example, empirically or by trial and error), it takes time to determine the optimal learning period for a given case. Therefore, it is desired to easily specify the learning period while keeping the accuracy of prediction by machine learning at a certain level or higher.

本発明の一側面に係る予測システムは、時系列のトレーニングデータから、それぞれが互いに異なる複数の部分集合データを生成する部分集合生成部と、複数の部分集合データのそれぞれに対して機械学習を実行することで、該複数の部分集合データに対応する複数のパターン関数を生成する関数生成部と、複数のパターン関数のそれぞれを用いて、トレーニングデータ内の評価時点における予測値を求め、該評価時点での実測値と該予測値との誤差が最小であるパターン関数に対応する学習期間を選択する選択部とを備える。 A prediction system according to an aspect of the present invention performs a machine learning on each of a plurality of subset data, and a subset generation unit that generates a plurality of subset data different from each other from time-series training data By using each of the function generation unit that generates a plurality of pattern functions corresponding to the plurality of subset data and each of the plurality of pattern functions, a predicted value at the evaluation time point in the training data is obtained, and the evaluation time point And a selection unit that selects a learning period corresponding to a pattern function having a minimum error between the actually measured value and the predicted value.

本発明の一側面に係る予測方法は、プロセッサを備える予測システムにより実行される予測方法であって、時系列のトレーニングデータから、それぞれが互いに異なる複数の部分集合データを生成する部分集合生成ステップと、複数の部分集合データのそれぞれに対して機械学習を実行することで、該複数の部分集合データに対応する複数のパターン関数を生成する関数生成ステップと、複数のパターン関数のそれぞれを用いて、トレーニングデータ内の評価時点における予測値を求め、該評価時点での実測値と予測値との誤差が最小であるパターン関数に対応する学習期間を選択する選択ステップとを含む。 A prediction method according to an aspect of the present invention is a prediction method executed by a prediction system including a processor, and a subset generation step of generating a plurality of subset data different from each other from time-series training data; Using a function generation step for generating a plurality of pattern functions corresponding to the plurality of subset data by performing machine learning on each of the plurality of subset data, and using each of the plurality of pattern functions, A selection step of obtaining a predicted value at an evaluation time point in the training data and selecting a learning period corresponding to a pattern function having a minimum error between the actually measured value and the predicted value at the evaluation time point.

本発明の一側面に係る予測プログラムは、時系列のトレーニングデータから、それぞれが互いに異なる複数の部分集合データを生成する部分集合生成部と、複数の部分集合データのそれぞれに対して機械学習を実行することで、該複数の部分集合データに対応する複数のパターン関数を生成する関数生成部と、複数のパターン関数のそれぞれを用いて、トレーニングデータ内の評価時点における予測値を求め、該評価時点での実測値と予測値との誤差が最小であるパターン関数に対応する学習期間を選択する選択部としてコンピュータを機能させる。 A prediction program according to an aspect of the present invention executes a machine learning for each of a plurality of subset data and a subset generation unit that generates a plurality of subset data different from each other from time-series training data By using each of the function generation unit that generates a plurality of pattern functions corresponding to the plurality of subset data and each of the plurality of pattern functions, a predicted value at the evaluation time point in the training data is obtained, and the evaluation time point The computer is caused to function as a selection unit that selects a learning period corresponding to a pattern function having the smallest error between the actually measured value and the predicted value.

このような側面においては、互いに異なる部分集合データを自動的に生成して複数のパターン関数も自動的に生成し、トレーニングデータを用いてそれらのパターン関数の精度を検証することで、最適であると期待できる学習期間を簡便に特定することができる。 In such an aspect, it is optimal by automatically generating different subset data and automatically generating a plurality of pattern functions, and verifying the accuracy of these pattern functions using training data. The learning period that can be expected is easily specified.

本発明の一側面によれば、機械学習による予測の精度を一定のレベル以上に保ちながら学習期間を簡便に特定することができる。 According to one aspect of the present invention, the learning period can be easily specified while maintaining the accuracy of prediction by machine learning at a certain level or higher.

実施形態に係る予測システムを構成するコンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the computer which comprises the prediction system which concerns on embodiment. 実施形態に係る予測システムの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the prediction system which concerns on embodiment. 部分集合データの概念を説明する図である。It is a figure explaining the concept of subset data. 実施形態に係る予測システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the prediction system which concerns on embodiment. 実施形態に係る予測システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the prediction system which concerns on embodiment. 実施形態に係る予測プログラムの構成を示す図である。It is a figure which shows the structure of the prediction program which concerns on embodiment.

以下、添付図面を参照しながら本発明の実施形態を詳細に説明する。なお、図面の説明において同一又は同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements are denoted by the same reference numerals, and redundant description is omitted.

まず、図１〜３を用いて、実施形態に係る予測システム１０の機能および構成を説明する。図２に示す予測システム１０は、機械学習により未知の値を予測するコンピュータシステムである。 First, the function and configuration of the prediction system 10 according to the embodiment will be described with reference to FIGS. A prediction system 10 shown in FIG. 2 is a computer system that predicts an unknown value by machine learning.

機械学習とは、既知の値の集合であるトレーニングデータを学習することでパターン関数を生成し、そのパターン関数を用いて未知の値を予測する処理である。本実施形態では、過去の時系列データであるトレーニングデータ（本明細書ではこれを「時系列のトレーニングデータ」と呼ぶ）を用い、そのトレーニングデータから得られたパターン関数を用いて将来の時点における値を予測する。なお、時系列データとは、ある現象の時間的な変化を連続的に、または一定間隔をおいて不連続に測定して得られた値の系列である。機械学習の例として、人工ニューラルネットワーク（ＡＮＮ）、サポートベクターマシン（ＳＶＭ）、あるいはそのＳＶＭを回帰に対応させたサポートベクター回帰（ＳＶＲ）、決定木学習，相関ルール学習，ベイジアンネットワークなどが挙げられるが、予測システム１０はこれ以外のアルゴリズムを用いてもよい。 Machine learning is a process of generating a pattern function by learning training data, which is a set of known values, and predicting an unknown value using the pattern function. In the present embodiment, training data that is past time-series data (this is referred to as “time-series training data” in the present specification), and a pattern function obtained from the training data is used at a future time point. Predict the value. The time series data is a series of values obtained by measuring a temporal change of a phenomenon continuously or discontinuously at regular intervals. Examples of machine learning include artificial neural networks (ANN), support vector machines (SVM), or support vector regression (SVR) corresponding to the SVM for regression, decision tree learning, correlation rule learning, Bayesian network, and the like. However, the prediction system 10 may use other algorithms.

予測システム１０が予測する対象は限定されない。例えば、予測システム１０は気温や湿度などの気象（あるいは微気象）を予測してもよいし、他の自然現象または社会現象を予測してもよい。 The object which the prediction system 10 predicts is not limited. For example, the prediction system 10 may predict weather (or microclimate) such as temperature and humidity, or may predict other natural or social phenomena.

予測システム１０は１台以上のコンピュータを備え、複数台のコンピュータを備える場合には、後述する予測システム１０の各機能要素は分散処理により実現される。個々のコンピュータの種類は限定されない。例えば、据置型または携帯型のパーソナルコンピュータ（ＰＣ）を用いてもよいし、ワークステーションを用いてもよいし、高機能携帯電話機（スマートフォン）や携帯電話機、携帯情報端末（ＰＤＡ）などの携帯端末を用いてもよい。あるいは、様々な種類のコンピュータを組み合わせて予測システム１０を構築してもよい。複数台のコンピュータを用いる場合には、これらのコンピュータはインターネットやイントラネットなどの通信ネットワークを介して接続される。 When the prediction system 10 includes one or more computers and includes a plurality of computers, each functional element of the prediction system 10 described later is realized by distributed processing. The type of individual computer is not limited. For example, a stationary or portable personal computer (PC) may be used, a workstation may be used, or a portable terminal such as a high-functional portable telephone (smart phone), a portable telephone, or a personal digital assistant (PDA). May be used. Alternatively, the prediction system 10 may be constructed by combining various types of computers. When a plurality of computers are used, these computers are connected via a communication network such as the Internet or an intranet.

予測システム１０内の個々のコンピュータ１００の一般的なハードウェア構成を図１に示す。コンピュータ１００は、オペレーティングシステムやアプリケーション・プログラムなどを実行するＣＰＵ（プロセッサ）１０１と、ＲＯＭ及びＲＡＭで構成される主記憶部１０２と、ハードディスクやフラッシュメモリなどで構成される補助記憶部１０３と、ネットワークカードあるいは無線通信モジュールで構成される通信制御部１０４と、キーボードやマウスなどの入力装置１０５と、ディスプレイやプリンタなどの出力装置１０６とを備える。当然ながら、搭載されるハードウェアモジュールはコンピュータ１００の種類により異なる。例えば、据置型のＰＣおよびワークステーションは入力装置および出力装置としてキーボード、マウス、およびモニタを備えることが多いが、スマートフォンではタッチパネルが入力装置および出力装置として機能することが多い。 A general hardware configuration of each computer 100 in the prediction system 10 is shown in FIG. A computer 100 includes a CPU (processor) 101 that executes an operating system, application programs, and the like, a main storage unit 102 that includes a ROM and a RAM, an auxiliary storage unit 103 that includes a hard disk and a flash memory, and a network. The communication control unit 104 includes a card or a wireless communication module, an input device 105 such as a keyboard and a mouse, and an output device 106 such as a display and a printer. Of course, the hardware modules to be mounted differ depending on the type of the computer 100. For example, stationary PCs and workstations often include a keyboard, a mouse, and a monitor as input devices and output devices, but in a smartphone, a touch panel often functions as an input device and an output device.

後述する予測システム１０の各機能要素は、ＣＰＵ１０１または主記憶部１０２の上に所定のソフトウェアを読み込ませ、ＣＰＵ１０１の制御の下で通信制御部１０４や入力装置１０５、出力装置１０６などを動作させ、主記憶部１０２または補助記憶部１０３におけるデータの読み出し及び書き込みを行うことで実現される。処理に必要なデータやデータベースは主記憶部１０２または補助記憶部１０３内に格納される。 Each functional element of the prediction system 10 described later reads predetermined software on the CPU 101 or the main storage unit 102, operates the communication control unit 104, the input device 105, the output device 106, and the like under the control of the CPU 101, This is realized by reading and writing data in the main storage unit 102 or the auxiliary storage unit 103. Data and a database necessary for processing are stored in the main storage unit 102 or the auxiliary storage unit 103.

図２に示すように、予測システム１０は機能的構成要素として受付部１１、部分集合生成部１２、関数生成部１３、選択部１４、予測部１５、および評価部１６を備える。 As shown in FIG. 2, the prediction system 10 includes a reception unit 11, a subset generation unit 12, a function generation unit 13, a selection unit 14, a prediction unit 15, and an evaluation unit 16 as functional components.

受付部１１は、時系列のトレーニングデータを受け付ける機能要素である。受付部１１はデータベース２０にアクセスしてトレーニングデータを読み出し、そのトレーニングデータを部分集合生成部１２に出力する。ここで、データベース２０はトレーニングデータを記憶する装置または機能要素であり、この実装方法は限定されない。例えば、データベース２０は予測システム１０内にあってもよいし、予測システム１０とは別のシステム内に存在してもよい。また、データベース２０は関係データベースでもよいしＣＳＶファイルでもよい。 The reception unit 11 is a functional element that receives time-series training data. The reception unit 11 accesses the database 20 to read training data, and outputs the training data to the subset generation unit 12. Here, the database 20 is a device or a functional element that stores training data, and this implementation method is not limited. For example, the database 20 may exist in the prediction system 10 or may exist in a system different from the prediction system 10. The database 20 may be a relational database or a CSV file.

部分集合生成部１２は、トレーニングデータから複数の部分集合データを生成する機能要素である。この部分集合データは、機械学習における学習期間（すなわち、ウィンドウ）を示す役割も持つ。生成される部分集合データの個数は２以上であれば何個でもよい。ただし、複数の部分集合データは互いに異なる必要があり、これは、複数の学習期間（ウィンドウ）が互いに異なることを意味する。部分集合生成部１２は生成した部分集合データを関数生成部１３に出力する。 The subset generation unit 12 is a functional element that generates a plurality of subset data from the training data. This subset data also has a role of indicating a learning period (that is, a window) in machine learning. Any number of subset data may be generated as long as it is two or more. However, the plurality of subset data need to be different from each other, which means that the plurality of learning periods (windows) are different from each other. The subset generation unit 12 outputs the generated subset data to the function generation unit 13.

図３を参照しながら部分集合生成部１２の処理を説明する。図３に示すトレーニングデータは、１０：００から１４：３０までの間に３０分間隔で測定した１０個の実績値ａ_１〜ａ_１０の集合である。最終的には、このトレーニングデータから、将来の予測値（例えば１５：００時点の予測値）を求めるためのパターン関数（詳細は後述する）およびその関数に対応する学習期間（ウィンドウ）が決定される。図３の例では、部分集合生成部１２はそのトレーニングデータから部分集合データ（学習期間）ｗ_１，ｗ_２，…，ｗ_ｎを生成する。複数の部分集合データを生成する際に、部分集合生成部１２は、どの部分集合データ（学習期間）にも含まれず、かつそれらの学習期間よりも後の時点におけるデータを一つ以上残す。この残したデータは、後述する選択部１４においてパターン関数を評価する際に用いられる。図３の例では、１４：００および１４：３０における実績値ａ_２，ａ_１が残すデータに相当する。 The processing of the subset generation unit 12 will be described with reference to FIG. The training data shown in FIG. 3 is a set of _ten actual values a ₁ to a 10 measured at intervals of 30 minutes between 10:00 and 14:30. Finally, a pattern function (details will be described later) and a learning period (window) corresponding to the function for obtaining a future predicted value (for example, a predicted value at 15:00) are determined from the training data. The In the example of FIG. 3, the subset generator 12 subset data (learning _period) w _1, w 2 from the training data, ..., to produce a _{w n.} When generating a plurality of subset data, the subset generation unit 12 leaves at least one piece of data that is not included in any subset data (learning period) and is later in time than those learning periods. The remaining data is used when the pattern function is evaluated in the selection unit 14 described later. In the example of FIG. 3, the actual values a ₂ and a _{1 at} 14:00 and 14:30 correspond to the data that remains.

個々の部分集合データを生成する際に設定するウィンドウの位置および長さは限定されない。例えば、部分集合生成部１２は、ウィンドウの位置が互いに異なるという条件下で、サイズが同じウィンドウを複数個設定してもよい。例えば、図３のトレーニングデータに対して、部分集合生成部１２は１２：３０〜１３：３０の範囲のウィンドウと、１２：００〜１３：００のウィンドウとを生成してもよい。 The position and length of the window set when generating individual subset data are not limited. For example, the subset generation unit 12 may set a plurality of windows having the same size under the condition that the positions of the windows are different from each other. For example, the subset generation unit 12 may generate a window in the range of 12:30 to 13:30 and a window of 12: 0 to 13:00 for the training data in FIG.

図３では、トレーニングデータのサイズおよびデータ測定期間の双方が非常に限定された例を示したが、一般にトレーニングデータは大量なので、個々の部分集合データも大きくなり得る。例えば、部分集合生成部１２は１年間分のトレーニングデータから季節毎（春、夏、秋、冬）の部分集合データや月毎（１〜１２月）の部分集合データなどを生成し得る。 FIG. 3 shows an example in which both the size of the training data and the data measurement period are very limited. However, since the training data is generally large, individual subset data can also be large. For example, the subset generation unit 12 can generate seasonal (spring, summer, autumn, winter) subset data, monthly (1-December) subset data, and the like from one year of training data.

このように、それぞれが互いに異なる複数の部分集合データを生成するのであれば、ウィンドウの設定方法は何ら限定されない。部分集合生成部１２は個々の部分集合データをユーザ入力に従って生成してもよいし、自動的に生成してもよい。 As described above, the window setting method is not limited as long as a plurality of subset data different from each other is generated. The subset generation unit 12 may generate individual subset data in accordance with a user input or automatically.

関数生成部１３は、各部分集合データに対して機械学習を実行することで、複数の部分集合データに対応する複数のパターン関数を生成する機能要素である。関数生成部１３は生成した複数のパターン関数を選択部１４に出力する。 The function generation unit 13 is a functional element that generates a plurality of pattern functions corresponding to a plurality of subset data by executing machine learning on each subset data. The function generation unit 13 outputs the generated plurality of pattern functions to the selection unit 14.

上述した通り機械学習の具体的な手法は限定されないが、本実施形態では関数生成部１３はサポートベクター回帰を用いる。一つの部分集合データに対して、関数生成部１３は、線形の関係としてパターンを発見可能な空間にその部分集合データを変換することで（カーネル関数）、線形の関係としてパターンを発見する（パターン解析アルゴリズム）。より具体的に説明すると、関数生成部１３は、その部分集合データをカーネル関数によりカーネル行列に変換し、そのカーネル行列にパターン解析アルゴリズムを適用することでパターン関数を生成する。このパターン関数を用いることで、将来の時系列データを予測することが可能となる。関数生成部１３は入力された部分集合データのそれぞれについてその処理を実行することで、複数のパターン関数を生成する。例えば、三つの部分集合データＤａ，Ｄｂ，Ｄｃが入力された場合には、関数生成部１３は部分集合データＤａからパターン関数Ｆａを生成し、部分集合データＤｂからパターン関数Ｆｂを生成し、部分集合データＤｃからパターン関数Ｆｃを生成する。 As described above, the specific method of machine learning is not limited, but in this embodiment, the function generation unit 13 uses support vector regression. For one subset data, the function generation unit 13 converts the subset data into a space where a pattern can be found as a linear relationship (kernel function), thereby finding a pattern as a linear relationship (pattern Analysis algorithm). More specifically, the function generation unit 13 converts the subset data into a kernel matrix using a kernel function, and generates a pattern function by applying a pattern analysis algorithm to the kernel matrix. By using this pattern function, it is possible to predict future time-series data. The function generation unit 13 generates a plurality of pattern functions by executing processing for each of the input subset data. For example, when three subset data Da, Db, and Dc are input, the function generation unit 13 generates a pattern function Fa from the subset data Da, generates a pattern function Fb from the subset data Db, A pattern function Fc is generated from the set data Dc.

選択部１４は、入力された複数のパターン関数の中で最も精度が高いパターン関数を特定し、そのパターン関数に対応する学習期間を将来の予測のために選択する機能要素である。ここで、パターン関数に対応する学習期間を選択するということは、そのパターン関数そのものも併せて選択することを意味する。選択部１４は選択したパターン関数および学習期間を予測部１５に出力する。 The selection unit 14 is a functional element that identifies a pattern function with the highest accuracy among a plurality of input pattern functions and selects a learning period corresponding to the pattern function for future prediction. Here, selecting the learning period corresponding to the pattern function means selecting the pattern function itself. The selection unit 14 outputs the selected pattern function and learning period to the prediction unit 15.

まず、選択部１４は受付部１１が受け付けたトレーニングデータで示される時点における複数の予測値を求める。この処理で予測値を求める時点については既に実績値が得られており、その予測値はパターン関数および学習期間を選択するために用いられるに過ぎない。したがって、本明細書では、上述した複数の予測値を求める処理において設定される時点を「評価時点」と呼ぶ。上述したように、評価時点は、部分集合生成部１２で生成された複数の部分集合データで示される複数の学習期間のいずれよりも後の時点である。図３の例では、時刻１４：００および１４：３０が評価時点になり得る。 First, the selection unit 14 obtains a plurality of predicted values at the time indicated by the training data received by the receiving unit 11. The actual value has already been obtained at the time of obtaining the predicted value in this process, and the predicted value is only used for selecting the pattern function and the learning period. Therefore, in this specification, a time point set in the above-described process for obtaining a plurality of predicted values is referred to as an “evaluation time point”. As described above, the evaluation time point is a time point later than any of the plurality of learning periods indicated by the plurality of subset data generated by the subset generation unit 12. In the example of FIG. 3, the times 14:00 and 14:30 can be the evaluation time points.

選択部１４は１以上の評価時点を設定する。続いて、選択部１４は各評価時点において、複数のパターン関数を用いて該評価時点における複数の予測値を求める。また、選択部１４は受付部１１が受け付けたトレーニングデータから各評価時点の実測値を取得する。 The selection unit 14 sets one or more evaluation points. Subsequently, the selection unit 14 obtains a plurality of predicted values at the evaluation time using a plurality of pattern functions at each evaluation time. In addition, the selection unit 14 acquires an actual measurement value at each evaluation point from the training data received by the reception unit 11.

続いて、選択部１４は算出した予測値と実測値との誤差を求める。そして、選択部１４はその誤差が最小であるパターン関数と、該パターン関数に対応する学習期間とを選択する。誤差の計算方法は限定されず、例えば二乗平均平方根誤差（ＲＭＳＥ）または二乗平均誤差（ＭＳＥ）を用いてもよい。もし評価時点が一つだけであれば、単純な減算で得られる差を用いてもよい。複数の評価時点を設定することで誤差の精度を上げることができる。 Subsequently, the selection unit 14 obtains an error between the calculated predicted value and the actually measured value. Then, the selection unit 14 selects a pattern function having the smallest error and a learning period corresponding to the pattern function. An error calculation method is not limited, and for example, a root mean square error (RMSE) or a root mean square error (MSE) may be used. If there is only one evaluation point, a difference obtained by simple subtraction may be used. By setting a plurality of evaluation points, the accuracy of the error can be increased.

例えば、パターン関数Ｆａ，Ｆｂ，Ｆｃが入力され、評価時点としてＥＴ_１，ＥＴ_２が存在すると仮定する。この場合には、選択部１４はパターン関数Ｆａから評価時点ＥＴ_１における予測値Ｖａ_１と評価時点ＥＴ_２における予測値Ｖａ_２とを求める。さらに、選択部１４はパターン関数Ｆｂから評価時点ＥＴ_１，ＥＴ_２における予測値Ｖｂ_１，Ｖｂ_２を求め、パターン関数Ｆｃから評価時点ＥＴ_１，ＥＴ_２における予測値Ｖｃ_１，Ｖｃ_２を求める。続いて、選択部１４は評価時点ＥＴ_１，ＥＴ_２における実績値ＶＲ_１，ＶＲ_２をトレーニングデータから読み出す。そして、選択部１４は、パターン関数Ｆａによる予測値（Ｖａ_１，Ｖａ_２）および実績値（ＶＲ_１，ＶＲ_２）から得られる誤差と、パターン関数Ｆｂによる予測値（Ｖｂ_１，Ｖｂ_２）および実績値（ＶＲ_１，ＶＲ_２）から得られる誤差と、パターン関数Ｆｃによる予測値（Ｖｃ_１，Ｖｃ_２）および実績値（ＶＲ_１，ＶＲ_２）から得られる誤差とを求める。そして、選択部１４は誤差が最小のパターン関数と、該パターン関数に対応する学習期間とを選択する。 For example, it is assumed that pattern functions Fa, Fb, and Fc are input and ET ₁ and ET ₂ exist as evaluation points. In this case, it finds a predicted value Va ₂ in the selected section 14 is the predicted value Va ₁ and Evaluation time point ET ₂ at the time of evaluation ET ₁ from the pattern function Fa. Further, the selection unit 14 obtains predicted values Vb ₁ and Vb ₂ at the evaluation points ET ₁ and ET ₂ from the pattern function Fb, and obtains predicted values Vc ₁ and Vc ₂ at the evaluation points ET ₁ and ET ₂ from the pattern function Fc. Subsequently, the selection unit 14 reads the actual values VR ₁ and VR ₂ at the evaluation points ET ₁ and ET ₂ from the training data. The selecting unit 14 then determines the error obtained from the predicted values (Va ₁ , Va ₂ ) and the actual values (VR ₁ , VR ₂ ) based on the pattern function Fa, the predicted values (Vb ₁ , Vb ₂ ) based on the pattern function Fb, and An error obtained from the actual values (VR ₁ , VR ₂ ) and an error obtained from the predicted values (Vc ₁ , Vc ₂ ) and the actual values (VR ₁ , VR ₂ ) by the pattern function Fc are obtained. Then, the selection unit 14 selects a pattern function with the smallest error and a learning period corresponding to the pattern function.

予測部１５は、選択されたパターン関数および学習期間を用いて将来の時点における予測値を求める機能要素である。予測部１５は求めた予測値をモニタやメモリ、データベース、プリンタなどの装置に出力する。ここで得られる予測値がどの程度正確であるかは、その時点になるまで分からない。予測部１５は求めた予測値を事後評価のために評価部１６に出力する。次の新たなパターン関数および学習期間が入力された場合には、予測部１５はその新たな入力に基づいて予測値を求める。 The prediction unit 15 is a functional element that calculates a predicted value at a future time point using the selected pattern function and learning period. The prediction unit 15 outputs the obtained predicted value to a device such as a monitor, a memory, a database, or a printer. How accurate the predicted value obtained here is is unknown until that point. The prediction unit 15 outputs the obtained predicted value to the evaluation unit 16 for the post-evaluation. When the next new pattern function and learning period are input, the prediction unit 15 obtains a predicted value based on the new input.

評価部１６は、予測部１５で得られた予測値が実際にどの程度正確だったかを判定する機能要素である。評価部１６は、予測部１５が予測した時点の実測値をデータベース２０から読み出し、入力された予測値とその実測値との誤差を求め、その誤差が所定の閾値未満であるか否かを判定する。この判定処理のために、評価部１６はその閾値を予め保持している。 The evaluation unit 16 is a functional element that determines how accurate the prediction value obtained by the prediction unit 15 is actually. The evaluation unit 16 reads an actual measurement value at the time point predicted by the prediction unit 15 from the database 20, obtains an error between the input predicted value and the actual measurement value, and determines whether the error is less than a predetermined threshold value. To do. For this determination process, the evaluation unit 16 holds the threshold value in advance.

誤差が閾値以上であるということは、予測部１５が現在使っているパターン関数の精度が良くないことを意味する。したがって、この場合には、評価部１６は新たなパターン関数および学習期間を決めるための指示を受付部１１に出力する。一方、誤差が閾値未満であるということは、現在のパターン関数の精度が良いことを意味する。したがって、この場合には、評価部１６はその指示を出力することなく処理を終了する。 That the error is equal to or greater than the threshold means that the accuracy of the pattern function currently used by the prediction unit 15 is not good. Therefore, in this case, the evaluation unit 16 outputs an instruction for determining a new pattern function and a learning period to the reception unit 11. On the other hand, the fact that the error is less than the threshold means that the accuracy of the current pattern function is good. Therefore, in this case, the evaluation unit 16 ends the process without outputting the instruction.

次に、図４，５を用いて、予測システム１０の動作を説明するとともに本実施形態に係る予測方法について説明する。 Next, the operation of the prediction system 10 will be described with reference to FIGS. 4 and 5 and the prediction method according to the present embodiment will be described.

基本的な処理手順を図４に示す。まず、受付部１１がトレーニングデータの入力を受け付け（ステップＳ１１）、部分集合生成部１２がそのトレーニングデータから複数の部分集合データを生成する（ステップＳ１２、部分集合生成ステップ）。続いて、関数生成部１３が各部分集合データに対してサポートベクター回帰による学習処理を実行することで複数のパターン関数を生成する（ステップＳ１３、関数生成ステップ）。続いて、選択部１４が各パターン関数を用いて評価時点での予測値を求め、実測値との誤差が最小のパターン関数に対応する学習期間を選択する（ステップＳ１４、選択ステップ）。パターン関数の生成、または評価時点での予測値の計算は、並列処理でも直列処理でも、並列処理と直列処理を混在させてもよい。 A basic processing procedure is shown in FIG. First, the reception unit 11 receives input of training data (step S11), and the subset generation unit 12 generates a plurality of subset data from the training data (step S12, subset generation step). Subsequently, the function generation unit 13 generates a plurality of pattern functions by executing learning processing by support vector regression on each subset data (step S13, function generation step). Subsequently, the selection unit 14 obtains a predicted value at the time of evaluation using each pattern function, and selects a learning period corresponding to the pattern function having the smallest error from the actual measurement value (step S14, selection step). The generation of the pattern function or the calculation of the predicted value at the time of evaluation may be parallel processing or serial processing, or parallel processing and serial processing may be mixed.

続いて、予測部１５が選択結果に基づいて（選択されたパターン関数および学習期間を用いて）、将来の予測時点における予測値を求める（ステップＳ１５）。続いて、評価部１６がその予測値を検証する。すなわち、予測値とその時点における実測値との誤差が閾値未満であれば（ステップＳ１６；ＹＥＳ）、予測部１５は現在のパターン関数および学習期間を用いて将来の別の時点での予測値を求めることができる。 Subsequently, based on the selection result (using the selected pattern function and learning period), the prediction unit 15 obtains a predicted value at a future prediction time (step S15). Subsequently, the evaluation unit 16 verifies the predicted value. That is, if the error between the predicted value and the actually measured value at that time is less than the threshold (step S16; YES), the predicting unit 15 uses the current pattern function and the learning period to calculate the predicted value at another time in the future. Can be sought.

一方、予測値と実測値との誤差が閾値以上である場合には（ステップＳ１６；ＮＯ）、ステップＳ１１以降の処理が再度実行される。この場合、データベース２０は、受付部１１がトレーニングデータを前回取得した時以降に追加された実測値を記憶しているので、予測システム１０はその新たな実測値を含む最新のトレーニングデータをデータベース２０から取得して新たなパターン関数を得ることができる。 On the other hand, when the error between the predicted value and the actually measured value is greater than or equal to the threshold value (step S16; NO), the processes after step S11 are executed again. In this case, since the database 20 stores actually measured values added since the time when the receiving unit 11 acquired the training data last time, the prediction system 10 stores the latest training data including the new actually measured values in the database 20. To obtain a new pattern function.

予測システム１０は、上記ステップＳ１２〜Ｓ１４の処理を複数回繰り返して部分集合データを段階的に絞り込みながらパターン関数を選んでいってもよい。図５を参照しながらその処理を説明する。 The prediction system 10 may select the pattern function while narrowing down the subset data step by step by repeating the processes of steps S12 to S14 a plurality of times. The process will be described with reference to FIG.

まず、受付部１１がトレーニングデータの入力を受け付け（ステップＳ２１）、部分集合生成部１２がそのトレーニングデータから複数の第１次部分集合データを生成する（ステップＳ２２、部分集合生成ステップ）。続いて、関数生成部１３が各第１次部分集合データに対してサポートベクター回帰による学習処理を実行することで複数のパターン関数を生成する（ステップＳ２３、関数生成ステップ）。続いて、選択部１４が各パターン関数を用いて特定の予測時点での予測値を求め、実測値との誤差が最小のパターン関数に対応する学習期間を選択する（ステップＳ２４、選択ステップ）。ここまでの処理は上記ステップＳ１１〜Ｓ１４と同様である。 First, the receiving unit 11 receives input of training data (step S21), and the subset generation unit 12 generates a plurality of first subset data from the training data (step S22, subset generation step). Subsequently, the function generation unit 13 generates a plurality of pattern functions by executing learning processing by support vector regression on each primary subset data (step S23, function generation step). Subsequently, the selection unit 14 obtains a predicted value at a specific prediction time using each pattern function, and selects a learning period corresponding to the pattern function having the smallest error from the actually measured value (step S24, selection step). The processing up to this point is the same as steps S11 to S14.

続いて、部分集合生成部１２が選択された学習期間に対応する第１次部分集合データから複数の第２次部分集合データを生成する（ステップＳ２５）。例えば、第１次部分集合データが１０日分の時系列データである場合には、部分集合生成部１２は１０個の１日分の時系列データを第２次部分集合データとして生成する。続いて、関数生成部１３が各第２次部分集合データに対してサポートベクター回帰による学習処理を実行することで複数のパターン関数を生成する（ステップＳ２６）。続いて、選択部１４が各パターン関数を用いて特定の予測時点での予測値を求め、実測値との誤差が最小のパターン関数に対応する学習期間を選択する（ステップＳ２７）。 Subsequently, the subset generation unit 12 generates a plurality of secondary subset data from the primary subset data corresponding to the selected learning period (step S25). For example, when the primary subset data is time series data for 10 days, the subset generation unit 12 generates 10 pieces of time series data for 1 day as secondary subset data. Subsequently, the function generation unit 13 generates a plurality of pattern functions by executing learning processing by support vector regression on each secondary subset data (step S26). Subsequently, the selection unit 14 obtains a predicted value at a specific prediction time using each pattern function, and selects a learning period corresponding to the pattern function having the smallest error from the actually measured value (step S27).

第１次部分集合データは時系列のトレーニングデータの一部であると言える。したがって、ステップＳ２５，Ｓ２６，Ｓ２７もそれぞれ、部分集合生成ステップ、関数生成ステップ、および選択ステップであると言える。 It can be said that the first subset data is a part of time-series training data. Therefore, it can be said that steps S25, S26, and S27 are a subset generation step, a function generation step, and a selection step, respectively.

部分集合データを絞り込む場合も、パターン関数の生成、または評価時点での予測値の計算は、並列処理でも直接処理でもよい。 Even when the subset data is narrowed down, the generation of the pattern function or the calculation of the predicted value at the time of evaluation may be performed in parallel or directly.

続いて、予測部１５が選択結果に基づいて（選択されたパターン関数および学習期間を用いて）、将来の予測時点における予測値を求める（ステップＳ２８）。続いて、評価部１６がその予測値を検証する。すなわち、予測値とその時点における実測値との誤差が閾値未満であれば（ステップＳ２９；ＹＥＳ）、予測部１５は現在のパターン関数を用いて将来の別の時点での予測値を求めることができる。一方、予測値と実測値との誤差が閾値以上である場合には（ステップＳ２９；ＮＯ）、ステップＳ２１以降の処理が再度実行される。 Subsequently, based on the selection result (using the selected pattern function and learning period), the prediction unit 15 obtains a predicted value at a future prediction time (step S28). Subsequently, the evaluation unit 16 verifies the predicted value. That is, if the error between the predicted value and the actually measured value at that time is less than the threshold (step S29; YES), the prediction unit 15 can obtain a predicted value at another time in the future using the current pattern function. it can. On the other hand, when the error between the predicted value and the actually measured value is greater than or equal to the threshold value (step S29; NO), the processing after step S21 is executed again.

なお、図５の例では部分集合データの絞り込みは一度だけであったが、その処理は２回以上繰り返してもよい。すなわち、部分集合生成ステップ、関数生成ステップ、および選択ステップは何回繰り返してもよい。 In the example of FIG. 5, the subset data is narrowed down once, but the process may be repeated twice or more. That is, the subset generation step, the function generation step, and the selection step may be repeated any number of times.

次に、図６を用いて、予測システム１０を実現するための予測プログラムＰ１を説明する。 Next, a prediction program P1 for realizing the prediction system 10 will be described with reference to FIG.

予測プログラムＰ１は、メインモジュールＰ１０、受付モジュールＰ１１、部分集合生成モジュールＰ１２、関数生成モジュールＰ１３、選択モジュールＰ１４、予測モジュールＰ１５、および評価モジュールＰ１６を備えている。 The prediction program P1 includes a main module P10, a reception module P11, a subset generation module P12, a function generation module P13, a selection module P14, a prediction module P15, and an evaluation module P16.

メインモジュールＰ１０は、機械学習による予測機能を統括的に制御する部分である。受付モジュールＰ１１、部分集合生成モジュールＰ１２、関数生成モジュールＰ１３、選択モジュールＰ１４、予測モジュールＰ１５、および評価モジュールＰ１６を実行することにより実現される機能はそれぞれ、上記の受付部１１、部分集合生成部１２、関数生成部１３、選択部１４、予測部１５、および評価部１６の機能と同様である。 The main module P10 is a part that comprehensively controls the prediction function based on machine learning. The functions realized by executing the reception module P11, the subset generation module P12, the function generation module P13, the selection module P14, the prediction module P15, and the evaluation module P16 are the reception unit 11 and the subset generation unit 12, respectively. The function generation unit 13, the selection unit 14, the prediction unit 15, and the evaluation unit 16 have the same functions.

予測プログラムＰ１は、例えば、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭ、半導体メモリなどの有形の記録媒体に固定的に記録された上で提供されてもよい。また、予測プログラムＰ１は、搬送波に重畳されたデータ信号として通信ネットワークを介して提供されてもよい。 The prediction program P1 may be provided after being fixedly recorded on a tangible recording medium such as a CD-ROM, DVD-ROM, or semiconductor memory. The prediction program P1 may be provided via a communication network as a data signal superimposed on a carrier wave.

以上説明したように、本発明の一側面に係る予測システムは、時系列のトレーニングデータから、それぞれが互いに異なる複数の部分集合データを生成する部分集合生成部と、複数の部分集合データのそれぞれに対して機械学習を実行することで、該複数の部分集合データに対応する複数のパターン関数を生成する関数生成部と、複数のパターン関数のそれぞれを用いて、トレーニングデータ内の評価時点における予測値を求め、該評価時点での実測値と該予測値との誤差が最小であるパターン関数に対応する学習期間を選択する選択部とを備える。 As described above, the prediction system according to one aspect of the present invention includes a subset generation unit that generates a plurality of subset data different from each other from time-series training data, and a plurality of subset data. A function generation unit that generates a plurality of pattern functions corresponding to the plurality of subset data by executing machine learning on each of the plurality of pattern functions, and a predicted value at the time of evaluation in the training data And a selection unit that selects a learning period corresponding to the pattern function having the smallest error between the actually measured value and the predicted value at the time of the evaluation.

この処理は、学習期間（ウィンドウ）を自動調整するものであるともいえる。パターン関数の決定に影響する学習期間は長ければ良いとは限らないから、最適な学習期間を手作業で特定するのは困難である。本実施形態では最適と思われる学習期間を自動的に求めることができる。 It can be said that this process automatically adjusts the learning period (window). Since the learning period that affects the determination of the pattern function is not necessarily long, it is difficult to manually specify the optimal learning period. In the present embodiment, the learning period that seems to be optimal can be automatically obtained.

他の側面に係る予測システムでは、部分集合生成部が、選択部により選択された学習期間に対応する部分集合データから、それぞれが互いに異なる複数の新たな部分集合データを生成し、関数生成部および選択部が、複数の新たな部分集合データに基づいて再度処理を実行してもよい。このように学習期間を絞り込むことで、パターン関数の精度の向上が期待できる。 In the prediction system according to another aspect, the subset generation unit generates a plurality of new subset data different from each other from the subset data corresponding to the learning period selected by the selection unit, and the function generation unit and The selection unit may execute the process again based on a plurality of new subset data. By narrowing the learning period in this way, it can be expected to improve the accuracy of the pattern function.

他の側面に係る予測システムでは、選択部により選択された学習期間を用いて、将来の予測時点における予測値を求める予測部と、予測部により得られた予測値と予測時点における実測値との誤差が所定の閾値未満か否かを判定する評価部とをさらに備え、予測時点における誤差が閾値以上である場合には、部分集合生成部、関数生成部、および選択部による処理が再度実行されてもよい。この場合には、実際の予測処理を進めながら動的に学習期間を再設定することができる。 In the prediction system according to another aspect, using the learning period selected by the selection unit, a prediction unit that obtains a prediction value at a future prediction time point, a prediction value obtained by the prediction unit, and an actual measurement value at the prediction time point And an evaluation unit that determines whether or not the error is less than a predetermined threshold. If the error at the prediction time is equal to or greater than the threshold, the processing by the subset generation unit, the function generation unit, and the selection unit is executed again. May be. In this case, the learning period can be dynamically reset while proceeding with the actual prediction process.

他の側面に係る予測システムでは、複数のパターン関数の生成、および評価時点における予測値の算出の少なくとも一方が並列処理されてもよい。並列処理をすることで、最適と思われる学習期間をより早く特定することができる。 In the prediction system according to another aspect, at least one of generation of a plurality of pattern functions and calculation of a predicted value at the time of evaluation may be processed in parallel. By performing parallel processing, the learning period that seems to be optimal can be identified earlier.

他の側面に係る予測システムでは、機械学習がサポートベクターマシンであってもよい。 In the prediction system according to another aspect, the machine learning may be a support vector machine.

以上、本発明をその実施形態に基づいて詳細に説明した。しかし、本発明は上記実施形態に限定されるものではない。本発明は、その要旨を逸脱しない範囲で様々な変形が可能である。 The present invention has been described in detail based on the embodiments. However, the present invention is not limited to the above embodiment. The present invention can be variously modified without departing from the gist thereof.

上記実施形態では、サポートベクター回帰において学習期間を自動調整する手法（ＳｌｉｄｉｎｇＷｉｎｄｏｗ−ｂａｓｅｄＳｕｐｐｏｒｔＶｅｃｔｏｒＲｅｇｒｅｓｓｉｏｎ（ＳＷ−ＳＶＲ））を示したが、上述したように、機械学習の具体的な手法はサポートベクター回帰に限定されない。 In the above embodiment, a method of automatically adjusting the learning period in support vector regression (Sliding Window-based Support Vector Regression (SW-SVR)) has been described. However, as described above, a specific method of machine learning is a support vector. It is not limited to regression.

予測部１５および評価部１６は予測システム１０とは別のシステムにあってもよく、この場合には、選択部１４は選択したパターン関数および学習期間をその別システムに送信する。評価部１６は省略可能であり、この場合には、予測部１５は明示の指示があるまで選択されたパターン関数を使用し続ける。 The prediction unit 15 and the evaluation unit 16 may be in a system different from the prediction system 10, and in this case, the selection unit 14 transmits the selected pattern function and learning period to the other system. The evaluation unit 16 can be omitted. In this case, the prediction unit 15 continues to use the selected pattern function until an explicit instruction is given.

１０…予測システム、１１…受付部、１２…部分集合生成部、１３…関数生成部、１４…選択部、１５…予測部、１６…評価部、２０…データベース、Ｐ１…予測プログラム、Ｐ１０…メインモジュール、Ｐ１１…受付モジュール、Ｐ１２…部分集合生成モジュール、Ｐ１３…関数生成モジュール、Ｐ１４…選択モジュール、Ｐ１５…予測モジュール、Ｐ１６…評価モジュール。 DESCRIPTION OF SYMBOLS 10 ... Prediction system, 11 ... Reception part, 12 ... Subset generation part, 13 ... Function generation part, 14 ... Selection part, 15 ... Prediction part, 16 ... Evaluation part, 20 ... Database, P1 ... Prediction program, P10 ... Main Module, P11 ... Reception module, P12 ... Subset generation module, P13 ... Function generation module, P14 ... Selection module, P15 ... Prediction module, P16 ... Evaluation module.

Claims

A subset generation unit that generates a plurality of subset data different from each other from time-series training data, wherein each of the plurality of subset data is time-series data ; and
A function generation unit that generates a plurality of pattern functions corresponding to the plurality of subset data by performing machine learning on each of the plurality of subset data;
Using each of the plurality of pattern functions, a predicted value at the evaluation time point in the training data is obtained, and a learning period corresponding to the pattern function in which an error between the actually measured value at the evaluation time point and the predicted value is minimum and a selector for selecting,
The subset generation unit further generates a plurality of new subset data different from each other from the subset data corresponding to the learning period selected by the selection unit, wherein the plurality of new subset data Each subset data is time-series data,
The function generation unit and the selection unit further execute processing again based on the plurality of new subset data.
Prediction system.

Using the learning period selected by the selection unit, a prediction unit for obtaining a prediction value at a future prediction time point;
An evaluation unit that determines whether or not an error between the predicted value obtained by the prediction unit and the actual measurement value at the prediction time point is less than a predetermined threshold;
If the error at the prediction time is greater than or equal to the threshold, the processing by the subset generation unit, the function generation unit, and the selection unit is executed again.
The prediction system according to claim 1 .

At least one of the generation of the plurality of pattern functions and the calculation of the predicted value at the evaluation time are processed in parallel.
The prediction system according to claim 1 or 2 .

The machine learning is a support vector machine,
The prediction system as described in any one of Claims 1-3 .

A prediction method executed by a prediction system comprising a processor,
A subset generation step for generating a plurality of different subset data from each other from time series training data, wherein each of the plurality of subset data is time series data ; and
A function generation step of generating a plurality of pattern functions corresponding to the plurality of subset data by performing machine learning on each of the plurality of subset data;
Using each of the plurality of pattern functions, a predicted value at the evaluation time point in the training data is obtained, and a learning period corresponding to the pattern function in which an error between the actually measured value at the evaluation time point and the predicted value is minimum only contains a selection step of selecting,
In the subset generation step, a plurality of new subset data different from each other is generated from the subset data corresponding to the learning period selected in the selection step, wherein the plurality of new subset data Each subset data is time-series data,
The function generation step and the selection step are performed again based on the plurality of new subset data;
Prediction method.

A subset generation unit that generates a plurality of subset data different from each other from time-series training data, wherein each of the plurality of subset data is time-series data ; and
A function generation unit that generates a plurality of pattern functions corresponding to the plurality of subset data by performing machine learning on each of the plurality of subset data;
Using each of the plurality of pattern functions, a predicted value at the evaluation time point in the training data is obtained, and a learning period corresponding to the pattern function in which an error between the actually measured value at the evaluation time point and the predicted value is minimum Let the computer function as a selection section to select
The subset generation unit further generates a plurality of new subset data different from each other from the subset data corresponding to the learning period selected by the selection unit, wherein the plurality of new subset data Each subset data is time-series data,
The function generation unit and the selection unit further execute processing again based on the plurality of new subset data.
Prediction program.