WO2020184561A1 - Data prediction device, data prediction method, and data prediction program - Google Patents

Data prediction device, data prediction method, and data prediction program Download PDF

Info

Publication number
WO2020184561A1
WO2020184561A1 PCT/JP2020/010304 JP2020010304W WO2020184561A1 WO 2020184561 A1 WO2020184561 A1 WO 2020184561A1 JP 2020010304 W JP2020010304 W JP 2020010304W WO 2020184561 A1 WO2020184561 A1 WO 2020184561A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
regression coefficient
unit
prediction
regression
Prior art date
Application number
PCT/JP2020/010304
Other languages
French (fr)
Japanese (ja)
Inventor
高嶋 洋一
昌宏 湯口
山田 智広
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Publication of WO2020184561A1 publication Critical patent/WO2020184561A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to a data prediction device, a data prediction method, and a data prediction program.
  • Equation (1) is based on 12 data from time t-2 to time t among the data for each unit time from time t-3 to time t of the four sensors 1 to 4 illustrated in FIG. Predict data D 1, t + 1 at time t + 1 of sensor 1.
  • D l, tm are the data illustrated by the circle in FIG. 1
  • l is the sensor number
  • tm is the time
  • m represents a natural number.
  • the data is predicted by the following learning process and prediction process.
  • the past data of each sensor that seems to be related to the predicted data is collected to create a linear regression equation, and the data of the sample section where the phenomenon to be predicted (unique movement of the predicted target) occurs is linearly regressiond.
  • the predicted value is continuously calculated from the sensor data by the obtained regression coefficient ⁇ k (prediction process).
  • the data of the sample interval may be insufficient and the solution of the simultaneous equations may not be obtained.
  • a mesh-shaped rainfall forecast value created by the Japan Meteorological Agency can be used instead of the data from the sensor or in addition to the data from the sensor.
  • the number of regression coefficients will increase further, and the data of the sample interval may be insufficient.
  • the regression coefficient of the approximate solution can be obtained by using the method called L1 regularization (Non-Patent Document 1) as illustrated by Eq. (2).
  • This L1 regularization has the property of bringing the regression coefficient, which is difficult to contribute to prediction, close to zero.
  • the y i is the dependent variable
  • x ij is the explanatory variable
  • t c is the adjustment parameter.
  • FIG. 2 shows a data prediction device 10 of a related technology configured by using L1 regularization.
  • the data collection unit 11 collects data from each of a large number of sensors and transmits the same data to the prediction unit 12 and the learning section selection unit 13.
  • the learning section selection unit 13 selects data in a data section in which an event to be predicted is likely to appear based on a rule, and transmits the selected data to the regression coefficient calculation unit 14 as data to be learned.
  • the regression coefficient calculation unit 14 obtains a regression coefficient using L1 regularization, and transmits the regression coefficient to the prediction unit 12.
  • the prediction unit 12 continuously calculates the prediction value by substituting the regression coefficient received from the regression coefficient calculation unit 14 and the data received from the data collection unit 11 into the linear regression equation.
  • L1 regularization makes it possible to obtain regression coefficients even when many explanatory variables are used, but the range of sensors that affect prediction and the range of data measurement time are not known in advance, so they are widespread.
  • the range will be set to. Therefore, the number of data used as input becomes very large, which puts pressure on the line capacity for collecting and transmitting data, and also causes a problem that the amount of calculation processing in the learning process becomes enormous.
  • the purpose of this disclosure is to reduce the number of data used when calculating the regression coefficient, reduce the line capacity for collecting and transmitting data, and reduce the amount of calculation processing in the learning process.
  • the data prediction device of the first aspect of the present disclosure has a data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a regression coefficient by L1 regularization.
  • a regression coefficient is calculated based on the plurality of data received from the data collection unit using the first calculation method to be obtained, and data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value is selected from the calculated regression coefficients.
  • the regression coefficient is based on the selection data received from the data collection unit by using the first regression coefficient calculation unit that transmits the selection information for the purpose to the data collection unit and the second calculation method different from the first calculation method.
  • a second regression coefficient calculation unit that calculates, a prediction unit that outputs the prediction result predicted based on the selection data received from the data collection unit and the regression coefficient calculated by the second regression coefficient calculation unit. including.
  • the second aspect of the present disclosure is the data prediction device of the first aspect, which is an error between the prediction result and the actually measured value, and when the number of occurrences of the error of the predetermined value or more becomes the predetermined number of times or more.
  • An error monitoring unit that instructs the second regression coefficient calculation unit to recalculate the regression coefficient by the second calculation method is further included.
  • the third aspect of the present disclosure is the data prediction device of the second aspect, in which the error monitoring unit again causes the error to occur more than the predetermined number of times after instructing the recalculation.
  • the first regression coefficient calculation unit is instructed to recalculate the regression coefficient by the first calculation method, and the data collection unit is instructed to retransmit the selection information.
  • the data prediction device of the fourth aspect of the present disclosure has a data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a regression coefficient by L1 regularization.
  • the regression coefficient is calculated based on the plurality of data received from the data collection unit using the calculated calculation method, and the selection information for selecting the data corresponding to the non-zero regression coefficient among the calculated regression coefficients is provided.
  • the fifth aspect of the present disclosure is the data prediction device of the fourth aspect, in which the error between the prediction result and the actually measured value is greater than or equal to the predetermined number of occurrences of the error. Further includes an error monitoring unit that instructs the regression coefficient calculation unit to recalculate the regression coefficient by the calculation method.
  • the program of the sixth aspect of the present disclosure transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and obtains a regression coefficient by L1 regularization.
  • the regression coefficient is calculated based on the plurality of received data, and selection information for selecting data corresponding to the regression coefficient whose absolute value is equal to or greater than the threshold value among the calculated regression coefficients is transmitted, and the first A regression coefficient is calculated using the received selection data using a second calculation method different from the calculation method, and based on the received selection data and the regression coefficient calculated using the second calculation method.
  • the computer transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and obtains a regression coefficient by L1 regularization.
  • a regression coefficient is calculated based on the plurality of received data using the first calculation method, and selection information for selecting data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value among the calculated regression coefficients is transmitted.
  • a regression coefficient was calculated using the received selection data using a second calculation method different from the first calculation method, and was calculated using the received selection data and the second calculation method. Output the prediction result predicted based on the regression coefficient.
  • the data prediction device 20 of this embodiment is illustrated in FIG.
  • the data collection unit 21 collects data from each of a large number of sensors S 1 , S 2 , S 3 , ..., And transmits the same data to the prediction unit 22 and the learning section selection unit 23.
  • the learning section selection unit 23 selects the data section in which the event to be predicted appears is predicted based on the rule, and selects the data to be learned in the first regression coefficient calculation unit 25 and the second regression coefficient calculation unit 24. Send to. That is, the learning section selection unit 23 removes unnecessary data and data with little change existing in the section in which the event to be predicted does not appear.
  • the first regression coefficient calculation unit 25 uses the first calculation method of obtaining the regression coefficient of the approximate solution by L1 regularization to a plurality of data transmitted from the data collection unit 21 and selected by the learning section selection unit 23.
  • the regression coefficient is calculated based on the calculation, the regression coefficient whose absolute value is equal to or larger than the threshold value is selected from the calculated regression coefficients, and the information indicating the selected regression coefficient is transmitted to each of the data collection unit 21 and the second regression coefficient calculation unit 24. Send.
  • the data collection unit 21 selects data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value from a plurality of collected data based on the information received from the first regression coefficient calculation unit 25, and uses the selected data as selection data. It is transmitted to each of the prediction unit 22, the learning section selection unit 23, and the error monitoring unit 26.
  • the information transmitted from the first regression coefficient calculation unit 25 to the data collection unit 21 is used for selecting data in the data collection unit 21, and is therefore referred to as selection information below.
  • data corresponding to the regression coefficient whose absolute value is equal to or more than the threshold value is selected as selection data, and the data corresponding to the regression coefficient whose absolute value is less than the threshold value is deleted. Therefore, as the selection information to be transmitted to the data collection unit 21, the data collection unit 21 uses information indicating a regression coefficient whose absolute value is less than the threshold value instead of information indicating a regression coefficient whose absolute value is less than the threshold value. Selected data may be obtained by deleting the data corresponding to the regression coefficient whose absolute value is less than the threshold value.
  • the second regression coefficient calculation unit 24 uses the selection data transmitted from the data acquisition unit 21 and passed through the learning section selection unit 23, and uses L2 regularization or multiple regression instead of L1 regularization. Ask for.
  • the regression coefficient calculated by the second regression coefficient calculation unit 24 is transmitted to the prediction unit 22 in which prediction is performed by regression calculation using L2 regularization or multiple regression.
  • the data collection unit 21 transmits the selection data to the prediction unit 22 and the learning section selection unit 23. That is, the data collection unit 21 collects data from each of all the sensors S 1 , S 2 , S 3 , ..., But transmits the data selectively.
  • the prediction unit 22 outputs the prediction result predicted by the linear regression equation using the selection data received from the data collection unit 21 and the regression coefficient received from the second regression coefficient calculation unit 24.
  • the prediction result is calculated from the data corresponding to the regression coefficient whose absolute value is less than the threshold, and the data corresponding to the regression coefficient whose absolute value is greater than or equal to the threshold and the data corresponding to the regression coefficient whose absolute value is greater than or equal to the threshold.
  • the amount of data used in the learning process can be reduced, and the amount of calculation processing can be reduced.
  • the error monitoring unit 26 monitors an error that is the difference between the prediction result by the prediction unit 22 and the actually measured value that is the data collected by the data collection unit 21, and the error of the predetermined value or more is the predetermined number of times (allowable number of times) or more.
  • a reselection instruction is transmitted to the data collection unit 21 and the first regression coefficient calculation unit 25.
  • the data collection unit 21 will transmit all the data, not the selected data, to the subsequent stage for a certain period until the reselection of the selected data is completed, and the first regression coefficient calculation unit 25 will again transmit the selected data.
  • Data is selected by L1 regularization for all data.
  • the selection data is reselected only when the error of the prediction result by L2 regularization or multiple regression is larger than the error of the prediction result by L1 regularization, and the error of the prediction result by L2 regularization or multiple regression is performed. However, if it is less than or equal to the error of the prediction result due to L1 regularization, reselection is not performed.
  • the error monitoring unit 26 monitors an error that is the difference between the prediction result by the prediction unit 22 and the measured value, and calculates the second regression coefficient when an error of a predetermined value or more occurs a predetermined number of times (allowable number of times) or more.
  • the regression coefficient may be recalculated. After recalculating the regression coefficient, if an error of a predetermined value or more occurs a predetermined number of times (allowable number of times) or more, a reselection instruction may be transmitted to the data collection unit 21 and the first regression coefficient calculation unit 25. ..
  • FIG. 4 illustrates the hardware configuration of the data prediction device 20.
  • the data prediction device 20 includes a CPU (Central Processing Unit) 51, a primary storage unit 52, a secondary storage unit 53, and an external interface 54, as shown in FIG.
  • the CPU 51 is an example of a processor that is hardware.
  • the CPU 51, the primary storage unit 52, the secondary storage unit 53, and the external interface 54 are connected to each other via the bus 59.
  • the primary storage unit 52 is, for example, a volatile memory such as a RAM (Random Access Memory).
  • the secondary storage unit 53 is, for example, a non-volatile memory such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
  • the secondary storage unit 53 includes a program storage area 53A and a data storage area 53B.
  • the program storage area 53A stores a program such as a data prediction program as an example.
  • the data storage area 53B stores data from the sensor, intermediate data during data prediction processing, and the like.
  • the CPU 51 reads the data prediction program from the program storage area 53A and deploys it in the primary storage unit 52. By loading and executing the data prediction program, the CPU 51 loads and executes the data collection unit 21, the prediction unit 22, the learning section selection unit 23, the second regression coefficient calculation unit 24, the first regression coefficient calculation unit 25, and It operates as an error monitoring unit 26.
  • a program such as a data prediction program may be stored in an external server and expanded to the primary storage unit 62 via a network. Further, a program such as a data prediction program may be stored in a non-temporary recording medium such as Digital Versatile Disc (DVD) and expanded to the primary storage unit 52 via a recording medium reading device.
  • DVD Digital Versatile Disc
  • FIG. 4 shows an example in which the sensor 31A and the danger notification system 31B are connected to the external interface 54.
  • the sensor 31A includes a large number of sensors.
  • the data predicted by the data prediction device 20 may be transmitted to, for example, the danger notification system 31B connected to the external interface 54 and used for the danger notification processing in the danger notification system 31B. Further, the data predicted by the data prediction device 20 may be recorded in an external storage device connected to the external interface 54, for example, or displayed as characters or images on the screen of the display connected to the external interface 54. You may.
  • the data prediction device 20 may be a dedicated device or a general-purpose device such as a workstation, a personal computer, or a tablet.
  • FIGS. 5 to 7 illustrate the data prediction process of this embodiment.
  • FIG. 5 illustrates the flow of the learning phase.
  • step S101 the data collection unit 21 collects data from, for example, a sensor 31A including a large number of sensors.
  • the data collection unit 21 transmits the collected data to the learning section selection unit 23 in the procedure S102.
  • step S103 the learning section selection unit 23 selects data in the data section that seems to be suitable for prediction and transmits it to the second regression coefficient calculation unit 24 and the first regression coefficient calculation unit 25.
  • the first regression coefficient calculation unit 25 calculates the regression coefficient using L1 regularization, selects a regression coefficient whose absolute value is equal to or greater than the threshold value, and transmits it to the second regression coefficient calculation unit 24 and the data collection unit 21.
  • the second regression coefficient calculation unit 24 obtains the regression coefficient by ordinary regression calculation using the data selected by the first regression coefficient calculation unit 25, for example, L2 regularization or multiple regression, and transmits it to the prediction unit 22.
  • FIG. 6 illustrates the flow of the prediction phase.
  • the data acquisition unit 21 transmits the data selected in the procedure S104 to the prediction unit 22 and the error monitoring unit 26 in the procedure S106.
  • the prediction unit 22 calculates the prediction result based on the regression coefficient and the data obtained in the procedure S105 in the procedure S107, transmits it to the error monitoring unit 26, and stores it in, for example, an external storage device.
  • FIG. 7 illustrates the flow of the error monitoring and reselection phase.
  • the error monitoring unit 26 calculates a prediction error (error) between the prediction result and the data (actual measurement value) in step S108, and when a prediction error of a predetermined value or more occurs more than a predetermined number of times (allowable number of times), the first regression A reselection instruction is transmitted to the coefficient calculation unit 25. Further, when the error monitoring unit 26 transmits the reselection instruction in the procedure S108, the error monitoring unit 26 sends all the data collected from the sensor or the like to the data collecting unit 21 until the reselection is completed (learning section selection unit 23, Instruct the error monitoring unit 26 and the prediction unit 22) to transmit.
  • a prediction error error
  • the prediction error tends to be larger than when the L1 regularization is used for the prediction, but when the prediction error is large or by periodically reselecting the data. , It is possible to keep the prediction error within a predetermined range.
  • the data prediction device 20 may include a data acquisition device 32A, a learning device 32B, and a prediction device 32C, as illustrated in FIG.
  • the data collection device 32A includes a data collection unit 21, and the learning device 32B includes a learning section selection unit 23, a second regression coefficient calculation unit 24, a first regression coefficient calculation unit 25, and an error monitoring unit 26, and the prediction device 32C.
  • the prediction unit 22 includes a prediction unit 22.
  • the sensor 31A and the data acquisition unit 21 and the prediction unit 22 and the output destination of the prediction result such as the danger notification system 31B are connected by a transmission line. Further, in the example shown in FIG. 8, the data acquisition unit 21 and the learning section selection unit 23, and the second regression coefficient calculation unit 24 and the prediction unit 22 are also connected by a transmission line.
  • FIG. 9 illustrates a data prediction device 40 of a modified example of this embodiment.
  • the description of the configuration and operation similar to that of the data prediction device 20 will be omitted as appropriate.
  • the data prediction device 40 is different from the data prediction device 20 of FIG. 3 in that the regression coefficient calculation unit 44 is included in place of the second regression coefficient calculation unit 24 and the first regression coefficient calculation unit 25.
  • the regression coefficient calculation unit 44 of the data prediction device 40 calculates the regression coefficient using L1 regularization.
  • the regression coefficient calculation unit 44 transmits a non-zero regression coefficient to the data acquisition unit 41 and the prediction unit 42.
  • the data collection unit 41 selects data corresponding to a non-zero regression coefficient from a plurality of collected data based on the information received from the regression coefficient calculation unit 44, and uses the selected data as selection data in the prediction unit 42 and learning. It is transmitted to each of the section selection unit 43 and the error monitoring unit 46.
  • the data prediction device 40 by selecting the data corresponding to the non-zero regression coefficient and not selecting the data corresponding to the zero regression coefficient, it is considered that the data selection is performed together with the calculation of the regression coefficient. There is.
  • data selection and regression coefficient calculation can be performed at the same time, so the calculation cost can be reduced.
  • the data prediction device of the present disclosure includes a data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a first calculation for obtaining a regression coefficient by L1 regularization.
  • the regression coefficient is calculated based on the selected data received from the data collection unit by using the first regression coefficient calculation unit that transmits the data to the data collection unit and the second calculation method different from the first calculation method. It includes a 2 regression coefficient calculation unit, the selection data received from the data collection unit, and a prediction unit that outputs a prediction result predicted based on the regression coefficient calculated by the 2nd regression coefficient calculation unit.
  • this makes it possible to reduce the number of data, reduce the line capacity for collecting and transmitting data, and reduce the amount of calculation processing in the learning process.
  • the data collection unit 21 may reduce the number of sensors that receive the data.
  • the calculation by the second regression coefficient calculation unit 25 is a normal regression calculation, the calculation cost can be reduced and the processing in an environment with few calculation resources is possible.

Abstract

The purpose of the present invention is to reduce the quantity of data used to calculate regression coefficients, reduce the bandwidth for collecting and transmitting data, and reduce computation throughput in a learning process. A data prediction device according to the present disclosure includes: a data collection unit that transmits a collected plurality of data, and selected data that was selected from the plurality of data; a first regression coefficient computation unit that computes regression coefficients on the basis of the received plurality of data and using a first computation method for calculating regression coefficients by L1 regularization, and transmits selection information for selecting data corresponding to regression coefficients among the calculated regression coefficients having an absolute value equal to or greater than a threshold value to the data collection unit; a second regression coefficient computation unit that computes regression coefficients on the basis of the received selected data and using a second computation method different from the first computation method; and a prediction unit that outputs a prediction result that was predicted on the basis of the received selected data and the regression coefficients computed by the second regression coefficient computation unit.

Description

データ予測装置、データ予測方法、及びデータ予測プログラムData forecasting device, data forecasting method, and data forecasting program
 本開示は、データ予測装置、データ予測方法、及びデータ予測プログラムに関する。 The present disclosure relates to a data prediction device, a data prediction method, and a data prediction program.
 各所に設置した大量のセンサから取得されるデータから、その中の特定のデータの近い将来の値を予測する問題では、(1)式に例示するように、図1に示す線形回帰による予測が行われることがある。下記(1)式は、図1に例示する1~4の4つのセンサの時刻t-3から時刻tまでの単位時間毎のデータのうち時刻t-2から時刻tまでの12個のデータからセンサ1の時刻t+1におけるデータD1,t+1を予測する。
β+β1,t-2+β2,t-2+β3,t-2+β4,t-2
β1,t-1+β2,t-1+β3,t-1+β4,t-1
β1,t+β102,t+β113,t+β124,t=D1,t+1 …(1)
In the problem of predicting the near-future value of specific data in the data acquired from a large amount of sensors installed in various places, as illustrated in Eq. (1), the prediction by linear regression shown in FIG. 1 is performed. May be done. The following equation (1) is based on 12 data from time t-2 to time t among the data for each unit time from time t-3 to time t of the four sensors 1 to 4 illustrated in FIG. Predict data D 1, t + 1 at time t + 1 of sensor 1.
β 0 + β 1 D 1, t-2 + β 2 D 2, t-2 + β 3 D 3, t-2 + β 4 D 4, t-2 +
β 5 D 1, t-1 + β 6 D 2, t-1 + β 7 D 3, t-1 + β 8 D 4, t-1 +
β 9 D 1, t + β 10 D 2, t + β 11 D 3, t + β 12 D 4, t = D 1, t + 1 … (1)
 βk(k=0,…,12)は回帰係数であり、Dl,t-mは、図1に円で例示するデータであり、lはセンサの番号、t-mは時刻を表し、mは自然数を表す。 β k (k = 0, ..., 12) is the regression coefficient, D l, tm are the data illustrated by the circle in FIG. 1, l is the sensor number, tm is the time, and so on. m represents a natural number.
 詳細には、以下の学習過程及び予測過程によりデータの予測が行われる。まず、予測するデータに関係しそうな各センサの過去のデータを収集して線形回帰式を作成し、予測しようとする現象(予測対象の特異な動き)が生じているサンプル区間のデータを線形回帰式の選択変数に代入して、回帰係数βkを連立方程式により求める(学習過程)。次に、求められた回帰係数βkにより、センサのデータから連続的に予測値を計算する(予測過程)。 In detail, the data is predicted by the following learning process and prediction process. First, the past data of each sensor that seems to be related to the predicted data is collected to create a linear regression equation, and the data of the sample section where the phenomenon to be predicted (unique movement of the predicted target) occurs is linearly regressiond. Substitute it in the selection variable of the equation and obtain the regression coefficient β k by the simultaneous equations (learning process). Next, the predicted value is continuously calculated from the sensor data by the obtained regression coefficient β k (prediction process).
 非常に多量のセンサからの過去の長い期間のデータを用いる場合には、サンプル区間のデータが不足し、連立方程式の解が求まらなくなる場合がある。例えば、河川の水位の予想を行う場合、センサからのデータの代わり、あるいは、センサからのデータに追加して、例えば、気象庁で作成されるメッシュ状の雨量の予報値を用いることができる。しかしながら、この場合はさらに回帰係数の数が増えることになり、サンプル区間のデータが不足する場合がある。 When using data from a very large amount of sensors over a long period of time in the past, the data of the sample interval may be insufficient and the solution of the simultaneous equations may not be obtained. For example, when predicting the water level of a river, a mesh-shaped rainfall forecast value created by the Japan Meteorological Agency can be used instead of the data from the sensor or in addition to the data from the sensor. However, in this case, the number of regression coefficients will increase further, and the data of the sample interval may be insufficient.
 サンプル区間のデータが不足した場合、L1正則化という方法(非特許文献1)を用いて、(2)式に例示するように、近似解の回帰係数を求めることができる。このL1正則化には、予測に貢献しにくい回帰係数をゼロに近づける性質がある。
Figure JPOXMLDOC01-appb-M000001

 
は目的変数であり、xijは説明変数であり、tは調整パラメータである。
When the data of the sample interval is insufficient, the regression coefficient of the approximate solution can be obtained by using the method called L1 regularization (Non-Patent Document 1) as illustrated by Eq. (2). This L1 regularization has the property of bringing the regression coefficient, which is difficult to contribute to prediction, close to zero.
Figure JPOXMLDOC01-appb-M000001


The y i is the dependent variable, x ij is the explanatory variable, t c is the adjustment parameter.
 L1正則化を用いて構成される、関連技術のデータ予測装置10を図2に示す。データ収集部11は多数のセンサの各々からデータを収集し、予測部12及び学習区間選択部13に同じデータを送信する。学習区間選択部13では、予測しようとする事象が現れていそうなデータ区間内のデータをルールベースで選択し、選択したデータを学習するべきデータとして回帰係数算出部14に送信する。回帰係数算出部14では、L1正則化を用いて回帰係数を求め、当該回帰係数を予測部12に送信する。予測部12は、回帰係数算出部14から受信した回帰係数と、データ収集部11から受信したデータと、を線形回帰式に代入して連続的に予測値を算出する。 FIG. 2 shows a data prediction device 10 of a related technology configured by using L1 regularization. The data collection unit 11 collects data from each of a large number of sensors and transmits the same data to the prediction unit 12 and the learning section selection unit 13. The learning section selection unit 13 selects data in a data section in which an event to be predicted is likely to appear based on a rule, and transmits the selected data to the regression coefficient calculation unit 14 as data to be learned. The regression coefficient calculation unit 14 obtains a regression coefficient using L1 regularization, and transmits the regression coefficient to the prediction unit 12. The prediction unit 12 continuously calculates the prediction value by substituting the regression coefficient received from the regression coefficient calculation unit 14 and the data received from the data collection unit 11 into the linear regression equation.
 L1正則化により、多くの説明変数を使用する場合にも回帰係数を求めることはできるようになるが、予測に影響を及ぼすセンサの範囲、及びデータ計測時間の範囲は事前に既知でないため、広めに範囲を設定することになる。そのため、入力として使用するデータの数が非常に多くなり、データを収集及び伝送する回線容量を圧迫し、また、学習過程における計算処理量も膨大になる、という問題が発生する。 L1 regularization makes it possible to obtain regression coefficients even when many explanatory variables are used, but the range of sensors that affect prediction and the range of data measurement time are not known in advance, so they are widespread. The range will be set to. Therefore, the number of data used as input becomes very large, which puts pressure on the line capacity for collecting and transmitting data, and also causes a problem that the amount of calculation processing in the learning process becomes enormous.
 本開示では、回帰係数を求める際に使用するデータの数を低減し、データを収集及び伝送する回線容量を低減し、学習過程における計算処理量を低減する、ことを目的とする。 The purpose of this disclosure is to reduce the number of data used when calculating the regression coefficient, reduce the line capacity for collecting and transmitting data, and reduce the amount of calculation processing in the learning process.
 本開示の第1態様のデータ予測装置は、収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信するデータ収集部と、L1正則化により回帰係数を求める第1算出法を用いて、前記データ収集部から受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち絶対値が閾値以上の回帰係数に対応するデータを選択するための選択情報を前記データ収集部に送信する第1回帰係数算出部と、前記第1算出法と異なる第2算出法を用いて、前記データ収集部から受信した前記選択データに基づいて回帰係数を算出する第2回帰係数算出部と、前記データ収集部から受信した前記選択データ、及び前記第2回帰係数算出部で算出された回帰係数に基づいて予測した予測結果を出力する予測部と、を含む。 The data prediction device of the first aspect of the present disclosure has a data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a regression coefficient by L1 regularization. A regression coefficient is calculated based on the plurality of data received from the data collection unit using the first calculation method to be obtained, and data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value is selected from the calculated regression coefficients. The regression coefficient is based on the selection data received from the data collection unit by using the first regression coefficient calculation unit that transmits the selection information for the purpose to the data collection unit and the second calculation method different from the first calculation method. A second regression coefficient calculation unit that calculates, a prediction unit that outputs the prediction result predicted based on the selection data received from the data collection unit and the regression coefficient calculated by the second regression coefficient calculation unit. including.
 本開示の第2態様は、第1態様のデータ予測装置であり、前記予測結果と実測値との誤差であって、所定値以上の前記誤差の発生回数が所定回数以上になった場合に、前記第2回帰係数算出部に対して前記第2算出法による回帰係数の再算出の指示を行う誤差監視部をさらに含む。 The second aspect of the present disclosure is the data prediction device of the first aspect, which is an error between the prediction result and the actually measured value, and when the number of occurrences of the error of the predetermined value or more becomes the predetermined number of times or more. An error monitoring unit that instructs the second regression coefficient calculation unit to recalculate the regression coefficient by the second calculation method is further included.
 本開示の第3態様は、第2態様のデータ予測装置であって、前記誤差監視部は、前記再算出の指示を行った後に、前記誤差の発生回数が再度前記所定回数以上になった場合に、前記第1回帰係数算出部に対して、前記第1算出法による回帰係数の再算出の指示、及び前記データ収集部に対する選択情報の再送信の指示を行う。 The third aspect of the present disclosure is the data prediction device of the second aspect, in which the error monitoring unit again causes the error to occur more than the predetermined number of times after instructing the recalculation. The first regression coefficient calculation unit is instructed to recalculate the regression coefficient by the first calculation method, and the data collection unit is instructed to retransmit the selection information.
 本開示の第4態様のデータ予測装置は、収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信するデータ収集部と、L1正則化により回帰係数を求める算出法を用いて、前記データ収集部から受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち非ゼロの回帰係数に対応するデータを選択するための選択情報を前記データ収集部に送信する回帰係数算出部と、前記データ収集部から受信した前記選択データ、及び前記回帰係数算出部で算出された前記非ゼロの回帰係数に基づいて予測した予測結果を出力する予測部と、を含む。 The data prediction device of the fourth aspect of the present disclosure has a data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a regression coefficient by L1 regularization. The regression coefficient is calculated based on the plurality of data received from the data collection unit using the calculated calculation method, and the selection information for selecting the data corresponding to the non-zero regression coefficient among the calculated regression coefficients is provided. Outputs the prediction result predicted based on the regression coefficient calculation unit transmitted to the data collection unit, the selection data received from the data collection unit, and the non-zero regression coefficient calculated by the regression coefficient calculation unit. Includes a predictor and.
 本開示の第5態様は、第4態様のデータ予測装置であって、前記予測結果と実測値との誤差であって、所定値以上の前記誤差の発生回数が所定回数以上になった場合に、前記回帰係数算出部に対して前記算出法による回帰係数の再算出の指示を行う誤差監視部をさらに含む。 The fifth aspect of the present disclosure is the data prediction device of the fourth aspect, in which the error between the prediction result and the actually measured value is greater than or equal to the predetermined number of occurrences of the error. Further includes an error monitoring unit that instructs the regression coefficient calculation unit to recalculate the regression coefficient by the calculation method.
 本開示の第6態様のプログラムは、収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信し、L1正則化により回帰係数を求める第1算出法を用いて、受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち絶対値が閾値以上の回帰係数に対応するデータを選択するための選択情報を送信し、前記第1算出法と異なる第2算出法を用いて、受信した前記選択データを使用して回帰係数を算出し、受信した前記選択データ、及び前記第2算出法を用いて算出された回帰係数に基づいて予測した予測結果を出力する、データ予測処理をコンピュータに実行させる。 The program of the sixth aspect of the present disclosure transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and obtains a regression coefficient by L1 regularization. The regression coefficient is calculated based on the plurality of received data, and selection information for selecting data corresponding to the regression coefficient whose absolute value is equal to or greater than the threshold value among the calculated regression coefficients is transmitted, and the first A regression coefficient is calculated using the received selection data using a second calculation method different from the calculation method, and based on the received selection data and the regression coefficient calculated using the second calculation method. Have the computer execute the data prediction process that outputs the predicted prediction result.
 本開示の第7態様のデータ予測方法では、コンピュータが、収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信し、L1正則化により回帰係数を求める第1算出法を用いて、受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち絶対値が閾値以上の回帰係数に対応するデータを選択するための選択情報を送信し、前記第1算出法と異なる第2算出法を用いて、受信した前記選択データを使用して回帰係数を算出し、受信した前記選択データ、及び前記第2算出法を用いて算出された回帰係数に基づいて予測した予測結果を出力する。 In the data prediction method of the seventh aspect of the present disclosure, the computer transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and obtains a regression coefficient by L1 regularization. A regression coefficient is calculated based on the plurality of received data using the first calculation method, and selection information for selecting data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value among the calculated regression coefficients is transmitted. Then, a regression coefficient was calculated using the received selection data using a second calculation method different from the first calculation method, and was calculated using the received selection data and the second calculation method. Output the prediction result predicted based on the regression coefficient.
 本開示では、データの数を低減し、データを収集及び伝送する回線容量を低減し、学習過程における計算処理量を低減する、ことができる。 In the present disclosure, it is possible to reduce the number of data, reduce the line capacity for collecting and transmitting data, and reduce the amount of calculation processing in the learning process.
線形回帰式によるデータの予測を説明する模式図である。It is a schematic diagram explaining the prediction of data by a linear regression equation. 関連技術のデータ予測装置を例示するブロック図である。It is a block diagram which illustrates the data prediction apparatus of the related technology. 本実施形態のデータ予測装置を例示するブロック図である。It is a block diagram which illustrates the data prediction apparatus of this embodiment. 本実施形態のデータ予測装置のハードウェア構成図を例示するブロック図である。It is a block diagram which illustrates the hardware block diagram of the data prediction apparatus of this embodiment. 本実施形態の学習フェーズの処理の流れを例示する模式図である。It is a schematic diagram which illustrates the process flow of the learning phase of this embodiment. 本実施形態の予測フェーズの処理の流れを例示する模式図である。It is a schematic diagram which illustrates the process flow of the prediction phase of this embodiment. 本実施形態の再選択フェーズの処理の流れを例示する模式図である。It is a schematic diagram which illustrates the process flow of the reselection phase of this embodiment. 本実施形態のデータ予測装置を例示するブロック図である。It is a block diagram which illustrates the data prediction apparatus of this embodiment. 本実施形態のデータ予測装置の変形例を例示するブロック図である。It is a block diagram which illustrates the modification of the data prediction apparatus of this embodiment.
 本実施形態のデータ予測装置20を図3に例示する。データ収集部21は多数のセンサS,S,S,…の各々からデータを収集し、予測部22及び学習区間選択部23に同じデータを送信する。データ収集部21が、予測部22及び学習区間選択部23に送信するデータとしては、多数のセンサS,S,S,…から収集した複数のデータ、及び後述する第1回帰係数算出部25から受信した選択情報に基づいて収集した複数のデータから選択した選択データがある。 The data prediction device 20 of this embodiment is illustrated in FIG. The data collection unit 21 collects data from each of a large number of sensors S 1 , S 2 , S 3 , ..., And transmits the same data to the prediction unit 22 and the learning section selection unit 23. Data collection unit 21, as the data to be transmitted to the prediction unit 22 and the learning section selecting unit 23, a number of sensors S 1, S 2, S 3 , ... a plurality of data collected from, and the first regression coefficient calculation described later There is selection data selected from a plurality of data collected based on the selection information received from the unit 25.
 学習区間選択部23では、予測しようとする事象が現れていることが予測されるデータ区間をルールベースで選択し、学習するべきデータを第1回帰係数算出部25及び第2回帰係数算出部24に送信する。すなわち、学習区間選択部23は、予測しようとする事象が現れていないことが予測される区間に存在する不要なデータ及び変化の乏しいデータを取り除く。 The learning section selection unit 23 selects the data section in which the event to be predicted appears is predicted based on the rule, and selects the data to be learned in the first regression coefficient calculation unit 25 and the second regression coefficient calculation unit 24. Send to. That is, the learning section selection unit 23 removes unnecessary data and data with little change existing in the section in which the event to be predicted does not appear.
 第1回帰係数算出部25は、L1正則化により近似解の回帰係数を求める第1算出法を用いて、データ収集部21から送信され、かつ学習区間選択部23で選択された複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち絶対値が閾値以上の回帰係数を選択し、選択した回帰係数を示す情報を、データ収集部21及び第2回帰係数算出部24の各々に送信する。 The first regression coefficient calculation unit 25 uses the first calculation method of obtaining the regression coefficient of the approximate solution by L1 regularization to a plurality of data transmitted from the data collection unit 21 and selected by the learning section selection unit 23. The regression coefficient is calculated based on the calculation, the regression coefficient whose absolute value is equal to or larger than the threshold value is selected from the calculated regression coefficients, and the information indicating the selected regression coefficient is transmitted to each of the data collection unit 21 and the second regression coefficient calculation unit 24. Send.
 データ収集部21では、第1回帰係数算出部25から受信した情報に基づいて、収集した複数のデータから絶対値が閾値以上の回帰係数に対応するデータを選択し、選択したデータを選択データとして予測部22、学習区間選択部23、及び誤差監視部26の各々に送信する。第1回帰係数算出部25からデータ収集部21に送信される情報は、データ収集部21においてデータを選択するために使用されるので、以下では選択情報という。 The data collection unit 21 selects data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value from a plurality of collected data based on the information received from the first regression coefficient calculation unit 25, and uses the selected data as selection data. It is transmitted to each of the prediction unit 22, the learning section selection unit 23, and the error monitoring unit 26. The information transmitted from the first regression coefficient calculation unit 25 to the data collection unit 21 is used for selecting data in the data collection unit 21, and is therefore referred to as selection information below.
 なお、データ収集部21では、絶対値が閾値以上の回帰係数に対応するデータが選択データとして選択され、絶対値が閾値未満の回帰係数に対応するデータが削除される。このため、データ収集部21に送信する選択情報としては、絶対値が閾値以上の回帰係数を示す情報に代えて、絶対値が閾値未満の回帰係数を示す情報を用い、データ収集部21において、絶対値が閾値未満の回帰係数に対応するデータを削除することにより、選択データを得るようにしてもよい。 In the data collection unit 21, data corresponding to the regression coefficient whose absolute value is equal to or more than the threshold value is selected as selection data, and the data corresponding to the regression coefficient whose absolute value is less than the threshold value is deleted. Therefore, as the selection information to be transmitted to the data collection unit 21, the data collection unit 21 uses information indicating a regression coefficient whose absolute value is less than the threshold value instead of information indicating a regression coefficient whose absolute value is less than the threshold value. Selected data may be obtained by deleting the data corresponding to the regression coefficient whose absolute value is less than the threshold value.
 第2回帰係数算出部24は、データ収集部21から送信され、かつ学習区間選択部23を通過した選択データを使用して、L1正則化ではなく、L2正則化または重回帰を用いて回帰係数を求める。 The second regression coefficient calculation unit 24 uses the selection data transmitted from the data acquisition unit 21 and passed through the learning section selection unit 23, and uses L2 regularization or multiple regression instead of L1 regularization. Ask for.
 第2回帰係数算出部24で算出された回帰係数は、L2正則化または重回帰を用いた回帰計算による予測が行われる予測部22に送信される。データ収集部21では、選択データを予測部22及び学習区間選択部23に送信する。すなわち、データ収集部21は、全てのセンサS,S,S,…の各々からデータを収集するが、データの送信は、選択的に行う。 The regression coefficient calculated by the second regression coefficient calculation unit 24 is transmitted to the prediction unit 22 in which prediction is performed by regression calculation using L2 regularization or multiple regression. The data collection unit 21 transmits the selection data to the prediction unit 22 and the learning section selection unit 23. That is, the data collection unit 21 collects data from each of all the sensors S 1 , S 2 , S 3 , ..., But transmits the data selectively.
 予測部22では、データ収集部21から受信した選択データと、第2回帰係数算出部24から受信した回帰係数とを用いて、線形回帰式で予測した予測結果を出力する。 The prediction unit 22 outputs the prediction result predicted by the linear regression equation using the selection data received from the data collection unit 21 and the regression coefficient received from the second regression coefficient calculation unit 24.
 すなわち、予測結果は、絶対値が閾値未満の回帰係数に対応するデータが削除され、絶対値が閾値以上の回帰係数と絶対値が閾値以上の回帰係数に対応するデータとから算出されるため、学習過程で使用するデータ量が低減され、計算処理量を低減することができる。 That is, the prediction result is calculated from the data corresponding to the regression coefficient whose absolute value is less than the threshold, and the data corresponding to the regression coefficient whose absolute value is greater than or equal to the threshold and the data corresponding to the regression coefficient whose absolute value is greater than or equal to the threshold. The amount of data used in the learning process can be reduced, and the amount of calculation processing can be reduced.
 誤差監視部26では、予測部22による予測結果と、データ収集部21で収集されたデータである実測値との差である誤差を監視し、所定値以上の誤差が所定回数(許容回数)以上発生した場合に、データ収集部21と第1回帰係数算出部25に、再選択指示を送信する。データ収集部21は、この再選択指示により、選択データの再選択が終わるまでの一定期間、選択データではなく、全データを後段に送信するようになり、第1回帰係数算出部25では、再度、全データを対象にしたL1正則化によるデータの選択が行われる。この際、L2正則化または重回帰による予測結果の誤差が、L1正則化による予測結果の誤差よりも大きい場合にのみ、選択データの再選択を行い、L2正則化または重回帰による予測結果の誤差が、L1正則化による予測結果の誤差以下である場合には、再選択を行わない。 The error monitoring unit 26 monitors an error that is the difference between the prediction result by the prediction unit 22 and the actually measured value that is the data collected by the data collection unit 21, and the error of the predetermined value or more is the predetermined number of times (allowable number of times) or more. When it occurs, a reselection instruction is transmitted to the data collection unit 21 and the first regression coefficient calculation unit 25. By this reselection instruction, the data collection unit 21 will transmit all the data, not the selected data, to the subsequent stage for a certain period until the reselection of the selected data is completed, and the first regression coefficient calculation unit 25 will again transmit the selected data. , Data is selected by L1 regularization for all data. At this time, the selection data is reselected only when the error of the prediction result by L2 regularization or multiple regression is larger than the error of the prediction result by L1 regularization, and the error of the prediction result by L2 regularization or multiple regression is performed. However, if it is less than or equal to the error of the prediction result due to L1 regularization, reselection is not performed.
 なお、誤差監視部26では、予測部22による予測結果と、実測値の差である誤差を監視し、所定値以上の誤差が所定回数(許容回数)以上発生した場合に、第2回帰係数算出部24で、回帰係数の再算出を行うようにしてもよい。回帰係数の再算出後、所定値以上の誤差が所定回数(許容回数)以上発生した場合に、データ収集部21と第1回帰係数算出部25に、再選択指示を送信するようにしてもよい。 The error monitoring unit 26 monitors an error that is the difference between the prediction result by the prediction unit 22 and the measured value, and calculates the second regression coefficient when an error of a predetermined value or more occurs a predetermined number of times (allowable number of times) or more. In part 24, the regression coefficient may be recalculated. After recalculating the regression coefficient, if an error of a predetermined value or more occurs a predetermined number of times (allowable number of times) or more, a reselection instruction may be transmitted to the data collection unit 21 and the first regression coefficient calculation unit 25. ..
 図4に、データ予測装置20のハードウェア構成を例示する。データ予測装置20は、一例として、図4に示すように、CPU(Central Processing Unit)51、一次記憶部52、二次記憶部53、及び、外部インタフェース54を含む。CPU51は、ハードウェアであるプロセッサの一例である。CPU51、一次記憶部52、二次記憶部53、及び、外部インタフェース54は、バス59を介して相互に接続されている。 FIG. 4 illustrates the hardware configuration of the data prediction device 20. As an example, the data prediction device 20 includes a CPU (Central Processing Unit) 51, a primary storage unit 52, a secondary storage unit 53, and an external interface 54, as shown in FIG. The CPU 51 is an example of a processor that is hardware. The CPU 51, the primary storage unit 52, the secondary storage unit 53, and the external interface 54 are connected to each other via the bus 59.
 一次記憶部52は、例えば、RAM(Random Access Memory)などの揮発性のメモリである。二次記憶部53は、例えば、HDD(Hard Disk Drive)、又はSSD(Solid State Drive)などの不揮発性のメモリである。 The primary storage unit 52 is, for example, a volatile memory such as a RAM (Random Access Memory). The secondary storage unit 53 is, for example, a non-volatile memory such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
 二次記憶部53は、プログラム格納領域53A及びデータ格納領域53Bを含む。プログラム格納領域53Aは、一例として、データ予測プログラムなどのプログラムを記憶している。データ格納領域53Bは、一例として、センサからのデータ及びデータ予測処理中の中間データなどを記憶する。 The secondary storage unit 53 includes a program storage area 53A and a data storage area 53B. The program storage area 53A stores a program such as a data prediction program as an example. As an example, the data storage area 53B stores data from the sensor, intermediate data during data prediction processing, and the like.
 CPU51は、プログラム格納領域53Aからデータ予測プログラムを読み出して一次記憶部52に展開する。CPU51は、データ予測プログラムをロードして実行することで、図3のデータ収集部21、予測部22、学習区間選択部23、第2回帰係数算出部24、第1回帰係数算出部25、及び誤差監視部26として動作する。 The CPU 51 reads the data prediction program from the program storage area 53A and deploys it in the primary storage unit 52. By loading and executing the data prediction program, the CPU 51 loads and executes the data collection unit 21, the prediction unit 22, the learning section selection unit 23, the second regression coefficient calculation unit 24, the first regression coefficient calculation unit 25, and It operates as an error monitoring unit 26.
 なお、データ予測プログラムなどのプログラムは、外部サーバに記憶され、ネットワークを介して、一次記憶部62に展開されてもよい。また、データ予測プログラムなどのプログラムは、Digital Versatile Disc(DVD)などの非一時的記録媒体に記憶され、記録媒体読込装置を介して、一次記憶部52に展開されてもよい。 A program such as a data prediction program may be stored in an external server and expanded to the primary storage unit 62 via a network. Further, a program such as a data prediction program may be stored in a non-temporary recording medium such as Digital Versatile Disc (DVD) and expanded to the primary storage unit 52 via a recording medium reading device.
 外部インタフェース54には外部装置が接続され、外部インタフェース54は、外部装置とCPU51との間の各種情報の送受信を司る。図4では、外部インタフェース54に、センサ31A、及び危険報知システム31Bが接続されている例を示している。なお、センサ31Aは、多数のセンサを含む。 An external device is connected to the external interface 54, and the external interface 54 controls the transmission and reception of various information between the external device and the CPU 51. FIG. 4 shows an example in which the sensor 31A and the danger notification system 31B are connected to the external interface 54. The sensor 31A includes a large number of sensors.
 データ予測装置20で予測されたデータは、例えば、外部インタフェース54に接続される危険報知システム31Bに送信され、危険報知システム31Bにおける危険報知処理に使用されてもよい。また、データ予測装置20で予測されたデータは、例えば、外部インタフェース54に接続される外部記憶装置に記録されてもよいし、外部インタフェース54に接続されるディスプレイの画面に文字または画像として表示されてもよい。 The data predicted by the data prediction device 20 may be transmitted to, for example, the danger notification system 31B connected to the external interface 54 and used for the danger notification processing in the danger notification system 31B. Further, the data predicted by the data prediction device 20 may be recorded in an external storage device connected to the external interface 54, for example, or displayed as characters or images on the screen of the display connected to the external interface 54. You may.
 また、データ予測装置20は、専用装置であってもよいし、ワークステーション、パーソナルコンピュータ、またはタブレットなどの汎用装置であってもよい。 Further, the data prediction device 20 may be a dedicated device or a general-purpose device such as a workstation, a personal computer, or a tablet.
 図5~図7に、本実施形態のデータ予測処理を例示する。図5は、学習フェーズの流れを例示する。 FIGS. 5 to 7 illustrate the data prediction process of this embodiment. FIG. 5 illustrates the flow of the learning phase.
 データ収集部21は、手順S101で、例えば、多数のセンサを含むセンサ31Aからデータを収集する。データ収集部21は、手順S102で、学習区間選択部23に、収集したデータを送信する。 In step S101, the data collection unit 21 collects data from, for example, a sensor 31A including a large number of sensors. The data collection unit 21 transmits the collected data to the learning section selection unit 23 in the procedure S102.
 学習区間選択部23は、手順S103で、予測に適切そうなデータ区間内のデータを選択して第2回帰係数算出部24と第1回帰係数算出部25に送信する。第1回帰係数算出部25は、L1正則化を用いて回帰係数を算出し、絶対値が閾値以上の回帰係数を選択して、第2回帰係数算出部24とデータ収集部21に送信する。 In step S103, the learning section selection unit 23 selects data in the data section that seems to be suitable for prediction and transmits it to the second regression coefficient calculation unit 24 and the first regression coefficient calculation unit 25. The first regression coefficient calculation unit 25 calculates the regression coefficient using L1 regularization, selects a regression coefficient whose absolute value is equal to or greater than the threshold value, and transmits it to the second regression coefficient calculation unit 24 and the data collection unit 21.
 第2回帰係数算出部24は、第1回帰係数算出部25で選択されたデータによる通常の回帰計算、例えば、L2正則化または重回帰により、回帰係数を求めて予測部22に送信する。 The second regression coefficient calculation unit 24 obtains the regression coefficient by ordinary regression calculation using the data selected by the first regression coefficient calculation unit 25, for example, L2 regularization or multiple regression, and transmits it to the prediction unit 22.
 図6は、予測フェーズの流れを例示する。データ収集部21は、手順S106で、手順S104で選択されたデータを予測部22及び誤差監視部26に送信する。予測部22は、手順S107で、手順S105で求めた回帰係数とデータとに基づいて予測結果を算出し、誤差監視部26に送信すると共に、例えば、外部記憶装置に記憶する。 FIG. 6 illustrates the flow of the prediction phase. The data acquisition unit 21 transmits the data selected in the procedure S104 to the prediction unit 22 and the error monitoring unit 26 in the procedure S106. The prediction unit 22 calculates the prediction result based on the regression coefficient and the data obtained in the procedure S105 in the procedure S107, transmits it to the error monitoring unit 26, and stores it in, for example, an external storage device.
 図7は、誤差監視及び再選択フェーズの流れを例示する。誤差監視部26は、手順S108で、予測結果とデータ(実測値)との予測誤差(誤差)を算出し、所定値以上の予測誤差が所定回数(許容回数)以上生じた場合、第1回帰係数算出部25に再選択指示を送信する。また、誤差監視部26は、手順S108で、再選択指示を送信した場合、データ収集部21に、再選択が完了するまで、センサ等から収集されたデータを全て後段(学習区間選択部23、誤差監視部26及び予測部22)に送信するよう指示する。 FIG. 7 illustrates the flow of the error monitoring and reselection phase. The error monitoring unit 26 calculates a prediction error (error) between the prediction result and the data (actual measurement value) in step S108, and when a prediction error of a predetermined value or more occurs more than a predetermined number of times (allowable number of times), the first regression A reselection instruction is transmitted to the coefficient calculation unit 25. Further, when the error monitoring unit 26 transmits the reselection instruction in the procedure S108, the error monitoring unit 26 sends all the data collected from the sensor or the like to the data collecting unit 21 until the reselection is completed (learning section selection unit 23, Instruct the error monitoring unit 26 and the prediction unit 22) to transmit.
 本実施形態によれば、L1正則化を予測に用いる場合より、予測誤差が大きくなる傾向を示す場合があるが、予測誤差が大きい場合、あるいは、定期的に、データの再選択を行うことにより、予測誤差を所定範囲内に収めることが可能である。 According to the present embodiment, the prediction error tends to be larger than when the L1 regularization is used for the prediction, but when the prediction error is large or by periodically reselecting the data. , It is possible to keep the prediction error within a predetermined range.
 なお、データ予測装置20は、図8に例示するように、データ収集装置32A、学習装置32B及び予測装置32Cを含んでいてもよい。データ収集装置32Aは、データ収集部21を含み、学習装置32Bは、学習区間選択部23、第2回帰係数算出部24、第1回帰係数算出部25及び誤差監視部26を含み、予測装置32Cは予測部22を含む。 Note that the data prediction device 20 may include a data acquisition device 32A, a learning device 32B, and a prediction device 32C, as illustrated in FIG. The data collection device 32A includes a data collection unit 21, and the learning device 32B includes a learning section selection unit 23, a second regression coefficient calculation unit 24, a first regression coefficient calculation unit 25, and an error monitoring unit 26, and the prediction device 32C. Includes the prediction unit 22.
 センサ31Aとデータ収集部21との間、及び予測部22と、例えば、危険報知システム31Bなどの予測結果の出力先との間は、伝送回線で接続されている。また、図8に示す例では、データ収集部21と学習区間選択部23との間、第2回帰係数算出部24と予測部22との間も伝送回線で接続されている。 The sensor 31A and the data acquisition unit 21 and the prediction unit 22 and the output destination of the prediction result such as the danger notification system 31B are connected by a transmission line. Further, in the example shown in FIG. 8, the data acquisition unit 21 and the learning section selection unit 23, and the second regression coefficient calculation unit 24 and the prediction unit 22 are also connected by a transmission line.
 図9に、本実施形態の変形例のデータ予測装置40を例示する。データ予測装置20と同様の構成及び作用については、説明を適宜省略する。データ予測装置40は、第2回帰係数算出部24及び第1回帰係数算出部25に代えて、回帰係数算出部44を含む点で、図3のデータ予測装置20と異なる。データ予測装置40の回帰係数算出部44は、L1正則化を使用して回帰係数を算出する。回帰係数算出部44は、非ゼロの回帰係数をデータ収集部41及び予測部42に送信する。 FIG. 9 illustrates a data prediction device 40 of a modified example of this embodiment. The description of the configuration and operation similar to that of the data prediction device 20 will be omitted as appropriate. The data prediction device 40 is different from the data prediction device 20 of FIG. 3 in that the regression coefficient calculation unit 44 is included in place of the second regression coefficient calculation unit 24 and the first regression coefficient calculation unit 25. The regression coefficient calculation unit 44 of the data prediction device 40 calculates the regression coefficient using L1 regularization. The regression coefficient calculation unit 44 transmits a non-zero regression coefficient to the data acquisition unit 41 and the prediction unit 42.
 データ収集部41では、回帰係数算出部44から受信した情報に基づいて、収集した複数のデータから非ゼロの回帰係数に対応するデータを選択し、選択したデータを選択データとして予測部42、学習区間選択部43、及び誤差監視部46の各々に送信する。 The data collection unit 41 selects data corresponding to a non-zero regression coefficient from a plurality of collected data based on the information received from the regression coefficient calculation unit 44, and uses the selected data as selection data in the prediction unit 42 and learning. It is transmitted to each of the section selection unit 43 and the error monitoring unit 46.
 データ予測装置40では、非ゼロの回帰係数に対応するデータを選択し、ゼロの回帰係数に対応するデータを選択しないことによって、データの選択が、回帰係数の算出と共に行われていると見なしている。 In the data prediction device 40, by selecting the data corresponding to the non-zero regression coefficient and not selecting the data corresponding to the zero regression coefficient, it is considered that the data selection is performed together with the calculation of the regression coefficient. There is.
 変形例では、データの選択と回帰係数の算出を同時に行うことが可能となるため、計算コストを低減することができる。 In the modified example, data selection and regression coefficient calculation can be performed at the same time, so the calculation cost can be reduced.
 本開示のデータ予測装置は、収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信するデータ収集部と、L1正則化により回帰係数を求める第1算出法を用いて、前記データ収集部から受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち絶対値が閾値以上の回帰係数に対応するデータを選択するための選択情報を前記データ収集部に送信する第1回帰係数算出部と、前記第1算出法と異なる第2算出法を用いて、前記データ収集部から受信した前記選択データに基づいて回帰係数を算出する第2回帰係数算出部と、前記データ収集部から受信した前記選択データ、及び前記第2回帰係数算出部で算出された回帰係数に基づいて予測した予測結果を出力する予測部と、を含む。 The data prediction device of the present disclosure includes a data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a first calculation for obtaining a regression coefficient by L1 regularization. Selection information for calculating a regression coefficient based on the plurality of data received from the data collection unit using the method and selecting data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value among the calculated regression coefficients. The regression coefficient is calculated based on the selected data received from the data collection unit by using the first regression coefficient calculation unit that transmits the data to the data collection unit and the second calculation method different from the first calculation method. It includes a 2 regression coefficient calculation unit, the selection data received from the data collection unit, and a prediction unit that outputs a prediction result predicted based on the regression coefficient calculated by the 2nd regression coefficient calculation unit.
 本開示では、これにより、データの数を低減し、データを収集及び伝送する回線容量を低減し、学習過程における計算処理量を低減する、ことができる。なお、データ収集部21から後段に送信するデータの数を低減する代わりに、データ収集部21がデータを受信するセンサの数を低減してもよい。 In the present disclosure, this makes it possible to reduce the number of data, reduce the line capacity for collecting and transmitting data, and reduce the amount of calculation processing in the learning process. Instead of reducing the number of data transmitted from the data collection unit 21 to the subsequent stage, the data collection unit 21 may reduce the number of sensors that receive the data.
 本開示では、第2回帰係数算出部25での計算が、通常の回帰計算であるため、計算コストを低減することができ、計算リソースの少ない環境における処理を可能にする。 In the present disclosure, since the calculation by the second regression coefficient calculation unit 25 is a normal regression calculation, the calculation cost can be reduced and the processing in an environment with few calculation resources is possible.
21 データ収集部
22 予測部
26 誤差監視部
24 第2回帰係数算出部
25 第1回帰係数算出部
31A センサ
51 CPU
52 一次記憶部
53 二次記憶部
21 Data acquisition unit 22 Prediction unit 26 Error monitoring unit 24 Second regression coefficient calculation unit 25 First regression coefficient calculation unit 31A Sensor 51 CPU
52 Primary storage 53 Secondary storage

Claims (7)

  1.  収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信するデータ収集部と、
     L1正則化により回帰係数を求める第1算出法を用いて、前記データ収集部から受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち絶対値が閾値以上の回帰係数に対応するデータを選択するための選択情報を前記データ収集部に送信する第1回帰係数算出部と、
     前記第1算出法と異なる第2算出法を用いて、前記データ収集部から受信した前記選択データに基づいて回帰係数を算出する第2回帰係数算出部と、
     前記データ収集部から受信した前記選択データ、及び前記第2回帰係数算出部で算出された回帰係数に基づいて予測した予測結果を出力する予測部と、
     を含むデータ予測装置。
    A data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a data collection unit.
    The regression coefficient is calculated based on the plurality of data received from the data collection unit using the first calculation method for obtaining the regression coefficient by L1 regularization, and the regression coefficient whose absolute value is equal to or larger than the threshold value among the calculated regression coefficients is calculated. A first regression coefficient calculation unit that transmits selection information for selecting data corresponding to the above to the data collection unit, and
    A second regression coefficient calculation unit that calculates a regression coefficient based on the selection data received from the data acquisition unit using a second calculation method different from the first calculation method.
    A prediction unit that outputs the prediction result predicted based on the selection data received from the data collection unit and the regression coefficient calculated by the second regression coefficient calculation unit.
    Data predictor including.
  2.  前記予測結果と実測値との誤差であって、所定値以上の前記誤差の発生回数が所定回数以上になった場合に、前記第2回帰係数算出部に対して前記第2算出法による回帰係数の再算出の指示を行う誤差監視部をさらに含む、
     請求項1に記載のデータ予測装置。
    When the number of occurrences of the error, which is an error between the predicted result and the measured value and is equal to or greater than the predetermined value, becomes the predetermined number or more, the regression coefficient according to the second calculation method is applied to the second regression coefficient calculation unit. Including an error monitoring unit that gives instructions for recalculation of
    The data prediction device according to claim 1.
  3.  前記誤差監視部は、前記再算出の指示を行った後に、前記誤差の発生回数が再度前記所定回数以上になった場合に、前記第1回帰係数算出部に対して、前記第1算出法による回帰係数の再算出の指示、及び前記データ収集部に対する選択情報の再送信の指示を行う、
     請求項2に記載のデータ予測装置。
    After issuing the recalculation instruction, the error monitoring unit uses the first calculation method for the first regression coefficient calculation unit when the number of occurrences of the error becomes the predetermined number or more again. Instruct the data collection unit to recalculate the regression coefficient and retransmit the selection information.
    The data prediction device according to claim 2.
  4.  収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信するデータ収集部と、
     L1正則化により回帰係数を求める算出法を用いて、前記データ収集部から受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち非ゼロの回帰係数に対応するデータを選択するための選択情報を前記データ収集部に送信する回帰係数算出部と、
     前記データ収集部から受信した前記選択データ、及び前記回帰係数算出部で算出された前記非ゼロの回帰係数に基づいて予測した予測結果を出力する予測部と、
     を含むデータ予測装置。
    A data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a data collection unit.
    The regression coefficient is calculated based on the plurality of data received from the data collection unit using the calculation method for obtaining the regression coefficient by L1 regularization, and the data corresponding to the non-zero regression coefficient among the calculated regression coefficients is obtained. A regression coefficient calculation unit that transmits selection information for selection to the data collection unit, and
    A prediction unit that outputs a prediction result predicted based on the selection data received from the data acquisition unit and the non-zero regression coefficient calculated by the regression coefficient calculation unit.
    Data predictor including.
  5.  前記予測結果と実測値との誤差であって、所定値以上の前記誤差の発生回数が所定回数以上になった場合に、前記回帰係数算出部に対して前記算出法による回帰係数の再算出の指示を行う誤差監視部をさらに含む、
     請求項4に記載のデータ予測装置。
    When the error between the prediction result and the actually measured value is greater than or equal to the predetermined value and the number of occurrences of the error is greater than or equal to the predetermined number of times, the regression coefficient calculation unit is recalculated by the calculation method. Including an error monitoring unit that gives instructions,
    The data prediction device according to claim 4.
  6.  収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信し、
     L1正則化により回帰係数を求める第1算出法を用いて、受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち絶対値が閾値以上の回帰係数に対応するデータを選択するための選択情報を送信し、
     前記第1算出法と異なる第2算出法を用いて、受信した前記選択データを使用して回帰係数を算出し、
     受信した前記選択データ、及び前記第2算出法を用いて算出された回帰係数に基づいて予測した予測結果を出力する、
     データ予測処理をコンピュータに実行させるためのプログラム。
    The selected data selected from the plurality of collected data and the received selection information is transmitted, and the selected data is transmitted.
    The regression coefficient is calculated based on the plurality of received data using the first calculation method for obtaining the regression coefficient by L1 regularization, and among the calculated regression coefficients, the data corresponding to the regression coefficient whose absolute value is equal to or larger than the threshold value is obtained. Send selection information to make a selection,
    Using a second calculation method different from the first calculation method, the regression coefficient is calculated using the received selection data.
    Outputs the predicted prediction result based on the received selection data and the regression coefficient calculated by using the second calculation method.
    A program that causes a computer to perform data prediction processing.
  7.  コンピュータが、
     収集した複数のデータ、及び受信した選択情報に基づいて前記複数のデータから選択した選択データを送信し、
     L1正則化により回帰係数を求める第1算出法を用いて、受信した前記複数のデータに基づいて回帰係数を算出し、算出した回帰係数のうち絶対値が閾値以上の回帰係数に対応するデータを選択するための選択情報を送信し、
     前記第1算出法と異なる第2算出法を用いて、受信した前記選択データを使用して回帰係数を算出し、
     受信した前記選択データ、及び前記第2算出法を用いて算出された回帰係数に基づいて予測した予測結果を出力する、
     データ予測方法。
    The computer
    The selected data selected from the plurality of collected data and the received selection information is transmitted, and the selected data is transmitted.
    The regression coefficient is calculated based on the plurality of received data using the first calculation method for obtaining the regression coefficient by L1 regularization, and among the calculated regression coefficients, the data corresponding to the regression coefficient whose absolute value is equal to or larger than the threshold value is obtained. Send selection information to make a selection,
    Using a second calculation method different from the first calculation method, the regression coefficient is calculated using the received selection data.
    Outputs the predicted prediction result based on the received selection data and the regression coefficient calculated by using the second calculation method.
    Data prediction method.
PCT/JP2020/010304 2019-03-13 2020-03-10 Data prediction device, data prediction method, and data prediction program WO2020184561A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-045556 2019-03-13
JP2019045556A JP2020149282A (en) 2019-03-13 2019-03-13 Data prediction device, data prediction method, and data prediction program

Publications (1)

Publication Number Publication Date
WO2020184561A1 true WO2020184561A1 (en) 2020-09-17

Family

ID=72426568

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/010304 WO2020184561A1 (en) 2019-03-13 2020-03-10 Data prediction device, data prediction method, and data prediction program

Country Status (2)

Country Link
JP (1) JP2020149282A (en)
WO (1) WO2020184561A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4239427A4 (en) * 2020-10-27 2024-05-08 Jfe Steel Corp Abnormality diagnosing model construction method, abnormality diagnosing method, abnormality diagnosing model construction device, and abnormality diagnosing device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017130041A (en) * 2016-01-20 2017-07-27 富士通株式会社 Information processing device, control method and control program
JP2018010477A (en) * 2016-07-13 2018-01-18 富士通株式会社 Sensor control device, sensor system, sensor control method, and sensor control program
JP2018109876A (en) * 2017-01-04 2018-07-12 株式会社東芝 Sensor design support apparatus, sensor design support method and computer program
JP2018151883A (en) * 2017-03-13 2018-09-27 株式会社東芝 Analysis device, analysis method, and program
JP2019032185A (en) * 2017-08-04 2019-02-28 株式会社東芝 Sensor control support apparatus, sensor control support method, and computer program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017130041A (en) * 2016-01-20 2017-07-27 富士通株式会社 Information processing device, control method and control program
JP2018010477A (en) * 2016-07-13 2018-01-18 富士通株式会社 Sensor control device, sensor system, sensor control method, and sensor control program
JP2018109876A (en) * 2017-01-04 2018-07-12 株式会社東芝 Sensor design support apparatus, sensor design support method and computer program
JP2018151883A (en) * 2017-03-13 2018-09-27 株式会社東芝 Analysis device, analysis method, and program
JP2019032185A (en) * 2017-08-04 2019-02-28 株式会社東芝 Sensor control support apparatus, sensor control support method, and computer program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4239427A4 (en) * 2020-10-27 2024-05-08 Jfe Steel Corp Abnormality diagnosing model construction method, abnormality diagnosing method, abnormality diagnosing model construction device, and abnormality diagnosing device

Also Published As

Publication number Publication date
JP2020149282A (en) 2020-09-17

Similar Documents

Publication Publication Date Title
Karpechko Predictability of sudden stratospheric warmings in the ECMWF extended-range forecast system
JP6679086B2 (en) Learning device, prediction device, learning method, prediction method, and program
JP4875041B2 (en) Data compression system and method
GB2554978A (en) Deep machine learning to predict and prevent adverse conditions at structural assets
JP6002250B2 (en) Time-series data processing apparatus and method, and storage medium
JP5867349B2 (en) Quality prediction apparatus, operation condition determination method, quality prediction method, computer program, and computer-readable storage medium
JP7021732B2 (en) Time series forecasting device, time series forecasting method and program
CN109416408B (en) Device and method for estimating an epicenter distance, and computer-readable recording medium
AU2021236564A1 (en) Devices, methods, and systems for determining environmental standard compliance
US20120239627A1 (en) Data storage apparatus and data storage method
JP7393883B2 (en) System and method for characterizing time series of arbitrary length using preselected signatures
WO2020184561A1 (en) Data prediction device, data prediction method, and data prediction program
CN110764714A (en) Data processing method, device and equipment and readable storage medium
WO2020184560A1 (en) Data prediction device, data prediction method, and data prediction program
KR102158100B1 (en) Auto monitoring method and apparatus by using anomaly detection
JP6919705B2 (en) Deterioration prediction device, deterioration prediction method, and program
JP4956380B2 (en) Communication band calculation apparatus, method, and program
JP6693764B2 (en) Processing device, distributed processing system, and distributed processing method
JP4820747B2 (en) TRAVEL TIME CALCULATION DEVICE, PROGRAM, AND RECORDING MEDIUM
WO2017013882A1 (en) Prediction device, prediction system, prediction method, and program recording medium
JP4748139B2 (en) Data transfer device, data transfer end time prediction method and program
CN109213967B (en) Carrier rocket data prediction method and device, storage medium and electronic equipment
US20210199842A1 (en) Integration of physical sensors in a data assimilation framework
JP6922999B2 (en) Information processing equipment, information processing methods and programs
JP7430127B2 (en) Prediction device, prediction method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20769788

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20769788

Country of ref document: EP

Kind code of ref document: A1