WO2020184560A1

WO2020184560A1 - Data prediction device, data prediction method, and data prediction program

Info

Publication number: WO2020184560A1
Application number: PCT/JP2020/010303
Authority: WO
Inventors: 高嶋　洋一; 昌宏湯口; 山田　智広
Original assignee: 日本電信電話株式会社
Priority date: 2019-03-13
Filing date: 2020-03-10
Publication date: 2020-09-17
Also published as: JP2020149281A

Abstract

The purpose of the present invention is to reduce the quantity of data used to calculate regression coefficients, reduce the bandwidth for collecting and transmitting data, and reduce computation throughput in a learning process.　A data prediction device according to the present disclosure includes: a first regression coefficient computation unit that computes regression coefficients on the basis of a plurality of data and using a first computation method for calculating regression coefficients by L1 regularization, and transmits selection information for selecting data corresponding to the regression coefficients having an absolute value equal to or greater than a threshold value to a data collection unit; a determination unit that determines which of a first computation method and a second computation method to use; a second regression coefficient computation unit that computes regression coefficients using the determined computation method; and a prediction unit that uses the plurality of data if the computation method is determined to be the first computation method and uses selected data if the computation method is determined to be the second computation method, and outputs a prediction result that was predicted on the basis of the data used and the regression coefficients computed by the second regression coefficient computation unit.

Description

Data forecasting device, data forecasting method, and data forecasting program

The present disclosure relates to a data prediction device, a data prediction method, and a data prediction program.

In the problem of predicting the near-future value of specific data in the data acquired from a large amount of sensors installed in various places, as illustrated in Eq. (1), the prediction by linear regression shown in FIG. 1 is performed. May be done. The following equation (1) is based on 12 data from time t-2 to time t among the data for each unit time from time t-3 to time t of the four sensors 1 to 4 illustrated in FIG. Predict data D _{1, t + 1} at time t + 1 of sensor 1.
β ₀ + β ₁ D _{1, t-2} + β ₂ D _{2, t-2} + β ₃ D _{3, t-2} + β ₄ D _{4, t-2} +
β ₅ D _{1, t-1} + β ₆ D _{2, t-1} + β ₇ D _{3, t-1} + β ₈ D _{4, t-1} +
β ₉ D _{1, t} + β ₁₀ D _{2, t} + β ₁₁ D _{3, t} + β ₁₂ D _{4, t} = D _{1, t + 1} … (1)

β _k (k = 0, ..., 12) is the regression coefficient, D _{l, tm} are the data illustrated by the circle in FIG. 1, l is the sensor number, tm is the time, and so on. m represents a natural number.

In detail, the data is predicted by the following learning process and prediction process. First, the past data of each sensor that seems to be related to the predicted data is collected to create a linear regression equation, and the data of the sample section where the phenomenon to be predicted (unique movement of the predicted target) occurs is linearly regressiond. Substitute it in the selection variable of the equation and obtain the regression coefficient β _k by the simultaneous equations (learning process). Next, the predicted value is continuously calculated from the sensor data by the obtained regression coefficient β _k (prediction process).

When using data from a very large amount of sensors over a long period of time in the past, the data of the sample interval may be insufficient and the solution of the simultaneous equations may not be obtained. For example, when predicting the water level of a river, a mesh-shaped rainfall forecast value created by the Japan Meteorological Agency can be used instead of the data from the sensor or in addition to the data from the sensor. However, in this case, the number of regression coefficients will increase further, and the data of the sample interval may be insufficient.

When the data of the sample interval is insufficient, the regression coefficient of the approximate solution can be obtained by using the method called L1 regularization (Non-Patent Document 1) as illustrated by Eq. (2). This L1 regularization has the property of bringing the regression coefficient, which is difficult to contribute to prediction, close to zero.

The y _i is the dependent _{variable, x ij} is the explanatory variable, _{t c} is the adjustment parameter.

FIG. 2 shows a data prediction device 10 of a related technology configured by using L1 regularization. The data collection unit 11 collects data from each of a large number of sensors and transmits the same data to the prediction unit 12 and the learning section selection unit 13. The learning section selection unit 13 selects data in a data section in which an event to be predicted is likely to appear based on a rule, and transmits the selected data to the regression coefficient calculation unit 14 as data to be learned. The regression coefficient calculation unit 14 obtains a regression coefficient using L1 regularization, and transmits the regression coefficient to the prediction unit 12. The prediction unit 12 continuously calculates the prediction value by substituting the regression coefficient received from the regression coefficient calculation unit 14 and the data received from the data collection unit 11 into the linear regression equation.

L1 regularization makes it possible to obtain regression coefficients even when many explanatory variables are used, but the range of sensors that affect prediction and the range of data measurement time are not known in advance, so they are widespread. The range will be set to. Therefore, the number of data used as input becomes very large, which puts pressure on the line capacity for collecting and transmitting data, and also causes a problem that the amount of calculation processing in the learning process becomes enormous.

The purpose of this disclosure is to reduce the number of data used when calculating the regression coefficient, reduce the line capacity for collecting and transmitting data, and reduce the amount of calculation processing in the learning process.

The data prediction device of the first aspect of the present disclosure has a data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a regression coefficient by L1 regularization. A regression coefficient is calculated based on the plurality of data received from the data collection unit using the first calculation method to be obtained, and data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value is selected from the calculated regression coefficients. Calculation of either the first calculation method for transmitting the selection information for the data to the data collection unit, the first calculation method when calculating the regression coefficient, or the second calculation method different from the first calculation method. The first calculation is the determination unit that determines whether to use the method, the second regression coefficient calculation unit that calculates the regression coefficient using the calculation method determined by the determination unit, and the calculation method determined by the determination unit. In the case of the method, the plurality of data received from the data collection unit are used, and when the calculation method determined by the determination unit is the second calculation method, the selection data received from the data collection unit is used. It includes a prediction unit that outputs the prediction result predicted based on the data used and used and the regression coefficient calculated by the second regression coefficient calculation unit.

The second aspect of the present disclosure is the data prediction device of the first aspect, further including an error monitoring unit that monitors the number of occurrences of an error of a predetermined value or more, which is an error between the prediction result and the measured value. When the determination unit determines that the second calculation method is to be used and the number of occurrences of the error monitored by the error monitoring unit exceeds a predetermined number, the determination unit determines that the first calculation method is used. ..

The third aspect of the present disclosure is the data prediction device of the first aspect, further including an event determination unit for determining whether or not a preset event has occurred, and the determination unit is the second calculation method. When it is determined by the event determination unit that the event has occurred in the state where it is determined to use, it is determined that the first calculation method is used.

The fourth aspect of the present disclosure is the data prediction device according to any one of the first to third aspects, which is an error in monitoring the number of occurrences of an error of a predetermined value or more, which is an error between the prediction result and the actually measured value. The determination unit further includes a monitoring unit, and when the number of occurrences of the error monitored by the error monitoring unit exceeds a predetermined number of times, the determination unit refers to the first calculation method with respect to the first regression coefficient calculation unit. Instructs the data collection unit to recalculate the regression coefficient and retransmits the selection information, and the first regression coefficient calculation unit recalculates and selects the regression coefficient in response to the instruction from the determination unit. The information is retransmitted, and the data collection unit retransmits the selection data to the first regression coefficient calculation unit according to the selection information retransmitted in response to the instruction from the determination unit.

A fifth aspect of the present disclosure is a program, in which a plurality of collected data and selection data selected from the plurality of data based on the received selection information are transmitted, and a regression coefficient is obtained by L1 regularization. Using the method, a regression coefficient is calculated based on the plurality of received data, selection information for selecting data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value among the calculated regression coefficients is transmitted, and regression is performed. When calculating the coefficient, it is determined which of the first calculation method and the second calculation method different from the first calculation method is used, and the regression coefficient is calculated using the determined calculation method. When the determined calculation method is the first calculation method, the received plurality of data are used, and when the determined calculation method is the second calculation method, the received selection data is used. Have the computer execute a data prediction process that outputs the prediction result predicted based on the data used and the calculated regression coefficient.

A sixth aspect of the present disclosure is a data prediction method, in which a computer transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and returns by L1 regularization. A regression coefficient is calculated based on the plurality of received data using the first calculation method for obtaining a coefficient, and selection for selecting data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value among the calculated regression coefficients. When transmitting information and calculating the regression coefficient, it is determined which of the first calculation method and the second calculation method different from the first calculation method is used, and the determined calculation method is used. When the determined calculation method is the first calculation method, the plurality of received data are used, and when the determined calculation method is the second calculation method, the received said. The selected data is used, and the predicted result predicted based on the used data and the calculated regression coefficient is output.

In the present disclosure, it is possible to reduce the number of data, reduce the line capacity for collecting and transmitting data, and reduce the amount of calculation processing in the learning process.

It is a schematic diagram explaining the prediction of data by a linear regression equation. It is a block diagram which illustrates the data prediction apparatus of the related technology. It is a block diagram which illustrates the basic example of the data prediction apparatus of this embodiment. It is a block diagram which illustrates the extended example of the data prediction apparatus of this embodiment. It is a block diagram which illustrates the hardware block diagram of the extended example of the data prediction apparatus of this embodiment. It is a schematic diagram which illustrates the process flow of the learning phase of the extended example of this embodiment. It is a schematic diagram which illustrates the process flow of the prediction phase of the extended example of this embodiment. It is a schematic diagram which illustrates the flow of the process of the recalculation and the reselection phase of the extended example of this embodiment. It is a block diagram which illustrates the data prediction apparatus of this embodiment.

The data prediction device 20 of this embodiment is illustrated in FIG. The data collection unit 21 collects data from each of a large number of sensors S ₁ , S ₂ , S ₃ , ..., And transmits the same data to the prediction unit 22 and the learning section selection unit 23. Data collection unit 21, as the data to be transmitted to the prediction unit 22 and the learning section selecting unit 23, a number of sensors _{_{S 1, S 2, S 3}} , ... a plurality of data collected from, and the first regression coefficient calculation described later There is selection data selected from a plurality of data collected based on the selection information received from the unit 25.

The learning section selection unit 23 selects the data section in which the event to be predicted appears is predicted based on the rule, and selects the data to be learned in the first regression coefficient calculation unit 25 and the second regression coefficient calculation unit 24. Send to. That is, the learning section selection unit 23 removes unnecessary data and data with little change existing in the section in which the event to be predicted does not appear.

The first regression coefficient calculation unit 25 uses the first calculation method of obtaining the regression coefficient of the approximate solution by L1 regularization to a plurality of data transmitted from the data collection unit 21 and selected by the learning section selection unit 23. The regression coefficient is calculated based on the calculation, the regression coefficient whose absolute value is equal to or larger than the threshold value is selected from the calculated regression coefficients, and the information indicating the selected regression coefficient is transmitted to each of the data collection unit 21 and the second regression coefficient calculation unit 24. Send.

The data collection unit 21 selects data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value from a plurality of collected data based on the information received from the first regression coefficient calculation unit 25, and uses the selected data as selection data. It is transmitted to each of the prediction unit 22, the learning section selection unit 23, and the error monitoring unit 26. The information transmitted from the first regression coefficient calculation unit 25 to the data collection unit 21 is used for selecting data in the data collection unit 21, and is therefore referred to as selection information below.

In the data collection unit 21, data corresponding to the regression coefficient whose absolute value is equal to or more than the threshold value is selected as selection data, and the data corresponding to the regression coefficient whose absolute value is less than the threshold value is deleted. Therefore, as the selection information to be transmitted to the data collection unit 21, the data collection unit 21 uses information indicating a regression coefficient whose absolute value is less than the threshold value instead of information indicating a regression coefficient whose absolute value is less than the threshold value. Selected data may be obtained by deleting the data corresponding to the regression coefficient whose absolute value is less than the threshold value.

The second regression coefficient calculation unit 24 uses the selection data transmitted from the data acquisition unit 21 and passed through the learning section selection unit 23, and uses L2 regularization or multiple regression instead of L1 regularization. Ask for.

The regression coefficient calculated by the second regression coefficient calculation unit 24 is transmitted to the prediction unit 22 in which prediction is performed by regression calculation using L2 regularization or multiple regression. The data collection unit 21 transmits the selection data to the prediction unit 22 and the learning section selection unit 23. That is, the data collection unit 21 collects data from each of all the sensors S ₁ , S ₂ , S ₃ , ..., But transmits the data selectively.

The prediction unit 22 outputs the prediction result predicted by the linear regression equation using the selection data received from the data collection unit 21 and the regression coefficient received from the second regression coefficient calculation unit 24.

That is, the prediction result is calculated from the data corresponding to the regression coefficient whose absolute value is less than the threshold, and the data corresponding to the regression coefficient whose absolute value is greater than or equal to the threshold and the data corresponding to the regression coefficient whose absolute value is greater than or equal to the threshold. The amount of data used in the learning process can be reduced, and the amount of calculation processing can be reduced.

The error monitoring unit 26 monitors an error that is the difference between the prediction result by the prediction unit 22 and the actually measured value that is the data collected by the data collection unit 21, and the error of the predetermined value or more is the predetermined number of times (allowable number of times) or more. When it occurs, a reselection instruction is transmitted to the data collection unit 21 and the first regression coefficient calculation unit 25. By this reselection instruction, the data collection unit 21 will transmit all the data, not the selected data, to the subsequent stage for a certain period until the reselection of the selected data is completed. Once again, data is selected by L1 regularization for all data. At this time, the selection data is reselected only when the error of the prediction result by L2 regularization or multiple regression is larger than the error of the prediction result by L1 regularization, and the error of the prediction result by L2 regularization or multiple regression is performed. However, if it is less than or equal to the error of the prediction result due to L1 regularization, reselection is not performed.

The error monitoring unit 26 monitors an error that is the difference between the prediction result by the prediction unit 22 and the measured value, and calculates the second regression coefficient when an error of a predetermined value or more occurs a predetermined number of times (allowable number of times) or more. In part 24, the regression coefficient may be recalculated. After recalculating the regression coefficient, if an error of a predetermined value or more occurs a predetermined number of times (allowable number of times) or more, a reselection instruction may be transmitted to the data collection unit 21 and the first regression coefficient calculation unit 25. ..

FIG. 4 illustrates the data prediction device 60, which is an extended example of the data prediction device 20 of the present embodiment. The description of the configuration and operation similar to that of the data prediction device 20 will be omitted as appropriate. The data prediction device 60 of FIG. 4 differs from the data prediction device 20 of FIG. 3 in that it has a determination unit 67 that selects a regression coefficient calculation method to be used when calculating the regression coefficient according to a tendency of data or the like. ..

The data collection unit 61 collects data from a large number of sensors S ₁ , S ₂ , S ₃ , ..., And transmits the same data to the prediction unit 62 and the learning section selection unit 63. The learning section selection unit 63 selects a data section in which an event to be predicted is likely to appear based on a rule, and transmits data to be learned to the regression coefficient calculation unit 64.

In the data prediction device 20, for example, L2 regularization or multiple regression is used when the second regression coefficient calculation unit 24 calculates the regression coefficient, but in the data prediction device 60, the second regression coefficient calculation unit 64 When calculating the regression coefficient in, the regression coefficient calculation method to be used is selected according to the determination result of the determination unit 67. The regression coefficient calculation method to be used is selected from, for example, the following two.
1) Multiple regression or L2 regularization using selected data (second calculation method)
2) L1 regularization using all data (first calculation method)

Until the error monitoring unit 66 detects that a prediction error (error) of a predetermined value or more has occurred more than a predetermined number of times (allowable number of times), the determination unit 67 selects the regression coefficient calculation method of 1). After making a determination, the second regression coefficient calculation unit 64 calculates the regression coefficient by the regression coefficient calculation method of 1) using the selection data. When the error monitoring unit 66 detects that a prediction error of a predetermined value or more has occurred more than a predetermined number of times (allowable number of times), the determination unit 67 determines to switch the regression coefficient calculation method from 1) to 2). ..

The second regression coefficient calculation unit 64 calculates the regression coefficient using all the data by the regression coefficient calculation method of 2) based on the determination of the determination unit 67. As a result, a regression coefficient calculation method with less error can be selected according to the data.

The learning section selection unit 63 further includes an event determination unit that determines whether or not a preset event (for example, rainfall) has occurred on a rule basis, and the data to be learned by the event determination unit is updated. Even when it is determined that the determination has been made, the determination unit 67 may determine that the regression coefficient calculation method of 2) is selected. As a result, even when the data to be learned is updated, it is possible to select a regression coefficient calculation method with less error.

FIG. 5 illustrates the hardware configuration of the data prediction device 60. As an example, the data prediction device 60 includes a CPU (Central Processing Unit) 51, a primary storage unit 52, a secondary storage unit 53, and an external interface 54, as shown in FIG. The CPU 51 is an example of a processor that is hardware. The CPU 51, the primary storage unit 52, the secondary storage unit 53, and the external interface 54 are connected to each other via the bus 59.

The primary storage unit 52 is, for example, a volatile memory such as a RAM (Random Access Memory). The secondary storage unit 53 is, for example, a non-volatile memory such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

The secondary storage unit 53 includes a program storage area 53A and a data storage area 53B. The program storage area 53A stores a program such as a data prediction program as an example. As an example, the data storage area 53B stores data from the sensor, intermediate data during data prediction processing, and the like.

The CPU 51 reads the data prediction program from the program storage area 53A and deploys it in the primary storage unit 52. By loading and executing the data prediction program, the CPU 51 loads and executes the data collection unit 61, the prediction unit 62, the learning section selection unit 63, the second regression coefficient calculation unit 64, the first regression coefficient calculation unit 65, and the error in FIG. It operates as a monitoring unit 66 and a determination unit 67.

A program such as a data prediction program may be stored in an external server and expanded to the primary storage unit 52 via a network. Further, a program such as a data prediction program may be stored in a non-temporary recording medium such as Digital Versatile Disc (DVD) and expanded to the primary storage unit 52 via a recording medium reading device.

An external device is connected to the external interface 54, and the external interface 54 controls the transmission and reception of various information between the external device and the CPU 51. FIG. 5 shows an example in which the sensor 31A and the danger notification system 31B are connected to the external interface 54. The sensor 31A includes a large number of sensors.

The data predicted by the data prediction device 60 may be transmitted to, for example, the danger notification system 31B connected to the external interface 54 and used for the danger notification processing in the danger notification system 31B. Further, the data predicted by the data prediction device 10 may be recorded in an external storage device connected to the external interface 54, for example, or displayed as characters or images on the screen of the display connected to the external interface 54. You may.

Further, the data prediction device 60 may be a dedicated device or a general-purpose device such as a workstation, a personal computer, or a tablet.

6 to 8 show examples of data prediction processing in the data prediction device 60. FIG. 6 illustrates the flow of the learning phase.

In step S201, the data collection unit 61 collects data from, for example, a sensor 31A including a large number of sensors. The data collection unit 61 transmits the collected data to the learning section selection unit 63 in the procedure S202.

In step S203, the learning section selection unit 63 selects a data section that seems to be suitable for prediction and transmits it to the second regression coefficient calculation unit 64 and the first regression coefficient calculation unit 65. The first regression coefficient calculation unit 65 calculates the regression coefficient using L1 regularization, selects a regression coefficient whose absolute value is equal to or greater than the threshold value, and transmits it to the second regression coefficient calculation unit 64 and the data collection unit 61.

The second regression coefficient calculation unit 64 obtains the regression coefficient by a normal regression calculation method, for example, L2 regularization or multiple regression, using the data selected by the first regression coefficient calculation unit 65 in step S205. Is transmitted to the prediction unit 62.

FIG. 7 illustrates the flow of the prediction phase. The data acquisition unit 61 transmits the data selected in the procedure S204 to the prediction unit 62 and the error monitoring unit 66 in the procedure S206. The prediction unit 62 calculates the prediction result based on the regression coefficient and the data obtained in the procedure S205 in the procedure S207, transmits it to the error monitoring unit 66, and stores it in, for example, an external storage device.

FIG. 8 illustrates the flow of error monitoring, recalculation, and reselection phases. The error monitoring unit 66 calculates the prediction error between the prediction result and the data (measured value) in step S208, and when the prediction error of the predetermined value or more occurs more than the predetermined number of times (allowable number of times), the data collecting unit 61 and the determination Instruct unit 67 to switch the regression coefficient calculation method.

When the switching of the regression coefficient calculation method is instructed, the data collection unit 61 transmits all the data to the learning section selection unit 63, and the determination unit 67 uses all the data in the second regression coefficient calculation unit 64. Then, it is determined that the regression coefficient is to be calculated by L1 regularization, and the determination result is transmitted to the second regression coefficient calculation unit 64 in step S209. Upon receiving the determination of the determination unit 67, the second regression coefficient calculation unit 64 switches the calculation method of the regression coefficient.

The error monitoring unit 66 calculates the prediction error between the prediction result and the data (actual measurement value) in step S210, and when the prediction error of the predetermined value or more occurs more than the predetermined number of times (allowable number of times), the data collection unit 61 and the determination Instruct unit 67 to reselect.

When the reselection is instructed, the data collection unit 61 transmits all the data to the learning section selection unit 63, the determination unit 67 determines that the reselection is to be performed, and the determination result is determined in step S211. 1 Transmission to the regression coefficient calculation unit 65. Upon receiving the determination of the determination unit 67, the first regression coefficient calculation unit 65 reselects the regression coefficient.

According to this embodiment, the prediction error tends to be larger than when only the regression coefficient calculated by the L1 regularization is used for the prediction, but when the prediction error is large or periodically, the data By reselecting, it is possible to keep the prediction error within a certain range.

An example of switching the regression coefficient calculation method from L2 regularization using the data selected by the first regression coefficient calculation unit 65 or multiple regression to L1 regularization using all the data has been described. Is not limited to this. The regression coefficient calculation method may be switched from L1 regularization using all the data to L2 regularization or multiple regression using the data selected by the first regression coefficient calculation unit 65.

Note that the data prediction device 60 may include a data acquisition device 32A, a learning device 32B, and a prediction device 32C, as illustrated in FIG. The data collection device 32A includes a data collection unit 61, and the learning device 32B includes a learning section selection unit 63, a second regression coefficient calculation unit 64, a first regression coefficient calculation unit 65, an error monitoring unit 66, and a determination unit 67. Including, the prediction device 32C includes a prediction unit 62.

A transmission line is connected between the sensor 31A and the data collection unit 61, and between the prediction unit 62 and the output destination of the prediction result such as the danger notification system 31B. Further, in the example shown in FIG. 9, the data acquisition unit 61 and the learning section selection unit 63, and the second regression coefficient calculation unit 64 and the prediction unit 62 are also connected by a transmission line.

The data prediction device of the present disclosure includes a data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a first calculation for obtaining a regression coefficient by L1 regularization. Selection information for calculating a regression coefficient based on the plurality of data received from the data collection unit using the method and selecting data corresponding to a regression coefficient whose absolute value is equal to or greater than a threshold value among the calculated regression coefficients. Which of the first calculation method, the first calculation method, and the second calculation method different from the first calculation method is used when calculating the regression coefficient, the first regression coefficient calculation unit that transmits the data to the data collection unit. When the determination unit for determining the above, the second regression coefficient calculation unit for calculating the regression coefficient using the calculation method determined by the determination unit, and the calculation method determined by the determination unit are the first calculation method. When the plurality of data received from the data collection unit is used and the calculation method determined by the determination unit is the second calculation method, the selection data received from the data collection unit is used and used. It includes a prediction unit that outputs the prediction result predicted based on the obtained data and the regression coefficient calculated by the second regression coefficient calculation unit.

In the present disclosure, this makes it possible to reduce the number of data, reduce the line capacity for collecting and transmitting data, and reduce the amount of calculation processing in the learning process. Instead of reducing the number of data transmitted from the data collection unit 61 to the subsequent stage, the data collection unit 61 may reduce the number of sensors that receive the data.

In the present disclosure, since the normal regression coefficient calculation method may be used for the calculation by the second regression coefficient calculation unit 65, the calculation cost can be reduced and the processing in an environment with few calculation resources is possible. ..

61 Data acquisition unit 62 Prediction unit 66 Error monitoring unit 64 Second regression coefficient calculation unit 65 First regression coefficient calculation unit 31A Sensor 51 CPU
52 Primary storage 53 Secondary storage

Claims

A data collection unit that transmits a plurality of collected data and selection data selected from the plurality of data based on the received selection information, and a data collection unit.
The regression coefficient is calculated based on the plurality of data received from the data collection unit using the first calculation method for obtaining the regression coefficient by L1 regularization, and the regression coefficient whose absolute value is equal to or larger than the threshold value among the calculated regression coefficients is calculated. A first regression coefficient calculation unit that transmits selection information for selecting data corresponding to the above to the data collection unit, and
A determination unit for determining which of the first calculation method and the second calculation method different from the first calculation method is used when calculating the regression coefficient.
A second regression coefficient calculation unit that calculates the regression coefficient using the calculation method determined by the determination unit, and
When the calculation method determined by the determination unit is the first calculation method, the plurality of data received from the data collection unit are used, and the calculation method determined by the determination unit is the second calculation method. In some cases, a prediction unit that uses the selected data received from the data acquisition unit and outputs a prediction result predicted based on the used data and the regression coefficient calculated by the second regression coefficient calculation unit.
Data predictors, including.
It further includes an error monitoring unit that monitors the number of occurrences of an error between the predicted result and the measured value, which is equal to or greater than a predetermined value.
When the determination unit determines that the second calculation method is to be used and the number of occurrences of the error monitored by the error monitoring unit exceeds a predetermined number, the determination unit determines that the first calculation method is used. To do,
The data prediction device according to claim 1.
It also includes an event determination unit that determines whether or not a preset event has occurred.
When the event determination unit determines that the event has occurred in a state where the determination unit has determined to use the second calculation method, the determination unit determines to use the first calculation method.
The data prediction device according to claim 1.
It further includes an error monitoring unit that monitors the number of occurrences of an error between the predicted result and the measured value, which is equal to or greater than a predetermined value.
When the number of occurrences of the error monitored by the error monitoring unit exceeds a predetermined number, the determination unit recalculates the regression coefficient by the first calculation method with respect to the first regression coefficient calculation unit. And instruct the data collection unit to retransmit the selection information.
The first regression coefficient calculation unit recalculates the regression coefficient and retransmits the selection information in response to an instruction from the determination unit.
The data acquisition unit retransmits the selection data to the first regression coefficient calculation unit according to the selection information retransmitted in response to the instruction from the determination unit.
The data prediction device according to any one of claims 1 to 3.
The selected data selected from the plurality of collected data and the received selection information is transmitted, and the selected data is transmitted.
The regression coefficient is calculated based on the plurality of received data using the first calculation method for obtaining the regression coefficient by L1 regularization, and among the calculated regression coefficients, the data corresponding to the regression coefficient whose absolute value is equal to or larger than the threshold value is obtained. Send selection information to make a selection,
When calculating the regression coefficient, it is determined which of the first calculation method and the second calculation method different from the first calculation method is used.
Calculate the regression coefficient using the determined calculation method,
When the determined calculation method is the first calculation method, the received plurality of data are used, and when the determined calculation method is the second calculation method, the received selection data is used and used. Output the prediction result predicted based on the obtained data and the calculated regression coefficient.
A program that causes a computer to perform data prediction processing.
The computer
The selected data selected from the plurality of collected data and the received selection information is transmitted, and the selected data is transmitted.
The regression coefficient is calculated based on the plurality of received data using the first calculation method for obtaining the regression coefficient by L1 regularization, and among the calculated regression coefficients, the data corresponding to the regression coefficient whose absolute value is equal to or larger than the threshold value is obtained. Send selection information to make a selection,
When calculating the regression coefficient, it is determined which of the first calculation method and the second calculation method different from the first calculation method is used.
Calculate the regression coefficient using the determined calculation method,
When the determined calculation method is the first calculation method, the received plurality of data are used, and when the determined calculation method is the second calculation method, the received selection data is used and used. Output the prediction result predicted based on the obtained data and the calculated regression coefficient.
Data prediction method.