CN111667123A

CN111667123A - Method for supplementing missing value by applying multiple interpolation in power load prediction

Info

Publication number: CN111667123A
Application number: CN202010555226.3A
Authority: CN
Inventors: 周浩; 顾一峰; 胡炳谦; 韩俊
Original assignee: Shanghai Ieslab Energy Technology Co ltd
Current assignee: Shanghai Ieslab Energy Technology Co ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2020-09-15

Abstract

In the power load prediction model, accurate and effective power historical load data are very important, play an important role in power prediction data analysis and model calculation, and when the abnormal values are brought into the power load prediction model or mathematical analysis, the accuracy and simulation effect of power load prediction can be greatly reduced, and the abnormal values need to be analyzed and eliminated through a mathematical statistical method. In addition, many missing values may be caused by various uncontrollable reasons during power operation, which may cause the imperfection of the historical power load data set. The missing values can be supplemented through a reasonable and effective method, and the supplemented data set is an important part for guaranteeing the accurate prediction of the power load prediction model. The invention provides a method for supplementing missing data by applying a Multiple Interpolation (MICE) method to missing values in historical power load data, and the missing data is supplemented to ensure the integrity of the historical power load data.

Description

Method for supplementing missing value by applying multiple interpolation in power load prediction

Technical Field

The invention relates to the technical field of power load prediction, in particular to a method for supplementing missing data by applying a Multiple Interpolation (MICE) method to missing values in historical load data in power load prediction.

Background

The accurate prediction of the power load is an important basis for ensuring the safety and economic operation of a power system and realizing scientific management and scheduling of a power grid, and is also a core component of a power energy management system. Some algorithms in power load prediction assume that all values are numerical and inclusive. However, in practical applications, there are data missing situations caused by various reasons in the power grid operation process, such as removing abnormal values, and missing in time series caused by accidents in operation. One way to deal with this data loss problem is to directly delete individual data pairs that contain missing values, but this runs the risk of losing valuable information. Another preferred strategy is to interpolate to supplement the missing value, that is, to estimate the size of the missing value from the observed data, so that the integrity of the data can be maintained to the maximum extent, and the subsequent power prediction model can have more accurate input data, thereby giving a more accurate output prediction value. The invention discloses a method for supplementing missing data by applying Multiple Interpolation (MICE) to missing values of historical load data in power load prediction, which achieves the aim of completeness and effectiveness of a historical load power data set and further ensures the effectiveness and accuracy of a power load prediction model.

Disclosure of Invention

The invention provides a method for supplementing and restoring missing values or abnormal eliminated data of power load data, which is characterized in that a Multiple Interpolation (MICE) method is applied, and the method comprises three functional modules of missing value identification, MICE interpolation supplement and missing value filling verification.

Multiple Interpolation (MICE) is a method of dealing with missing values based on repetitive simulations. When faced with the complex missing value problem, it will generate a complete set of data sets (typically 3 to 10) from one data set containing the missing values. Missing data will be filled in with the Monte carlo method in each simulation dataset. The implementation of multiple interpolation is shown in fig. 1, where the function mic () first starts with a data box containing missing data and then returns an object containing multiple (default to 5) complete data sets. Each complete data set is generated by interpolating missing data in the original data frame. Each complete data set is slightly different because of the random components of the interpolation. Then, the with () function may apply a statistical model (e.g., a linear fitting model LR () or a Generalized Linear Model (GLM)) to each complete data set in turn. Finally, the pool () function integrates these individual analysis results into a set of results. Both the standard error and the p-value of the final model will accurately reflect the uncertainty due to missing values and multiple interpolations. The with function generally includes a plurality of regression models for interpolating data sets, and a T test is performed on the data sets to determine whether a data set obtained by one of the linear models is qualified. The pool function summarizes a plurality of regression models and performs an F test on the whole data set to determine whether the whole method is qualified. The qualified data can be output as a padded data set. The threshold values for T-test and F-test need to be determined by the data quality control requirements.

The original data set before the missing value is filled and the data set after the missing value is filled are subjected to one-way-ANOVA (one way-ANOVA), and the significance difference value between the two groups of data is calculated, so that no significance difference exists between the two groups of data. If the two groups of data have significance difference after verification, the number of the data sets of the with function needs to be adjusted, or missing values are still removed to ensure that the filled data and the original data have no significance difference, and the whole data set can keep certain validity.

The actually collected power load historical data is processed by the modules, so that the effect of complementing the integrity of the data set can be achieved, and the effectiveness of the original data is improved. The historical data of the power load after the filling processing is used for a power load prediction model, so that the reliability and the accuracy of power load prediction are greatly improved.

Drawings

Fig. 1 is a schematic diagram of a multiple interpolation model according to an embodiment of the present invention.

Fig. 2 is a schematic processing flow diagram of a method for supplementing missing values of historical load data according to an embodiment of the present invention.

Detailed Description

In order to make the content, the purpose, the features and the advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the protection scope of the present invention.

As shown in fig. 2, the method for supplementing the missing value of the power load by applying KNN interpolation proposed by the present invention is specifically divided into the following steps.

The first step,Data preprocessing:arranging the collected historical data of the original historical power load according to a time sequence, determining the start and stop time of a data set, checking the default of the data on the time sequence, marking a default value and recording the default start and stop time.

Step two,Interpolation of supplementary data by multiple interpolation algorithm:the multiple interpolation algorithm supplements data by performing a distribution operation on the labeled data, and generally includes the following parts:

the Mice function first starts with a data set containing missing data and returns an object containing multiple (by default, 5) complete data sets. Each complete data set is generated by interpolating missing data in the original data frame. Each complete data set is slightly different due to the random components interpolated;

the with function may apply a statistical model (e.g., a linear model or a generalized linear model) to each complete data set in turn;

pool function integrates these individual analysis results into one set of results. Both the standard error and the p-value of the final model will accurately reflect the uncertainty due to missing values and multiple interpolations.

Step three,Data validity verification: the original power load historical data set and the data set supplemented by the KNN algorithm need to be checked for data validity statistical differences to ensure the validity of the data. Two sets of data were subjected to one way-ANOVA (one way-ANOVA) to calculate significant differences between the two sets of dataValue, it is necessary to ensure that there is no significant difference between the two sets of data. If significant difference exists after two groups of data are verified, the value of k (the number of nearest neighbors) needs to be adjusted or a distance measurement mode needs to be changed, the operation mode of supplement value is improved, the dimension of filling processing is changed to ensure that the processed data does not have significant difference with the original data, and the accuracy and the effectiveness of the processed data are kept.

The invention provides a method for supplementing values or missing values in historical data of power load prediction due to various reasons by using a multiple interpolation algorithm model, which is characterized in that multiple interpolation algorithms are introduced in power load prediction data processing to supplement the missing values, and the number of linear fitting/linear programming models in a with function is adjusted by comparing validity verification of data sets before and after comparison, so that the historical load data for power load prediction is more complete, and the prediction effect of a power load model is obviously improved.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The invention discloses a method for supplementing missing values by applying multiple interpolation in power load prediction, which is characterized by comprising the following steps of: the method for supplementing the power load missing value by applying the KNN interpolation specifically comprises the following steps:

the first step,Data preprocessing:arranging the collected historical data of the original historical power load according to a time sequence, determining the start-stop time of a data set, checking the default of the data on the time sequence, marking a default value and recording the default start-stop time;

1) the Mice function firstly starts from a data set containing missing data and returns an object containing a plurality of (default 5) complete data sets;

each complete data set is generated by interpolating missing data in the original data frame;

each complete data set is slightly different due to the random components interpolated;

2) the with function may apply a statistical model (e.g., a linear model or a generalized linear model) to each complete data set in turn;

3) the pool function integrates these individual analysis results into a set of results;

the standard error and the p value of the final model accurately reflect the uncertainty generated by the missing value and multiple interpolation;

step three,Data validity verification: the original power load historical data set and the data set supplemented by the KNN algorithm need to be checked for data validity statistical differences to ensure data validity, two groups of data need to be subjected to one-way-ANOVA (one way-ANOVA), significance difference values between the two groups of data are calculated, no significance difference between the two groups of data needs to be ensured, if the two groups of data are verified, the value of k (the number of nearest neighbors) needs to be adjusted or a distance measurement mode needs to be changed, the operation mode of supplementing and recharging is improved, the dimension of filling processing is changed to ensure that the processed data and the original data do not have significance differences, and the accuracy and the validity of the processed data are kept.