CN111667123A - Method for supplementing missing value by applying multiple interpolation in power load prediction - Google Patents
Method for supplementing missing value by applying multiple interpolation in power load prediction Download PDFInfo
- Publication number
- CN111667123A CN111667123A CN202010555226.3A CN202010555226A CN111667123A CN 111667123 A CN111667123 A CN 111667123A CN 202010555226 A CN202010555226 A CN 202010555226A CN 111667123 A CN111667123 A CN 111667123A
- Authority
- CN
- China
- Prior art keywords
- data
- power load
- missing
- data set
- historical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 230000001502 supplementing effect Effects 0.000 title claims abstract description 12
- 241000699670 Mus sp. Species 0.000 claims abstract description 8
- 238000004458 analytical method Methods 0.000 claims abstract description 4
- 238000001543 one-way ANOVA Methods 0.000 claims description 6
- 239000013589 supplement Substances 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 4
- 238000013179 statistical model Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000002159 abnormal effect Effects 0.000 abstract description 4
- 230000000694 effects Effects 0.000 abstract description 3
- 238000004088 simulation Methods 0.000 abstract description 3
- 238000007405 data analysis Methods 0.000 abstract 1
- 238000012821 model calculation Methods 0.000 abstract 1
- 238000007619 statistical method Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 3
- 239000000306 component Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000001134 F-test Methods 0.000 description 1
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
In the power load prediction model, accurate and effective power historical load data are very important, play an important role in power prediction data analysis and model calculation, and when the abnormal values are brought into the power load prediction model or mathematical analysis, the accuracy and simulation effect of power load prediction can be greatly reduced, and the abnormal values need to be analyzed and eliminated through a mathematical statistical method. In addition, many missing values may be caused by various uncontrollable reasons during power operation, which may cause the imperfection of the historical power load data set. The missing values can be supplemented through a reasonable and effective method, and the supplemented data set is an important part for guaranteeing the accurate prediction of the power load prediction model. The invention provides a method for supplementing missing data by applying a Multiple Interpolation (MICE) method to missing values in historical power load data, and the missing data is supplemented to ensure the integrity of the historical power load data.
Description
Technical Field
The invention relates to the technical field of power load prediction, in particular to a method for supplementing missing data by applying a Multiple Interpolation (MICE) method to missing values in historical load data in power load prediction.
Background
The accurate prediction of the power load is an important basis for ensuring the safety and economic operation of a power system and realizing scientific management and scheduling of a power grid, and is also a core component of a power energy management system. Some algorithms in power load prediction assume that all values are numerical and inclusive. However, in practical applications, there are data missing situations caused by various reasons in the power grid operation process, such as removing abnormal values, and missing in time series caused by accidents in operation. One way to deal with this data loss problem is to directly delete individual data pairs that contain missing values, but this runs the risk of losing valuable information. Another preferred strategy is to interpolate to supplement the missing value, that is, to estimate the size of the missing value from the observed data, so that the integrity of the data can be maintained to the maximum extent, and the subsequent power prediction model can have more accurate input data, thereby giving a more accurate output prediction value. The invention discloses a method for supplementing missing data by applying Multiple Interpolation (MICE) to missing values of historical load data in power load prediction, which achieves the aim of completeness and effectiveness of a historical load power data set and further ensures the effectiveness and accuracy of a power load prediction model.
Disclosure of Invention
The invention provides a method for supplementing and restoring missing values or abnormal eliminated data of power load data, which is characterized in that a Multiple Interpolation (MICE) method is applied, and the method comprises three functional modules of missing value identification, MICE interpolation supplement and missing value filling verification.
Multiple Interpolation (MICE) is a method of dealing with missing values based on repetitive simulations. When faced with the complex missing value problem, it will generate a complete set of data sets (typically 3 to 10) from one data set containing the missing values. Missing data will be filled in with the Monte carlo method in each simulation dataset. The implementation of multiple interpolation is shown in fig. 1, where the function mic () first starts with a data box containing missing data and then returns an object containing multiple (default to 5) complete data sets. Each complete data set is generated by interpolating missing data in the original data frame. Each complete data set is slightly different because of the random components of the interpolation. Then, the with () function may apply a statistical model (e.g., a linear fitting model LR () or a Generalized Linear Model (GLM)) to each complete data set in turn. Finally, the pool () function integrates these individual analysis results into a set of results. Both the standard error and the p-value of the final model will accurately reflect the uncertainty due to missing values and multiple interpolations. The with function generally includes a plurality of regression models for interpolating data sets, and a T test is performed on the data sets to determine whether a data set obtained by one of the linear models is qualified. The pool function summarizes a plurality of regression models and performs an F test on the whole data set to determine whether the whole method is qualified. The qualified data can be output as a padded data set. The threshold values for T-test and F-test need to be determined by the data quality control requirements.
The original data set before the missing value is filled and the data set after the missing value is filled are subjected to one-way-ANOVA (one way-ANOVA), and the significance difference value between the two groups of data is calculated, so that no significance difference exists between the two groups of data. If the two groups of data have significance difference after verification, the number of the data sets of the with function needs to be adjusted, or missing values are still removed to ensure that the filled data and the original data have no significance difference, and the whole data set can keep certain validity.
The actually collected power load historical data is processed by the modules, so that the effect of complementing the integrity of the data set can be achieved, and the effectiveness of the original data is improved. The historical data of the power load after the filling processing is used for a power load prediction model, so that the reliability and the accuracy of power load prediction are greatly improved.
Drawings
Fig. 1 is a schematic diagram of a multiple interpolation model according to an embodiment of the present invention.
Fig. 2 is a schematic processing flow diagram of a method for supplementing missing values of historical load data according to an embodiment of the present invention.
Detailed Description
In order to make the content, the purpose, the features and the advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the protection scope of the present invention.
As shown in fig. 2, the method for supplementing the missing value of the power load by applying KNN interpolation proposed by the present invention is specifically divided into the following steps.
The first step,Data preprocessing:arranging the collected historical data of the original historical power load according to a time sequence, determining the start and stop time of a data set, checking the default of the data on the time sequence, marking a default value and recording the default start and stop time.
Step two,Interpolation of supplementary data by multiple interpolation algorithm:the multiple interpolation algorithm supplements data by performing a distribution operation on the labeled data, and generally includes the following parts:
the Mice function first starts with a data set containing missing data and returns an object containing multiple (by default, 5) complete data sets. Each complete data set is generated by interpolating missing data in the original data frame. Each complete data set is slightly different due to the random components interpolated;
the with function may apply a statistical model (e.g., a linear model or a generalized linear model) to each complete data set in turn;
pool function integrates these individual analysis results into one set of results. Both the standard error and the p-value of the final model will accurately reflect the uncertainty due to missing values and multiple interpolations.
Step three,Data validity verification: the original power load historical data set and the data set supplemented by the KNN algorithm need to be checked for data validity statistical differences to ensure the validity of the data. Two sets of data were subjected to one way-ANOVA (one way-ANOVA) to calculate significant differences between the two sets of dataValue, it is necessary to ensure that there is no significant difference between the two sets of data. If significant difference exists after two groups of data are verified, the value of k (the number of nearest neighbors) needs to be adjusted or a distance measurement mode needs to be changed, the operation mode of supplement value is improved, the dimension of filling processing is changed to ensure that the processed data does not have significant difference with the original data, and the accuracy and the effectiveness of the processed data are kept.
The invention provides a method for supplementing values or missing values in historical data of power load prediction due to various reasons by using a multiple interpolation algorithm model, which is characterized in that multiple interpolation algorithms are introduced in power load prediction data processing to supplement the missing values, and the number of linear fitting/linear programming models in a with function is adjusted by comparing validity verification of data sets before and after comparison, so that the historical load data for power load prediction is more complete, and the prediction effect of a power load model is obviously improved.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (1)
1. The invention discloses a method for supplementing missing values by applying multiple interpolation in power load prediction, which is characterized by comprising the following steps of: the method for supplementing the power load missing value by applying the KNN interpolation specifically comprises the following steps:
the first step,Data preprocessing:arranging the collected historical data of the original historical power load according to a time sequence, determining the start-stop time of a data set, checking the default of the data on the time sequence, marking a default value and recording the default start-stop time;
step two,Interpolation of supplementary data by multiple interpolation algorithm:the multiple interpolation algorithm supplements data by performing a distribution operation on the labeled data, and generally includes the following parts:
1) the Mice function firstly starts from a data set containing missing data and returns an object containing a plurality of (default 5) complete data sets;
each complete data set is generated by interpolating missing data in the original data frame;
each complete data set is slightly different due to the random components interpolated;
2) the with function may apply a statistical model (e.g., a linear model or a generalized linear model) to each complete data set in turn;
3) the pool function integrates these individual analysis results into a set of results;
the standard error and the p value of the final model accurately reflect the uncertainty generated by the missing value and multiple interpolation;
step three,Data validity verification: the original power load historical data set and the data set supplemented by the KNN algorithm need to be checked for data validity statistical differences to ensure data validity, two groups of data need to be subjected to one-way-ANOVA (one way-ANOVA), significance difference values between the two groups of data are calculated, no significance difference between the two groups of data needs to be ensured, if the two groups of data are verified, the value of k (the number of nearest neighbors) needs to be adjusted or a distance measurement mode needs to be changed, the operation mode of supplementing and recharging is improved, the dimension of filling processing is changed to ensure that the processed data and the original data do not have significance differences, and the accuracy and the validity of the processed data are kept.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010555226.3A CN111667123A (en) | 2020-06-17 | 2020-06-17 | Method for supplementing missing value by applying multiple interpolation in power load prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010555226.3A CN111667123A (en) | 2020-06-17 | 2020-06-17 | Method for supplementing missing value by applying multiple interpolation in power load prediction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111667123A true CN111667123A (en) | 2020-09-15 |
Family
ID=72388471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010555226.3A Pending CN111667123A (en) | 2020-06-17 | 2020-06-17 | Method for supplementing missing value by applying multiple interpolation in power load prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111667123A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117453696A (en) * | 2023-12-07 | 2024-01-26 | 深圳拓安信物联股份有限公司 | Method and device for supplementing missing data of water meter |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101964998A (en) * | 2009-07-24 | 2011-02-02 | 北京亿阳信通软件研究院有限公司 | Forecasting method and device of telephone traffic in ordinary holiday of telecommunication network |
CN108519989A (en) * | 2018-02-27 | 2018-09-11 | 国网冀北电力有限公司电力科学研究院 | The reduction retroactive method and device of a kind of day electricity missing data |
US20190180389A1 (en) * | 2016-08-01 | 2019-06-13 | Liverpool John Moores University | Analysing energy/utility usage |
CN110580542A (en) * | 2019-07-31 | 2019-12-17 | 中国电力科学研究院有限公司 | Power consumption prediction method and device |
-
2020
- 2020-06-17 CN CN202010555226.3A patent/CN111667123A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101964998A (en) * | 2009-07-24 | 2011-02-02 | 北京亿阳信通软件研究院有限公司 | Forecasting method and device of telephone traffic in ordinary holiday of telecommunication network |
US20190180389A1 (en) * | 2016-08-01 | 2019-06-13 | Liverpool John Moores University | Analysing energy/utility usage |
CN108519989A (en) * | 2018-02-27 | 2018-09-11 | 国网冀北电力有限公司电力科学研究院 | The reduction retroactive method and device of a kind of day electricity missing data |
CN110580542A (en) * | 2019-07-31 | 2019-12-17 | 中国电力科学研究院有限公司 | Power consumption prediction method and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117453696A (en) * | 2023-12-07 | 2024-01-26 | 深圳拓安信物联股份有限公司 | Method and device for supplementing missing data of water meter |
CN117453696B (en) * | 2023-12-07 | 2024-04-12 | 深圳拓安信物联股份有限公司 | Method and device for supplementing missing data of water meter |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860980A (en) | Method for interpolating and supplementing missing value by applying classification regression tree in power load prediction | |
CN113867295A (en) | Manufacturing workshop AGV dynamic scheduling method, system, equipment and storage medium based on digital twinning | |
CN111667123A (en) | Method for supplementing missing value by applying multiple interpolation in power load prediction | |
CN109389294B (en) | Usability evaluation method and device of nuclear security level DCS (distributed control System) | |
US8938484B2 (en) | Maintaining dependencies among supernodes during repeated matrix factorizations | |
CN109861293B (en) | Method for evaluating influence of photovoltaic uncertainty on small signal stability of power system | |
CN111258585A (en) | Attendance calculation method, system and equipment | |
CN109144806B (en) | Function verification method and device for register transmission stage circuit | |
CN111667117A (en) | Method for supplementing missing value by applying Bayesian estimation in power load prediction | |
Nijhawan et al. | On development of change point based generalized SRGM for software with multiple releases | |
CN113821419A (en) | Cloud server aging prediction method based on SVR and Gaussian function | |
CN111476408B (en) | Power communication equipment state prediction method and system | |
Mirnajafizadeh et al. | Robust simultaneous lot-sizing and scheduling with considering controllable processing time and fixed carbon emission in flow-shop environment | |
CN112861064A (en) | Social credit evaluation source data processing method, system, terminal and medium | |
CN111768045A (en) | Method for supplementing resident electricity consumption missing data by applying multiple interpolation in resident electricity consumption management | |
Bhatti et al. | Profit Analysis to an Industrial System Possessing Active Redundancy form Using Geometric Distribution | |
CN111291464A (en) | Dynamic equivalence method and device for power system | |
CN111222673A (en) | Section out-of-limit positioning method and system in electric quantity transaction plan | |
CN111966676A (en) | Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining | |
CN115601198B (en) | Power data simulation method, device, equipment and storage medium | |
CN112365070B (en) | Power load prediction method, device, equipment and readable storage medium | |
CN117033113B (en) | Control circuit and method for signal delay | |
CN117609270B (en) | Multi-dimensional data distributed parallel processing method | |
CN116257510A (en) | Ammeter data verification and repair method, device and storage medium | |
CN115730245A (en) | Fault diagnosis method, device, equipment and medium for oil-immersed power transformer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |