CN111667117A - Method for supplementing missing value by applying Bayesian estimation in power load prediction - Google Patents
Method for supplementing missing value by applying Bayesian estimation in power load prediction Download PDFInfo
- Publication number
- CN111667117A CN111667117A CN202010521260.9A CN202010521260A CN111667117A CN 111667117 A CN111667117 A CN 111667117A CN 202010521260 A CN202010521260 A CN 202010521260A CN 111667117 A CN111667117 A CN 111667117A
- Authority
- CN
- China
- Prior art keywords
- data
- power load
- theta
- data set
- bayesian estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 230000001502 supplementing effect Effects 0.000 title claims abstract description 9
- 239000013589 supplement Substances 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000007476 Maximum Likelihood Methods 0.000 claims abstract description 5
- 238000005315 distribution function Methods 0.000 claims description 8
- 238000001543 one-way ANOVA Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 2
- 238000004088 simulation Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 239000008358 core component Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Supply And Distribution Of Alternating Current (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
In power load prediction, the historical load data of a given unit is usually the important basic data for performing prediction calculation and simulation, but it is the principle that the data set of the power load data contains missing values due to various reasons (such as data loss caused by an emergency, etc.), and these missing values are usually left as blank or marked as placeholders. When the power load prediction model trains a data set containing many missing values, the presence of the missing values may greatly affect the performance of the machine learning model. The invention provides a method for supplementing missing data by applying Bayesian estimation to missing values in historical load data in power load prediction, which is characterized in that a Bayesian estimation method is used for calculating maximum likelihood to supplement the missing values in a power historical load data set.
Description
Technical Field
The invention relates to the technical field of power load prediction, in particular to a method for supplementing missing data by applying Bayesian estimation to missing values in historical load data in power load prediction.
Background
The accurate prediction of the power load is an important basis for ensuring the safety and economic operation of a power system and realizing scientific management and scheduling of a power grid, and is also a core component of a power energy management system. In power load prediction, historical load data of a given unit is usually important basic data for prediction calculation and simulation. However, it is understood that the data set of the power load data may contain missing values for various reasons (e.g., data loss due to an emergency, etc.), and these missing values are usually left blank or marked as placeholders. When the power load prediction model trains a data set containing many missing values, the presence of the missing values may greatly affect the performance of the machine learning model. Some algorithms in power load prediction assume that all values are numerical and inclusive. One way to deal with this problem is to delete individual data pairs that contain missing values, but this runs the risk of losing valuable information. Another preferred strategy is to interpolate missing values, i.e. to infer the size of the missing value from the observed data. The invention discloses a method for applying Bayesian estimation to missing values of historical load data in power load prediction to supplement missing data, and achieves the purpose of ensuring complete and effective operation prediction of a power load prediction model.
Disclosure of Invention
The invention provides a method for supplementing missing data by applying Bayesian estimation to missing values in historical load data in power load prediction, which is characterized in that a Bayesian estimation method is used for calculating maximum likelihood to supplement the missing values in a power historical load data set.
Bayesian estimation is a method for determining parameters of a model in statistics, and it is considered that each parameter in a data set obeys a certain probability distribution, and existing data is generated only under the distribution of the parameter. Therefore, in the intuitive understanding, a parameter theta is assumed, then the theta is solved according to data, wherein the probability p (theta) of theta occurrence needs to be set artificially, and then a specific theta is solved by combining a MAP (maximum degree posterior) method. Under the condition of small data quantity or sparse Bayesian estimation, the accuracy is improved by considering prior, and the estimated parameters can better reflect the actual situation. The application of Bayesian estimation in the invention is to fit missing data in power load prediction in the distribution of the whole data set to find the maximum likelihood number, fill up the null value, ensure the integrity of the data, and further ensure the model operation effect of the power load prediction, and the original data set before filling up the null value and the data set after filling up the null value are subjected to one-way-ANOVA (one way-ANOVA) to calculate the significance difference value between two groups of data, and it is required to ensure that no significance difference exists between the two groups of data. If significant difference exists after the two groups of data are verified, the selection of specific restrictive parameters in the Bayesian estimation model needs to be adjusted, or missing values are still eliminated to ensure that the filled data and the original data do not have significant difference, and the whole data set can keep certain effectiveness; the actually collected power load historical data or the data set subjected to outlier removal/denoising processing does not have missing values through Bayesian estimation, and the effectiveness of the whole data set can be improved. The filled data set is used for a power load prediction model, so that the reliability and the accuracy of power load prediction are greatly improved, and the implementation flow diagram of the method is shown in fig. 1.
Drawings
Fig. 1 is a schematic diagram illustrating a processing flow of supplementing missing values by using bayesian estimation according to an embodiment of the present invention.
Detailed Description
In order to make the content, the purpose, the features and the advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the protection scope of the present invention.
Step one, data preprocessing: arranging the collected original data according to a time sequence, determining the start and stop time of the data set, checking the default of the data on the time sequence, marking the default value and recording the default start and stop time.
Step two, Bayesian estimation supplement missing value: and (3) marking the historical power load data preprocessed in the first step with a timestamp, and then performing Bayesian estimation operation to supplement the consistency of the power load data on a time sequence without corresponding data in certain time periods. The calculation method specifically adopted is as follows:
1. determining a prior distribution function P (theta) of an uncertainty parameter theta through the distribution form of the data set;
2. d = { x from whole data set1,x2, …,xnSolving a joint distribution function P (D | theta) of samples, which is a function for theta;
3. and (3) solving the posterior distribution of theta by using a Bayesian formula:
4. and (3) solving a Bayesian estimation value:
whereinThe maximum likelihood number for the calculation target is used to supplement the missing value. The prior distribution function P (theta) and the joint distribution function P (D | theta) of the samples in the calculation method are obtained by fitting Gaussian distribution to a data set, and the preset condition is that the data set meets normal Gaussian distribution on the whole distribution.
Step three, data validity verification: the original power load data set and the data set after the supplementary data processing need to be checked for statistical difference of data validity to ensure the validity of the data. Two sets of data were subjected to one-way-ANOVA (one way-ANOVA) to calculate the significant difference between the two sets of data, which was required to ensure that there was no significant difference between the two sets of data. If the two groups of data have significance difference after verification, the selection of specific parameters in the second step needs to be adjusted, the number of the original data with great difference values eliminated is reduced, the degree of denoising processing is reduced to ensure that the processed data and the original data have no significance difference, and the processed data keeps effectiveness.
The invention provides a method for supplementing missing values caused by various reasons in historical data of power load prediction by applying a Bayesian estimation method, which is characterized in that the Bayesian estimation method is introduced in the power load prediction data preprocessing, and the fitting numerical value with the maximum probability is selected to supplement the missing values, so that the accuracy of the historical data for power load prediction is higher, and the prediction effect of the power load is obviously improved. By applying the method, the data set of the power load prediction model is more complete, so that the training effect of the prediction model is better, and the prediction accuracy is greatly improved.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (1)
1. The invention discloses a method for supplementing missing values by applying Bayesian estimation in power load prediction, which is characterized by comprising the following steps of: step one, data preprocessing: arranging the collected original data according to a time sequence, determining the start-stop time of the data set, checking the default of the data on the time sequence, marking a default value and recording the default start-stop time;
step two, Bayesian estimation supplement missing value: marking the historical power load data preprocessed in the first step with a timestamp, and then performing Bayesian estimation operation to supplement the consistency of the power load data on a time sequence without corresponding data in certain time periods;
the calculation method specifically adopted is as follows:
1) determining a prior distribution function P (theta) of an uncertainty parameter theta through the distribution form of the data set;
2) d = { x from whole data set1,x2, …,xnSolving a joint distribution function P (D | theta) of samples, which is a function for theta;
3) and (3) solving the posterior distribution of theta by using a Bayesian formula:
4) and (3) solving a Bayesian estimation value:
the prior distribution function P (theta) and the joint distribution function P (D | theta) of the sample in the calculation method are obtained by fitting Gaussian distribution to a data set, and the preset condition is that the data set meets normal Gaussian distribution on the whole distribution;
step three, data validity verification: the original power load data set and the data set after being processed by the supplementary data need to be checked for statistical difference of data validity to ensure the validity of the data, two groups of data need to be subjected to one way-ANOVA (one way-ANOVA), a significant difference value between the two groups of data is calculated, no significant difference between the two groups of data needs to be ensured,
if the two groups of data have significance difference after verification, the selection of specific parameters in the second step needs to be adjusted, the number of the original data with great difference values eliminated is reduced, the degree of denoising processing is reduced to ensure that the processed data and the original data have no significance difference, and the processed data keeps effectiveness.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010521260.9A CN111667117A (en) | 2020-06-10 | 2020-06-10 | Method for supplementing missing value by applying Bayesian estimation in power load prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010521260.9A CN111667117A (en) | 2020-06-10 | 2020-06-10 | Method for supplementing missing value by applying Bayesian estimation in power load prediction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111667117A true CN111667117A (en) | 2020-09-15 |
Family
ID=72386187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010521260.9A Pending CN111667117A (en) | 2020-06-10 | 2020-06-10 | Method for supplementing missing value by applying Bayesian estimation in power load prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111667117A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117785855A (en) * | 2023-12-25 | 2024-03-29 | 杭州字节方舟科技有限公司 | Block chain-based wind control early warning, device, equipment and storage medium |
CN117932474A (en) * | 2024-03-22 | 2024-04-26 | 山东核电有限公司 | Training method, device, equipment and storage medium of communication missing data determination model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101964998A (en) * | 2009-07-24 | 2011-02-02 | 北京亿阳信通软件研究院有限公司 | Forecasting method and device of telephone traffic in ordinary holiday of telecommunication network |
CN104008433A (en) * | 2014-06-03 | 2014-08-27 | 国家电网公司 | Method for predicting medium-and-long-term power loads on basis of Bayes dynamic model |
CN107577649A (en) * | 2017-09-26 | 2018-01-12 | 广州供电局有限公司 | The interpolation processing method and device of missing data |
CN108320063A (en) * | 2018-03-26 | 2018-07-24 | 上海积成能源科技有限公司 | To the method for rejecting abnormal data and denoising in a kind of load forecast |
US20200082283A1 (en) * | 2018-09-12 | 2020-03-12 | Samsung Sds Co., Ltd. | Method and apparatus for correcting missing value in data |
-
2020
- 2020-06-10 CN CN202010521260.9A patent/CN111667117A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101964998A (en) * | 2009-07-24 | 2011-02-02 | 北京亿阳信通软件研究院有限公司 | Forecasting method and device of telephone traffic in ordinary holiday of telecommunication network |
CN104008433A (en) * | 2014-06-03 | 2014-08-27 | 国家电网公司 | Method for predicting medium-and-long-term power loads on basis of Bayes dynamic model |
CN107577649A (en) * | 2017-09-26 | 2018-01-12 | 广州供电局有限公司 | The interpolation processing method and device of missing data |
CN108320063A (en) * | 2018-03-26 | 2018-07-24 | 上海积成能源科技有限公司 | To the method for rejecting abnormal data and denoising in a kind of load forecast |
US20200082283A1 (en) * | 2018-09-12 | 2020-03-12 | Samsung Sds Co., Ltd. | Method and apparatus for correcting missing value in data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117785855A (en) * | 2023-12-25 | 2024-03-29 | 杭州字节方舟科技有限公司 | Block chain-based wind control early warning, device, equipment and storage medium |
CN117932474A (en) * | 2024-03-22 | 2024-04-26 | 山东核电有限公司 | Training method, device, equipment and storage medium of communication missing data determination model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113965359B (en) | Federal learning data poisoning attack-oriented defense method and device | |
CN110942154A (en) | Data processing method, device, equipment and storage medium based on federal learning | |
CN111667117A (en) | Method for supplementing missing value by applying Bayesian estimation in power load prediction | |
CN108062573A (en) | Model training method and device | |
CN111860980A (en) | Method for interpolating and supplementing missing value by applying classification regression tree in power load prediction | |
CN112558875B (en) | Data verification method and device, electronic equipment and storage medium | |
US20230155378A1 (en) | Reliability calculation method of power distribution system considering hierarchical decentralized control of demand-side resources | |
CN111784173B (en) | AB experiment data processing method, device, server and medium | |
CN114911788B (en) | Data interpolation method and device and storage medium | |
CN117032954B (en) | Memory optimization method, system, equipment and medium for terminal training model | |
CN111667123A (en) | Method for supplementing missing value by applying multiple interpolation in power load prediction | |
CN109784484A (en) | Neural network accelerated method, device, neural network accelerate chip and storage medium | |
CN107492303A (en) | Drawing method and system for equivalent salt deposit density distribution map of power transmission line in coastal region | |
CN106294115A (en) | The method of testing of a kind of application system animal migration and device | |
CN107945034A (en) | Financial analysis method, application server and computer-readable recording medium based on microblogging finance and economics event | |
CN112488843A (en) | Enterprise risk early warning method, device, equipment and medium based on social network | |
CN108804640B (en) | Data grouping method, device, storage medium and equipment based on maximized IV | |
CN111966676A (en) | Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining | |
CN115713395A (en) | Flink-based user wind control management method, device and equipment | |
CN111768045A (en) | Method for supplementing resident electricity consumption missing data by applying multiple interpolation in resident electricity consumption management | |
CN111428886A (en) | Fault diagnosis deep learning model self-adaptive updating method and device | |
CN114679466B (en) | Consensus processing method, device, computer equipment and medium for block chain network | |
CN103106103A (en) | Requesting information classification method and device | |
CN108805778A (en) | Electronic device, the method and storage medium for acquiring collage-credit data | |
CN111641704B (en) | Resource-related data transmission method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200915 |