CN111966676A - Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining - Google Patents

Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining Download PDF

Info

Publication number
CN111966676A
CN111966676A CN202010920130.2A CN202010920130A CN111966676A CN 111966676 A CN111966676 A CN 111966676A CN 202010920130 A CN202010920130 A CN 202010920130A CN 111966676 A CN111966676 A CN 111966676A
Authority
CN
China
Prior art keywords
data
theta
electricity consumption
bayesian estimation
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010920130.2A
Other languages
Chinese (zh)
Inventor
周浩
顾一峰
胡炳谦
韩俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ieslab Energy Technology Co ltd
Original Assignee
Shanghai Ieslab Energy Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ieslab Energy Technology Co ltd filed Critical Shanghai Ieslab Energy Technology Co ltd
Priority to CN202010920130.2A priority Critical patent/CN111966676A/en
Publication of CN111966676A publication Critical patent/CN111966676A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

In the management and analysis of the residential electricity consumption data, the requirement on the integrity of the data is high, and for missing values in the collected original data, the supplementation is required to be completed through various mathematical methods, and the validity of the data is maintained. The invention discloses a Bayesian estimation supplementing method for missing values in residential electricity consumption data, which can effectively supplement the missing data in the residential electricity consumption data through a series of mathematical calculations, thereby achieving the purposes of improving the data quality and ensuring the integrity of the data.

Description

Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining
Technical Field
The invention relates to the technical field of power load prediction, in particular to a method for supplementing missing data by applying Bayesian estimation to missing values in residential electricity consumption data mining.
Background
The influence of more factors on the electricity consumption of residents is achieved, and the fact that the electricity consumption habits of the residents and the rules among main influence factors of the electricity consumption habits of the residents are mastered has important significance on the dispatching of an electric power system, the promotion of electric power marketization and intelligent city management. The first step of analyzing and mining the residential electricity consumption data is to collect complete and effective residential electricity consumption data. However, the data set of the residential electricity consumption data may contain missing values for various reasons (e.g., data loss due to an emergency, etc.), and these missing values are usually left blank or marked as placeholders. When the data mining model trains a data set containing many missing values, the presence of the missing values can greatly affect the performance of the machine learning model. Some algorithms in data mining assume that all values are numerical and meaningful, and when these missing values are introduced into a data mining model, they will bring uncontrollable influence and loss of accuracy to the analysis results of the model. In this case, a more preferable method is to interpolate the missing value, that is, to estimate the size of the missing value from the observed data, and one of the methods is bayesian estimation. The invention discloses a method for supplementing missing data by applying Bayesian estimation to missing values of residential data, and the purpose of ensuring the integrity of residential electricity consumption data is achieved.
Disclosure of Invention
The invention provides a method for supplementing missing data by applying Bayesian estimation to missing values in resident electricity consumption data, which is characterized in that a Bayesian estimation method is used for calculating maximum likelihood numbers to supplement the missing values in a resident electricity consumption data set.
Bayesian estimation is a method for determining parameters of a model in statistics, and it is considered that each parameter in a data set obeys a certain probability distribution, and existing data is generated only under the distribution of the parameter. Therefore, in the intuitive understanding, a parameter theta is assumed, and then the theta is solved according to data, wherein the probability p (theta) of theta occurrence needs to be set artificially, and then a specific theta is solved by considering a method for maximizing a possible value. Under the condition of small data quantity or sparse Bayesian estimation, the accuracy is improved by considering prior, and the estimated parameters can better reflect the actual situation. The application of Bayesian estimation in the invention is to fit missing data in power load prediction in the distribution of the whole data set to find the maximum likelihood number, fill up the vacancy value, ensure the integrity of the data, and further ensure the operation effect of a residential electricity data mining model, wherein the original data set before filling up the vacancy value and the data set after filling up the vacancy value are subjected to one-way-ANOVA (one way-ANOVA), calculate the significance difference value between two groups of data, and need to ensure that no significance difference exists between the two groups of data. If significant difference exists after the two groups of data are verified, the selection of specific limiting parameters in the Bayesian estimation model needs to be adjusted, or missing values are still eliminated to ensure that the filled data and the original data do not have significant difference, the integral data set can keep certain effectiveness, and the actually collected residential electricity data or the data set subjected to abnormal value removal/denoising processing supplements the missing values through Bayesian estimation, so that the effectiveness of the integral data set can be improved. The filled data set is used for data mining, so that the reliability and accuracy of residential electricity management are greatly improved.
Drawings
FIG. 1 is a flow chart illustrating a process of using Bayesian estimation to supplement missing values in an embodiment of the present invention.
Detailed Description
In order to make the content, the purpose, the features and the advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the scope of the present invention, and the implementation steps of the present invention as shown in fig. 1 are as follows.
The first step,Data preprocessing:arranging the collected original resident electricity consumption data according to a time sequence, determining the start and stop time of the data set, checking the default of the data on the time sequence, marking a default value and recording the default start and stop time.
Step two,Bayes estimation complement deficiency value:and (3) marking the resident electricity consumption data preprocessed in the step (1) with a timestamp, and then performing Bayesian estimation operation to supplement the consistency of the power load data on a time sequence without corresponding data in some time periods. The calculation method is specifically adopted asThe following:
1. determining a prior distribution function P (theta) of an uncertainty parameter theta through the distribution form of the data set;
2. d = { x from whole data set1,x2, …,xnSolving a joint distribution function P (D | theta) of samples, which is a function for theta;
3. and (3) solving the posterior distribution of theta by using a Bayesian formula:
Figure 859641DEST_PATH_IMAGE001
4. and (3) solving a Bayesian estimation value:
Figure 557470DEST_PATH_IMAGE002
wherein
Figure 729563DEST_PATH_IMAGE003
The maximum likelihood number for the calculation target is used to supplement the missing value. The prior distribution function P (theta) and the joint distribution function P (D | theta) of the samples in the calculation method are obtained by fitting Gaussian distribution to a data set, and the preset condition is that the data set meets normal Gaussian distribution on the whole distribution.
Step three,Data validity verification: the original residential electricity data set and the data set after the supplementary data processing need to be checked for statistical difference of data validity to ensure the validity of the data. Two sets of data were subjected to one-way-ANOVA (one way-ANOVA) to calculate the significant difference between the two sets of data, which was required to ensure that there was no significant difference between the two sets of data. If the two groups of data have significance difference after verification, the selection of specific parameters in the second step needs to be adjusted, the number of the original data with great difference values eliminated is reduced, the degree of denoising processing is reduced to ensure that the processed data and the original data have no significance difference, and the processed data keeps effectiveness.
The invention provides a method for supplementing missing values caused by various reasons in residential electricity data by applying a Bayesian estimation method, which is characterized in that the Bayesian estimation method is introduced in the processing of the residential electricity data, and the fitting numerical value with the maximum probability is selected to supplement the missing values, so that the residential electricity data is more complete, and the data quality is obviously improved. By applying the method, the analysis and mining based on the residential electricity consumption data are more accurate and reliable.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (2)

1. The invention discloses a method for supplementing missing values by applying Bayesian estimation in residential electricity consumption data mining, which is characterized by comprising the following steps of:
the first step,Data preprocessing:arranging the collected original resident electricity consumption data according to a time sequence, determining the start-stop time of the data set, checking the default of the data on the time sequence, marking a default value and recording the default start-stop time;
step two,Bayes estimation complement deficiency value:marking the resident electricity consumption data preprocessed in the step 1 with a timestamp, and then performing Bayesian estimation operation to supplement the consistency of the power load data on a time sequence without corresponding data in some time periods;
the calculation method specifically adopted is as follows:
1) determining a prior distribution function P (theta) of the parameter theta of the determined loss value according to the distribution form of the data set;
2) d = { x from whole data set1,x2, …,xnSolving a joint distribution function P (D | theta) of samples, which is a function for theta;
3) solving the posterior distribution of theta by using a Bayesian formula:
Figure 176070DEST_PATH_IMAGE001
4) solving a Bayesian estimation value:
Figure 276750DEST_PATH_IMAGE002
wherein
Figure 909463DEST_PATH_IMAGE003
The maximum likelihood number which is calculated for the calculation target is used for supplementing a missing value, a prior distribution function P (theta) and a joint distribution function P (D | theta) of a sample in the calculation method are obtained by fitting Gaussian distribution to a data set, and the preset condition is that the data set meets normal Gaussian distribution on the whole distribution;
step three,Data validity verification: the original resident electricity utilization data set and the data set after the supplementary data processing need to be checked for data validity statistical difference to ensure the validity of the data, and the two groups of data need to be subjected to one way-ANOVA (one way-ANOVA) to calculate the significance difference value between the two groups of data and ensure that no significance difference exists between the two groups of data;
if the two groups of data have significance difference after verification, the selection of specific parameters in the second step needs to be adjusted, the number of the original data with great difference values eliminated is reduced, the degree of denoising processing is reduced to ensure that the processed data and the original data have no significance difference, and the processed data keeps effectiveness.
2. The invention provides a method for supplementing missing values caused by various reasons in residential electricity data by applying a Bayesian estimation method, which is characterized in that the Bayesian estimation method is introduced in the processing of the residential electricity data, and the fitting numerical value with the maximum probability is selected to supplement the missing values, so that the residential electricity data is more complete, and the data quality is obviously improved.
CN202010920130.2A 2020-09-04 2020-09-04 Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining Pending CN111966676A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010920130.2A CN111966676A (en) 2020-09-04 2020-09-04 Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010920130.2A CN111966676A (en) 2020-09-04 2020-09-04 Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining

Publications (1)

Publication Number Publication Date
CN111966676A true CN111966676A (en) 2020-11-20

Family

ID=73392026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010920130.2A Pending CN111966676A (en) 2020-09-04 2020-09-04 Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining

Country Status (1)

Country Link
CN (1) CN111966676A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964998A (en) * 2009-07-24 2011-02-02 北京亿阳信通软件研究院有限公司 Forecasting method and device of telephone traffic in ordinary holiday of telecommunication network
CN107577649A (en) * 2017-09-26 2018-01-12 广州供电局有限公司 The interpolation processing method and device of missing data
CN109740826A (en) * 2019-01-30 2019-05-10 广东工业大学 A kind of cooling heating and power generation system load forecasting method based on Dynamic Data Mining
CN111506635A (en) * 2020-05-11 2020-08-07 上海积成能源科技有限公司 System and method for analyzing residential electricity consumption behavior based on autoregressive naive Bayes algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964998A (en) * 2009-07-24 2011-02-02 北京亿阳信通软件研究院有限公司 Forecasting method and device of telephone traffic in ordinary holiday of telecommunication network
CN107577649A (en) * 2017-09-26 2018-01-12 广州供电局有限公司 The interpolation processing method and device of missing data
CN109740826A (en) * 2019-01-30 2019-05-10 广东工业大学 A kind of cooling heating and power generation system load forecasting method based on Dynamic Data Mining
CN111506635A (en) * 2020-05-11 2020-08-07 上海积成能源科技有限公司 System and method for analyzing residential electricity consumption behavior based on autoregressive naive Bayes algorithm

Similar Documents

Publication Publication Date Title
CN105512799B (en) Power system transient stability evaluation method based on mass online historical data
CN112730938B (en) Electricity larceny user judging method based on electricity utilization acquisition big data
CN111860980A (en) Method for interpolating and supplementing missing value by applying classification regression tree in power load prediction
CN110633194B (en) Performance evaluation method of hardware resources in specific environment
CN106570790B (en) Wind power plant output data restoration method considering wind speed data segmentation characteristics
CN116881718A (en) Artificial intelligence training method and system based on big data cleaning
WO2019019429A1 (en) Anomaly detection method, device and apparatus for virtual machine, and storage medium
CN108683658B (en) Industrial control network flow abnormity identification method based on multi-RBM network construction reference model
CN112215398A (en) Power consumer load prediction model establishing method, device, equipment and storage medium
CN112419268A (en) Method, device, equipment and medium for detecting image defects of power transmission line
CN116523140A (en) Method and device for detecting electricity theft, electronic equipment and storage medium
CN114168583A (en) Electric quantity data cleaning method and system based on regular automatic encoder
CN111966676A (en) Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining
CN113516192A (en) Method, system, device and storage medium for identifying user electricity consumption transaction
CN111667117A (en) Method for supplementing missing value by applying Bayesian estimation in power load prediction
CN111026624B (en) Fault prediction method and device of power grid information system
CN111159251A (en) Method and device for determining abnormal data
CN111310121A (en) New energy output probability prediction method and system
CN111768045A (en) Method for supplementing resident electricity consumption missing data by applying multiple interpolation in resident electricity consumption management
CN113821419A (en) Cloud server aging prediction method based on SVR and Gaussian function
CN116955059A (en) Root cause positioning method, root cause positioning device, computing equipment and computer storage medium
CN111667123A (en) Method for supplementing missing value by applying multiple interpolation in power load prediction
CN112967154B (en) Assessment method and device for Well-rolling of power system
CN106485526A (en) A kind of diagnostic method of data mining model and device
CN114677052A (en) Natural gas load fluctuation asymmetry analysis method and system based on TARCH model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201120