CN111966676A - Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining - Google Patents
Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining Download PDFInfo
- Publication number
- CN111966676A CN111966676A CN202010920130.2A CN202010920130A CN111966676A CN 111966676 A CN111966676 A CN 111966676A CN 202010920130 A CN202010920130 A CN 202010920130A CN 111966676 A CN111966676 A CN 111966676A
- Authority
- CN
- China
- Prior art keywords
- data
- theta
- electricity consumption
- bayesian estimation
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005611 electricity Effects 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000001502 supplementing effect Effects 0.000 title claims abstract description 10
- 238000007418 data mining Methods 0.000 title claims description 9
- 239000013589 supplement Substances 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000005315 distribution function Methods 0.000 claims description 8
- 238000001543 one-way ANOVA Methods 0.000 claims description 6
- 238000007476 Maximum Likelihood Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 208000028523 Hereditary Complement Deficiency disease Diseases 0.000 claims description 2
- 201000002388 complement deficiency Diseases 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000012067 mathematical method Methods 0.000 abstract 1
- 230000009469 supplementation Effects 0.000 abstract 1
- 238000005065 mining Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
In the management and analysis of the residential electricity consumption data, the requirement on the integrity of the data is high, and for missing values in the collected original data, the supplementation is required to be completed through various mathematical methods, and the validity of the data is maintained. The invention discloses a Bayesian estimation supplementing method for missing values in residential electricity consumption data, which can effectively supplement the missing data in the residential electricity consumption data through a series of mathematical calculations, thereby achieving the purposes of improving the data quality and ensuring the integrity of the data.
Description
Technical Field
The invention relates to the technical field of power load prediction, in particular to a method for supplementing missing data by applying Bayesian estimation to missing values in residential electricity consumption data mining.
Background
The influence of more factors on the electricity consumption of residents is achieved, and the fact that the electricity consumption habits of the residents and the rules among main influence factors of the electricity consumption habits of the residents are mastered has important significance on the dispatching of an electric power system, the promotion of electric power marketization and intelligent city management. The first step of analyzing and mining the residential electricity consumption data is to collect complete and effective residential electricity consumption data. However, the data set of the residential electricity consumption data may contain missing values for various reasons (e.g., data loss due to an emergency, etc.), and these missing values are usually left blank or marked as placeholders. When the data mining model trains a data set containing many missing values, the presence of the missing values can greatly affect the performance of the machine learning model. Some algorithms in data mining assume that all values are numerical and meaningful, and when these missing values are introduced into a data mining model, they will bring uncontrollable influence and loss of accuracy to the analysis results of the model. In this case, a more preferable method is to interpolate the missing value, that is, to estimate the size of the missing value from the observed data, and one of the methods is bayesian estimation. The invention discloses a method for supplementing missing data by applying Bayesian estimation to missing values of residential data, and the purpose of ensuring the integrity of residential electricity consumption data is achieved.
Disclosure of Invention
The invention provides a method for supplementing missing data by applying Bayesian estimation to missing values in resident electricity consumption data, which is characterized in that a Bayesian estimation method is used for calculating maximum likelihood numbers to supplement the missing values in a resident electricity consumption data set.
Bayesian estimation is a method for determining parameters of a model in statistics, and it is considered that each parameter in a data set obeys a certain probability distribution, and existing data is generated only under the distribution of the parameter. Therefore, in the intuitive understanding, a parameter theta is assumed, and then the theta is solved according to data, wherein the probability p (theta) of theta occurrence needs to be set artificially, and then a specific theta is solved by considering a method for maximizing a possible value. Under the condition of small data quantity or sparse Bayesian estimation, the accuracy is improved by considering prior, and the estimated parameters can better reflect the actual situation. The application of Bayesian estimation in the invention is to fit missing data in power load prediction in the distribution of the whole data set to find the maximum likelihood number, fill up the vacancy value, ensure the integrity of the data, and further ensure the operation effect of a residential electricity data mining model, wherein the original data set before filling up the vacancy value and the data set after filling up the vacancy value are subjected to one-way-ANOVA (one way-ANOVA), calculate the significance difference value between two groups of data, and need to ensure that no significance difference exists between the two groups of data. If significant difference exists after the two groups of data are verified, the selection of specific limiting parameters in the Bayesian estimation model needs to be adjusted, or missing values are still eliminated to ensure that the filled data and the original data do not have significant difference, the integral data set can keep certain effectiveness, and the actually collected residential electricity data or the data set subjected to abnormal value removal/denoising processing supplements the missing values through Bayesian estimation, so that the effectiveness of the integral data set can be improved. The filled data set is used for data mining, so that the reliability and accuracy of residential electricity management are greatly improved.
Drawings
FIG. 1 is a flow chart illustrating a process of using Bayesian estimation to supplement missing values in an embodiment of the present invention.
Detailed Description
In order to make the content, the purpose, the features and the advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the scope of the present invention, and the implementation steps of the present invention as shown in fig. 1 are as follows.
The first step,Data preprocessing:arranging the collected original resident electricity consumption data according to a time sequence, determining the start and stop time of the data set, checking the default of the data on the time sequence, marking a default value and recording the default start and stop time.
Step two,Bayes estimation complement deficiency value:and (3) marking the resident electricity consumption data preprocessed in the step (1) with a timestamp, and then performing Bayesian estimation operation to supplement the consistency of the power load data on a time sequence without corresponding data in some time periods. The calculation method is specifically adopted asThe following:
1. determining a prior distribution function P (theta) of an uncertainty parameter theta through the distribution form of the data set;
2. d = { x from whole data set1,x2, …,xnSolving a joint distribution function P (D | theta) of samples, which is a function for theta;
3. and (3) solving the posterior distribution of theta by using a Bayesian formula:
4. and (3) solving a Bayesian estimation value:
whereinThe maximum likelihood number for the calculation target is used to supplement the missing value. The prior distribution function P (theta) and the joint distribution function P (D | theta) of the samples in the calculation method are obtained by fitting Gaussian distribution to a data set, and the preset condition is that the data set meets normal Gaussian distribution on the whole distribution.
Step three,Data validity verification: the original residential electricity data set and the data set after the supplementary data processing need to be checked for statistical difference of data validity to ensure the validity of the data. Two sets of data were subjected to one-way-ANOVA (one way-ANOVA) to calculate the significant difference between the two sets of data, which was required to ensure that there was no significant difference between the two sets of data. If the two groups of data have significance difference after verification, the selection of specific parameters in the second step needs to be adjusted, the number of the original data with great difference values eliminated is reduced, the degree of denoising processing is reduced to ensure that the processed data and the original data have no significance difference, and the processed data keeps effectiveness.
The invention provides a method for supplementing missing values caused by various reasons in residential electricity data by applying a Bayesian estimation method, which is characterized in that the Bayesian estimation method is introduced in the processing of the residential electricity data, and the fitting numerical value with the maximum probability is selected to supplement the missing values, so that the residential electricity data is more complete, and the data quality is obviously improved. By applying the method, the analysis and mining based on the residential electricity consumption data are more accurate and reliable.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (2)
1. The invention discloses a method for supplementing missing values by applying Bayesian estimation in residential electricity consumption data mining, which is characterized by comprising the following steps of:
the first step,Data preprocessing:arranging the collected original resident electricity consumption data according to a time sequence, determining the start-stop time of the data set, checking the default of the data on the time sequence, marking a default value and recording the default start-stop time;
step two,Bayes estimation complement deficiency value:marking the resident electricity consumption data preprocessed in the step 1 with a timestamp, and then performing Bayesian estimation operation to supplement the consistency of the power load data on a time sequence without corresponding data in some time periods;
the calculation method specifically adopted is as follows:
1) determining a prior distribution function P (theta) of the parameter theta of the determined loss value according to the distribution form of the data set;
2) d = { x from whole data set1,x2, …,xnSolving a joint distribution function P (D | theta) of samples, which is a function for theta;
3) solving the posterior distribution of theta by using a Bayesian formula:
4) solving a Bayesian estimation value:
whereinThe maximum likelihood number which is calculated for the calculation target is used for supplementing a missing value, a prior distribution function P (theta) and a joint distribution function P (D | theta) of a sample in the calculation method are obtained by fitting Gaussian distribution to a data set, and the preset condition is that the data set meets normal Gaussian distribution on the whole distribution;
step three,Data validity verification: the original resident electricity utilization data set and the data set after the supplementary data processing need to be checked for data validity statistical difference to ensure the validity of the data, and the two groups of data need to be subjected to one way-ANOVA (one way-ANOVA) to calculate the significance difference value between the two groups of data and ensure that no significance difference exists between the two groups of data;
if the two groups of data have significance difference after verification, the selection of specific parameters in the second step needs to be adjusted, the number of the original data with great difference values eliminated is reduced, the degree of denoising processing is reduced to ensure that the processed data and the original data have no significance difference, and the processed data keeps effectiveness.
2. The invention provides a method for supplementing missing values caused by various reasons in residential electricity data by applying a Bayesian estimation method, which is characterized in that the Bayesian estimation method is introduced in the processing of the residential electricity data, and the fitting numerical value with the maximum probability is selected to supplement the missing values, so that the residential electricity data is more complete, and the data quality is obviously improved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010920130.2A CN111966676A (en) | 2020-09-04 | 2020-09-04 | Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010920130.2A CN111966676A (en) | 2020-09-04 | 2020-09-04 | Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111966676A true CN111966676A (en) | 2020-11-20 |
Family
ID=73392026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010920130.2A Pending CN111966676A (en) | 2020-09-04 | 2020-09-04 | Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111966676A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101964998A (en) * | 2009-07-24 | 2011-02-02 | 北京亿阳信通软件研究院有限公司 | Forecasting method and device of telephone traffic in ordinary holiday of telecommunication network |
CN107577649A (en) * | 2017-09-26 | 2018-01-12 | 广州供电局有限公司 | The interpolation processing method and device of missing data |
CN109740826A (en) * | 2019-01-30 | 2019-05-10 | 广东工业大学 | A kind of cooling heating and power generation system load forecasting method based on Dynamic Data Mining |
CN111506635A (en) * | 2020-05-11 | 2020-08-07 | 上海积成能源科技有限公司 | System and method for analyzing residential electricity consumption behavior based on autoregressive naive Bayes algorithm |
-
2020
- 2020-09-04 CN CN202010920130.2A patent/CN111966676A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101964998A (en) * | 2009-07-24 | 2011-02-02 | 北京亿阳信通软件研究院有限公司 | Forecasting method and device of telephone traffic in ordinary holiday of telecommunication network |
CN107577649A (en) * | 2017-09-26 | 2018-01-12 | 广州供电局有限公司 | The interpolation processing method and device of missing data |
CN109740826A (en) * | 2019-01-30 | 2019-05-10 | 广东工业大学 | A kind of cooling heating and power generation system load forecasting method based on Dynamic Data Mining |
CN111506635A (en) * | 2020-05-11 | 2020-08-07 | 上海积成能源科技有限公司 | System and method for analyzing residential electricity consumption behavior based on autoregressive naive Bayes algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105512799B (en) | Power system transient stability evaluation method based on mass online historical data | |
CN112730938B (en) | Electricity larceny user judging method based on electricity utilization acquisition big data | |
CN111860980A (en) | Method for interpolating and supplementing missing value by applying classification regression tree in power load prediction | |
CN110633194B (en) | Performance evaluation method of hardware resources in specific environment | |
CN106570790B (en) | Wind power plant output data restoration method considering wind speed data segmentation characteristics | |
CN116881718A (en) | Artificial intelligence training method and system based on big data cleaning | |
WO2019019429A1 (en) | Anomaly detection method, device and apparatus for virtual machine, and storage medium | |
CN108683658B (en) | Industrial control network flow abnormity identification method based on multi-RBM network construction reference model | |
CN112215398A (en) | Power consumer load prediction model establishing method, device, equipment and storage medium | |
CN112419268A (en) | Method, device, equipment and medium for detecting image defects of power transmission line | |
CN116523140A (en) | Method and device for detecting electricity theft, electronic equipment and storage medium | |
CN114168583A (en) | Electric quantity data cleaning method and system based on regular automatic encoder | |
CN111966676A (en) | Method for supplementing missing value by applying Bayesian estimation in residential electricity consumption data mining | |
CN113516192A (en) | Method, system, device and storage medium for identifying user electricity consumption transaction | |
CN111667117A (en) | Method for supplementing missing value by applying Bayesian estimation in power load prediction | |
CN111026624B (en) | Fault prediction method and device of power grid information system | |
CN111159251A (en) | Method and device for determining abnormal data | |
CN111310121A (en) | New energy output probability prediction method and system | |
CN111768045A (en) | Method for supplementing resident electricity consumption missing data by applying multiple interpolation in resident electricity consumption management | |
CN113821419A (en) | Cloud server aging prediction method based on SVR and Gaussian function | |
CN116955059A (en) | Root cause positioning method, root cause positioning device, computing equipment and computer storage medium | |
CN111667123A (en) | Method for supplementing missing value by applying multiple interpolation in power load prediction | |
CN112967154B (en) | Assessment method and device for Well-rolling of power system | |
CN106485526A (en) | A kind of diagnostic method of data mining model and device | |
CN114677052A (en) | Natural gas load fluctuation asymmetry analysis method and system based on TARCH model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201120 |