CN114048200A - User electricity consumption behavior analysis method considering missing data completion - Google Patents

User electricity consumption behavior analysis method considering missing data completion Download PDF

Info

Publication number
CN114048200A
CN114048200A CN202111324959.7A CN202111324959A CN114048200A CN 114048200 A CN114048200 A CN 114048200A CN 202111324959 A CN202111324959 A CN 202111324959A CN 114048200 A CN114048200 A CN 114048200A
Authority
CN
China
Prior art keywords
data
user
users
load
missing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111324959.7A
Other languages
Chinese (zh)
Inventor
关艳
田浩杰
陈洪禹
孙殿家
张冶
吴彤
高曦莹
陆心怡
王一苗
王馨璐
郭丹
宋轩宇
王玥
于跃
杨佳璇
田贵阳
李哲
闫奕名
王铭玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Marketing Service Center Of State Grid Liaoning Electric Power Co ltd
State Grid Corp of China SGCC
Original Assignee
Marketing Service Center Of State Grid Liaoning Electric Power Co ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Marketing Service Center Of State Grid Liaoning Electric Power Co ltd, State Grid Corp of China SGCC filed Critical Marketing Service Center Of State Grid Liaoning Electric Power Co ltd
Priority to CN202111324959.7A priority Critical patent/CN114048200A/en
Publication of CN114048200A publication Critical patent/CN114048200A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a user electricity consumption behavior analysis method considering missing data completion, which comprises the following steps of: acquiring archive data of a user, meter electric quantity data and environment temperature and humidity data; preprocessing the acquired data (first data recovery), classifying the preprocessed data, extracting power utilization characteristics, and performing segmentation storage according to different user types; performing aggregation and reclassification on the preprocessed complete data, and acquiring a convergence center where a user with the missing data is located and a second data recovery value corresponding to the missing data moment on the convergence center; calculating the average value of the two data recovery values as a final data recovery value and performing secondary data recovery; and analyzing the power utilization behavior of the user by using the recovered complete data. According to the user electricity consumption behavior analysis method considering missing data completion, the missing data is completed twice, so that overlarge errors can be avoided, and the metering data of the power system can be more complete and accurate.

Description

User electricity consumption behavior analysis method considering missing data completion
Technical Field
The invention belongs to the technical field of analysis of power utilization behaviors of users, and particularly provides a method for analyzing power utilization behaviors of users by taking missing data into account and supplementing the missing data.
Background
With the development of the intelligent level of the power system, the data collection amount of the power marketing business system is continuously increased every day, a large amount of data is accumulated, and the intelligent analysis and lean application requirements of the data are more and more strong. When marketing data is analyzed and mined, the situation of power utilization of users is found to be more complicated, so that the traditional simple user classification method according to the price of power, the industry and the like cannot fully mine and analyze the valuable hidden law for building the energy Internet.
The complete and accurate metering data is the basis for the power supply company to analyze the power utilization behavior of the user and respond to the demand side. However, in an actual production environment, data loss is caused by some uncontrollable factors, and at the moment, the accuracy of data analysis cannot be influenced by reasonably and effectively supplementing and recovering lost data.
In the early stage, a field investigation method is mostly adopted for the loss of the electricity utilization data. In order to change the situation, a plurality of experts and scholars in recent years research on recovering lost data by using an informatization means, and respectively provide corresponding methods for recovering the lost value. For example, the Yantao et al proposes to find the nearest gene by mahalanobis distance and to restore the missing data in gene expression; the money and the Wen and the like propose that the lost data recovery applied to the melanism fading time sequence research process is realized by a near point median method and a linear interpolation method; the strong build and the like are to recover the missing data by adopting an improved fuzzy clustering algorithm in the power station database.
When data are missing, the data are supplemented and perfected properly on the basis of data acquisition, so that after relatively complete and accurate data are obtained, potential laws in the data can be researched and explored by using the data mining technologies, the power utilization habits of users can be fully known, the supply and demand changes in the power market can be accurately analyzed, and optimization of operation control and scheduling plans is facilitated. Clustering analysis in data mining technology can be applied to classification processing of electricity utilization data. The method is characterized in that features are extracted from electricity utilization data, similarity comparison activities are carried out on sample objects through a clustering method, so that samples in a class have similar characteristics, and the characteristic difference between the samples and samples outside the class is large, and therefore refined electricity utilization behavior classification is achieved. The clustering algorithm K-Means is widely used with the advantages of simple principle, easy implementation and high efficiency. However, the K-Means algorithm is not suitable for the data set with uneven density and wide data distribution.
Therefore, the method for analyzing the user electricity utilization behavior considering missing data completion is provided, so that the user group can be classified finely on the basis of accurate and complete data, the identification and prediction of the electricity utilization characteristics and the electricity utilization mode of the user are achieved, and further, a basic condition is provided for power supply companies to develop electricity price formulation and demand side response, and the method becomes a problem to be solved urgently.
Disclosure of Invention
In view of this, the present invention provides a method for analyzing power consumption behavior of a user, which takes missing data compensation into account, so as to solve the problems of the existing methods.
The invention provides a user electricity consumption behavior analysis method considering missing data completion, which comprises the following steps of:
acquiring archive data of a user, meter electric quantity data and environment temperature and humidity data;
preprocessing the acquired data to obtain a first data recovery value V of a missing data user at the moment of missing data1And using said first data recovery value V1Performing first data recovery;
classifying and extracting power utilization characteristics of the complete data after the first data recovery is completed, and performing segmentation storage according to different user types;
performing aggregation and reclassification on users in the user type containing the missing data users, and acquiring the aggregation center where the missing data users are located and the corresponding missing data on the aggregation centerThe value of the moment is taken as a second data recovery value V2
Extracting a first data recovery value V1And a second data recovery value V corresponding thereto2Calculating an average value as a final data recovery value V of a missing data user at the moment of missing data, and performing secondary data recovery by using the final data recovery value V;
and analyzing the power utilization behavior of the user by using the complete data after the secondary data recovery.
Preferably, the profile data comprises user classification and electricity utilization classification; the electricity quantity data of the meter comprises real-time voltage, current and historical daily electricity quantity.
Further preferably, the collected data is preprocessed to obtain a first data recovery value V of the missing data user at the moment of missing data1The method specifically comprises the following steps:
sequencing a plurality of data before and after the missing data and finding a median;
calculating the average value of the median and the data moving forwards and backwards of the missing data as a first data recovery value V1
More preferably, the number of the missing data moving forward and backward is 5-10.
Further preferably, the preprocessing further includes a step of normalizing the load data.
Further preferably, the user types include industrial users, commercial users, residential customers, and other customers.
Further preferably, the step of performing aggregated reclassification on users within the user type containing the missing data users comprises the following steps:
randomly selecting K user data from user data in the user type of the user with the data missing as an initial clustering center;
calculating the arithmetic square root of the distance of a sample to the center of the initial cluster
Figure BDA0003346715560000031
And sample data is distributedClassifying samples in the closest cluster;
after all sample data are distributed, recalculating centers Z of K clustersi
Calculating the distance from each sample to each new cluster center again, and classifying the samples into the closest cluster, wherein the distance is the weighted distance
Figure BDA0003346715560000041
The above-mentioned
Figure BDA0003346715560000042
For weight, S represents the weighted standard deviation within the class;
after all the sample data are distributed, calculating the centers of the K clusters again and distributing the samples until the cluster centers are not changed any more.
Further preferably, the analyzing the power consumption behavior of the user includes:
respectively calculating the load characteristic indexes of various types of users:
average daily load rate:
Figure BDA0003346715560000043
in the formula, betakiIs the average daily load rate, N, of the ith user in the Kth userskK is the total number of kth users, K is 1,2,3, 4;
average daily minimum load rate:
Figure BDA0003346715560000044
in the formula, gammakiThe average daily minimum load rate of the ith user in the Kth users is obtained;
③ average daily peak-to-valley difference:
Figure BDA0003346715560000045
in the formula, thetakiThe average daily peak-to-valley difference rate of the ith user in the Kth users.
Further preferably, the analyzing the power consumption behavior of the user further includes:
respectively calculating the load curves of various types of users
The load curve calculation formula is:
Figure BDA0003346715560000051
in the formula (I), the compound is shown in the specification,
Figure BDA0003346715560000052
load of the Kth user at g1 month, NkThe total number of users of the kth category, K is 1,2,3,4, g1 is 1,2, …, 12,
Figure BDA0003346715560000053
load of g1 month for ith user among Kth users;
② the continuous load curve calculation formula is
Figure BDA0003346715560000054
In the formula (I), the compound is shown in the specification,
Figure BDA0003346715560000055
the load of the kth user at hour g2 is g2 ═ 1,2, …, 8760 or g2 ═ 1,2, …, 8784,
Figure BDA0003346715560000056
load for h 2 for ith user among Kth users;
③ the daily load curve calculation formula is
Figure BDA0003346715560000057
In the formula:
Figure BDA0003346715560000058
the load of the kth user at the time of g3 is g3 ═ 1,2, …, 24;
Figure BDA0003346715560000059
the load of the ith user at the g3 moment is the Kth user; dkiThe annual maximum load of the ith user is the Kth user.
According to the user electricity consumption behavior analysis method considering missing data completion, the possibility of overlarge errors is avoided by completing missing data twice, the missing data is accurately compensated with low deviation, the metering data of the power system can be more complete and accurate, and a tamping foundation is laid for the subsequent user electricity consumption behavior analysis.
Detailed Description
The present invention will be further explained with reference to specific embodiments, but is not limited thereto.
The invention provides a user electricity consumption behavior analysis method considering missing data completion, which comprises the following steps of:
acquiring archive data of a user, meter electric quantity data and environment temperature and humidity data;
preprocessing the acquired data to obtain a first data recovery value V of a missing data user at the moment of missing data1And using said first data recovery value V1Performing first data recovery;
classifying and extracting power utilization characteristics of the complete data after the first data recovery is completed, and performing segmentation storage according to different user types;
performing aggregation and reclassification on users in the user type containing the missing data users, and acquiring the aggregation center where the missing data users are located and the numerical value of the aggregation center corresponding to the missing data time as a second data recovery value V2
Extracting a first data recovery value V1And a second data recovery value V corresponding thereto2Calculating an average value as a final data recovery value V of a missing data user at the moment of missing data, and performing secondary data recovery by using the final data recovery value V;
and analyzing the power utilization behavior of the user by using the complete data after the secondary data recovery.
According to the user power consumption behavior analysis method considering missing data completion, the lost data is completed twice, the possibility of overlarge errors is avoided, the lost data is completed accurately in a low deviation mode, the metering data of a power system can be more complete and accurate, and a tamping foundation can be laid for the subsequent user power consumption behavior analysis.
Wherein the archive data comprises user classification and electricity utilization classification; the electricity quantity data of the meter comprises real-time voltage, current and historical daily electricity quantity.
As an improvement of the technical scheme, the collected data are preprocessed to obtain a first data recovery value V of a missing data user at the moment of missing data1The method specifically comprises the following steps:
sequencing a plurality of data before and after the missing data and finding a median;
calculating the average value of the median and the data moving forwards and backwards of the missing data as a first data recovery value V1
As an improvement of the technical scheme, the number of the missing data moving forwards and backwards is 5-10.
As an improvement of the technical solution, the preprocessing further includes a step of performing normalization processing on the load data.
As an improvement of the technical solution, the user types comprise industrial users, commercial users, residential customers and other customers.
As an improvement of the technical scheme, the step of performing aggregation reclassification on the users in the user types containing the users with the missing data comprises the following steps:
randomly selecting K user data from user data in the user type of the user with the data missing as an initial clustering center;
calculating the sample to the beginningArithmetic square root of distance from the origin to the center of the cluster
Figure BDA0003346715560000071
Distributing the sample data to the nearest cluster, and classifying the samples;
after all sample data are distributed, recalculating centers Z of K clustersi
Calculating the distance from each sample to each new cluster center again, and classifying the samples into the closest cluster, wherein the distance is the weighted distance
Figure BDA0003346715560000072
The above-mentioned
Figure BDA0003346715560000073
For weight, S represents the weighted standard deviation within the class;
after all the sample data are distributed, calculating the centers of the K clusters again and distributing the samples until the cluster centers are not changed any more.
The improved K-Means algorithm is adopted, the user power utilization behaviors are clustered more finely, the distance between classes is larger, the classes are more compact, more potential user power utilization behaviors can be excavated, the power utilization characteristics and the power utilization modes of the users can be identified and predicted, power supply companies can also make different strategies pertinently, and demand side response work is better developed.
As an improvement of the technical scheme, the step of analyzing the power utilization behavior of the user comprises the following steps:
respectively calculating the load characteristic indexes of various types of users:
average daily load rate:
Figure BDA0003346715560000081
in the formula, betakiIs the average daily load rate, N, of the ith user in the Kth userskK is the total number of kth users, K is 1,2,3, 4;
average daily minimum load rate:
Figure BDA0003346715560000082
in the formula, gammakiThe average daily minimum load rate of the ith user in the Kth users is obtained;
③ average daily peak-to-valley difference:
Figure BDA0003346715560000083
in the formula, thetakiThe average daily peak-to-valley difference rate of the ith user in the Kth users.
As an improvement of the technical scheme, analyzing the electricity utilization behavior of the user further comprises:
respectively calculating the load curves of various types of users
The load curve calculation formula is:
Figure BDA0003346715560000084
in the formula (I), the compound is shown in the specification,
Figure BDA0003346715560000085
load of the Kth user at g1 month, NkThe total number of users of the kth category, K is 1,2,3,4, g1 is 1,2, …, 12,
Figure BDA0003346715560000086
load of g1 month for ith user among Kth users;
② the continuous load curve calculation formula is
Figure BDA0003346715560000091
In the formula (I), the compound is shown in the specification,
Figure BDA0003346715560000092
the load of the kth user at hour g2 is g2 ═ 1,2, …, 8760 or g2 ═ 1,2, …, 8784,
Figure BDA0003346715560000093
load for h 2 for ith user among Kth users;
③ the daily load curve calculation formula is
Figure BDA0003346715560000094
In the formula:
Figure BDA0003346715560000095
the load of the kth user at the time of g3 is g3 ═ 1,2, …, 24;
Figure BDA0003346715560000096
the load of the ith user at the g3 moment is the Kth user; dkiThe annual maximum load of the ith user is the Kth user.
Examples
The method for analyzing the power utilization behavior of the user by taking missing data into account and supplementing comprises the following steps of:
the method comprises the following steps: data acquisition
The electricity consumption data of the user comprises two categories of archive data (user classification, electricity consumption category, historical monthly electricity quantity and the like) and meter electricity quantity data (real-time voltage, current, historical daily electricity quantity and the like). The file data is in the marketing business system, the meter electric quantity data is in the electricity consumption information acquisition system, and the temperature, the humidity and the like of external factors influencing the change of the electricity consumption data can be acquired through the social data management system.
The data sources are classified, so that data loss or malicious tampering can be timely traced to a specific position. Extracting user identity information M in the archive data, expressing the rest part in the archive data by A, respectively expressing the metering data and the external factor data (temperature and humidity) by B, C, wherein the number of sampling users is p, and the data format is as follows:
A=(a1,a2,a3,…,ap) (1)
B=(b1,b2,b3,…,bp) (2)
C=(c1,c2,c3,…,cp) (3)
data with the same number in A, B, C are collected at the same time on the same day, and the one-to-one correspondence of various kinds of data is guaranteed. Collecting q moments, and recording the sampling time T of each point as follows:
T=(t1,t2,t3,…,tq) (4)
thus, the data set that constitutes a single user is as follows:
Fi=(M,T,A,B,C) (5)
step two: data pre-processing
(1) Data cleansing
The user data to be analyzed comes from actual data of production life, and in the real world, due to various internal or external influences, data is missing or abnormal data exists. To improve the credibility and interpretability of the final result, these bad data are culled and repaired before analysis is performed.
The power consumption situation is different, the high-power operation of the power supply circuit can generate a load maximum value, once a power failure or accidental circuit maintenance is met, the load amount becomes a minimum value, and a large error can be generated by an averaging method. The median does not vary greatly due to variations in some of the data. Therefore, the invention adopts a method of combining median and mean to reduce the error. The specific method comprises the following steps: the method comprises the steps of firstly sequencing a plurality of data before and after the lost data to find a median, and then averaging the obtained median and the data moving forwards and backwards to recover the lost data.
Suppose XiMissing data at the ith data point of a certain day load curve, L being data XiA total of 2h +1 median (full) before and afterThe requirement that the number of feet is less than h). Recovered data Xi' is
Figure BDA0003346715560000111
In the formula, Xi-hAnd Xi+hEach represents XiH data in front and back; generally, h is 5-10.
The flow of obtaining the median L by the computer is as follows:
step 0, begin, obtain and lose the data Xi
Step 1, inputting the previous data X of the lost datai-h-1
Step 2, judging the input data Xi-h-1Whether less than or greater than Xi-h-1~Xi+hH in the data range; if yes, executing the step 6, otherwise, turning to the step 3;
step 3, carrying out decreasing operation on the numerical value h as h-1;
step 4, judging whether the value h after the degressive operation is larger than 1; if yes, executing the step 1, otherwise, turning to the step 5;
step 5, inputting the post data X of the lost datai+hAnd turning to step 2;
step 6, outputting a median L;
and 7, ending.
(2) Data normalization
Because the electricity utilization situation changes at different times, the load data at different times on the same day may have great difference, in order to improve the execution speed of the algorithm to shorten the time of the clustering process and show the dynamic change of the electricity utilization behavior of the user more clearly, the specific numerical value of the data needs to be limited within a certain range, the magnitude difference inside the data is reduced, namely normalization processing, and the result of the load characteristic data of the user finally falls in the interval [0,1 ]. The normalization formula is as in formula (7):
Figure BDA0003346715560000112
in the formula, XiFor the actual electrical load at the ith sampling moment, XWRepresenting the data of the ith point normalized by the extreme value and the electrical load after normalization, Xi maxAnd Xi minRespectively representing the minimum and maximum load values of the sample data sequence.
Step three: extracting electrical characteristics
Integrating the preprocessed data together and classifying, integrating the collected information of p users, and constructing a data matrix F as follows (taking the T1 moment as an example):
Figure BDA0003346715560000121
because the key indexes for distinguishing the user categories are power consumption and electricity charge, the magnitude difference of different types of users on the two indexes is large, the actual situation can be deviated from in the centralized processing of the electricity consumption data, and the users are classified firstly: industrial users U1 (number of users N1), commercial users U2 (number of users N2), residential clients U3 (number of users N3), and other clients U4 (number of users N4). Thus collected P (P ═ N)1+N2+N3+N4) The data set for an individual user may be represented as:
F=(U1,U2,U3,U4) (9)
step four: power usage behavior analysis
(1) Index of load characteristic
The first step of the load analysis is to obtain load characteristic indexes (average daily load rate, average daily minimum load rate, average daily peak-valley difference rate and the like), and the load characteristic indexes can simply and quickly reflect the rule and the characteristic of the change of the load along with the time. The difference of the electricity consumption data of different types of users is large, so that the load characteristic indexes of the users of all types are calculated respectively.
Average daily load rate:
Figure BDA0003346715560000131
in the formula, betakiThe average daily load rate of the ith user in the Kth users is obtained. N is a radical ofkThe total number of the kth user types is K ═ 1,2,3 and 4.
Average daily minimum load rate:
Figure BDA0003346715560000132
in the formula, gammakiThe average daily minimum load rate of the ith user in the Kth users is obtained.
③ average daily peak-to-valley difference:
Figure BDA0003346715560000133
in the formula, thetakiAnd the average daily peak-to-valley difference rate of the ith user of the Kth user.
(2) Load curve
The load characteristic index can obtain the electricity utilization characteristics of each large type of user, but the real-time state of the electricity utilization of the user and the annual electricity utilization trend cannot be analyzed, so that a load curve which can be expressed visually is needed, and the load curve comprises an annual maximum load curve (the load changes month by month all year), an annual continuous load curve (the annual load utilization hours) and a daily load curve. Since the individual fluctuations of the daily load have a great influence on the daily load curve, the integration is preferably performed by a weighted average method, and the weight is the root of the annual maximum load.
The load curve calculation formula is:
Figure BDA0003346715560000134
in the formula (I), the compound is shown in the specification,
Figure BDA0003346715560000135
load of the Kth user at g1 month, NkThe total number of users in the kth user type, K is 1,2,3,4, g1 is 1,2, …, 12,
Figure BDA0003346715560000141
load of g1 month for the ith user of the Kth user.
② the continuous load curve calculation formula is
Figure BDA0003346715560000142
In the formula (I), the compound is shown in the specification,
Figure BDA0003346715560000143
the load of the kth user at g2 hours is g2 ═ 1,2, …, 8760 (yearly) or g2 ═ 1,2, …, 8784 (leap year),
Figure BDA0003346715560000144
the load of the ith user of the Kth user at g2 hours.
③ the daily load curve calculation formula is
Figure BDA0003346715560000145
In the formula (I), the compound is shown in the specification,
Figure BDA0003346715560000146
the load of the kth user at the time of g3 is g3 ═ 1,2, …, 24;
Figure BDA0003346715560000147
the load of the ith user in the Kth user at the g3 moment; dkiThe load is the maximum load of the ith user year in the Kth users.
The annual load curves of various types of users show a more balanced trend, and the difference is in the load capacity; the continuous load curve mostly decreases along with the increase of time, but the decreasing speed is different; the daily load curve has several trends: single peak, double peak, triple peak, more balanced throughout the day and high at night and low during the day. Industrial users with a bimodal or trimodal trend are more peak-to-peak valleys in the noon.
(2) After the lost data of the user are supplemented in the preprocessing stage, the daily load curves of the user can be aggregated and classified on the basis of rough classification of the user, so that the lost data can be supplemented accurately at low deviation, and the power utilization behavior of the user can be analyzed more accurately.
The principle is simple, the clustering algorithm K-Means with high efficiency can be adopted, the K-Means algorithm finds the center with the closest sample data distance through a clustering criterion function, and iteration is carried out, so that the intervals among different types are larger, and the intervals among the same types are smaller. The K-Means algorithm is to divide data into a preset class number K on the basis of minimizing an error function, and adopts the distance as an evaluation index of similarity, namely, the closer the distance between two objects is, the greater the similarity of the two objects is. This application draws five user power consumption data characteristics: daily load peak valley time, daily load rate, daily minimum load rate and daily peak-valley difference rate.
(ii) conventional K-Means algorithm
The clustering criterion function adopted in the operation process is
Figure BDA0003346715560000151
In the formula: n isiIs the number of samples of the ith class, xijIs the jth sample in the ith class, ziIs the cluster center of the ith class;
center Z of K clustersiThe calculation of (2):
Figure BDA0003346715560000152
the clustering criterion function of the traditional K-Means algorithm takes the minimum sum of squares of errors in each class as an optimal result, so that the clustering criterion function is more suitable for uniformly distributed data. However, under the condition that users compare general classifications, most data are scattered, and the accuracy of clustering results is greatly reduced by the traditional algorithm. Therefore, the K-Means clustering algorithm can be improved, and the algorithm of the distance is adjusted by changing the clustering criterion function, so that the sample data is less deviated from the distance center, and the cluster distance is larger and the cluster is more compact. According to the method and the device, the weighted standard deviation in each class is taken as a clustering criterion, and the weight is the number of data objects in the class, so that the accuracy is prevented from being reduced.
Variance:
Figure BDA0003346715560000153
standard deviation:
Figure BDA0003346715560000161
in the formula: n isiIs the number of samples of the ith class, xijIs the jth sample in the ith class, ZiIs the cluster center of the ith class;
improved clustering criteria function (take user type industrial user U1 as an example):
Figure BDA0003346715560000162
in the formula: n is a radical of1For the total number of samples of the industrial user data set, niRepresents the number of ith class samples, SiIs the standard deviation of the ith class.
The computer implemented process of the improved K-Means method employed in the present application is as follows:
step 0, Slave U1Randomly selecting K user data from the user data as an initial clustering center;
step 1, calculating the arithmetic square root of the distance from the sample to the initial center of convergence
Figure BDA0003346715560000163
Step 2, distributing the sample data to the nearest cluster, and classifying the samples;
step 3, after all sample data are distributed, recalculating centers Z of K clustersi
Step 4, calculating the distance from each sample to each new cluster center again, wherein the distance is the weighted distance
Figure BDA0003346715560000164
(weight is
Figure BDA0003346715560000165
);
Step 5, classifying the samples into the clustering classes with the closest distance;
step 6, recalculating centers of the K clusters again;
step 7, judging whether the clustering center changes; if yes, executing the step 4, otherwise, executing the step 8;
and 8, stopping iteration and outputting a final classification result.
The power supply company can better master the power load conditions of various users according to the user clustering result, so that a more reasonable power supply strategy is formulated, the waste is avoided while the requirements of all users are met, and the power supply company can meet the demand from time to time.
Calculating lost data by improved K-Means algorithm
Carrying out refined classification on the power utilization behaviors of the users by using an improved K-Means algorithm so as to obtain the corresponding time t in the center of the focusiMore accurate data V2. The final value of the filling and restoring missing data is the average value of the values of the two parts, one aspect is not considered singly, the possibility of overlarge errors is avoided, and the subsequent power utilization behavior analysis based on the data is facilitated.
Wherein the center of convergence corresponds to the time tiData V of2The operation process of (2) is as follows:
step 0, marking the user M which is preprocessed and then completes dataiAnd the time t at which the data is locatedi
Step 1, obtaining user MiThe rough category of the location (industrial, commercial, residential, or other);
step 2, calculating and obtaining K clustering centers under rough classification of the users;
step 3, obtaining user MiCenter of mass K of the classmi
Step 4, obtaining a polymer core KmiUpper corresponding time tiIs the value of V2
Third, after the missing data value is completed, further electricity analysis can be carried out (taking the resident user U3 as an example)
The electricity utilization behaviors of the residential users are related to the working habits of the residential users, and the residential users often present certain periodicity, and have respective load characteristic curves in working days, holidays and different seasons. And the power supply company makes different strategies according to different load curves, so that the response work of the demand side is better developed.
The working days are usually of three types, the first type is a daily load curve with large fluctuation, and the valley area is the rest time at night, which shows that residents are likely to work at home, so that the maximum value in the load curve is more appeared in the daytime, and the electricity consumption of the users is the largest of the three types and is the basic part of customers of the power supply company. For this type, the marketing strategy of the power supply company is to keep its power utilization aggressiveness; the second type is a double-peak daily load curve, the peak time period is noon and night, the valley time period is other time, the possibility of the conventional outgoing work of the users is higher, and the peak time period and the valley time period are clearly demarcated, so that a power supply company can provide more reasonable power supply strategies and make more accurate electricity prices for the users, thereby saving resources for other users who supply and are not in demand; the third type is a daily load curve with stable fluctuation, the peak-valley division boundary is not obvious, but the electricity consumption at night is less than that in the daytime, and the total electricity consumption of the users is the lowest of the three types, so that the probability of retired old people is large due to various expressions, and power supply companies can do some electricity price activities or electricity quantity packages which can improve the power utilization enthusiasm for the users.
There are two possibilities in holidays, one is a family party and one is an outside play or meal. The daily load curve of the first case has well-defined electricity peak areas and valley areas, and the electricity peak is the eating time periods of noon and evening; the daily load curve in the second case is substantially 0 and the user has not substantially used the appliance at home. The power supply company can judge the preference of the user according to the frequency of the user selecting home or out, and can make a more reasonable holiday marketing strategy for the user by combining the holiday time length.
The relation between the electric load and the season is analyzed on the basis of the climate characteristics of different seasons of the region (taking the northern region as an example). Traditional seasonal division (March is one season) is adopted: the spring, summer and autumn are 3-5 months, 6-8 months and 9-11 months of each year, and the 12 months and the 1-2 months of the next year are winter respectively. However, regardless of the season, the load amount gradually decreased after 24:00, reached the lowest value of daily load at 4:00 in the early morning, and then changed from 6: the 00 load ramp rate was gradually increased and reached a maximum at 8:00, a minimum at 13:00, and then the load began to ramp after 18:00 and reached a maximum of the daily load curve at 22: 00. The trend of the daily load curve is generally the same, wherein most of the northern areas in spring and autumn have cool weather and proper climate, and most of the electric appliances for cooling or warming cannot be used, so the load values of the daily load curves in two seasons are generally consistent and are smaller than those in two seasons of summer and winter; the temperature is higher in summer and reaches the highest in noon, so that the electricity load value in the noon stage is increased in the middle of two rises of the daily load curve; in winter, the temperature is low, the day is short and the night is long, the off-duty time of residents is advanced, and therefore the second rise in the daily load curve is earlier than the spring and autumn. If the temperature changes suddenly (suddenly decreases or increases) on a certain day, the peak value of the daily load curve also increases suddenly. However, the starting time of the application electric appliance is different when each user faces high temperature in summer and low temperature in winter, the time points of the peak value of the daily load curve changing from low to high and from high to low are also different, and a power supply company can customize a more detailed power price policy according to the peak value change point of the user.

Claims (9)

1. A user electricity consumption behavior analysis method for counting missing data completion is characterized by comprising the following steps:
acquiring archive data of a user, meter electric quantity data and environment temperature and humidity data;
preprocessing the acquired data to obtain a first data recovery value V of a missing data user at the moment of missing data1And using said first data recovery value V1Performing first data recovery;
classifying and extracting power utilization characteristics of the complete data after the first data recovery is completed, and performing segmentation storage according to different user types;
performing aggregation and reclassification on users in the user type containing the missing data users, and acquiring the aggregation center where the missing data users are located and the numerical value of the aggregation center corresponding to the missing data time as a second data recovery value V2
Extracting a first data recovery value V1And a second data recovery value V corresponding thereto2Calculating an average value as a final data recovery value V of a missing data user at the moment of missing data, and performing secondary data recovery by using the final data recovery value V;
and analyzing the power utilization behavior of the user by using the complete data after the secondary data recovery.
2. The method of analyzing user electricity usage behavior in consideration of missing data padding as claimed in claim 1, wherein: the archive data comprises user classification and electricity utilization classification; the electricity quantity data of the meter comprises real-time voltage, current and historical daily electricity quantity.
3. The method of analyzing user electricity usage behavior in consideration of missing data padding as claimed in claim 1, wherein: preprocessing the acquired data to obtain a first data recovery value V of a missing data user at the moment of missing data1Concrete bagThe method comprises the following steps:
sequencing a plurality of data before and after the missing data and finding a median;
calculating the average value of the median and the data moving forwards and backwards of the missing data as a first data recovery value V1
4. The method of analyzing user electricity usage behavior in consideration of missing data padding as claimed in claim 3, wherein: the number of the missing data moving forwards and backwards is 5-10.
5. The method of analyzing user electricity usage behavior in consideration of missing data padding as claimed in claim 1, wherein: the preprocessing further comprises the step of normalizing the load data.
6. The method of analyzing user electricity usage behavior in consideration of missing data padding as claimed in claim 1, wherein: the user types include industrial users, commercial users, residential customers, and other customers.
7. The method of analyzing user electricity usage behavior in consideration of missing data padding as claimed in claim 1, wherein: the aggregated reclassification of users within a user type containing missing data users comprises the steps of:
randomly selecting K user data from user data in the user type of the user with the data missing as an initial clustering center;
calculating the arithmetic square root of the distance of a sample to the center of the initial cluster
Figure FDA0003346715550000021
Distributing the sample data to the nearest cluster, and classifying the samples;
after all sample data are distributed, recalculating centers Z of K clustersi
The distance of each sample to the respective new cluster center is again calculated,and classifying the samples into the nearest cluster, wherein the distance is the weighted distance
Figure FDA0003346715550000022
The above-mentioned
Figure FDA0003346715550000023
For weight, S represents the weighted standard deviation within the class;
after all the sample data are distributed, calculating the centers of the K clusters again and distributing the samples until the cluster centers are not changed any more.
8. The method of analyzing user electricity usage behavior in consideration of missing data padding as claimed in claim 1, wherein: the analysis of the power utilization behavior of the user comprises the following steps:
respectively calculating the load characteristic indexes of various types of users:
average daily load rate:
Figure FDA0003346715550000031
in the formula, betakiIs the average daily load rate, N, of the ith user in the Kth userskK is the total number of kth users, K is 1,2,3, 4;
average daily minimum load rate:
Figure FDA0003346715550000032
in the formula, gammakiThe average daily minimum load rate of the ith user in the Kth users is obtained;
③ average daily peak-to-valley difference:
Figure FDA0003346715550000033
in the formula, thetakiThe average daily peak-to-valley difference rate of the ith user in the Kth users.
9. The method of analyzing user electricity usage behavior in consideration of missing data padding as claimed in claim 1, wherein: the analyzing of the user power consumption behavior further comprises:
respectively calculating the load curves of various types of users
The load curve calculation formula is:
Figure FDA0003346715550000034
in the formula, Ekg1Load of the Kth user at g1 month, NkK is the total number of users of the kth category, K is 1,2,3,4, g1 is 1,2, …, 12, Ekig1Load of g1 month for ith user among Kth users;
② the continuous load curve calculation formula is
Figure FDA0003346715550000041
In the formula, Ekg2The load of the Kth user in g 2h is g2 ═ 1,2, …, 8760 or g2 ═ 1,2, …, 8784, Ekig2Load for h 2 for ith user among Kth users;
③ the daily load curve calculation formula is
Figure FDA0003346715550000042
In the formula: ekg3The load of the kth user at the time of g3 is g3 ═ 1,2, …, 24; ekig3The load of the ith user at the g3 moment is the Kth user; dkiThe annual maximum load of the ith user is the Kth user.
CN202111324959.7A 2021-11-10 2021-11-10 User electricity consumption behavior analysis method considering missing data completion Pending CN114048200A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111324959.7A CN114048200A (en) 2021-11-10 2021-11-10 User electricity consumption behavior analysis method considering missing data completion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111324959.7A CN114048200A (en) 2021-11-10 2021-11-10 User electricity consumption behavior analysis method considering missing data completion

Publications (1)

Publication Number Publication Date
CN114048200A true CN114048200A (en) 2022-02-15

Family

ID=80207908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111324959.7A Pending CN114048200A (en) 2021-11-10 2021-11-10 User electricity consumption behavior analysis method considering missing data completion

Country Status (1)

Country Link
CN (1) CN114048200A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809237A (en) * 2023-02-07 2023-03-17 河北建投水务投资有限公司 Method and system for supplementing missing data of user water meter

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809237A (en) * 2023-02-07 2023-03-17 河北建投水务投资有限公司 Method and system for supplementing missing data of user water meter

Similar Documents

Publication Publication Date Title
Fu et al. Clustering-based short-term load forecasting for residential electricity under the increasing-block pricing tariffs in China
CN111461761A (en) Resident user portrait method based on multi-dimensional fine-grained behavior data
CN116646933A (en) Big data-based power load scheduling method and system
CN105117810A (en) Residential electricity consumption mid-term load prediction method under multistep electricity price mechanism
CN111967723A (en) User peak regulation potential analysis method based on data mining
CN105184455A (en) High dimension visualized analysis method facing urban electric power data analysis
CN110276393A (en) A kind of compound prediction technique of green building energy consumption
CN109634940A (en) A kind of typical low pressure platform area's electricity consumption model building method based on magnanimity low-voltage platform area electricity consumption data
CN110109971A (en) A kind of low-voltage platform area user power utilization Load Characteristic Analysis method
CN111612228A (en) User electricity consumption behavior analysis method based on electricity consumption information
CN109858728A (en) Load forecasting method based on branch trade Analysis of Electrical Characteristics
CN117272850B (en) Elastic space analysis method for safe operation scheduling of power distribution network
CN105260798A (en) Big data miner for multi-dimensional load characteristic analysis
CN111428745A (en) Clustering analysis-based low-voltage user electricity utilization feature extraction method
CN115907822A (en) Load characteristic index relevance mining method considering region and economic influence
CN114611738A (en) Load prediction method based on user electricity consumption behavior analysis
CN114048200A (en) User electricity consumption behavior analysis method considering missing data completion
CN114676931B (en) Electric quantity prediction system based on data center technology
CN116862137A (en) Charging pile load flexible scheduling method and device based on data fusion
CN116131256A (en) Method and system for calculating openable capacity of power distribution network based on equipment synchronization rate
CN116226293A (en) Method and system for generating and managing power customer portrait
CN109149644B (en) Light-storage integrated online strategy matching and collaborative optimization method based on big data analysis
CN114692672A (en) User type classification method based on electricity utilization characteristics and Mean Shift algorithm
CN114820036A (en) Charging market analysis system oriented to big data and regression analysis prediction algorithm
CN114638284A (en) Power utilization behavior characterization method considering external influence factors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination