CN115618253A - Incremental clustering method for power load data - Google Patents

Incremental clustering method for power load data Download PDF

Info

Publication number
CN115618253A
CN115618253A CN202211265276.3A CN202211265276A CN115618253A CN 115618253 A CN115618253 A CN 115618253A CN 202211265276 A CN202211265276 A CN 202211265276A CN 115618253 A CN115618253 A CN 115618253A
Authority
CN
China
Prior art keywords
clustering
model
dcs
data
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211265276.3A
Other languages
Chinese (zh)
Inventor
张勇
李欣玥
王莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Commerce
Original Assignee
Tianjin University of Commerce
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Commerce filed Critical Tianjin University of Commerce
Priority to CN202211265276.3A priority Critical patent/CN115618253A/en
Publication of CN115618253A publication Critical patent/CN115618253A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention relates to a power load increment clustering method combining a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm, which relates to power load increment clustering, and is used for carrying out time series clustering aiming at the characteristics of high dimension, time variation and heteroscedastic variance of power load data, and preprocessing and analyzing daily load data of a user; then determining the clustering number; determining a user hypothesis for the DCS model; establishing a DCS model, and estimating parameters of the model; then, calculating to obtain relevant parameters of the DCS model, and obtaining a final DCS model; calculating an autocorrelation value of the sequence based on the DCS model; and finally, carrying out AI-FCM clustering based on the DCS model. The method can effectively solve the problems of low clustering efficiency, long model training time, incapability of effectively extracting user mode hidden information and the like caused by multimode high-dimensional and different variance characteristics of the power load data.

Description

Incremental clustering method for power load data
Technical Field
The invention relates to a power load data incremental clustering method, in particular to a power load incremental clustering method combining a DCS statistical model and an autocorrelation incremental fuzzy C-means clustering algorithm.
Background
The iterative updating speed of the power load data is continuously accelerated, the data quantity and the data dimensionality are continuously increased, the power utilization user mode is also diversified and complicated, and the method has important practical significance for developing the user power load data clustering research. Ferraro et al, in the "comprehensive and clustering analysis of the daily load of electricity in the world centers" paper, performed feature extraction on the daily load of electricity in multiple countries and performed implementation of multiple clustering methods. CN111915116A discloses a power resident user classification method based on K-means clustering, and the load characteristics of each type of power resident users are analyzed based on clustering results, so that the problems that the accuracy rate of power user clustering performed by using a traditional clustering algorithm is low, and a local optimal solution is easy to obtain exist. Most of the power load data are time sequence data, and meanwhile, the method has the characteristics of time-varying property, high-dimensional property, heteroscedasticity and the like, the traditional clustering method is difficult to apply, when the clustering analysis of the power load data is carried out, the combination of a dimensionality reduction method such as principal component analysis, feature extraction and the like and a clustering algorithm can carry out the clustering analysis on the high-dimensional power load data, hu et al extracts a series of global peak and peak period features from daily loads of photovoltaic and non-photovoltaic families in a Classification and characterization of intra-day load using interactive feature and feature-based clustering' paper, and the clustering method is realized based on the features. CN110795610B discloses a power load analysis method based on clustering, which comprises a data preprocessing module, a feature extraction module and a clustering module, and has the defect that the feature extraction is insufficient and can only reflect the power consumption behavior features of partial users. CN112270338A discloses a method for clustering power load curves, which processes loads through a t-SNE dimension reduction technology, and performs cluster analysis on the loads by combining a GSA elbow criterion and a binary K-means algorithm, so that the problems that the dimension reduction method has high computational complexity and a local optimal solution may be obtained exist, and it is difficult to accurately extract data features and perform clustering. The problems with this approach are: the power load time sequence data has the characteristics of time variability, heteroscedasticity and the like, the traditional dimension reduction methods such as feature extraction and the like are difficult to effectively extract the characteristic information of the power load time sequence data for clustering analysis, certain limitation is realized in the time sequence clustering analysis, and the clustering accuracy needs to be improved.
Because the traditional clustering method mainly carries out clustering analysis on static time sequence electric load data and the time sequence data has the condition of dynamic change, the statistical model and the incremental clustering algorithm are combined and applied to the electric load clustering research to carry out effective clustering. Otranto in the "Clustering heterologous time series by model-based procedures" paper, sequentially completes Clustering according to unconditional fluctuation rate, time-varying fluctuation rate parameters of time series and corresponding parameters of a Generalized auto-regressive Conditional diversity (GARCH) model. CN112215490A discloses a power load cluster analysis method based on correlation coefficient improved K-means, which adopts wavelet transform and principal component analysis to process a power load data machine, and utilizes Pearson correlation coefficient to perform cluster analysis, and does not consider real-time generated data to perform hidden information mining. CN114611591A discloses a power load clustering method based on discrete wavelet transform, which extracts power utilization trend characteristics and daily load curve characteristics of users to form user power utilization characteristics and uses the user power utilization characteristics as a user clustering basis to perform K-means clustering on the power utilization users, and has the problem that new implicit information generated by continuously increasing data volume is difficult to dig out.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method comprises the steps of establishing a Dynamic Condition Scoring (DCS) statistical model based on Gaussian distribution data observation driving according to heteroscedastic characteristics of user electrical load time sequence data, obtaining an autocorrelation value data set by utilizing condition moment estimation calculation of statistical model parameters, and establishing an autocorrelation increment fuzzy C-means clustering algorithm (AI-FCM) to realize clustering analysis by combining different electrical load data flow time sequence characteristics, wherein the Dynamic Condition Scoring (DCS) statistical model is combined with a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm, and the problems that clustering efficiency is low, model training time is long, user mode hidden information cannot be effectively extracted and the like caused by multimode high-dimension and heteroscedastic characteristics of electrical load data can be effectively solved.
The technical scheme adopted by the invention for solving the technical problem is as follows: the method for power load increment clustering by combining a DCS statistical model with an autocorrelation increment fuzzy C-means clustering algorithm comprises the following specific steps of establishing a Dynamic Condition Scoring (DCS) statistical model based on Gaussian distribution data observation driving according to the heteroscedastic characteristic of user power load time sequence data, obtaining an autocorrelation value data set by utilizing condition moment estimation calculation of statistical model parameters, and constructing the autocorrelation increment fuzzy C-means clustering algorithm (AI-FCM) by combining different power load data flow time sequence characteristics:
step one, user daily load data preprocessing and analyzing:
acquiring a daily load data set of the maximum load days of the same year and month of K users, searching and filling missing values and detecting and correcting abnormal values of the data set, obtaining the preprocessed data set as data used for establishing a power load time sequence model and performing cluster analysis, drawing a user daily load curve for simple analysis of fluctuation conditions, and primarily classifying the curve.
Secondly, determining the clustering number:
the range of the number of clusters is defined as
Figure BSA0000286639390000021
(rounding up), calculating the clustering effectiveness index V corresponding to the clustering number in the range according to the formula (1) xb The optimal number of clusters C under this data set is determined.
Figure BSA0000286639390000022
A. Establishing and analyzing a DCS model of the power load time sequence data:
thirdly, determining a user hypothesis of the DCS model:
randomly selecting users in various categories of the primary classification to draw a histogram and a QQ (Quantum-Quantum Plot), determining the basic assumption of the users according to the histogram and the QQ, wherein the QQ is essentially a scatter Plot, if the scatter point of daily load time sequence data is near a red line, the observed vector approximately obeys Gaussian distribution, and the Gaussian distribution (2) is used as user data y t The basic assumption of (c) is:
Figure BSA0000286639390000023
wherein mu t Is the mean value of the time-varying parameters,
Figure BSA0000286639390000024
is a time-varying parameter variance;
fourthly, establishing a DCS model, and performing parameter estimation of the model:
building a DCS (p, q) model, and enabling p = q =1, wherein user data
Figure BSA0000286639390000025
A DCS (1, 1) model based on the assumption of Gaussian distribution, i.e. f, can be obtained t The update equation for the tth implementation is shown in equation (3):
f t =ω+As t-1 +Bf t-1 (3)
wherein the time-varying parameter vector
Figure BSA0000286639390000026
Constant vector
Figure BSA0000286639390000027
Real matrix
Figure BSA0000286639390000028
Wherein the scalar parameter comprises ω μ
Figure BSA0000286639390000029
a μ The function of the static parameter vector theta is obtained, and the parameter estimation is carried out on omega, A and B by using a maximum likelihood estimation method;
fifthly, calculating relevant parameters of the DCS model to obtain a final DCS model:
DCS model f t Is the driving vector s from the moment t-1 t-1 In time variable parameter vector f t-1 Is calculated to obtain, wherein the vector is driven
Figure BSA0000286639390000031
Step 5.1, calculating a conditional score vector
Figure BSA0000286639390000032
The calculation formula is shown in formula (4)) And (5) is as follows:
Figure BSA0000286639390000033
Figure BSA0000286639390000034
step 5.2, calculating a scaling matrix S t The calculation formula is shown in formulas (6) and (7):
Figure BSA0000286639390000035
Figure BSA0000286639390000036
and 5.3, obtaining a final time-varying parameter updating equation as shown in formulas (8) and (9):
Figure BSA0000286639390000037
Figure BSA0000286639390000038
and substituting the parameter estimation value and the corresponding parameter into an update equation of the time-varying parameter to obtain a condition moment estimation value of a time-varying mean value and a time-varying variance as an input data set of the clustering algorithm.
And finishing the establishment of the DCS model of the power load data.
B. The power load time sequence data is subjected to AI-FCM cluster analysis based on a DCS model:
sixthly, calculating the autocorrelation value of the sequence based on the DCS model:
in DCS models, R-level clustering is performed according to the R (R =1, 2.., R) th conditional moment estimate of each model. The DCS (1, 1) model means R =2, so R =1And the conditional moment estimation of the time series of r =2 respectively mean the parameters of the DCS (1, 1) model
Figure BSA0000286639390000039
And
Figure BSA00002866393900000310
is estimated. The r conditional moment estimate is used to obtain an r estimated autocorrelation value
Figure BSA00002866393900000311
Will become mean value
Figure BSA00002866393900000312
Sum time-varying variance
Figure BSA00002866393900000313
The conditional moment estimates are calculated as datasets for calculating autocorrelation values, i.e. time series y is calculated t The estimated autocorrelation value of the r-th conditional moment at the lag l is calculated as shown in (10):
Figure BSA00002866393900000314
wherein
Figure BSA00002866393900000315
Is the mean value of the r-order conditional moments of the kth time series from the time t to the time t-l, and the distance between the r-order conditional moments of the time series k and k' based on the autocorrelation is:
Figure BSA00002866393900000316
seventhly, carrying out AI-FCM clustering based on the DCS model:
obtaining the estimated autocorrelation value from the r-th
Figure BSA00002866393900000317
And similarityMeasuring distance
Figure BSA00002866393900000318
The formed sequence data sets are respectively used as the input of an autocorrelation incremental fuzzy C-means clustering algorithm to obtain clustering results, and the specific operations are as follows:
in step 7.1, dividing K data points randomly into P (P =1, 2.. Multidot., P) data blocks, each data block having K/P (rounding up) data points, giving each data point an initial weight of 1, defining a fuzzy factor m =2, and an iteration end threshold e =0.0001, and performing a-wFCM clustering on the first data block P =1, wherein the clustering specifically comprises the following operations:
and 7.1.1, randomly initializing a membership matrix according to the calculated clustering number C to ensure that the membership matrix meets the following constraint conditions:
Figure BSA0000286639390000041
and 7.1.2, calculating a clustering center according to the obtained membership matrix, wherein the calculation formula is as follows:
Figure BSA0000286639390000042
7.1.3, calculating the value of the objective function J according to the membership matrix and the clustering center, wherein the calculation formula is shown as a formula (14), if the difference between the J values of the previous time and the next time is less than a specified threshold epsilon, ending iteration, and outputting the membership matrix and the clustering center, otherwise, carrying out the next step;
Figure BSA0000286639390000043
and 7.1.4, recalculating according to the clustering center to obtain a membership matrix and returning to the step 7.1.2, wherein the calculation formula of the membership matrix is as follows:
Figure BSA0000286639390000044
and 7.2, assigning the weight of the cluster center of the data block obtained by calculation again according to the formula (16), and adding the cluster center of the data block and the weight obtained by calculation into the next data block for A-wFCM clustering (steps 7.1.1 to 7.1.4):
Figure BSA0000286639390000045
wherein, N is the number of data points in the current data block, and j is the number of clustering centers in the previous data block.
And 7.3, repeating the step 7.2 until all the data blocks are clustered, and finally recalculating the membership of all the data points according to the final clustering center result and the formula (15).
And finishing the self-correlation increment fuzzy C-means clustering of the power load data based on the DCS model.
The user data time series y in the third step t The value of the middle t depends on the resolution of the measured power load data, namely the number of the load measurement points in the maximum load day, and the time-varying mean value and the time-varying variance are determined according to each time point t and the original time sequence y t The probability distribution hypothesis of the DCS model of all users is determined by drawing a histogram and a QQ chart of a typical user in the preliminary classification, the Gaussian distribution is used as the probability distribution hypothesis of the subsequent calculation, random variables under various situations can be described, and the probability distribution hypothesis has high applicability and universality.
The above-mentioned DCS (p, q) model in the fourth step enables p = q =1 to already describe most time-series data in a situation where the complexity of the model is low, and determines a time-varying parameter vector f according to the assumed parameters of the user t And determining the vector expression of the subsequent parameter to be estimated.
Calculating the time series y in the sixth step t When the r-th conditional moment is used for estimating the autocorrelation value at the lag of l, the lag order l is obtained by analyzing the characteristics displayed by the time sequence diagram and the autocorrelation diagram of the original time sequence.
The invention has the beneficial effects that: compared with the prior art, the invention has the prominent substantive characteristics and remarkable progress as follows:
(1) Compared with CN111915116A, the method of the invention has the advantages of clustering the high-dimensional characteristics of the time sequence data and improving the clustering efficiency and accuracy of the time sequence.
(2) Compared with CN110795610B, the method of the present invention has the advantages of capturing the characteristic details of time sequence data, describing user data accurately and performing clustering analysis flexibly.
(3) Compared with CN112270338A, the method of the invention has the advantages that a plurality of characteristics of time series data can be considered for cluster analysis, and the cluster result can be flexibly explained.
(4) Compared with CN112215490A, the method of the invention has the advantage that the time-varying characteristics of the time sequence data can be fully extracted for the establishment and analysis of the statistical model.
(5) Compared with CN114611591A, the method of the invention has the advantage that newly added users can be added into the original data for cluster analysis aiming at the dynamically updated user data.
(6) The method comprises the steps of firstly establishing a Dynamic Condition Scoring (DCS) statistical model based on Gaussian distribution data observation driving according to characteristics of time-varying property, high dimension, variance and the like of daily load time sequence data of a user, then utilizing condition moment estimation and calculation of statistical model parameters to obtain an autocorrelation value data set in order to describe a dependency structure of the time sequence data and reduce the calculation cost of a clustering process under the high dimension time sequence data, and finally establishing an autocorrelation increment fuzzy C mean value clustering algorithm (AI-FCM) by combining time sequence characteristics of different power load data flows, and inputting the autocorrelation value data set as a clustering algorithm to realize clustering analysis. Therefore, the method is a power load increment clustering method combining a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm, and can effectively solve the problems of low clustering efficiency, long model training time, incapability of effectively extracting user mode hidden information and the like caused by multimode high-dimensional and different variance characteristics of power load data.
(7) The invention constructs a class of observation-driven DCS model, updates parameters along with time by using scale scores of a likelihood function, provides a uniform and consistent framework for introducing time-varying parameters into a wide nonlinear model by using data characteristics of a time sequence, considers the DCS model as the basis of time sequence clustering analysis, and can effectively analyze and establish the data and the model of the time sequence data before clustering.
(8) According to the invention, the time sequence characteristics of the user electrical load data are considered, an incremental clustering algorithm based on a DCS statistical model is constructed for solving the problem that the typical load mode of the user is accurate, efficient and real-time updated, and the efficiency and accuracy of time sequence data clustering are improved.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic block diagram of a flow of a power load increment clustering method combining a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm.
Fig. 2 is a graph of the maximum load daily load in an embodiment of the present invention.
Fig. 3 is a histogram and QQ plot for a typical user in an embodiment of the present invention.
FIG. 4 is a graph of the results of DCS-AI-FCM clustering of the user electrical load data sets under the same conditional mean in an embodiment of the present invention.
FIG. 5 is a graph of the results of DCS-AI-FCM clustering of user electrical load data sets under the same conditional variance in an embodiment of the present invention.
FIG. 6 is a comparison chart of the intra-cluster evaluation indexes of the user electrical load data set DCS-AI-FCM clustering and other clustering methods in the embodiment of the invention.
Detailed Description
The embodiment shown in fig. 1 shows that the flow of the power load increment clustering method combining the DCS statistical model and the autocorrelation increment fuzzy C-means clustering algorithm of the present invention is as follows:
1. preprocessing and analyzing daily load data of a user → 2, determining the number of clusters → 3, determining user hypothesis of a DCS model → 4, establishing the DCS model, performing parameter estimation of the model → 5, calculating to obtain relevant parameters of the DCS model, obtaining a final DCS model → 6, calculating an autocorrelation value of a sequence based on the DCS model → 7, and performing AI-FCM clustering based on the DCS model.
Example 1
The method for clustering the increment of the power load by combining the DCS statistical model and the autocorrelation increment fuzzy C-means clustering algorithm comprises the following specific steps:
step one, user daily load data preprocessing and analyzing:
acquiring a daily load data set of 3 months maximum load days in 2020 of K =11 power utilization companies in China, searching and filling missing values and detecting and correcting abnormal values of the data set, acquiring a preprocessed data set as data used for power load time series model establishment and cluster analysis, drawing a user daily load curve for simple analysis of fluctuation conditions, and primarily classifying the curve.
Secondly, determining the clustering number:
the range of the number of clusters is defined as
Figure BSA0000286639390000061
(rounding up), calculating the clustering effectiveness index V corresponding to the clustering number in the range according to the formula (1) xb The optimal number of clusters C under this data set is determined.
Figure BSA0000286639390000062
C. Establishing and analyzing a DCS model of the power load time series data:
thirdly, determining user hypothesis of the DCS model:
randomly selecting users in each category of the primary classification to draw a histogram and a QQ (Quantum-Quantum Plot), and determining the basic hypothesis of the users according to the histogram and the QQ Plot, wherein the QQ isThe graph is essentially a scatter plot, and if the scatter of the daily load time-series data is near the red line, it indicates that the observed vector is approximately gaussian, and gaussian (2) is used as the user data y t The basic assumption of (c) is:
Figure BSA0000286639390000063
wherein mu t As a mean value of the time-varying parameters,
Figure BSA0000286639390000064
is a time-varying parameter variance;
fourthly, establishing a DCS model, and performing parameter estimation of the model:
building a DCS (p, q) model, and enabling p = q =1, wherein user data
Figure BSA0000286639390000065
A DCS (1, 1) model based on the assumption of Gaussian distribution, i.e. f, can be obtained t The updating equation for the t-th implementation is shown in equation (3):
f t =ω+As t-1 +Bf t-1 (3)
wherein the time-varying parameter vector
Figure BSA0000286639390000066
Constant vector
Figure BSA0000286639390000067
Real matrix
Figure BSA0000286639390000068
Wherein the scalar parameter comprises ω μ
Figure BSA00002866393900000610
a μ The function of the static parameter vector theta is obtained, and the parameter estimation is carried out on omega, A and B by using a maximum likelihood estimation method;
fifthly, calculating relevant parameters of the DCS model to obtain a final DCS model:
DCS model f t Is the drive vector s from time t-1 t-1 In time variable parameter vector f t-1 Is calculated to obtain, wherein the vector is driven
Figure BSA0000286639390000069
Step 5.1, calculating a conditional score vector
Figure BSA0000286639390000071
The calculation formula is shown in formulas (4) and (5):
Figure BSA0000286639390000072
Figure BSA0000286639390000073
step 5.2, calculating a scaling matrix S t The calculation formula is shown in formulas (6) and (7):
Figure BSA0000286639390000074
Figure BSA0000286639390000075
and 5.3, obtaining a final time-varying parameter updating equation as shown in formulas (8) and (9):
Figure BSA0000286639390000076
Figure BSA0000286639390000077
and substituting the parameter estimation value and the corresponding parameter into an update equation of the time-varying parameter to obtain a condition moment estimation value of a time-varying mean value and a time-varying variance as an input data set of the clustering algorithm.
And finishing the establishment of the DCS model of the power load data.
D. And (3) carrying out AI-FCM cluster analysis on the power load time sequence data based on a DCS model:
sixthly, calculating the autocorrelation value of the sequence based on the DCS model:
in DCS models, R-level clustering is performed according to the R (R =1, 2.., R) th conditional moment estimate of each model. The DCS (1, 1) model means R =2, and therefore the time-series conditional moment estimates of R =1 and R =2 respectively mean the parameters of the DCS (1, 1) model
Figure BSA0000286639390000078
And
Figure BSA0000286639390000079
is estimated. The r conditional moment estimate is used to obtain an r estimated autocorrelation value
Figure BSA00002866393900000710
Will become mean value
Figure BSA00002866393900000711
Sum time-varying variance
Figure BSA00002866393900000712
Is calculated as a data set for calculating autocorrelation values, i.e. time series y is calculated t The estimated autocorrelation value of the r-th conditional moment at lag l is calculated as shown in (10):
Figure BSA00002866393900000713
wherein
Figure BSA00002866393900000714
Is the mean value of the r-order conditional moments of the kth time series from the time t to the time t-l, the time seriesThe distance between the r-order conditional moments of columns k and k' based on the autocorrelation is:
Figure BSA00002866393900000715
seventhly, carrying out AI-FCM clustering based on the DCS model:
obtaining the estimated autocorrelation value from the r-th
Figure BSA00002866393900000716
And similarity measure distance
Figure BSA00002866393900000717
And the formed sequence data sets are respectively used as the input of an autocorrelation increment fuzzy C-means clustering algorithm to obtain clustering results, and the specific operation is as follows:
at step 7.1, randomly dividing the K data points into P (P =1, 2.. Multidot., P) data blocks, wherein each data block has K/P (rounding up) data points, giving an initial weight of each data point to be l, defining a fuzzy factor m =2, and an iteration end threshold value epsilon =0.0001, and performing a-wFCM clustering on the first data block P =1, wherein the clustering specifically operates as follows:
and 7.1.1, randomly initializing a membership matrix according to the calculated clustering number C to ensure that the membership matrix meets the following constraint conditions:
Figure BSA0000286639390000081
and 7.1.2, calculating a clustering center according to the obtained membership matrix, wherein the calculation formula is as follows:
Figure BSA0000286639390000082
7.1.3, calculating the value of the objective function J according to the membership matrix and the clustering center, wherein the calculation formula is shown as a formula (14), if the difference between the J values of the previous time and the next time is less than a specified threshold epsilon, ending iteration, and outputting the membership matrix and the clustering center, otherwise, carrying out the next step;
Figure BSA0000286639390000083
and 7.1.4, recalculating according to the clustering center to obtain a membership matrix and returning to the step 7.1.2, wherein the calculation formula of the membership matrix is as follows:
Figure BSA0000286639390000084
step 7.2, the weight of the data block clustering center obtained by calculation is given again according to the formula (16), the data block clustering center and the weight obtained by calculation are added into the next data block for A-wFCM clustering (steps 7.1.1 to 7.1.4),
Figure BSA0000286639390000085
wherein, N is the number of data points in the current data block, and j is the number of clustering centers in the previous data block.
And 7.3, repeating the step 7.2 until all the data blocks are clustered, and finally recalculating the membership of all the data points according to the final clustering center result and the formula (15).
And finishing the self-correlation increment fuzzy C-means clustering of the power load data based on the DCS model.
Fig. 2 shows a load curve chart of 11 maximum load days of the household electric power companies, wherein the companies are simply classified according to fluctuation conditions of the companies in the graph, the companies with obvious wave peaks are classified into one class, and the companies with relatively smooth fluctuation are classified into one class.
Fig. 3 shows that 4 companies are randomly selected, histograms and QQ charts are drawn on the maximum daily load data, and the images show that the dispersion point of the maximum daily load time series data is located near the red line, which indicates that the observed vector approximately obeys gaussian distribution, and further, the gaussian distribution is taken as the user hypothesis.
Fig. 4 and 5 show clustering results obtained by clustering company data sets using the DCS-AI-FCM clustering method, in which case companies clustered in the same category according to two time-varying parameters have the same conditional distribution when R-level clustering is started, i.e., when time series are grouped with the same conditional distribution. And when the clustering according to the conditional variance is considered, the clustering distribution of some companies has larger uncertainty, all companies clustered according to the conditional mean belong to the current clustering, the probability is over 95 percent, but when the clustering according to the conditional variance is considered, the probability is over 70 percent.
FIG. 6 shows comparison of the results of A-FCM clustering, DCS-A-FCM clustering and DCS-AI-FCM clustering performed on the maximum load daily datse:Sup>A of 11 electric utilities, and effectiveness evaluation is performed by respectively adopting SC, CHI and DBI internal evaluation indexes and drawing se:Sup>A histogram, wherein the indexes in the histogram show that the clustering accuracy can be effectively improved by adding se:Sup>A DCS statistical model and incremental clustering thinking.

Claims (3)

  1. The method for clustering the increment of the power load by combining a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm is characterized by comprising the following steps of: according to the heteroscedastic characteristics of user electricity load time sequence data, a dependency structure of the time sequence data is described, the calculation cost of a clustering process under high-dimensional time sequence data is reduced, and the specific steps of constructing a self-correlation increment fuzzy C-means clustering algorithm based on a DCS statistical model are as follows by combining the time sequence characteristics of different electricity load data flows:
    firstly, preprocessing and analyzing daily load data of a user:
    acquiring daily load data sets of the maximum load days of the same year and month of K users, searching and filling missing values and detecting and correcting abnormal values of the data sets, obtaining preprocessed data sets serving as data used for power load time series model establishment and cluster analysis, drawing a daily load curve of the users for simple analysis of fluctuation conditions, and primarily classifying the curve;
    secondly, determining the clustering number:
    the range of the number of clusters is defined as
    Figure FSA0000286639380000011
    (rounding up), calculating the clustering effectiveness index V corresponding to the clustering number in the range according to the formula (1) xb Determining the optimal clustering number C under the data set;
    Figure FSA0000286639380000012
    A. establishing and analyzing a DCS model of the power load time sequence data:
    thirdly, determining a user hypothesis of the DCS model:
    randomly selecting users in each category of the preliminary classification to draw a histogram and a QQ (Quantum-Quantum Plot), determining the basic assumption of the users according to the histogram and the QQ, wherein the QQ is essentially a scatter diagram, if the scatter point of daily load time series data is near a red line, the observed vector approximately obeys Gaussian distribution, and the Gaussian distribution (2) is used as user data y t The basic assumption of (2):
    Figure FSA0000286639380000013
    wherein mu t Is the mean value of the time-varying parameters,
    Figure FSA0000286639380000014
    is a time-varying parameter variance;
    fourthly, establishing a DCS model, and performing parameter estimation of the model:
    establishing a DCS (p, q) model, and enabling p = q =1, wherein user data
    Figure FSA0000286639380000015
    A DCS (1, 1) model based on the assumption of Gaussian distribution, i.e. f, can be obtained t The updating equation for the t-th implementation is shown in equation (3):
    f t =ω+As t-1 +Bf t-1 (3)
    wherein the time-varying parameter vector
    Figure FSA0000286639380000016
    Constant vector
    Figure FSA0000286639380000017
    Real matrix
    Figure FSA0000286639380000018
    Wherein the scalar parameter comprises ω μ
    Figure FSA0000286639380000019
    a μ The function of the static parameter vector theta is obtained, and the parameter estimation is carried out on omega, A and B by using a maximum likelihood estimation method;
    fifthly, calculating relevant parameters of the DCS model to obtain a final DCS model:
    DCS model f t Is the drive vector s from time t-1 t-1 In time variable parameter vector f t-1 Is calculated to obtain, wherein the vector is driven
    Figure FSA00002866393800000110
    Figure FSA00002866393800000111
    Step 5.1, calculating a conditional score vector
    Figure FSA00002866393800000112
    The calculation formula is shown in formulas (4) and (5):
    Figure FSA00002866393800000113
    Figure FSA0000286639380000021
    step 5.2, calculating a scaling matrix S t The calculation formula is shown in formulas (6) and (7):
    Figure FSA0000286639380000022
    Figure FSA0000286639380000023
    and 5.3, obtaining a final time-varying parameter updating equation as shown in formulas (8) and (9):
    Figure FSA0000286639380000024
    Figure FSA0000286639380000025
    substituting the parameter estimation value and the corresponding parameter into an update equation of a time-varying parameter to obtain a condition moment estimation value of a time-varying mean value and a time-varying variance as an input data set of a clustering algorithm;
    and finishing the establishment of the DCS model of the power load data.
    B. And (3) carrying out AI-FCM cluster analysis on the power load time sequence data based on a DCS model:
    sixthly, calculating the autocorrelation value of the sequence based on the DCS model:
    in DCS models, R-level clustering is performed according to the R (R =1, 2.., R) th conditional moment estimate of each model. The DCS (1, 1) model means R =2, and therefore the time-series conditional moment estimates of R =1 and R =2 respectively mean the parameters of the DCS (1, 1) model
    Figure FSA0000286639380000026
    And
    Figure FSA0000286639380000027
    is estimated. The r conditional moment estimate is used to obtain an r estimated autocorrelation value
    Figure FSA0000286639380000028
    Will become mean value
    Figure FSA0000286639380000029
    Sum time-varying variance
    Figure FSA00002866393800000210
    The conditional moment estimates are calculated as datasets for calculating autocorrelation values, i.e. time series y is calculated t The estimated autocorrelation value of the r-th conditional moment at lag l is calculated as shown in (10):
    Figure FSA00002866393800000211
    wherein
    Figure FSA00002866393800000212
    Is the mean value of the r-order conditional moments of the kth time series from the time t to the time t-l, and the distance between the r-order conditional moments of the time series k and k' based on the autocorrelation is:
    Figure FSA00002866393800000213
    seventhly, carrying out AI-FCM clustering based on the DCS model:
    obtaining the estimated autocorrelation value from the r-th
    Figure FSA00002866393800000214
    And similarity measure distance
    Figure FSA00002866393800000215
    Sequence of compositionsColumn data sets are respectively used as input of an autocorrelation increment fuzzy C-means clustering algorithm to obtain clustering results, and the method specifically comprises the following operations:
    in step 7.1, dividing K data points randomly into P (P =1, 2.. Multidot., P) data blocks, each data block having K/P (rounding up) data points, giving each data point an initial weight of 1, defining a fuzzy factor m =2, and an iteration end threshold e =0.0001, and performing a-wFCM clustering on the first data block P =1, wherein the clustering specifically comprises the following operations:
    and 7.1.1, randomly initializing a membership matrix according to the calculated clustering number C so as to meet the following constraint conditions:
    Figure FSA00002866393800000216
    and 7.1.2, calculating a clustering center according to the obtained membership matrix, wherein the calculation formula is as follows:
    Figure FSA0000286639380000031
    7.1.3, calculating the value of the objective function J according to the membership matrix and the clustering center, wherein the calculation formula is shown as a formula (14), if the difference between the J values of the previous time and the next time is less than a specified threshold epsilon, ending iteration, and outputting the membership matrix and the clustering center, otherwise, carrying out the next step;
    Figure FSA0000286639380000032
    and 7.1.4, recalculating according to the clustering center to obtain a membership matrix and returning to the step 7.1.2, wherein the calculation formula of the membership matrix is as follows:
    Figure FSA0000286639380000033
    step 7.2, the weight of the data block clustering center obtained by calculation is given again according to the formula (16), the data block clustering center and the weight obtained by calculation are added into the next data block for A-wFCM clustering (steps 7.1.1 to 7.1.4),
    Figure FSA0000286639380000034
    wherein, N is the number of data points in the current data block, and j is the number of clustering centers in the previous data block;
    7.3, repeating the step 7.2 until all the data blocks are clustered, and finally recalculating the membership of all the data points according to the final clustering center result and the formula (15);
    and finishing the self-correlation increment fuzzy C-means clustering of the power load data based on the DCS model.
  2. 2. The method for clustering incremental power load by combining the DCS statistical model with the autocorrelation incremental fuzzy C-means clustering algorithm according to claim 1, wherein the method comprises the following steps: said third step user data time series y t The value of (m) t depends on the resolution of the measured power load data, the number of load measurement points for the maximum load day, the time-varying mean and the time-varying variance according to each time point t and the original time series y t The probability distribution hypothesis of the DCS model of all users is determined by drawing histograms and QQ graphs of typical users in the preliminary classification, gaussian distribution is used as the probability distribution hypothesis of subsequent calculation, random variables under various situations can be described, and the probability distribution hypothesis has high applicability and universality.
  3. 3. The method of claim 1, wherein the DCS statistical model is combined with an autocorrelation increment fuzzy C-means clustering algorithm to perform incremental clustering of power loads, and the method comprises: the fourth step is that the DCS (p, q) model enables p = q =1 to describe most time sequence data under the condition of low model complexity, and a time-varying parameter vector f is determined according to the assumed parameters of a user t In a vector, whileAnd determining a vector expression of the subsequent parameter to be estimated.
CN202211265276.3A 2022-10-17 2022-10-17 Incremental clustering method for power load data Pending CN115618253A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211265276.3A CN115618253A (en) 2022-10-17 2022-10-17 Incremental clustering method for power load data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211265276.3A CN115618253A (en) 2022-10-17 2022-10-17 Incremental clustering method for power load data

Publications (1)

Publication Number Publication Date
CN115618253A true CN115618253A (en) 2023-01-17

Family

ID=84862327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211265276.3A Pending CN115618253A (en) 2022-10-17 2022-10-17 Incremental clustering method for power load data

Country Status (1)

Country Link
CN (1) CN115618253A (en)

Similar Documents

Publication Publication Date Title
CN112308158B (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN108596362B (en) Power load curve form clustering method based on adaptive piecewise aggregation approximation
CN109389180A (en) A power equipment image-recognizing method and inspection robot based on deep learning
CN109711609B (en) Photovoltaic power station output power prediction method based on wavelet transformation and extreme learning machine
CN109190890A (en) A kind of user behavior analysis method based on custom power consumption data
CN110147760B (en) Novel efficient electric energy quality disturbance image feature extraction and identification method
CN116451097A (en) Power load data weighted incremental clustering method capable of adaptively determining clustering number
CN110826618A (en) Personal credit risk assessment method based on random forest
CN114067915A (en) scRNA-seq data dimension reduction method based on deep antithetical variational self-encoder
CN116821832A (en) Abnormal data identification and correction method for high-voltage industrial and commercial user power load
CN112529638A (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN111090679B (en) Time sequence data representation learning method based on time sequence influence and graph embedding
CN103605493A (en) Parallel sorting learning method and system based on graphics processing unit
CN111127407B (en) Fourier transform-based style migration forged image detection device and method
CN115618253A (en) Incremental clustering method for power load data
CN106897553B (en) Home intelligent power method for establishing model based on single-sensor
CN114299330A (en) Seismic facies classification method
CN113361635A (en) Small sample class incremental learning method based on representation prediction
CN112256766A (en) Power consumption behavior analysis method for energy collection terminal
Zheng et al. Mc-fgsm: Black-box adversarial attack for deep learning system
Zhang et al. Dynamic conditional score model-based weighted incremental fuzzy clustering of consumer power load data
CN117390508B (en) Hydroelectric generating set signal state identification method based on time-shifting multi-scale cosine similarity entropy
CN112231933B (en) Feature selection method for radar electromagnetic interference effect analysis
CN113744081B (en) Analysis method for electricity stealing behavior
Zhao et al. Image Preprocessing Algorithm Based on K-Means

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20230117