CN115618253A

CN115618253A - Incremental clustering method for power load data

Info

Publication number: CN115618253A
Application number: CN202211265276.3A
Authority: CN
Inventors: 张勇; 李欣玥; 王莉
Original assignee: Tianjin University of Commerce
Current assignee: Tianjin University of Commerce
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2023-01-17

Abstract

The invention relates to a power load increment clustering method combining a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm, which relates to power load increment clustering, and is used for carrying out time series clustering aiming at the characteristics of high dimension, time variation and heteroscedastic variance of power load data, and preprocessing and analyzing daily load data of a user; then determining the clustering number; determining a user hypothesis for the DCS model; establishing a DCS model, and estimating parameters of the model; then, calculating to obtain relevant parameters of the DCS model, and obtaining a final DCS model; calculating an autocorrelation value of the sequence based on the DCS model; and finally, carrying out AI-FCM clustering based on the DCS model. The method can effectively solve the problems of low clustering efficiency, long model training time, incapability of effectively extracting user mode hidden information and the like caused by multimode high-dimensional and different variance characteristics of the power load data.

Description

Incremental clustering method for power load data

Technical Field

The invention relates to a power load data incremental clustering method, in particular to a power load incremental clustering method combining a DCS statistical model and an autocorrelation incremental fuzzy C-means clustering algorithm.

Background

The iterative updating speed of the power load data is continuously accelerated, the data quantity and the data dimensionality are continuously increased, the power utilization user mode is also diversified and complicated, and the method has important practical significance for developing the user power load data clustering research. Ferraro et al, in the "comprehensive and clustering analysis of the daily load of electricity in the world centers" paper, performed feature extraction on the daily load of electricity in multiple countries and performed implementation of multiple clustering methods. CN111915116A discloses a power resident user classification method based on K-means clustering, and the load characteristics of each type of power resident users are analyzed based on clustering results, so that the problems that the accuracy rate of power user clustering performed by using a traditional clustering algorithm is low, and a local optimal solution is easy to obtain exist. Most of the power load data are time sequence data, and meanwhile, the method has the characteristics of time-varying property, high-dimensional property, heteroscedasticity and the like, the traditional clustering method is difficult to apply, when the clustering analysis of the power load data is carried out, the combination of a dimensionality reduction method such as principal component analysis, feature extraction and the like and a clustering algorithm can carry out the clustering analysis on the high-dimensional power load data, hu et al extracts a series of global peak and peak period features from daily loads of photovoltaic and non-photovoltaic families in a Classification and characterization of intra-day load using interactive feature and feature-based clustering' paper, and the clustering method is realized based on the features. CN110795610B discloses a power load analysis method based on clustering, which comprises a data preprocessing module, a feature extraction module and a clustering module, and has the defect that the feature extraction is insufficient and can only reflect the power consumption behavior features of partial users. CN112270338A discloses a method for clustering power load curves, which processes loads through a t-SNE dimension reduction technology, and performs cluster analysis on the loads by combining a GSA elbow criterion and a binary K-means algorithm, so that the problems that the dimension reduction method has high computational complexity and a local optimal solution may be obtained exist, and it is difficult to accurately extract data features and perform clustering. The problems with this approach are: the power load time sequence data has the characteristics of time variability, heteroscedasticity and the like, the traditional dimension reduction methods such as feature extraction and the like are difficult to effectively extract the characteristic information of the power load time sequence data for clustering analysis, certain limitation is realized in the time sequence clustering analysis, and the clustering accuracy needs to be improved.

Because the traditional clustering method mainly carries out clustering analysis on static time sequence electric load data and the time sequence data has the condition of dynamic change, the statistical model and the incremental clustering algorithm are combined and applied to the electric load clustering research to carry out effective clustering. Otranto in the "Clustering heterologous time series by model-based procedures" paper, sequentially completes Clustering according to unconditional fluctuation rate, time-varying fluctuation rate parameters of time series and corresponding parameters of a Generalized auto-regressive Conditional diversity (GARCH) model. CN112215490A discloses a power load cluster analysis method based on correlation coefficient improved K-means, which adopts wavelet transform and principal component analysis to process a power load data machine, and utilizes Pearson correlation coefficient to perform cluster analysis, and does not consider real-time generated data to perform hidden information mining. CN114611591A discloses a power load clustering method based on discrete wavelet transform, which extracts power utilization trend characteristics and daily load curve characteristics of users to form user power utilization characteristics and uses the user power utilization characteristics as a user clustering basis to perform K-means clustering on the power utilization users, and has the problem that new implicit information generated by continuously increasing data volume is difficult to dig out.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method comprises the steps of establishing a Dynamic Condition Scoring (DCS) statistical model based on Gaussian distribution data observation driving according to heteroscedastic characteristics of user electrical load time sequence data, obtaining an autocorrelation value data set by utilizing condition moment estimation calculation of statistical model parameters, and establishing an autocorrelation increment fuzzy C-means clustering algorithm (AI-FCM) to realize clustering analysis by combining different electrical load data flow time sequence characteristics, wherein the Dynamic Condition Scoring (DCS) statistical model is combined with a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm, and the problems that clustering efficiency is low, model training time is long, user mode hidden information cannot be effectively extracted and the like caused by multimode high-dimension and heteroscedastic characteristics of electrical load data can be effectively solved.

The technical scheme adopted by the invention for solving the technical problem is as follows: the method for power load increment clustering by combining a DCS statistical model with an autocorrelation increment fuzzy C-means clustering algorithm comprises the following specific steps of establishing a Dynamic Condition Scoring (DCS) statistical model based on Gaussian distribution data observation driving according to the heteroscedastic characteristic of user power load time sequence data, obtaining an autocorrelation value data set by utilizing condition moment estimation calculation of statistical model parameters, and constructing the autocorrelation increment fuzzy C-means clustering algorithm (AI-FCM) by combining different power load data flow time sequence characteristics:

step one, user daily load data preprocessing and analyzing:

acquiring a daily load data set of the maximum load days of the same year and month of K users, searching and filling missing values and detecting and correcting abnormal values of the data set, obtaining the preprocessed data set as data used for establishing a power load time sequence model and performing cluster analysis, drawing a user daily load curve for simple analysis of fluctuation conditions, and primarily classifying the curve.

Secondly, determining the clustering number:

the range of the number of clusters is defined as

(rounding up), calculating the clustering effectiveness index V corresponding to the clustering number in the range according to the formula (1) _xb The optimal number of clusters C under this data set is determined.

A. Establishing and analyzing a DCS model of the power load time sequence data:

thirdly, determining a user hypothesis of the DCS model:

randomly selecting users in various categories of the primary classification to draw a histogram and a QQ (Quantum-Quantum Plot), determining the basic assumption of the users according to the histogram and the QQ, wherein the QQ is essentially a scatter Plot, if the scatter point of daily load time sequence data is near a red line, the observed vector approximately obeys Gaussian distribution, and the Gaussian distribution (2) is used as user data y _t The basic assumption of (c) is:

wherein mu _t Is the mean value of the time-varying parameters,

is a time-varying parameter variance;

fourthly, establishing a DCS model, and performing parameter estimation of the model:

building a DCS (p, q) model, and enabling p = q =1, wherein user data

A DCS (1, 1) model based on the assumption of Gaussian distribution, i.e. f, can be obtained _t The update equation for the tth implementation is shown in equation (3):

f _t ＝ω+As _t-1 +Bf _t-1 (3)

wherein the time-varying parameter vector

Constant vector

Real matrix

Wherein the scalar parameter comprises ω _μ 、

a _μ The function of the static parameter vector theta is obtained, and the parameter estimation is carried out on omega, A and B by using a maximum likelihood estimation method;

fifthly, calculating relevant parameters of the DCS model to obtain a final DCS model:

DCS model f _t Is the driving vector s from the moment t-1 _t-1 In time variable parameter vector f _t-1 Is calculated to obtain, wherein the vector is driven

Step 5.1, calculating a conditional score vector

The calculation formula is shown in formula (4)) And (5) is as follows:

step 5.2, calculating a scaling matrix S _t The calculation formula is shown in formulas (6) and (7):

and 5.3, obtaining a final time-varying parameter updating equation as shown in formulas (8) and (9):

and substituting the parameter estimation value and the corresponding parameter into an update equation of the time-varying parameter to obtain a condition moment estimation value of a time-varying mean value and a time-varying variance as an input data set of the clustering algorithm.

And finishing the establishment of the DCS model of the power load data.

B. The power load time sequence data is subjected to AI-FCM cluster analysis based on a DCS model:

sixthly, calculating the autocorrelation value of the sequence based on the DCS model:

in DCS models, R-level clustering is performed according to the R (R =1, 2.., R) th conditional moment estimate of each model. The DCS (1, 1) model means R =2, so R =1And the conditional moment estimation of the time series of r =2 respectively mean the parameters of the DCS (1, 1) model

And

is estimated. The r conditional moment estimate is used to obtain an r estimated autocorrelation value

Will become mean value

Sum time-varying variance

The conditional moment estimates are calculated as datasets for calculating autocorrelation values, i.e. time series y is calculated _t The estimated autocorrelation value of the r-th conditional moment at the lag l is calculated as shown in (10):

wherein

Is the mean value of the r-order conditional moments of the kth time series from the time t to the time t-l, and the distance between the r-order conditional moments of the time series k and k' based on the autocorrelation is:

seventhly, carrying out AI-FCM clustering based on the DCS model:

obtaining the estimated autocorrelation value from the r-th

And similarityMeasuring distance

The formed sequence data sets are respectively used as the input of an autocorrelation incremental fuzzy C-means clustering algorithm to obtain clustering results, and the specific operations are as follows:

in step 7.1, dividing K data points randomly into P (P =1, 2.. Multidot., P) data blocks, each data block having K/P (rounding up) data points, giving each data point an initial weight of 1, defining a fuzzy factor m =2, and an iteration end threshold e =0.0001, and performing a-wFCM clustering on the first data block P =1, wherein the clustering specifically comprises the following operations:

and 7.1.1, randomly initializing a membership matrix according to the calculated clustering number C to ensure that the membership matrix meets the following constraint conditions:

and 7.1.2, calculating a clustering center according to the obtained membership matrix, wherein the calculation formula is as follows:

7.1.3, calculating the value of the objective function J according to the membership matrix and the clustering center, wherein the calculation formula is shown as a formula (14), if the difference between the J values of the previous time and the next time is less than a specified threshold epsilon, ending iteration, and outputting the membership matrix and the clustering center, otherwise, carrying out the next step;

and 7.1.4, recalculating according to the clustering center to obtain a membership matrix and returning to the step 7.1.2, wherein the calculation formula of the membership matrix is as follows:

and 7.2, assigning the weight of the cluster center of the data block obtained by calculation again according to the formula (16), and adding the cluster center of the data block and the weight obtained by calculation into the next data block for A-wFCM clustering (steps 7.1.1 to 7.1.4):

wherein, N is the number of data points in the current data block, and j is the number of clustering centers in the previous data block.

And 7.3, repeating the step 7.2 until all the data blocks are clustered, and finally recalculating the membership of all the data points according to the final clustering center result and the formula (15).

And finishing the self-correlation increment fuzzy C-means clustering of the power load data based on the DCS model.

The user data time series y in the third step _t The value of the middle t depends on the resolution of the measured power load data, namely the number of the load measurement points in the maximum load day, and the time-varying mean value and the time-varying variance are determined according to each time point t and the original time sequence y _t The probability distribution hypothesis of the DCS model of all users is determined by drawing a histogram and a QQ chart of a typical user in the preliminary classification, the Gaussian distribution is used as the probability distribution hypothesis of the subsequent calculation, random variables under various situations can be described, and the probability distribution hypothesis has high applicability and universality.

The above-mentioned DCS (p, q) model in the fourth step enables p = q =1 to already describe most time-series data in a situation where the complexity of the model is low, and determines a time-varying parameter vector f according to the assumed parameters of the user _t And determining the vector expression of the subsequent parameter to be estimated.

Calculating the time series y in the sixth step _t When the r-th conditional moment is used for estimating the autocorrelation value at the lag of l, the lag order l is obtained by analyzing the characteristics displayed by the time sequence diagram and the autocorrelation diagram of the original time sequence.

The invention has the beneficial effects that: compared with the prior art, the invention has the prominent substantive characteristics and remarkable progress as follows:

(1) Compared with CN111915116A, the method of the invention has the advantages of clustering the high-dimensional characteristics of the time sequence data and improving the clustering efficiency and accuracy of the time sequence.

(2) Compared with CN110795610B, the method of the present invention has the advantages of capturing the characteristic details of time sequence data, describing user data accurately and performing clustering analysis flexibly.

(3) Compared with CN112270338A, the method of the invention has the advantages that a plurality of characteristics of time series data can be considered for cluster analysis, and the cluster result can be flexibly explained.

(4) Compared with CN112215490A, the method of the invention has the advantage that the time-varying characteristics of the time sequence data can be fully extracted for the establishment and analysis of the statistical model.

(5) Compared with CN114611591A, the method of the invention has the advantage that newly added users can be added into the original data for cluster analysis aiming at the dynamically updated user data.

(6) The method comprises the steps of firstly establishing a Dynamic Condition Scoring (DCS) statistical model based on Gaussian distribution data observation driving according to characteristics of time-varying property, high dimension, variance and the like of daily load time sequence data of a user, then utilizing condition moment estimation and calculation of statistical model parameters to obtain an autocorrelation value data set in order to describe a dependency structure of the time sequence data and reduce the calculation cost of a clustering process under the high dimension time sequence data, and finally establishing an autocorrelation increment fuzzy C mean value clustering algorithm (AI-FCM) by combining time sequence characteristics of different power load data flows, and inputting the autocorrelation value data set as a clustering algorithm to realize clustering analysis. Therefore, the method is a power load increment clustering method combining a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm, and can effectively solve the problems of low clustering efficiency, long model training time, incapability of effectively extracting user mode hidden information and the like caused by multimode high-dimensional and different variance characteristics of power load data.

(7) The invention constructs a class of observation-driven DCS model, updates parameters along with time by using scale scores of a likelihood function, provides a uniform and consistent framework for introducing time-varying parameters into a wide nonlinear model by using data characteristics of a time sequence, considers the DCS model as the basis of time sequence clustering analysis, and can effectively analyze and establish the data and the model of the time sequence data before clustering.

(8) According to the invention, the time sequence characteristics of the user electrical load data are considered, an incremental clustering algorithm based on a DCS statistical model is constructed for solving the problem that the typical load mode of the user is accurate, efficient and real-time updated, and the efficiency and accuracy of time sequence data clustering are improved.

Drawings

The invention is further illustrated with reference to the following figures and examples.

FIG. 1 is a schematic block diagram of a flow of a power load increment clustering method combining a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm.

Fig. 2 is a graph of the maximum load daily load in an embodiment of the present invention.

Fig. 3 is a histogram and QQ plot for a typical user in an embodiment of the present invention.

FIG. 4 is a graph of the results of DCS-AI-FCM clustering of the user electrical load data sets under the same conditional mean in an embodiment of the present invention.

FIG. 5 is a graph of the results of DCS-AI-FCM clustering of user electrical load data sets under the same conditional variance in an embodiment of the present invention.

FIG. 6 is a comparison chart of the intra-cluster evaluation indexes of the user electrical load data set DCS-AI-FCM clustering and other clustering methods in the embodiment of the invention.

Detailed Description

The embodiment shown in fig. 1 shows that the flow of the power load increment clustering method combining the DCS statistical model and the autocorrelation increment fuzzy C-means clustering algorithm of the present invention is as follows:

1. preprocessing and analyzing daily load data of a user → 2, determining the number of clusters → 3, determining user hypothesis of a DCS model → 4, establishing the DCS model, performing parameter estimation of the model → 5, calculating to obtain relevant parameters of the DCS model, obtaining a final DCS model → 6, calculating an autocorrelation value of a sequence based on the DCS model → 7, and performing AI-FCM clustering based on the DCS model.

Example 1

The method for clustering the increment of the power load by combining the DCS statistical model and the autocorrelation increment fuzzy C-means clustering algorithm comprises the following specific steps:

step one, user daily load data preprocessing and analyzing:

acquiring a daily load data set of 3 months maximum load days in 2020 of K =11 power utilization companies in China, searching and filling missing values and detecting and correcting abnormal values of the data set, acquiring a preprocessed data set as data used for power load time series model establishment and cluster analysis, drawing a user daily load curve for simple analysis of fluctuation conditions, and primarily classifying the curve.

Secondly, determining the clustering number:

the range of the number of clusters is defined as

C. Establishing and analyzing a DCS model of the power load time series data:

thirdly, determining user hypothesis of the DCS model:

randomly selecting users in each category of the primary classification to draw a histogram and a QQ (Quantum-Quantum Plot), and determining the basic hypothesis of the users according to the histogram and the QQ Plot, wherein the QQ isThe graph is essentially a scatter plot, and if the scatter of the daily load time-series data is near the red line, it indicates that the observed vector is approximately gaussian, and gaussian (2) is used as the user data y _t The basic assumption of (c) is:

wherein mu _t As a mean value of the time-varying parameters,

is a time-varying parameter variance;

building a DCS (p, q) model, and enabling p = q =1, wherein user data

A DCS (1, 1) model based on the assumption of Gaussian distribution, i.e. f, can be obtained _t The updating equation for the t-th implementation is shown in equation (3):

f _t ＝ω+As _t-1 +Bf _t-1 (3)

wherein the time-varying parameter vector

Constant vector

Real matrix

Wherein the scalar parameter comprises ω _μ 、

DCS model f _t Is the drive vector s from time t-1 _t-1 In time variable parameter vector f _t-1 Is calculated to obtain, wherein the vector is driven

Step 5.1, calculating a conditional score vector

The calculation formula is shown in formulas (4) and (5):

And finishing the establishment of the DCS model of the power load data.

D. And (3) carrying out AI-FCM cluster analysis on the power load time sequence data based on a DCS model:

in DCS models, R-level clustering is performed according to the R (R =1, 2.., R) th conditional moment estimate of each model. The DCS (1, 1) model means R =2, and therefore the time-series conditional moment estimates of R =1 and R =2 respectively mean the parameters of the DCS (1, 1) model

And

Will become mean value

Sum time-varying variance

Is calculated as a data set for calculating autocorrelation values, i.e. time series y is calculated _t The estimated autocorrelation value of the r-th conditional moment at lag l is calculated as shown in (10):

wherein

Is the mean value of the r-order conditional moments of the kth time series from the time t to the time t-l, the time seriesThe distance between the r-order conditional moments of columns k and k' based on the autocorrelation is:

seventhly, carrying out AI-FCM clustering based on the DCS model:

obtaining the estimated autocorrelation value from the r-th

And similarity measure distance

And the formed sequence data sets are respectively used as the input of an autocorrelation increment fuzzy C-means clustering algorithm to obtain clustering results, and the specific operation is as follows:

at step 7.1, randomly dividing the K data points into P (P =1, 2.. Multidot., P) data blocks, wherein each data block has K/P (rounding up) data points, giving an initial weight of each data point to be l, defining a fuzzy factor m =2, and an iteration end threshold value epsilon =0.0001, and performing a-wFCM clustering on the first data block P =1, wherein the clustering specifically operates as follows:

step 7.2, the weight of the data block clustering center obtained by calculation is given again according to the formula (16), the data block clustering center and the weight obtained by calculation are added into the next data block for A-wFCM clustering (steps 7.1.1 to 7.1.4),

Fig. 2 shows a load curve chart of 11 maximum load days of the household electric power companies, wherein the companies are simply classified according to fluctuation conditions of the companies in the graph, the companies with obvious wave peaks are classified into one class, and the companies with relatively smooth fluctuation are classified into one class.

Fig. 3 shows that 4 companies are randomly selected, histograms and QQ charts are drawn on the maximum daily load data, and the images show that the dispersion point of the maximum daily load time series data is located near the red line, which indicates that the observed vector approximately obeys gaussian distribution, and further, the gaussian distribution is taken as the user hypothesis.

Fig. 4 and 5 show clustering results obtained by clustering company data sets using the DCS-AI-FCM clustering method, in which case companies clustered in the same category according to two time-varying parameters have the same conditional distribution when R-level clustering is started, i.e., when time series are grouped with the same conditional distribution. And when the clustering according to the conditional variance is considered, the clustering distribution of some companies has larger uncertainty, all companies clustered according to the conditional mean belong to the current clustering, the probability is over 95 percent, but when the clustering according to the conditional variance is considered, the probability is over 70 percent.

FIG. 6 shows comparison of the results of A-FCM clustering, DCS-A-FCM clustering and DCS-AI-FCM clustering performed on the maximum load daily datse:Sup>A of 11 electric utilities, and effectiveness evaluation is performed by respectively adopting SC, CHI and DBI internal evaluation indexes and drawing se:Sup>A histogram, wherein the indexes in the histogram show that the clustering accuracy can be effectively improved by adding se:Sup>A DCS statistical model and incremental clustering thinking.

Claims

The method for clustering the increment of the power load by combining a DCS statistical model and an autocorrelation increment fuzzy C-means clustering algorithm is characterized by comprising the following steps of: according to the heteroscedastic characteristics of user electricity load time sequence data, a dependency structure of the time sequence data is described, the calculation cost of a clustering process under high-dimensional time sequence data is reduced, and the specific steps of constructing a self-correlation increment fuzzy C-means clustering algorithm based on a DCS statistical model are as follows by combining the time sequence characteristics of different electricity load data flows:

firstly, preprocessing and analyzing daily load data of a user:

acquiring daily load data sets of the maximum load days of the same year and month of K users, searching and filling missing values and detecting and correcting abnormal values of the data sets, obtaining preprocessed data sets serving as data used for power load time series model establishment and cluster analysis, drawing a daily load curve of the users for simple analysis of fluctuation conditions, and primarily classifying the curve;

secondly, determining the clustering number:

the range of the number of clusters is defined as
(rounding up), calculating the clustering effectiveness index V corresponding to the clustering number in the range according to the formula (1) _xb Determining the optimal clustering number C under the data set;

A. establishing and analyzing a DCS model of the power load time sequence data:

thirdly, determining a user hypothesis of the DCS model:

randomly selecting users in each category of the preliminary classification to draw a histogram and a QQ (Quantum-Quantum Plot), determining the basic assumption of the users according to the histogram and the QQ, wherein the QQ is essentially a scatter diagram, if the scatter point of daily load time series data is near a red line, the observed vector approximately obeys Gaussian distribution, and the Gaussian distribution (2) is used as user data y _t The basic assumption of (2):

wherein mu _t Is the mean value of the time-varying parameters,
is a time-varying parameter variance;

fourthly, establishing a DCS model, and performing parameter estimation of the model:

establishing a DCS (p, q) model, and enabling p = q =1, wherein user data
A DCS (1, 1) model based on the assumption of Gaussian distribution, i.e. f, can be obtained _t The updating equation for the t-th implementation is shown in equation (3):

f _t ＝ω+As _t-1 +Bf _t-1 (3)

wherein the time-varying parameter vector
Constant vector
Real matrix
Wherein the scalar parameter comprises ω _μ 、
a _μ The function of the static parameter vector theta is obtained, and the parameter estimation is carried out on omega, A and B by using a maximum likelihood estimation method;

fifthly, calculating relevant parameters of the DCS model to obtain a final DCS model:

DCS model f _t Is the drive vector s from time t-1 _t-1 In time variable parameter vector f _t-1 Is calculated to obtain, wherein the vector is driven

Step 5.1, calculating a conditional score vector
The calculation formula is shown in formulas (4) and (5):

step 5.2, calculating a scaling matrix S _t The calculation formula is shown in formulas (6) and (7):

and 5.3, obtaining a final time-varying parameter updating equation as shown in formulas (8) and (9):

substituting the parameter estimation value and the corresponding parameter into an update equation of a time-varying parameter to obtain a condition moment estimation value of a time-varying mean value and a time-varying variance as an input data set of a clustering algorithm;

and finishing the establishment of the DCS model of the power load data.

B. And (3) carrying out AI-FCM cluster analysis on the power load time sequence data based on a DCS model:

sixthly, calculating the autocorrelation value of the sequence based on the DCS model:

in DCS models, R-level clustering is performed according to the R (R =1, 2.., R) th conditional moment estimate of each model. The DCS (1, 1) model means R =2, and therefore the time-series conditional moment estimates of R =1 and R =2 respectively mean the parameters of the DCS (1, 1) model
And
is estimated. The r conditional moment estimate is used to obtain an r estimated autocorrelation value
Will become mean value
Sum time-varying variance
The conditional moment estimates are calculated as datasets for calculating autocorrelation values, i.e. time series y is calculated _t The estimated autocorrelation value of the r-th conditional moment at lag l is calculated as shown in (10):

wherein
Is the mean value of the r-order conditional moments of the kth time series from the time t to the time t-l, and the distance between the r-order conditional moments of the time series k and k' based on the autocorrelation is:

seventhly, carrying out AI-FCM clustering based on the DCS model:

obtaining the estimated autocorrelation value from the r-th
And similarity measure distance
Sequence of compositionsColumn data sets are respectively used as input of an autocorrelation increment fuzzy C-means clustering algorithm to obtain clustering results, and the method specifically comprises the following operations:

in step 7.1, dividing K data points randomly into P (P =1, 2.. Multidot., P) data blocks, each data block having K/P (rounding up) data points, giving each data point an initial weight of 1, defining a fuzzy factor m =2, and an iteration end threshold e =0.0001, and performing a-wFCM clustering on the first data block P =1, wherein the clustering specifically comprises the following operations:

and 7.1.1, randomly initializing a membership matrix according to the calculated clustering number C so as to meet the following constraint conditions:

and 7.1.2, calculating a clustering center according to the obtained membership matrix, wherein the calculation formula is as follows:

7.1.3, calculating the value of the objective function J according to the membership matrix and the clustering center, wherein the calculation formula is shown as a formula (14), if the difference between the J values of the previous time and the next time is less than a specified threshold epsilon, ending iteration, and outputting the membership matrix and the clustering center, otherwise, carrying out the next step;

and 7.1.4, recalculating according to the clustering center to obtain a membership matrix and returning to the step 7.1.2, wherein the calculation formula of the membership matrix is as follows:

step 7.2, the weight of the data block clustering center obtained by calculation is given again according to the formula (16), the data block clustering center and the weight obtained by calculation are added into the next data block for A-wFCM clustering (steps 7.1.1 to 7.1.4),

wherein, N is the number of data points in the current data block, and j is the number of clustering centers in the previous data block;

7.3, repeating the step 7.2 until all the data blocks are clustered, and finally recalculating the membership of all the data points according to the final clustering center result and the formula (15);

and finishing the self-correlation increment fuzzy C-means clustering of the power load data based on the DCS model.
2. The method for clustering incremental power load by combining the DCS statistical model with the autocorrelation incremental fuzzy C-means clustering algorithm according to claim 1, wherein the method comprises the following steps: said third step user data time series y _t The value of (m) t depends on the resolution of the measured power load data, the number of load measurement points for the maximum load day, the time-varying mean and the time-varying variance according to each time point t and the original time series y _t The probability distribution hypothesis of the DCS model of all users is determined by drawing histograms and QQ graphs of typical users in the preliminary classification, gaussian distribution is used as the probability distribution hypothesis of subsequent calculation, random variables under various situations can be described, and the probability distribution hypothesis has high applicability and universality.
3. The method of claim 1, wherein the DCS statistical model is combined with an autocorrelation increment fuzzy C-means clustering algorithm to perform incremental clustering of power loads, and the method comprises: the fourth step is that the DCS (p, q) model enables p = q =1 to describe most time sequence data under the condition of low model complexity, and a time-varying parameter vector f is determined according to the assumed parameters of a user _t In a vector, whileAnd determining a vector expression of the subsequent parameter to be estimated.