CN117407737A

CN117407737A - Cloud load library establishment method based on clustering

Info

Publication number: CN117407737A
Application number: CN202310599457.8A
Authority: CN
Inventors: 陆洋; 高久国; 徐峰; 钱卫杰; 刘承宗; 杨超; 赵健; 陈子靖; 徐斌; 鲍雨
Original assignee: State Grid Zhejiang Electric Power Co Ltd Anji County Power Supply Co; Shanghai University of Electric Power; Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: State Grid Zhejiang Electric Power Co Ltd Anji County Power Supply Co; Shanghai University of Electric Power; Huzhou Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2024-01-16

Abstract

The invention discloses a cloud load library establishment method based on clustering, which comprises the following steps: acquiring distribution transformer load data as a clustering sample; preprocessing distribution transformer load data; establishing characteristic indexes of each light distribution and transformation storage, industry and farmers; carrying out dimension reduction treatment on the load data to obtain a low-dimension electricity utilization characteristic vector; clustering the distribution transformer feature vectors; and analyzing the components of various cluster distribution transformers and establishing a typical distribution transformer cloud load library. According to the technical scheme, a large amount of distribution transformer load data containing different proportions of distributed optical storage and charging and industry and farmers are selected as samples, and a distribution transformer load library is established by utilizing a double-scale distance measurement clustering algorithm based on a distribution transformer feature set; clustering by using a double-scale distance measurement clustering algorithm based on a distribution transformer feature set; and finally, carrying out component analysis on each cluster center by combining the clustering result and the distribution transformer information, and forming a typical distribution transformer cloud load library by taking each cluster center as a typical distribution transformer type.

Description

Cloud load library establishment method based on clustering

Technical Field

The invention relates to the technical field of power systems, in particular to a cloud load library establishment method based on clustering.

Background

In a power distribution network, a distribution transformer is often referred to as a power transformer with a voltage class of 10-35KV, and is directly powered by an end user, so that the power distribution network has an important role in power supply. Because the load characteristics of the power consumer components and the power requirements of the power supply of each distribution transformer are different, an accurate and efficient method for analyzing and managing is lacking. The distribution transformer cloud load library is a database which is extracted from massive distribution transformer load data and can represent the load curve composition of typical power consumption characteristics, and the establishment of the distribution transformer cloud load library is the basis for carrying out load prediction, demand side management and other researches on the distribution transformer cloud load library.

The data display shows that the dimension of the distribution transformer cloud load data gradually becomes larger along with the popularization of intelligent measurement equipment, the traditional clustering method has poor effect when facing high-dimension data, and an accurate distribution transformer cloud load library is difficult to establish. Therefore, a cloud load library establishment method based on clustering is provided.

Chinese patent document CN103533011a discloses a cloud-based intelligent terminal data configuration method and system. The system architecture is mainly composed of AP equipment and a cloud radio frequency optimization module; the AP equipment comprises a wireless scanning module, a data collecting module, a data transmitting module, a data receiving module and a radio frequency configuration effecting module, and the cloud radio frequency optimizing module comprises a cloud data receiving module, a cloud data caching module, a cloud data loading module, a cloud data calculating module, a cloud data issuing module, a cloud timing module and a log storing module. The technical scheme is difficult to solve the technical problems that the distribution transformer has various load characteristics and the distribution transformer load library is difficult to build.

Disclosure of Invention

The invention mainly solves the technical problems that the prior technical scheme is difficult to solve the problems of various load characteristics of a distribution transformer and difficult to establish a distribution transformer load library, and provides a cloud load library establishment method based on clustering, which selects mass distribution transformer load data containing different proportions of distributed optical storage and power plant and farmers as samples and utilizes a double-scale distance measurement clustering algorithm based on a distribution transformer feature set to establish the distribution transformer load library; after preprocessing the original data, establishing various light distribution and storage and charging, and industrial and agricultural business characteristic indexes, and adopting a deep convolution automatic encoder to perform dimension reduction processing on the load data to obtain a low-dimension electricity utilization characteristic vector; clustering by using a double-scale distance measurement clustering algorithm based on the distribution transformer feature set; and finally, carrying out component analysis on each cluster center by combining the clustering result and the distribution transformer information, and forming a typical distribution transformer cloud load library by taking each cluster center as a typical distribution transformer type.

The technical problems of the invention are mainly solved by the following technical proposal: the invention comprises the following steps:

s1, acquiring distribution transformer load data as a clustering sample;

s2, preprocessing distribution transformer load data;

s3, establishing characteristic indexes of each light distribution and transformation storage and charging and industry and farmers;

s4, carrying out dimension reduction processing on the load data to obtain a low-dimension electricity utilization characteristic vector;

s5, clustering the distribution transformer feature vectors;

s6, analyzing various cluster distribution components and establishing a typical distribution cloud load library.

Selecting a large amount of distribution transformer load data containing distributed optical storage and charging with different proportions and workers and farmers as samples, and establishing a distribution transformer load library by using a double-scale distance measurement clustering algorithm based on a distribution transformer feature set; after preprocessing the original data, establishing various light distribution and storage and charging, and industrial and agricultural business characteristic indexes, and adopting a deep convolution automatic encoder to perform dimension reduction processing on the load data to obtain a low-dimension electricity utilization characteristic vector; clustering by using a double-scale distance measurement clustering algorithm based on the distribution transformer feature set; and finally, carrying out component analysis on each cluster center by combining the clustering result and the distribution transformer information, and forming a typical distribution transformer cloud load library by taking each cluster center as a typical distribution transformer type.

Preferably, the distribution transformer load data comprises distribution transformer load data of distributed optical storage and charging and industry and farmers with different proportions.

Preferably, the preprocessing in step S2 specifically includes cleaning the data, completing missing value filling and abnormal data checking and correcting, and forming the distribution transformer load matrix P from the preprocessed data.

Preferably, the step S3 specifically includes assigning a variant type, assigning a light storage and charge characteristic index to the variant containing the light storage and charge type, and assigning a commercial and industrial characteristic index to the variant containing the commercial and industrial type.

Preferably, the step S4 specifically includes establishing a convolutional automatic encoder, performing data reconstruction by using the configuration transformer load data as input to obtain a vector h of dimension reduction in each configuration transformer hidden layer as an extracted feature, and combining the vector h with a configuration transformer characteristic index to obtain a configuration transformer feature set U.

Preferably, the step S5 of clustering the distribution transformer feature vectors specifically comprises clustering the distribution transformer light storage, the worker and farmer characteristic indexes and the low-dimensional power utilization feature vectors by adopting a double-scale distance metric clustering algorithm based on the distribution transformer feature set,

s3.1, calculating the density rho of each point in the distribution transformer feature set U, and taking the point with the maximum density value as a first clustering center C ₁ Removing C ₁ Nearby data;

s3.2, repeating the step S3.1 to obtain other clustering centers by adopting the residual data of the distribution transformer characteristic set U until no residual data exists, and taking the clustering center C as an initial clustering center of a K-means algorithm;

s3.3, introducing a double-scale distance measurement to calculate the distance between samples in a K-means algorithm;

and S3.4, clustering the data set U based on the double-scale distance measurement to obtain a clustering result.

Preferably, the step S3.1 specifically includes setting the number of samples in the configuration change feature set U to be n, and setting the ith data to be U _i ，u _i For m-dimensional vectors, i.e. u _i ＝[u _i,1 ,…,u _i,m ]Calculating the density rho of each point in U _i ：

Wherein d is _wd (u _i ,u _j ) For sample point u _i And u _j Weighted Euclidean distance between; w (w) _k The weight of the kth feature;MeanDis (U) is the average weighted distance of all sample elements in the dataset U, expressed as:

the maximum point of the density value in U is taken asFor the first cluster center C ₁ The set C of cluster centers becomes c= { C ₁ Simultaneously distance C in U ₁ Points smaller than meandi (U) are removed.

Preferably, the step S3.2 specifically includes calculating the density ρ (i) of the remaining data in the sample U, and selecting the sample point with the largest ρ (i) as the second cluster center C ₂ The set C of cluster centers becomes c= { C ₁ ,C ₂ Simultaneously distance C in U ₂ And removing points smaller than the MeanDis (U), repeating the steps until no residual data exists in the data set U, and selecting the obtained cluster center C as an initial cluster center of the K-means algorithm.

Preferably, the step S3.3 specifically includes introducing a double-scale distance metric to calculate a distance between samples in the K-means algorithm, and calculating a formula based on the distance of the double-scale distance metric as follows:

d _tsd (u _i ,u _j )＝αd _wd (u _i ,u _j )+βd _fd (u _i ,u _j )

wherein u is _i And u _j A sample of the distance to be calculated; d, d _wd (u _i ,u _j ) Is u _i And u _j Weighted Euclidean distance between; alpha and beta are weight coefficients of two distance measures; d, d _fd (u _i ,u _j ) Is u _i And u _j The fraiche distance therebetween is calculated as follows:

wherein,the re-parameterized function for the unit interval of gamma and eta is a corresponding value when the re-parameterized function tends to be infinite; d () is a metric function; k represents the kth feature calculated to the sample.

Preferably, the step S6 specifically includes performing component analysis on each cluster center by combining the clustering result and the acquired distribution transformer information, analyzing distributed optical storage and industrial and agricultural components of the cluster centers, and forming a typical distribution transformer cloud load library by using each cluster center as a typical distribution transformer type.

The beneficial effects of the invention are as follows: selecting a large amount of distribution transformer load data containing distributed optical storage and charging with different proportions and workers and farmers as samples, and establishing a distribution transformer load library by using a double-scale distance measurement clustering algorithm based on a distribution transformer feature set; after preprocessing the original data, establishing various light distribution and storage and charging, and industrial and agricultural business characteristic indexes, and adopting a deep convolution automatic encoder to perform dimension reduction processing on the load data to obtain a low-dimension electricity utilization characteristic vector; clustering by using a double-scale distance measurement clustering algorithm based on the distribution transformer feature set; and finally, carrying out component analysis on each cluster center by combining the clustering result and the distribution transformer information, and forming a typical distribution transformer cloud load library by taking each cluster center as a typical distribution transformer type.

Drawings

Fig. 1 is a flow chart of the present invention.

Fig. 2 is a diagram of the results of a simulation verification of the present invention.

Detailed Description

The technical scheme of the invention is further specifically described below through examples and with reference to the accompanying drawings.

Examples: the cloud load library establishment method based on clustering in this embodiment, as shown in fig. 1, includes the following steps:

step 1: acquiring distribution load data containing distributed optical storage and filling with different proportions and workers and farmers as clustering samples, and cleaning the data to finish missing value filling and abnormal data inspection and correction;

step 2: establishing various light distribution and storage and charging, and industrial and agricultural business characteristic indexes, and adopting a deep convolution automatic encoder to perform dimension reduction processing on load data to obtain a low-dimension electricity utilization characteristic vector;

step 3: and clustering the distribution transformer feature vectors by adopting a double-scale distance measurement clustering algorithm based on the distribution transformer feature set for the distribution transformer light storage, the worker and farmer characteristic indexes and the low-dimensional power utilization feature vectors, analyzing the distribution transformer composition components of various clusters, and realizing the accurate establishment of a typical distribution transformer cloud load library.

The step 1 requires cleaning the obtained distribution transformer load data to finish missing value filling and abnormal data checking and correcting, and comprises the following specific steps:

and (1.1) acquiring distribution transformer load data of distributed optical storage and charging and workers and farmers in different proportions.

Based on an intelligent measuring device, acquiring 96-point daily load data of N distribution transformers in m days to form a load original data matrix P of l multiplied by 96, wherein l=n multiplied by m, and a distribution transformer set N= { N ₁ ,n ₂ ,n ₃ ,…n _n The date is represented by the set m= {1,2,3, … M }. Load matrix p of each distribution transformer _ni And the resulting total distribution load matrix P is shown below:

P＝[P _n1 ,P _n2 ,…,P _ni ] ^T _l×96

(1.2) data cleaning, and completing missing value filling and abnormal data checking and correcting.

Firstly, eliminating load data with large data loss, and filling load data with less serious data loss by adopting a Lagrange interpolation method, wherein the specific filling formula is as follows:

wherein:for abnormal sample point x _k,t Is a correction value of (2); x is x _k,t-a 、x _k,t-b Sample points taken forward and backward respectively, typically taken as 4-6; a. b is the number of sample points taken forward and backward.

(1.3) data standardization, eliminating load dimension influence.

After the missing and abnormal data are processed, standard scaler standardization is carried out on the original data by adopting Z-Score standardization, the influence of load dimension on subsequent neural network training and deep clustering is eliminated, and the standardization formula is as follows:

wherein: x is load data after cleaning; x' is the normalized load data; μ and σ are the mean and standard deviation of the sample data, respectively.

Step 2, the light storage and charge of each distribution transformer, the industrial and agricultural business characteristic index and the dimension reduction treatment of the load data by adopting a deep convolution automatic encoder are required to be established, and a low-dimension electricity utilization characteristic vector is obtained, and the specific steps are as follows:

and (2.1) establishing a distribution characteristic index according to the distribution type.

Firstly, the variable types are allocated in a region, the light storage and charge power curves are extracted from the variable types including the light storage and charge type, and characteristic indexes are formulated for the light storage and charge power curves: maximum power P of photovoltaic power generation _pmax Peak time T of photovoltaic power generation _peak Number of times T of energy storage charge and discharge _sto Energy storage capacity E _cap Number of charging piles N _cha The method comprises the steps of carrying out a first treatment on the surface of the Formulating characteristic indexes for the distribution transformer containing industrial and agricultural business types: industrial user proportion L _ind Proportion of agricultural users L _agr Commercial user ratio L _bus Peak load rate L _peak Load factor L in valley period _val Maximum load occurrence time T _max Minimum load occurrence time T _min . Wherein peak load rate L _peak Load factor L in valley period _val The calculation formula is as follows:

p in the formula _av ，P _av,peak And P _av,val The daily load power average value, peak load average value, and valley load average value are represented, respectively. The peak period time is selected to be 8:00-11:00, 18:00-21:00, and the valley period is selectedThe time is 22:00-24:00, 0:00-6:00.

(2.2) creating a depth convolution auto-encoder.

For the situations of high distribution transformer load dimension and large data volume, a convolution automatic encoder is adopted to conduct data dimension reduction and feature learning, the convolution automatic encoder comprises two processes of encoding and decoding, and the assumption is that x= [ x ₁ ,x ₂ ,...,x _i ]For normalized distribution transformer load data, the coding process is represented by the following formula:

h＝σ(x ^* ω+b)

wherein x is E R ^i×j Representing a time sequence, i is the length of the time sequence, represents performing convolution operation, ω and b are the self-encoder network weights and offsets of the convolution at the encoding stage, h is the convolved data, σ is the activation function, and Relu is adopted as the activation function.

The decoding process is shown in the following formula:

x'＝σ'(h*ω'+b')

where ω 'and b' represent the convolutional self-encoder network weights and offsets, respectively, during the decoding stage, x 'represents the reconstruction data of input x, σ' is the activation function in the decoder, and sigmoid is taken as the activation function.

The goal of the self-coding training is to minimize the reconstruction error, taking the mean square error as the loss function L _r The loss function is defined as follows:

solving L by adopting gradient descent method _r Obtaining the optimal network parameters, realizing the construction of the depth convolution self-encoder, obtaining the vector h of dimension reduction in each distribution transformer hidden layer as the extracted characteristic, and combining the extracted characteristic with the distribution transformer characteristic index to obtain a distribution transformer characteristic set U which is used as the input of the clustering algorithm in the step 3.

And 3, clustering the distribution transformer feature vectors by adopting a double-scale distance measurement clustering algorithm based on the distribution transformer feature set, analyzing the distribution transformer composition components of various clusters, and realizing the accurate establishment of a typical distribution transformer load library, wherein the method comprises the following specific steps of:

(3.1) calculating the density ρ of each point in the input data set.

Setting the number of samples in the distribution change feature set U as n and the ith data as U _i ，u _i For m-dimensional vectors, i.e. u _i ＝[u _i,1 ,…,u _i,m ]Calculating the density rho of each point in U _i ：

taking the maximum point of the density value in U as a first clustering center C ₁ The set C of cluster centers becomes c= { C ₁ Simultaneously distance C in U ₁ Points smaller than meandi (U) are removed.

(3.2) updating the clustering center and repeating the step (3.1).

Calculating the density rho (i) of the residual data in the sample D, and selecting the sample point with the largest rho (i) as a second clustering center C ₂ The set C of cluster centers becomes c= { C ₁ ,C ₂ Simultaneously distance C in D ₂ Points smaller than meandi (D) are removed. And (3.1) repeating the step until the data set D has no residual data, and selecting the obtained cluster center C as an initial cluster center of the K-means algorithm.

(3.3) introducing a two-dimensional distance measurement method.

The distance between each sample in the K-means algorithm is calculated by introducing a double-scale distance measure, and the distance calculation formula based on the double-scale distance measure is as follows:

d _tsd (u _i ,u _j )＝αd _wd (u _i ,u _j )+βd _fd (u _i ,u _j )

wherein the method comprises the steps ofThe re-parameterized function for the unit interval of gamma and eta is a corresponding value when the re-parameterized function tends to be infinite; d () is a metric function; k represents the kth feature calculated to the sample.

And (3.4) carrying out component analysis on each cluster center by combining the clustering result and the acquired distribution transformer information, analyzing distributed optical storage and filling of the cluster centers and the components of industry and farmers, and forming a typical distribution transformer cloud load library by taking each cluster center as a typical distribution transformer type.

In order to verify that the method has more accurate clustering results compared with the traditional method, 3500 distribution transformer typical daily load curves are selected to verify the technical effects adopted in the method, and the experimental results are shown in fig. 2. The different methods selected in the embodiment and the comparison test performed by adopting the method compare test results by means of scientific demonstration, and the actual effect of the method is verified. The present example selects cluster index DBI (Davies-Bouldin index), CHI (Calinski-Harabasz index) and SC (Silhouette Coefficient) for quantitative analysis.

The higher the SC value, the better the clustering result. The smaller the DBI value, the better the clustering effect. The larger the CHI value, the better the clustering effect. And K-means, principal component analysis (Principal Components Analysis, PCA) +K-means, and performing comparative analysis by three conventional clustering methods based on deep embedded clustering (IDEC) of local structure retention. The comparative results are shown in the following table:

table 1 clustering results are compared.

Claims

1. The cloud load library establishment method based on clustering is characterized by comprising the following steps of:

s1, acquiring distribution transformer load data as a clustering sample;

s2, preprocessing distribution transformer load data;

s5, clustering the distribution transformer feature vectors;

2. The cloud load library establishment method based on clustering according to claim 1, wherein the distribution transformer load data in the step S1 includes distribution transformer load data including distributed optical storage and filling with different proportions and industry and farmers.

3. The cloud load library establishing method based on clustering according to claim 1, wherein the preprocessing in the step S2 specifically includes cleaning data, completing missing value filling and abnormal data checking and correcting, and forming the preprocessed data into a distribution transformer load matrix P.

4. The cloud load library establishment method based on clustering according to claim 1, wherein the step S3 specifically comprises, and the distribution transformer type is used for preparing the light storage and filling characteristic indexes for the distribution transformer containing the light storage and filling type and the industrial and agricultural commercial characteristic indexes for the distribution transformer containing the industrial and agricultural commercial type.

5. The cloud load library establishing method based on clustering according to claim 1, wherein the step S4 specifically includes establishing a convolutional automatic encoder, performing data reconstruction by using distribution transformer load data as input to obtain a vector h of dimension reduction in each distribution transformer hidden layer as an extracted feature, and combining the extracted feature with a distribution transformer characteristic index to obtain a distribution transformer feature set U.

6. The method for creating cloud load library based on clustering as claimed in claim 5, wherein said step S5 of clustering each distribution transformer feature vector specifically comprises clustering each distribution transformer feature vector by adopting a dual-scale distance metric clustering algorithm based on a distribution transformer feature set on distribution transformer light storage, worker-farmer characteristic indexes and low-dimensional electricity utilization feature vectors,

7. The cloud load library establishment method based on clustering as claimed in claim 6, wherein the step S3.1 specifically includes setting the number of samples in the configuration change feature set U to be n, and setting the ith data to be U _i ，u _i For m-dimensional vectors, i.e. u _i ＝[u _i,1 ,…,u _i,m ]Calculating the density rho of each point in U _i ：

8. The cloud load library establishment method based on clustering as claimed in claim 1, wherein the step S3.2 specifically includes calculating a density ρ (i) of remaining data in the sample U, and selecting a sample point with a maximum ρ (i) as the second cluster center C ₂ The set C of cluster centers becomes c= { C ₁ ,C ₂ Simultaneously distance C in U ₂ And removing points smaller than the MeanDis (U), repeating the steps until no residual data exists in the data set U, and selecting the obtained cluster center C as an initial cluster center of the K-means algorithm.

9. The cloud load library establishment method based on clustering according to claim 7, wherein the step S3.3 specifically includes introducing a double-scale distance metric to calculate a distance between samples in a K-means algorithm, and calculating a distance calculation formula based on the double-scale distance metric as follows:

d _tsd (u _i ,u _j )＝αd _wd (u _i ,u _j )+βd _fd (u _i ,u _j )

10. The method for establishing the cloud load library based on the clustering according to claim 1, wherein the step S6 specifically includes performing component analysis on each cluster center by combining a clustering result and the acquired distribution transformation information, analyzing distributed optical storage and filling of the cluster center and components of industry and farmers, and forming a typical distribution transformation cloud load library by taking each cluster center as a typical distribution transformation type.