CN112069633B

CN112069633B - Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering

Info

Publication number: CN112069633B
Application number: CN202010798057.6A
Authority: CN
Inventors: 吴峥嵘; 石江华; 周蓝波; 宋祎波; 李俊颖; 忻葆宏; 张萌亮; 宗卫国; 顾珏; 曹轶毅
Original assignee: State Grid Shanghai Electric Power Co Ltd
Current assignee: State Grid Shanghai Electric Power Co Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2023-04-07
Anticipated expiration: 2040-08-10
Also published as: CN112069633A

Abstract

A power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering belongs to the field of power distribution network reliability prediction. For the power distribution network data after normalization processing, a clustering number selection mechanism with a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted to obtain the optimal clustering number of the sample; after clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting an abnormal value identification standard of an upper critical graph and a lower critical graph; if the distance between the sample and the clustering center is larger than the diagnosis threshold value, judging the sample as an outlier sample and removing the outlier sample; further obtaining sample data of 'noise removal'; and predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing. The method overcomes the defect that the moment bending method and the concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm; the method has the advantages of good noise removing effect, high sorting accuracy and high effectiveness.

Description

Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering

Technical Field

The invention belongs to the field of reliability prediction of a power distribution network, and particularly relates to a power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering.

Background

In recent years, national grid companies increase the automation transformation force of distribution networks, deepen the popularization and application of distribution automation systems, can realize remote signaling, remote measurement and remote control of distribution network main lines and partial branch line switches, and according to the line accident signal and the protection signal of the automatic switch, the fault section is automatically judged, a prompt signal is sent to a regulation and control person, or fault isolation and power restoration are automatically completed, so that the power supply reliability and the transmission quality are improved.

The electric power system is used as a unified whole of production, transmission, distribution and consumption of electric energy, and any electric power system fault can affect users. According to statistics, more than 80% of the power failure caused by the fault of the power distribution network in the user fault is caused by the fault of the power distribution network.

The method has the advantages of reducing power distribution network faults, improving the reliability of the power distribution network, and playing an important role in guaranteeing the power utilization quality and power utilization experience of users and guaranteeing the social and economic healthy development of power companies.

Due to the power distribution network fault, a regulation and control person cannot timely find and handle the fault condition of the non-automatic line, and cannot screen wrong remote signaling and remote measuring information and correctly handle the information.

However, with the massive access of distributed power generation and the continuous improvement of the requirement of users on the reliability of power supply, the control capability of the existing technical support means on the power distribution network, especially the branch line, is more and more difficult to meet the requirement of regulation and control operation.

Firstly, in the current power dispatching system, only a small amount of power operation data can be manually exported and processed by manpower, the data utilization efficiency is low, and the intensive application, collection, transmission, analysis and processing by a big data technology and a cloud computing technology are urgently needed to effectively serve the existing regulation and control business for the distribution network regulation and control decision. If the automatic scheduling system can be improved, the application function of the automatic scheduling system is expanded, effective information contained in mass data is deeply excavated, fault monitoring and data accuracy check of the power distribution network are realized, the power supply quality and the power supply reliability are improved by a low cost and a simple and convenient means, and the automatic scheduling system is an effective and feasible scheme for power supply enterprises.

Secondly, in the power industry, the application of big data analysis technology to fault monitoring is in a starting stage, and a wide and general technical mode is not formed yet. The mass data continuously accumulated by the power distribution network information system creates conditions for researching more advanced predictive power distribution network reliability improvement technology. Meanwhile, as the relationship between the power distribution network fault and the influence factor thereof is discovered, the concept that the power distribution network fault cannot be predicted in the past is changed.

However, a common feature of big data is to include a set of noise or outliers. Because the database is large and, most likely, comes from multiple heterogeneous data sources, the power distribution network database is highly susceptible to noise, missing data, and inconsistent data, the presence of such anomalous bad data will result in poor quality mining results.

Data scrubbing can be used to remove noise from data and correct incomplete and inconsistent bad data. Data anomalies are detected, data is adjusted as early as possible and the data to be analyzed is reduced, and high return is obtained in the decision making process. The adverse effect of outlier samples on the prediction model is avoided.

At present, the establishment of the clustering number in most clustering algorithms is an important and difficult problem, and the particle swarm clustering algorithm is no exception. The prior art has great defects in establishing the cluster number according to the prior knowledge, and the effectiveness of the established cluster number is directly influenced by errors or missing of the prior knowledge.

In view of this, it is a problem to be solved urgently in actual work to establish a power distribution network data preprocessing method which can accurately classify and is based on the particle swarm principle.

Disclosure of Invention

The invention aims to solve the technical problem of providing a power distribution network data preprocessing method based on a particle swarm principle by adopting big data clustering. Aiming at the defects that the conclusion of a bending moment method and a concave-convex coefficient method is inconsistent when the current optimal N value of the cluster is selected, the particle swarm algorithm is combined with a bending moment method and a concave-convex coefficient algorithm, a cluster number selection mechanism with the bending moment method as the main part and the concave-convex coefficient method as the auxiliary part is adopted, and optimized power distribution network data with outliers removed is output at the same time. The method overcomes the defect that a bending moment method and a concave-convex coefficient algorithm are easy to fall into local extreme values, keeps global optimization of the particle swarm algorithm, has high convergence speed of the bending moment method and the concave-convex coefficient algorithm, outputs optimized power distribution network data for eliminating outliers, can overcome the defects of the prior art, and has the advantages of good denoising effect, high sorting accuracy and effectiveness and the like.

The technical scheme of the invention is as follows: the method for preprocessing the power distribution network data based on the particle swarm principle by adopting big data clustering is characterized by comprising the following steps of:

1) Analyzing the size and frequency of faults of the distribution network in the past year, finding out the type of data mining, and acquiring data according to data sources of a distribution line online monitoring system, an intelligent public distribution transformer monitoring system and the like;

2) Adopting a characteristic structure for the data source, carrying out normalization processing, and combining the data from a plurality of sources into a consistent database for storage;

3) Performing the clustering number analysis of a bending moment method and a concave-convex coefficient method on the current particle population after the previous step to obtain the optimal clustering number of the sample;

4) Dividing the data samples into a plurality of categories according to the number of clusters, calculating the distance between the data samples and a cluster center by adopting a particle swarm algorithm, and optimizing the cluster center;

5) After clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting abnormal value identification standards of an upper critical graph and a lower critical graph, if the distance between the samples and a clustering center is greater than the diagnosis threshold value, the samples are judged to be outlier samples, and the outlier samples are removed; further obtaining the sample data of 'noise removal',

6) And predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing.

According to the power distribution network data preprocessing method, the power distribution network data after normalization processing is subjected to bending moment method and concave-convex coefficient method clustering number analysis, and the optimal clustering number of the sample is obtained by adopting a clustering number selection mechanism with the bending moment method as the main part and the concave-convex coefficient method as the auxiliary part.

Specifically, the bending moment method comprises the following steps:

enabling the clustering number N to be valued from 1 until the clustering upper limit suitable for the power distribution network is obtained;

clustering each N value and recording clustering errors of all corresponding samples, namely the advantages and disadvantages of clustering effects;

then drawing a relation graph of the clustering errors of the N and all samples;

finally, selecting the N value corresponding to the bending edge angle as the optimal clustering number;

the bending moment coefficient algorithm is as follows:

wherein N is the number of clusters, O _b For sample objects in the i cluster, C _i The cluster center of the current i cluster.

Specifically, the irregularity coefficient method includes:

according to the distances between the samples in the cluster, the concave-convex coefficients of all the samples are calculated, and then the average value is calculated to obtain the average concave-convex coefficient;

the value range of the average concave-convex coefficient is [ -1,1], the farther the sample distance between clusters is, the larger the average concave-convex coefficient is, and the M with the largest average concave-convex coefficient is taken as the optimal clustering number;

a certain sample point D _i The concave-convex coefficient algorithm is as follows:

wherein, let k _i Is the average distance of point i to all other points in its cluster, l _i The minimum distance from point i to all points in any cluster where it is not present;

the definition of the nearest cluster is:

wherein S is a cluster R _N Sample of (1), with D _i The average distance of all samples to a cluster is used as a measure of the distance of the point to the cluster, and the distance D is selected _i The closest one cluster is taken as the closest cluster.

Further, according to the optimal N value determined by the concave-convex coefficient method, if the conclusion of the BMC supports or is not contradictory to the concave-convex coefficient method, the optimal N value is directly determined by the concave-convex coefficient method; if the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.

Further, dividing the cluster number obtained by the data sample into N categories, and calculating the distance between the data sample and the cluster center; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.

Further, after cluster analysis of the samples, the diagnostic threshold is defined using the outlier recognition criteria of the upper and lower critical graphs, the outliers generally being defined as greater than D _L -1.5GAP or less than D _H A value of +1.5 GAP;

wherein D _L The lower quartile indicates that one fourth of all sample values has a smaller data value than the lower quartile;

D _H is called as onA quartile indicating that one fourth of all sample values has a greater data value than the quartile;

GAP is called the interquartile range, which is the upper quartile D _H And lower quartile D _L The difference between which half of the total observed value is contained.

And if the distance between the sample and the clustering center is greater than the diagnosis threshold value, diagnosing the sample as an outlier sample and removing the outlier sample.

The method for preprocessing the data of the power distribution network overcomes the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into local extreme values, keeps the global optimization of the particle swarm algorithm, and has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm.

Compared with the prior art, the invention has the advantages that:

1. by adopting the power distribution network data preprocessing method based on the particle swarm principle for big data clustering, the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into a local extreme value is overcome, the global optimization searching performance of the particle swarm algorithm is kept, and meanwhile, the method has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm;

2. a clustering number selection mechanism taking a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted, optimized power distribution network data with outliers removed is output simultaneously, and the defects that the current bending moment method and the concave-convex coefficient method are inconsistent in conclusion when the optimal N value of clustering is selected can be avoided;

3. the technical scheme of the invention has the advantages of good noise removing effect, high sorting accuracy and high effectiveness.

Drawings

FIG. 1 is a schematic diagram of the sum of squared errors versus the number of clusters N;

FIG. 2 is a schematic diagram showing the relationship between the concave-convex coefficient and the number of clusters N;

FIG. 3 is a three-dimensional distribution diagram of distribution transformer rated capacity, monthly average load and monthly maximum load samples;

FIG. 4 is a graph of outlier identification for a power distribution network data outlier sample box scenario;

fig. 5 is a flow chart of a power distribution network data preprocessing method according to the present invention.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

In fig. 5, a technical solution of the present invention provides a power distribution network data preprocessing method based on a particle swarm principle by using big data clustering, which includes the following steps:

3) Performing bending moment method and concave-convex coefficient method clustering number analysis on the current particle population after the step 2) to obtain the optimal clustering number of the sample;

4) Dividing the data samples into a plurality of categories according to the step 3), calculating the distance between the data samples and a clustering center by adopting a particle swarm algorithm, and optimizing the clustering center;

5) After clustering analysis is carried out on a sample, a diagnosis threshold value is defined by adopting an abnormal value identification standard of an upper critical graph and a lower critical graph, if the distance between the sample and a clustering center is greater than the diagnosis threshold value, the sample is judged to be an outlier sample, and the outlier sample is removed; therefore, sample data of 'noise removal' can be obtained, and prediction of potential rules of the power distribution network fault is facilitated;

According to the technical scheme, the normalized power distribution network data is subjected to bending moment method and concave-convex coefficient method clustering number analysis, and a clustering number selection mechanism with the bending moment method as the main method and the concave-convex coefficient method as the auxiliary method is adopted to obtain the optimal clustering number of the sample.

The moment bending method comprises the following specific steps:

taking the value of the clustering number N from 1 until the clustering upper limit suitable for the power distribution network is obtained, such as 12; clustering each N value, recording the clustering errors of all corresponding samples, namely the advantages and disadvantages of the clustering effect, drawing a relation graph of the N and the clustering errors of all samples, and finally selecting the N value corresponding to the bent corner angle as the optimal clustering number.

The bending moment coefficient algorithm is as follows:

The concave-convex coefficient method comprises the following specific steps:

and (4) solving the concave-convex coefficients of all samples according to the distances of the samples in the cluster, and then averaging to obtain the average concave-convex coefficient. The value range of the average concave-convex coefficient is [ -1,1], and the farther the sample distance between clusters is, the larger the average concave-convex coefficient is, and the M with the largest average concave-convex coefficient is taken as the optimal clustering number.

wherein, let k _i Is the average distance of point i to all other points in its cluster, l _i Is the minimum distance of point i to all points in any cluster it does not.

The definition of the nearest cluster is

Wherein S is a cluster R _N Sample of (1), with D _i The average distance of all samples to a cluster is used as a measure of the distance of the point to the cluster, and the distance D is selected _i The closest cluster is taken as the closest cluster.

And according to the optimal N value determined by the concave-convex coefficient method, if the conclusion of the BMC supports or does not contradict the concave-convex coefficient method, directly determining the optimal N value by the concave-convex coefficient method. If the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.

Dividing the data samples into N categories according to the clustering numbers obtained by the bending moment method, the concave-convex coefficient method and the optimal N value, and calculating the distance between the data samples and the clustering center; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.

After clustering the samples, the diagnostic threshold is defined using the outlier identification criteria of the upper and lower critical plots, outliers are generally defined as being greater than D _L -1.5GAP or is less than D _H +1.5GAP value.

Wherein D _L The lower quartile indicates that one fourth of all sample values has a smaller data value than the lower quartile; d _H The upper quartile indicates that one fourth of all sample values have a larger data value than the upper quartile; GAP is called the interquartile range, which is the upper quartile D _H With the lower part quartile D _L The difference between which half of the total observed value is contained. And if the distance between the sample and the clustering center is greater than the diagnosis threshold value, diagnosing the sample as an outlier sample and rejecting the outlier sample.

Example (b):

according to the method, random data acquisition is carried out according to data sources of a distribution line on-line monitoring system, an intelligent public distribution transformer monitoring system and the like, 962 samples are total, the original data needs to be processed necessarily and distributed in the same interval [0,1], and are classified in the same magnitude, namely normalized processing is carried out, and the data are merged into a consistent database by a plurality of sources to be stored.

Firstly, the average value of k-dimension data of n samples is calculated

And the standard deviation C of the measured value,

this gives the normalized value x of the raw data _ik ′：

The curve calculated by bending moment method using Matlab programming is shown in fig. 1.

The curve calculated by the bump factor method using Matlab programming is shown in fig. 2.

It can be seen that the maximum value of N for the irregularity coefficient is 3, which means that the optimum clustering number is 3. However, it is worth noting that, as can be seen from the relationship between N and the bending moment coefficient, when N is 3, although it is still very large, here the curve has already been significantly bent, i.e. it is not contradictory to the abbe number method, which is a reasonable clustering number.

Three-dimensional characteristics of each distribution network data are extracted, namely distribution transformer rated capacity, monthly average load and monthly maximum load, and 950 sample values and parameters of three types of distribution network data which can be extracted through clustering corresponding to a three-dimensional space in an algorithm are shown in the following table 1.

TABLE 1

Fig. 3 is a three-dimensional distribution diagram after data of the distribution network is mixed, and clusters where three types of different data are located can be clearly distinguished, wherein a small number of samples deviate from a cluster center.

And calculating an outlier sample rejection threshold of the sample data to be 3.6288 by using the abnormal value identification standard of the upper and lower critical graphs. And (3) taking the distance between each sample obtained by the particle swarm clustering algorithm and the clustering center point as the basis for outlier sample elimination, and if the distance is greater than an elimination threshold value, diagnosing the outlier sample as shown in the figure 4.

12 outlier samples can be removed from the upper graph, and the rest samples are the sample data of the noise elimination. From the sorting accuracy = (total number of accurately sorted samples/total number of samples) × 100%, it can be seen that the total sorting accuracy is 98.75%.

The establishment of the clustering number is an important and difficult problem in most clustering algorithms, and the particle swarm clustering algorithm is no exception. The existing method for determining the clustering number according to the priori knowledge has great defects, if the best clustering number of samples is obtained without the clustering number analysis of the bending moment method and the concave-convex coefficient method, the clustering number is determined to be 2 and 4 according to the priori knowledge, then the removed outlier samples are respectively 34 and 47, and the sorting accuracy is 96.47 percent and 95.11 percent. Is obviously lower than the sorting accuracy of the invention. Obviously, compared with the prior art, the method has the advantages of good denoising effect, high accuracy rate, high effectiveness and the like.

According to the technical scheme, the normalized power distribution network data is subjected to the clustering number analysis by the bending moment method and the concave-convex coefficient method, a clustering number selection mechanism taking the bending moment method as a main part and the concave-convex coefficient method as an auxiliary part is formed, the optimal clustering number of samples is obtained, the noise-removed sample data can be obtained, and the prediction of the potential rules of the power distribution network faults is facilitated. The method overcomes the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and simultaneously has faster convergence speed of the moment bending method and the concave-convex coefficient algorithm; the method has the advantages of good denoising effect, high accuracy and high effectiveness.

The method can be widely applied to the field of reliability prediction of the power distribution network.

Claims

1. A power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering is characterized by comprising the following steps of:

5) After clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting abnormal value identification standards of an upper critical graph and a lower critical graph, if the distance between the samples and a clustering center is greater than the diagnosis threshold value, the samples are judged to be outlier samples, and the outlier samples are removed; so as to obtain the sample data of 'noise removal',

2. The method for preprocessing power distribution network data based on the particle swarm principle by using big data clustering as claimed in claim 1, wherein the method for preprocessing power distribution network data is characterized in that the normalized power distribution network data is subjected to a bending moment method and concave-convex coefficient method clustering number analysis, and a clustering number selection mechanism with a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted to obtain the optimal clustering number of the sample.

3. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the bending moment method comprises the following steps:

the bending moment coefficient algorithm is as follows:

wherein N is a cluster number, O _b For sample objects in the i cluster, C _i The cluster center of the current i cluster.

4. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering according to claim 1, wherein the concave-convex coefficient method comprises the following steps:

the definition of the nearest cluster is:

5. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the optimal N value is determined according to a concave-convex coefficient method, and if the conclusion of BMC supports or is not contradictory to the concave-convex coefficient method, the optimal N value is directly determined by the concave-convex coefficient method;

if the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.

6. The power distribution network data preprocessing method based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the power distribution network data preprocessing method is characterized in that the clustering number obtained by data samples is divided into N categories, and the distance between the data samples and a clustering center is calculated; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.

7. The method as claimed in claim 1, wherein the method for preprocessing the power distribution network data based on the particle swarm principle by using big data clustering is characterized in that after the clustering analysis is performed on the samples, the method for preprocessing the power distribution network data uses the abnormal value recognition standards of the upper and lower critical graphs to define the diagnosis threshold, and the abnormal value is generally defined to be larger than D _L -1.5GAP or less than D _H A value of +1.5 GAP;

D _H the number is called the upper quartile, and the data value of one fourth of all the sample values is larger than the upper quartile;

8. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting the big data clustering as claimed in claim 7, wherein if the distance between the sample and the clustering center is greater than a diagnosis threshold value, the sample is diagnosed as an outlier sample and is removed.

9. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting the big data clustering as claimed in claim 1, which is characterized in that the method for preprocessing the data of the power distribution network overcomes the defect that a bending moment method and a concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and has higher convergence speed of the bending moment method and the concave-convex coefficient algorithm.