CN112069633B - Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering - Google Patents

Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering Download PDF

Info

Publication number
CN112069633B
CN112069633B CN202010798057.6A CN202010798057A CN112069633B CN 112069633 B CN112069633 B CN 112069633B CN 202010798057 A CN202010798057 A CN 202010798057A CN 112069633 B CN112069633 B CN 112069633B
Authority
CN
China
Prior art keywords
clustering
data
distribution network
power distribution
concave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010798057.6A
Other languages
Chinese (zh)
Other versions
CN112069633A (en
Inventor
吴峥嵘
石江华
周蓝波
宋祎波
李俊颖
忻葆宏
张萌亮
宗卫国
顾珏
曹轶毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shanghai Electric Power Co Ltd
Original Assignee
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shanghai Electric Power Co Ltd filed Critical State Grid Shanghai Electric Power Co Ltd
Priority to CN202010798057.6A priority Critical patent/CN112069633B/en
Publication of CN112069633A publication Critical patent/CN112069633A/en
Application granted granted Critical
Publication of CN112069633B publication Critical patent/CN112069633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/18Network design, e.g. design based on topological or interconnect aspects of utility systems, piping, heating ventilation air conditioning [HVAC] or cabling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/25Design optimisation, verification or simulation using particle-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

A power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering belongs to the field of power distribution network reliability prediction. For the power distribution network data after normalization processing, a clustering number selection mechanism with a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted to obtain the optimal clustering number of the sample; after clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting an abnormal value identification standard of an upper critical graph and a lower critical graph; if the distance between the sample and the clustering center is larger than the diagnosis threshold value, judging the sample as an outlier sample and removing the outlier sample; further obtaining sample data of 'noise removal'; and predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing. The method overcomes the defect that the moment bending method and the concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm; the method has the advantages of good noise removing effect, high sorting accuracy and high effectiveness.

Description

Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering
Technical Field
The invention belongs to the field of reliability prediction of a power distribution network, and particularly relates to a power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering.
Background
In recent years, national grid companies increase the automation transformation force of distribution networks, deepen the popularization and application of distribution automation systems, can realize remote signaling, remote measurement and remote control of distribution network main lines and partial branch line switches, and according to the line accident signal and the protection signal of the automatic switch, the fault section is automatically judged, a prompt signal is sent to a regulation and control person, or fault isolation and power restoration are automatically completed, so that the power supply reliability and the transmission quality are improved.
The electric power system is used as a unified whole of production, transmission, distribution and consumption of electric energy, and any electric power system fault can affect users. According to statistics, more than 80% of the power failure caused by the fault of the power distribution network in the user fault is caused by the fault of the power distribution network.
The method has the advantages of reducing power distribution network faults, improving the reliability of the power distribution network, and playing an important role in guaranteeing the power utilization quality and power utilization experience of users and guaranteeing the social and economic healthy development of power companies.
Due to the power distribution network fault, a regulation and control person cannot timely find and handle the fault condition of the non-automatic line, and cannot screen wrong remote signaling and remote measuring information and correctly handle the information.
However, with the massive access of distributed power generation and the continuous improvement of the requirement of users on the reliability of power supply, the control capability of the existing technical support means on the power distribution network, especially the branch line, is more and more difficult to meet the requirement of regulation and control operation.
Firstly, in the current power dispatching system, only a small amount of power operation data can be manually exported and processed by manpower, the data utilization efficiency is low, and the intensive application, collection, transmission, analysis and processing by a big data technology and a cloud computing technology are urgently needed to effectively serve the existing regulation and control business for the distribution network regulation and control decision. If the automatic scheduling system can be improved, the application function of the automatic scheduling system is expanded, effective information contained in mass data is deeply excavated, fault monitoring and data accuracy check of the power distribution network are realized, the power supply quality and the power supply reliability are improved by a low cost and a simple and convenient means, and the automatic scheduling system is an effective and feasible scheme for power supply enterprises.
Secondly, in the power industry, the application of big data analysis technology to fault monitoring is in a starting stage, and a wide and general technical mode is not formed yet. The mass data continuously accumulated by the power distribution network information system creates conditions for researching more advanced predictive power distribution network reliability improvement technology. Meanwhile, as the relationship between the power distribution network fault and the influence factor thereof is discovered, the concept that the power distribution network fault cannot be predicted in the past is changed.
However, a common feature of big data is to include a set of noise or outliers. Because the database is large and, most likely, comes from multiple heterogeneous data sources, the power distribution network database is highly susceptible to noise, missing data, and inconsistent data, the presence of such anomalous bad data will result in poor quality mining results.
Data scrubbing can be used to remove noise from data and correct incomplete and inconsistent bad data. Data anomalies are detected, data is adjusted as early as possible and the data to be analyzed is reduced, and high return is obtained in the decision making process. The adverse effect of outlier samples on the prediction model is avoided.
At present, the establishment of the clustering number in most clustering algorithms is an important and difficult problem, and the particle swarm clustering algorithm is no exception. The prior art has great defects in establishing the cluster number according to the prior knowledge, and the effectiveness of the established cluster number is directly influenced by errors or missing of the prior knowledge.
In view of this, it is a problem to be solved urgently in actual work to establish a power distribution network data preprocessing method which can accurately classify and is based on the particle swarm principle.
Disclosure of Invention
The invention aims to solve the technical problem of providing a power distribution network data preprocessing method based on a particle swarm principle by adopting big data clustering. Aiming at the defects that the conclusion of a bending moment method and a concave-convex coefficient method is inconsistent when the current optimal N value of the cluster is selected, the particle swarm algorithm is combined with a bending moment method and a concave-convex coefficient algorithm, a cluster number selection mechanism with the bending moment method as the main part and the concave-convex coefficient method as the auxiliary part is adopted, and optimized power distribution network data with outliers removed is output at the same time. The method overcomes the defect that a bending moment method and a concave-convex coefficient algorithm are easy to fall into local extreme values, keeps global optimization of the particle swarm algorithm, has high convergence speed of the bending moment method and the concave-convex coefficient algorithm, outputs optimized power distribution network data for eliminating outliers, can overcome the defects of the prior art, and has the advantages of good denoising effect, high sorting accuracy and effectiveness and the like.
The technical scheme of the invention is as follows: the method for preprocessing the power distribution network data based on the particle swarm principle by adopting big data clustering is characterized by comprising the following steps of:
1) Analyzing the size and frequency of faults of the distribution network in the past year, finding out the type of data mining, and acquiring data according to data sources of a distribution line online monitoring system, an intelligent public distribution transformer monitoring system and the like;
2) Adopting a characteristic structure for the data source, carrying out normalization processing, and combining the data from a plurality of sources into a consistent database for storage;
3) Performing the clustering number analysis of a bending moment method and a concave-convex coefficient method on the current particle population after the previous step to obtain the optimal clustering number of the sample;
4) Dividing the data samples into a plurality of categories according to the number of clusters, calculating the distance between the data samples and a cluster center by adopting a particle swarm algorithm, and optimizing the cluster center;
5) After clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting abnormal value identification standards of an upper critical graph and a lower critical graph, if the distance between the samples and a clustering center is greater than the diagnosis threshold value, the samples are judged to be outlier samples, and the outlier samples are removed; further obtaining the sample data of 'noise removal',
6) And predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing.
According to the power distribution network data preprocessing method, the power distribution network data after normalization processing is subjected to bending moment method and concave-convex coefficient method clustering number analysis, and the optimal clustering number of the sample is obtained by adopting a clustering number selection mechanism with the bending moment method as the main part and the concave-convex coefficient method as the auxiliary part.
Specifically, the bending moment method comprises the following steps:
enabling the clustering number N to be valued from 1 until the clustering upper limit suitable for the power distribution network is obtained;
clustering each N value and recording clustering errors of all corresponding samples, namely the advantages and disadvantages of clustering effects;
then drawing a relation graph of the clustering errors of the N and all samples;
finally, selecting the N value corresponding to the bending edge angle as the optimal clustering number;
the bending moment coefficient algorithm is as follows:
Figure BDA0002626384840000031
wherein N is the number of clusters, O b For sample objects in the i cluster, C i The cluster center of the current i cluster.
Specifically, the irregularity coefficient method includes:
according to the distances between the samples in the cluster, the concave-convex coefficients of all the samples are calculated, and then the average value is calculated to obtain the average concave-convex coefficient;
the value range of the average concave-convex coefficient is [ -1,1], the farther the sample distance between clusters is, the larger the average concave-convex coefficient is, and the M with the largest average concave-convex coefficient is taken as the optimal clustering number;
a certain sample point D i The concave-convex coefficient algorithm is as follows:
Figure BDA0002626384840000032
wherein, let k i Is the average distance of point i to all other points in its cluster, l i The minimum distance from point i to all points in any cluster where it is not present;
the definition of the nearest cluster is:
Figure BDA0002626384840000041
wherein S is a cluster R N Sample of (1), with D i The average distance of all samples to a cluster is used as a measure of the distance of the point to the cluster, and the distance D is selected i The closest one cluster is taken as the closest cluster.
Further, according to the optimal N value determined by the concave-convex coefficient method, if the conclusion of the BMC supports or is not contradictory to the concave-convex coefficient method, the optimal N value is directly determined by the concave-convex coefficient method; if the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.
Further, dividing the cluster number obtained by the data sample into N categories, and calculating the distance between the data sample and the cluster center; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.
Further, after cluster analysis of the samples, the diagnostic threshold is defined using the outlier recognition criteria of the upper and lower critical graphs, the outliers generally being defined as greater than D L -1.5GAP or less than D H A value of +1.5 GAP;
wherein D L The lower quartile indicates that one fourth of all sample values has a smaller data value than the lower quartile;
D H is called as onA quartile indicating that one fourth of all sample values has a greater data value than the quartile;
GAP is called the interquartile range, which is the upper quartile D H And lower quartile D L The difference between which half of the total observed value is contained.
And if the distance between the sample and the clustering center is greater than the diagnosis threshold value, diagnosing the sample as an outlier sample and removing the outlier sample.
The method for preprocessing the data of the power distribution network overcomes the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into local extreme values, keeps the global optimization of the particle swarm algorithm, and has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm.
Compared with the prior art, the invention has the advantages that:
1. by adopting the power distribution network data preprocessing method based on the particle swarm principle for big data clustering, the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into a local extreme value is overcome, the global optimization searching performance of the particle swarm algorithm is kept, and meanwhile, the method has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm;
2. a clustering number selection mechanism taking a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted, optimized power distribution network data with outliers removed is output simultaneously, and the defects that the current bending moment method and the concave-convex coefficient method are inconsistent in conclusion when the optimal N value of clustering is selected can be avoided;
3. the technical scheme of the invention has the advantages of good noise removing effect, high sorting accuracy and high effectiveness.
Drawings
FIG. 1 is a schematic diagram of the sum of squared errors versus the number of clusters N;
FIG. 2 is a schematic diagram showing the relationship between the concave-convex coefficient and the number of clusters N;
FIG. 3 is a three-dimensional distribution diagram of distribution transformer rated capacity, monthly average load and monthly maximum load samples;
FIG. 4 is a graph of outlier identification for a power distribution network data outlier sample box scenario;
fig. 5 is a flow chart of a power distribution network data preprocessing method according to the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
In fig. 5, a technical solution of the present invention provides a power distribution network data preprocessing method based on a particle swarm principle by using big data clustering, which includes the following steps:
1) Analyzing the size and frequency of faults of the distribution network in the past year, finding out the type of data mining, and acquiring data according to data sources of a distribution line online monitoring system, an intelligent public distribution transformer monitoring system and the like;
2) Adopting a characteristic structure for the data source, carrying out normalization processing, and combining the data from a plurality of sources into a consistent database for storage;
3) Performing bending moment method and concave-convex coefficient method clustering number analysis on the current particle population after the step 2) to obtain the optimal clustering number of the sample;
4) Dividing the data samples into a plurality of categories according to the step 3), calculating the distance between the data samples and a clustering center by adopting a particle swarm algorithm, and optimizing the clustering center;
5) After clustering analysis is carried out on a sample, a diagnosis threshold value is defined by adopting an abnormal value identification standard of an upper critical graph and a lower critical graph, if the distance between the sample and a clustering center is greater than the diagnosis threshold value, the sample is judged to be an outlier sample, and the outlier sample is removed; therefore, sample data of 'noise removal' can be obtained, and prediction of potential rules of the power distribution network fault is facilitated;
6) And predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing.
According to the technical scheme, the normalized power distribution network data is subjected to bending moment method and concave-convex coefficient method clustering number analysis, and a clustering number selection mechanism with the bending moment method as the main method and the concave-convex coefficient method as the auxiliary method is adopted to obtain the optimal clustering number of the sample.
The moment bending method comprises the following specific steps:
taking the value of the clustering number N from 1 until the clustering upper limit suitable for the power distribution network is obtained, such as 12; clustering each N value, recording the clustering errors of all corresponding samples, namely the advantages and disadvantages of the clustering effect, drawing a relation graph of the N and the clustering errors of all samples, and finally selecting the N value corresponding to the bent corner angle as the optimal clustering number.
The bending moment coefficient algorithm is as follows:
Figure BDA0002626384840000061
wherein N is the number of clusters, O b For sample objects in the i cluster, C i The cluster center of the current i cluster.
The concave-convex coefficient method comprises the following specific steps:
and (4) solving the concave-convex coefficients of all samples according to the distances of the samples in the cluster, and then averaging to obtain the average concave-convex coefficient. The value range of the average concave-convex coefficient is [ -1,1], and the farther the sample distance between clusters is, the larger the average concave-convex coefficient is, and the M with the largest average concave-convex coefficient is taken as the optimal clustering number.
A certain sample point D i The concave-convex coefficient algorithm is as follows:
Figure BDA0002626384840000062
wherein, let k i Is the average distance of point i to all other points in its cluster, l i Is the minimum distance of point i to all points in any cluster it does not.
The definition of the nearest cluster is
Figure BDA0002626384840000063
Wherein S is a cluster R N Sample of (1), with D i The average distance of all samples to a cluster is used as a measure of the distance of the point to the cluster, and the distance D is selected i The closest cluster is taken as the closest cluster.
And according to the optimal N value determined by the concave-convex coefficient method, if the conclusion of the BMC supports or does not contradict the concave-convex coefficient method, directly determining the optimal N value by the concave-convex coefficient method. If the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.
Dividing the data samples into N categories according to the clustering numbers obtained by the bending moment method, the concave-convex coefficient method and the optimal N value, and calculating the distance between the data samples and the clustering center; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.
After clustering the samples, the diagnostic threshold is defined using the outlier identification criteria of the upper and lower critical plots, outliers are generally defined as being greater than D L -1.5GAP or is less than D H +1.5GAP value.
Wherein D L The lower quartile indicates that one fourth of all sample values has a smaller data value than the lower quartile; d H The upper quartile indicates that one fourth of all sample values have a larger data value than the upper quartile; GAP is called the interquartile range, which is the upper quartile D H With the lower part quartile D L The difference between which half of the total observed value is contained. And if the distance between the sample and the clustering center is greater than the diagnosis threshold value, diagnosing the sample as an outlier sample and rejecting the outlier sample.
Example (b):
according to the method, random data acquisition is carried out according to data sources of a distribution line on-line monitoring system, an intelligent public distribution transformer monitoring system and the like, 962 samples are total, the original data needs to be processed necessarily and distributed in the same interval [0,1], and are classified in the same magnitude, namely normalized processing is carried out, and the data are merged into a consistent database by a plurality of sources to be stored.
Firstly, the average value of k-dimension data of n samples is calculated
Figure BDA0002626384840000071
And the standard deviation C of the measured value,
Figure BDA0002626384840000072
this gives the normalized value x of the raw data ik ′:
Figure BDA0002626384840000073
The curve calculated by bending moment method using Matlab programming is shown in fig. 1.
The curve calculated by the bump factor method using Matlab programming is shown in fig. 2.
It can be seen that the maximum value of N for the irregularity coefficient is 3, which means that the optimum clustering number is 3. However, it is worth noting that, as can be seen from the relationship between N and the bending moment coefficient, when N is 3, although it is still very large, here the curve has already been significantly bent, i.e. it is not contradictory to the abbe number method, which is a reasonable clustering number.
Three-dimensional characteristics of each distribution network data are extracted, namely distribution transformer rated capacity, monthly average load and monthly maximum load, and 950 sample values and parameters of three types of distribution network data which can be extracted through clustering corresponding to a three-dimensional space in an algorithm are shown in the following table 1.
TABLE 1
Figure BDA0002626384840000074
Fig. 3 is a three-dimensional distribution diagram after data of the distribution network is mixed, and clusters where three types of different data are located can be clearly distinguished, wherein a small number of samples deviate from a cluster center.
And calculating an outlier sample rejection threshold of the sample data to be 3.6288 by using the abnormal value identification standard of the upper and lower critical graphs. And (3) taking the distance between each sample obtained by the particle swarm clustering algorithm and the clustering center point as the basis for outlier sample elimination, and if the distance is greater than an elimination threshold value, diagnosing the outlier sample as shown in the figure 4.
12 outlier samples can be removed from the upper graph, and the rest samples are the sample data of the noise elimination. From the sorting accuracy = (total number of accurately sorted samples/total number of samples) × 100%, it can be seen that the total sorting accuracy is 98.75%.
The establishment of the clustering number is an important and difficult problem in most clustering algorithms, and the particle swarm clustering algorithm is no exception. The existing method for determining the clustering number according to the priori knowledge has great defects, if the best clustering number of samples is obtained without the clustering number analysis of the bending moment method and the concave-convex coefficient method, the clustering number is determined to be 2 and 4 according to the priori knowledge, then the removed outlier samples are respectively 34 and 47, and the sorting accuracy is 96.47 percent and 95.11 percent. Is obviously lower than the sorting accuracy of the invention. Obviously, compared with the prior art, the method has the advantages of good denoising effect, high accuracy rate, high effectiveness and the like.
According to the technical scheme, the normalized power distribution network data is subjected to the clustering number analysis by the bending moment method and the concave-convex coefficient method, a clustering number selection mechanism taking the bending moment method as a main part and the concave-convex coefficient method as an auxiliary part is formed, the optimal clustering number of samples is obtained, the noise-removed sample data can be obtained, and the prediction of the potential rules of the power distribution network faults is facilitated. The method overcomes the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and simultaneously has faster convergence speed of the moment bending method and the concave-convex coefficient algorithm; the method has the advantages of good denoising effect, high accuracy and high effectiveness.
The method can be widely applied to the field of reliability prediction of the power distribution network.

Claims (9)

1. A power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering is characterized by comprising the following steps of:
1) Analyzing the size and frequency of faults of the distribution network in the past year, finding out the type of data mining, and acquiring data according to data sources of a distribution line online monitoring system, an intelligent public distribution transformer monitoring system and the like;
2) Adopting a characteristic structure for the data source, carrying out normalization processing, and combining the data from a plurality of sources into a consistent database for storage;
3) Performing the clustering number analysis of a bending moment method and a concave-convex coefficient method on the current particle population after the previous step to obtain the optimal clustering number of the sample;
4) Dividing the data samples into a plurality of categories according to the number of clusters, calculating the distance between the data samples and a cluster center by adopting a particle swarm algorithm, and optimizing the cluster center;
5) After clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting abnormal value identification standards of an upper critical graph and a lower critical graph, if the distance between the samples and a clustering center is greater than the diagnosis threshold value, the samples are judged to be outlier samples, and the outlier samples are removed; so as to obtain the sample data of 'noise removal',
6) And predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing.
2. The method for preprocessing power distribution network data based on the particle swarm principle by using big data clustering as claimed in claim 1, wherein the method for preprocessing power distribution network data is characterized in that the normalized power distribution network data is subjected to a bending moment method and concave-convex coefficient method clustering number analysis, and a clustering number selection mechanism with a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted to obtain the optimal clustering number of the sample.
3. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the bending moment method comprises the following steps:
enabling the clustering number N to be valued from 1 until the clustering upper limit suitable for the power distribution network is obtained;
clustering each N value and recording clustering errors of all corresponding samples, namely the advantages and disadvantages of clustering effects;
then drawing a relation graph of the clustering errors of the N and all samples;
finally, selecting the N value corresponding to the bending edge angle as the optimal clustering number;
the bending moment coefficient algorithm is as follows:
Figure FDA0002626384830000011
wherein N is a cluster number, O b For sample objects in the i cluster, C i The cluster center of the current i cluster.
4. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering according to claim 1, wherein the concave-convex coefficient method comprises the following steps:
according to the distances between the samples in the cluster, the concave-convex coefficients of all the samples are calculated, and then the average value is calculated to obtain the average concave-convex coefficient;
the value range of the average concave-convex coefficient is [ -1,1], the farther the sample distance between clusters is, the larger the average concave-convex coefficient is, and the M with the largest average concave-convex coefficient is taken as the optimal clustering number;
a certain sample point D i The concave-convex coefficient algorithm is as follows:
Figure FDA0002626384830000021
wherein, let k i Is the average distance of point i to all other points in its cluster, l i The minimum distance from point i to all points in any cluster where it is not present;
the definition of the nearest cluster is:
Figure FDA0002626384830000022
wherein S is a cluster R N Sample of (1), with D i The average distance of all samples to a cluster is used as a measure of the distance of the point to the cluster, and the distance D is selected i The closest cluster is taken as the closest cluster.
5. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the optimal N value is determined according to a concave-convex coefficient method, and if the conclusion of BMC supports or is not contradictory to the concave-convex coefficient method, the optimal N value is directly determined by the concave-convex coefficient method;
if the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.
6. The power distribution network data preprocessing method based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the power distribution network data preprocessing method is characterized in that the clustering number obtained by data samples is divided into N categories, and the distance between the data samples and a clustering center is calculated; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.
7. The method as claimed in claim 1, wherein the method for preprocessing the power distribution network data based on the particle swarm principle by using big data clustering is characterized in that after the clustering analysis is performed on the samples, the method for preprocessing the power distribution network data uses the abnormal value recognition standards of the upper and lower critical graphs to define the diagnosis threshold, and the abnormal value is generally defined to be larger than D L -1.5GAP or less than D H A value of +1.5 GAP;
wherein D L The lower quartile indicates that one fourth of all sample values has a smaller data value than the lower quartile;
D H the number is called the upper quartile, and the data value of one fourth of all the sample values is larger than the upper quartile;
GAP is called the interquartile range, which is the upper quartile D H And lower quartile D L The difference between which half of the total observed value is contained.
8. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting the big data clustering as claimed in claim 7, wherein if the distance between the sample and the clustering center is greater than a diagnosis threshold value, the sample is diagnosed as an outlier sample and is removed.
9. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting the big data clustering as claimed in claim 1, which is characterized in that the method for preprocessing the data of the power distribution network overcomes the defect that a bending moment method and a concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and has higher convergence speed of the bending moment method and the concave-convex coefficient algorithm.
CN202010798057.6A 2020-08-10 2020-08-10 Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering Active CN112069633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010798057.6A CN112069633B (en) 2020-08-10 2020-08-10 Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010798057.6A CN112069633B (en) 2020-08-10 2020-08-10 Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering

Publications (2)

Publication Number Publication Date
CN112069633A CN112069633A (en) 2020-12-11
CN112069633B true CN112069633B (en) 2023-04-07

Family

ID=73661048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010798057.6A Active CN112069633B (en) 2020-08-10 2020-08-10 Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering

Country Status (1)

Country Link
CN (1) CN112069633B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592005B (en) * 2021-08-04 2024-03-08 中冶赛迪信息技术(重庆)有限公司 Converter tapping parameter recommendation method, system, medium and terminal

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104299035A (en) * 2014-09-29 2015-01-21 国家电网公司 Method for diagnosing fault of transformer on basis of clustering algorithm and neural network
CN109902953A (en) * 2019-02-27 2019-06-18 华北电力大学 A kind of classification of power customers method based on adaptive population cluster
CN110750524A (en) * 2019-09-12 2020-02-04 中国电力科学研究院有限公司 Method and system for determining fault characteristics of active power distribution network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104299035A (en) * 2014-09-29 2015-01-21 国家电网公司 Method for diagnosing fault of transformer on basis of clustering algorithm and neural network
CN109902953A (en) * 2019-02-27 2019-06-18 华北电力大学 A kind of classification of power customers method based on adaptive population cluster
CN110750524A (en) * 2019-09-12 2020-02-04 中国电力科学研究院有限公司 Method and system for determining fault characteristics of active power distribution network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
汽轮机故障诊断的粒子群优化加权模糊聚类法;陈平等;《振动.测试与诊断》;20111015(第05期);全文 *
电力系统粒子群优化模糊聚类算法及其应用;何晓峰等;《继电器》;20071116(第22期);全文 *

Also Published As

Publication number Publication date
CN112069633A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN110223196B (en) Anti-electricity-stealing analysis method based on typical industry feature library and anti-electricity-stealing sample library
CN105512799B (en) Power system transient stability evaluation method based on mass online historical data
CN111160791A (en) Abnormal user identification method based on GBDT algorithm and factor fusion
CN108304567B (en) Method and system for identifying working condition mode and classifying data of high-voltage transformer
CN112434962B (en) Enterprise user state evaluation method and system based on power load data
CN114389359A (en) Intelligent operation and maintenance method of centralized control type relay protection equipment based on cloud edge cooperation
CN113189418B (en) Topological relation identification method based on voltage data
CN111709554A (en) Method and system for joint prediction of net loads of power distribution network
CN109446243B (en) Method for detecting power generation abnormity of photovoltaic power station based on big data analysis
CN112668612A (en) Partial discharge signal clustering analysis method based on grids
CN110930057A (en) Quantitative evaluation method for reliability of distribution transformer test result based on LOF algorithm
CN112069633B (en) Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering
CN114118588A (en) Peak-facing summer power failure prediction method based on game feature extraction under clustering undersampling
CN113627735A (en) Early warning method and system for safety risk of engineering construction project
CN112417763A (en) Defect diagnosis method, device and equipment for power transmission line and storage medium
CN115526258A (en) Power system transient stability evaluation method based on Spearman correlation coefficient feature extraction
CN107862459B (en) Metering equipment state evaluation method and system based on big data
CN114417971A (en) Electric power data abnormal value detection algorithm based on K nearest neighbor density peak clustering
CN113689079A (en) Transformer area line loss prediction method and system based on multivariate linear regression and cluster analysis
CN113298148B (en) Ecological environment evaluation-oriented unbalanced data resampling method
CN115940134A (en) Distribution network data analysis processing method based on fault-tolerant data
CN114429240A (en) Method and device for monitoring running state of highway equipment
Liangzhi et al. Research on fault prediction and diagnosis of power equipment based on big data
CN113158552B (en) Bioreactor operation condition grading prediction method and system based on time sequence
CN114168662A (en) Power distribution network problem combing and analyzing method and system based on multiple data sources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant