CN112069633B - Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering - Google Patents
Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering Download PDFInfo
- Publication number
- CN112069633B CN112069633B CN202010798057.6A CN202010798057A CN112069633B CN 112069633 B CN112069633 B CN 112069633B CN 202010798057 A CN202010798057 A CN 202010798057A CN 112069633 B CN112069633 B CN 112069633B
- Authority
- CN
- China
- Prior art keywords
- clustering
- data
- distribution network
- power distribution
- concave
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/18—Network design, e.g. design based on topological or interconnect aspects of utility systems, piping, heating ventilation air conditioning [HVAC] or cabling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/25—Design optimisation, verification or simulation using particle-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/04—Power grid distribution networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/02—Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
A power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering belongs to the field of power distribution network reliability prediction. For the power distribution network data after normalization processing, a clustering number selection mechanism with a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted to obtain the optimal clustering number of the sample; after clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting an abnormal value identification standard of an upper critical graph and a lower critical graph; if the distance between the sample and the clustering center is larger than the diagnosis threshold value, judging the sample as an outlier sample and removing the outlier sample; further obtaining sample data of 'noise removal'; and predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing. The method overcomes the defect that the moment bending method and the concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm; the method has the advantages of good noise removing effect, high sorting accuracy and high effectiveness.
Description
Technical Field
The invention belongs to the field of reliability prediction of a power distribution network, and particularly relates to a power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering.
Background
In recent years, national grid companies increase the automation transformation force of distribution networks, deepen the popularization and application of distribution automation systems, can realize remote signaling, remote measurement and remote control of distribution network main lines and partial branch line switches, and according to the line accident signal and the protection signal of the automatic switch, the fault section is automatically judged, a prompt signal is sent to a regulation and control person, or fault isolation and power restoration are automatically completed, so that the power supply reliability and the transmission quality are improved.
The electric power system is used as a unified whole of production, transmission, distribution and consumption of electric energy, and any electric power system fault can affect users. According to statistics, more than 80% of the power failure caused by the fault of the power distribution network in the user fault is caused by the fault of the power distribution network.
The method has the advantages of reducing power distribution network faults, improving the reliability of the power distribution network, and playing an important role in guaranteeing the power utilization quality and power utilization experience of users and guaranteeing the social and economic healthy development of power companies.
Due to the power distribution network fault, a regulation and control person cannot timely find and handle the fault condition of the non-automatic line, and cannot screen wrong remote signaling and remote measuring information and correctly handle the information.
However, with the massive access of distributed power generation and the continuous improvement of the requirement of users on the reliability of power supply, the control capability of the existing technical support means on the power distribution network, especially the branch line, is more and more difficult to meet the requirement of regulation and control operation.
Firstly, in the current power dispatching system, only a small amount of power operation data can be manually exported and processed by manpower, the data utilization efficiency is low, and the intensive application, collection, transmission, analysis and processing by a big data technology and a cloud computing technology are urgently needed to effectively serve the existing regulation and control business for the distribution network regulation and control decision. If the automatic scheduling system can be improved, the application function of the automatic scheduling system is expanded, effective information contained in mass data is deeply excavated, fault monitoring and data accuracy check of the power distribution network are realized, the power supply quality and the power supply reliability are improved by a low cost and a simple and convenient means, and the automatic scheduling system is an effective and feasible scheme for power supply enterprises.
Secondly, in the power industry, the application of big data analysis technology to fault monitoring is in a starting stage, and a wide and general technical mode is not formed yet. The mass data continuously accumulated by the power distribution network information system creates conditions for researching more advanced predictive power distribution network reliability improvement technology. Meanwhile, as the relationship between the power distribution network fault and the influence factor thereof is discovered, the concept that the power distribution network fault cannot be predicted in the past is changed.
However, a common feature of big data is to include a set of noise or outliers. Because the database is large and, most likely, comes from multiple heterogeneous data sources, the power distribution network database is highly susceptible to noise, missing data, and inconsistent data, the presence of such anomalous bad data will result in poor quality mining results.
Data scrubbing can be used to remove noise from data and correct incomplete and inconsistent bad data. Data anomalies are detected, data is adjusted as early as possible and the data to be analyzed is reduced, and high return is obtained in the decision making process. The adverse effect of outlier samples on the prediction model is avoided.
At present, the establishment of the clustering number in most clustering algorithms is an important and difficult problem, and the particle swarm clustering algorithm is no exception. The prior art has great defects in establishing the cluster number according to the prior knowledge, and the effectiveness of the established cluster number is directly influenced by errors or missing of the prior knowledge.
In view of this, it is a problem to be solved urgently in actual work to establish a power distribution network data preprocessing method which can accurately classify and is based on the particle swarm principle.
Disclosure of Invention
The invention aims to solve the technical problem of providing a power distribution network data preprocessing method based on a particle swarm principle by adopting big data clustering. Aiming at the defects that the conclusion of a bending moment method and a concave-convex coefficient method is inconsistent when the current optimal N value of the cluster is selected, the particle swarm algorithm is combined with a bending moment method and a concave-convex coefficient algorithm, a cluster number selection mechanism with the bending moment method as the main part and the concave-convex coefficient method as the auxiliary part is adopted, and optimized power distribution network data with outliers removed is output at the same time. The method overcomes the defect that a bending moment method and a concave-convex coefficient algorithm are easy to fall into local extreme values, keeps global optimization of the particle swarm algorithm, has high convergence speed of the bending moment method and the concave-convex coefficient algorithm, outputs optimized power distribution network data for eliminating outliers, can overcome the defects of the prior art, and has the advantages of good denoising effect, high sorting accuracy and effectiveness and the like.
The technical scheme of the invention is as follows: the method for preprocessing the power distribution network data based on the particle swarm principle by adopting big data clustering is characterized by comprising the following steps of:
1) Analyzing the size and frequency of faults of the distribution network in the past year, finding out the type of data mining, and acquiring data according to data sources of a distribution line online monitoring system, an intelligent public distribution transformer monitoring system and the like;
2) Adopting a characteristic structure for the data source, carrying out normalization processing, and combining the data from a plurality of sources into a consistent database for storage;
3) Performing the clustering number analysis of a bending moment method and a concave-convex coefficient method on the current particle population after the previous step to obtain the optimal clustering number of the sample;
4) Dividing the data samples into a plurality of categories according to the number of clusters, calculating the distance between the data samples and a cluster center by adopting a particle swarm algorithm, and optimizing the cluster center;
5) After clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting abnormal value identification standards of an upper critical graph and a lower critical graph, if the distance between the samples and a clustering center is greater than the diagnosis threshold value, the samples are judged to be outlier samples, and the outlier samples are removed; further obtaining the sample data of 'noise removal',
6) And predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing.
According to the power distribution network data preprocessing method, the power distribution network data after normalization processing is subjected to bending moment method and concave-convex coefficient method clustering number analysis, and the optimal clustering number of the sample is obtained by adopting a clustering number selection mechanism with the bending moment method as the main part and the concave-convex coefficient method as the auxiliary part.
Specifically, the bending moment method comprises the following steps:
enabling the clustering number N to be valued from 1 until the clustering upper limit suitable for the power distribution network is obtained;
clustering each N value and recording clustering errors of all corresponding samples, namely the advantages and disadvantages of clustering effects;
then drawing a relation graph of the clustering errors of the N and all samples;
finally, selecting the N value corresponding to the bending edge angle as the optimal clustering number;
the bending moment coefficient algorithm is as follows:
wherein N is the number of clusters, O b For sample objects in the i cluster, C i The cluster center of the current i cluster.
Specifically, the irregularity coefficient method includes:
according to the distances between the samples in the cluster, the concave-convex coefficients of all the samples are calculated, and then the average value is calculated to obtain the average concave-convex coefficient;
the value range of the average concave-convex coefficient is [ -1,1], the farther the sample distance between clusters is, the larger the average concave-convex coefficient is, and the M with the largest average concave-convex coefficient is taken as the optimal clustering number;
a certain sample point D i The concave-convex coefficient algorithm is as follows:
wherein, let k i Is the average distance of point i to all other points in its cluster, l i The minimum distance from point i to all points in any cluster where it is not present;
the definition of the nearest cluster is:
wherein S is a cluster R N Sample of (1), with D i The average distance of all samples to a cluster is used as a measure of the distance of the point to the cluster, and the distance D is selected i The closest one cluster is taken as the closest cluster.
Further, according to the optimal N value determined by the concave-convex coefficient method, if the conclusion of the BMC supports or is not contradictory to the concave-convex coefficient method, the optimal N value is directly determined by the concave-convex coefficient method; if the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.
Further, dividing the cluster number obtained by the data sample into N categories, and calculating the distance between the data sample and the cluster center; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.
Further, after cluster analysis of the samples, the diagnostic threshold is defined using the outlier recognition criteria of the upper and lower critical graphs, the outliers generally being defined as greater than D L -1.5GAP or less than D H A value of +1.5 GAP;
wherein D L The lower quartile indicates that one fourth of all sample values has a smaller data value than the lower quartile;
D H is called as onA quartile indicating that one fourth of all sample values has a greater data value than the quartile;
GAP is called the interquartile range, which is the upper quartile D H And lower quartile D L The difference between which half of the total observed value is contained.
And if the distance between the sample and the clustering center is greater than the diagnosis threshold value, diagnosing the sample as an outlier sample and removing the outlier sample.
The method for preprocessing the data of the power distribution network overcomes the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into local extreme values, keeps the global optimization of the particle swarm algorithm, and has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm.
Compared with the prior art, the invention has the advantages that:
1. by adopting the power distribution network data preprocessing method based on the particle swarm principle for big data clustering, the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into a local extreme value is overcome, the global optimization searching performance of the particle swarm algorithm is kept, and meanwhile, the method has higher convergence speed of the moment bending method and the concave-convex coefficient algorithm;
2. a clustering number selection mechanism taking a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted, optimized power distribution network data with outliers removed is output simultaneously, and the defects that the current bending moment method and the concave-convex coefficient method are inconsistent in conclusion when the optimal N value of clustering is selected can be avoided;
3. the technical scheme of the invention has the advantages of good noise removing effect, high sorting accuracy and high effectiveness.
Drawings
FIG. 1 is a schematic diagram of the sum of squared errors versus the number of clusters N;
FIG. 2 is a schematic diagram showing the relationship between the concave-convex coefficient and the number of clusters N;
FIG. 3 is a three-dimensional distribution diagram of distribution transformer rated capacity, monthly average load and monthly maximum load samples;
FIG. 4 is a graph of outlier identification for a power distribution network data outlier sample box scenario;
fig. 5 is a flow chart of a power distribution network data preprocessing method according to the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
In fig. 5, a technical solution of the present invention provides a power distribution network data preprocessing method based on a particle swarm principle by using big data clustering, which includes the following steps:
1) Analyzing the size and frequency of faults of the distribution network in the past year, finding out the type of data mining, and acquiring data according to data sources of a distribution line online monitoring system, an intelligent public distribution transformer monitoring system and the like;
2) Adopting a characteristic structure for the data source, carrying out normalization processing, and combining the data from a plurality of sources into a consistent database for storage;
3) Performing bending moment method and concave-convex coefficient method clustering number analysis on the current particle population after the step 2) to obtain the optimal clustering number of the sample;
4) Dividing the data samples into a plurality of categories according to the step 3), calculating the distance between the data samples and a clustering center by adopting a particle swarm algorithm, and optimizing the clustering center;
5) After clustering analysis is carried out on a sample, a diagnosis threshold value is defined by adopting an abnormal value identification standard of an upper critical graph and a lower critical graph, if the distance between the sample and a clustering center is greater than the diagnosis threshold value, the sample is judged to be an outlier sample, and the outlier sample is removed; therefore, sample data of 'noise removal' can be obtained, and prediction of potential rules of the power distribution network fault is facilitated;
6) And predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing.
According to the technical scheme, the normalized power distribution network data is subjected to bending moment method and concave-convex coefficient method clustering number analysis, and a clustering number selection mechanism with the bending moment method as the main method and the concave-convex coefficient method as the auxiliary method is adopted to obtain the optimal clustering number of the sample.
The moment bending method comprises the following specific steps:
taking the value of the clustering number N from 1 until the clustering upper limit suitable for the power distribution network is obtained, such as 12; clustering each N value, recording the clustering errors of all corresponding samples, namely the advantages and disadvantages of the clustering effect, drawing a relation graph of the N and the clustering errors of all samples, and finally selecting the N value corresponding to the bent corner angle as the optimal clustering number.
The bending moment coefficient algorithm is as follows:
wherein N is the number of clusters, O b For sample objects in the i cluster, C i The cluster center of the current i cluster.
The concave-convex coefficient method comprises the following specific steps:
and (4) solving the concave-convex coefficients of all samples according to the distances of the samples in the cluster, and then averaging to obtain the average concave-convex coefficient. The value range of the average concave-convex coefficient is [ -1,1], and the farther the sample distance between clusters is, the larger the average concave-convex coefficient is, and the M with the largest average concave-convex coefficient is taken as the optimal clustering number.
A certain sample point D i The concave-convex coefficient algorithm is as follows:
wherein, let k i Is the average distance of point i to all other points in its cluster, l i Is the minimum distance of point i to all points in any cluster it does not.
The definition of the nearest cluster is
Wherein S is a cluster R N Sample of (1), with D i The average distance of all samples to a cluster is used as a measure of the distance of the point to the cluster, and the distance D is selected i The closest cluster is taken as the closest cluster.
And according to the optimal N value determined by the concave-convex coefficient method, if the conclusion of the BMC supports or does not contradict the concave-convex coefficient method, directly determining the optimal N value by the concave-convex coefficient method. If the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.
Dividing the data samples into N categories according to the clustering numbers obtained by the bending moment method, the concave-convex coefficient method and the optimal N value, and calculating the distance between the data samples and the clustering center; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.
After clustering the samples, the diagnostic threshold is defined using the outlier identification criteria of the upper and lower critical plots, outliers are generally defined as being greater than D L -1.5GAP or is less than D H +1.5GAP value.
Wherein D L The lower quartile indicates that one fourth of all sample values has a smaller data value than the lower quartile; d H The upper quartile indicates that one fourth of all sample values have a larger data value than the upper quartile; GAP is called the interquartile range, which is the upper quartile D H With the lower part quartile D L The difference between which half of the total observed value is contained. And if the distance between the sample and the clustering center is greater than the diagnosis threshold value, diagnosing the sample as an outlier sample and rejecting the outlier sample.
Example (b):
according to the method, random data acquisition is carried out according to data sources of a distribution line on-line monitoring system, an intelligent public distribution transformer monitoring system and the like, 962 samples are total, the original data needs to be processed necessarily and distributed in the same interval [0,1], and are classified in the same magnitude, namely normalized processing is carried out, and the data are merged into a consistent database by a plurality of sources to be stored.
Firstly, the average value of k-dimension data of n samples is calculatedAnd the standard deviation C of the measured value,
this gives the normalized value x of the raw data ik ′:
The curve calculated by bending moment method using Matlab programming is shown in fig. 1.
The curve calculated by the bump factor method using Matlab programming is shown in fig. 2.
It can be seen that the maximum value of N for the irregularity coefficient is 3, which means that the optimum clustering number is 3. However, it is worth noting that, as can be seen from the relationship between N and the bending moment coefficient, when N is 3, although it is still very large, here the curve has already been significantly bent, i.e. it is not contradictory to the abbe number method, which is a reasonable clustering number.
Three-dimensional characteristics of each distribution network data are extracted, namely distribution transformer rated capacity, monthly average load and monthly maximum load, and 950 sample values and parameters of three types of distribution network data which can be extracted through clustering corresponding to a three-dimensional space in an algorithm are shown in the following table 1.
TABLE 1
Fig. 3 is a three-dimensional distribution diagram after data of the distribution network is mixed, and clusters where three types of different data are located can be clearly distinguished, wherein a small number of samples deviate from a cluster center.
And calculating an outlier sample rejection threshold of the sample data to be 3.6288 by using the abnormal value identification standard of the upper and lower critical graphs. And (3) taking the distance between each sample obtained by the particle swarm clustering algorithm and the clustering center point as the basis for outlier sample elimination, and if the distance is greater than an elimination threshold value, diagnosing the outlier sample as shown in the figure 4.
12 outlier samples can be removed from the upper graph, and the rest samples are the sample data of the noise elimination. From the sorting accuracy = (total number of accurately sorted samples/total number of samples) × 100%, it can be seen that the total sorting accuracy is 98.75%.
The establishment of the clustering number is an important and difficult problem in most clustering algorithms, and the particle swarm clustering algorithm is no exception. The existing method for determining the clustering number according to the priori knowledge has great defects, if the best clustering number of samples is obtained without the clustering number analysis of the bending moment method and the concave-convex coefficient method, the clustering number is determined to be 2 and 4 according to the priori knowledge, then the removed outlier samples are respectively 34 and 47, and the sorting accuracy is 96.47 percent and 95.11 percent. Is obviously lower than the sorting accuracy of the invention. Obviously, compared with the prior art, the method has the advantages of good denoising effect, high accuracy rate, high effectiveness and the like.
According to the technical scheme, the normalized power distribution network data is subjected to the clustering number analysis by the bending moment method and the concave-convex coefficient method, a clustering number selection mechanism taking the bending moment method as a main part and the concave-convex coefficient method as an auxiliary part is formed, the optimal clustering number of samples is obtained, the noise-removed sample data can be obtained, and the prediction of the potential rules of the power distribution network faults is facilitated. The method overcomes the defect that a moment bending method and a concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and simultaneously has faster convergence speed of the moment bending method and the concave-convex coefficient algorithm; the method has the advantages of good denoising effect, high accuracy and high effectiveness.
The method can be widely applied to the field of reliability prediction of the power distribution network.
Claims (9)
1. A power distribution network data preprocessing method based on a particle swarm principle and adopting big data clustering is characterized by comprising the following steps of:
1) Analyzing the size and frequency of faults of the distribution network in the past year, finding out the type of data mining, and acquiring data according to data sources of a distribution line online monitoring system, an intelligent public distribution transformer monitoring system and the like;
2) Adopting a characteristic structure for the data source, carrying out normalization processing, and combining the data from a plurality of sources into a consistent database for storage;
3) Performing the clustering number analysis of a bending moment method and a concave-convex coefficient method on the current particle population after the previous step to obtain the optimal clustering number of the sample;
4) Dividing the data samples into a plurality of categories according to the number of clusters, calculating the distance between the data samples and a cluster center by adopting a particle swarm algorithm, and optimizing the cluster center;
5) After clustering analysis is carried out on the samples, a diagnosis threshold value is defined by adopting abnormal value identification standards of an upper critical graph and a lower critical graph, if the distance between the samples and a clustering center is greater than the diagnosis threshold value, the samples are judged to be outlier samples, and the outlier samples are removed; so as to obtain the sample data of 'noise removal',
6) And predicting the potential rule of the power distribution network fault by adopting the sample data subjected to the denoising processing.
2. The method for preprocessing power distribution network data based on the particle swarm principle by using big data clustering as claimed in claim 1, wherein the method for preprocessing power distribution network data is characterized in that the normalized power distribution network data is subjected to a bending moment method and concave-convex coefficient method clustering number analysis, and a clustering number selection mechanism with a bending moment method as a main part and a concave-convex coefficient method as an auxiliary part is adopted to obtain the optimal clustering number of the sample.
3. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the bending moment method comprises the following steps:
enabling the clustering number N to be valued from 1 until the clustering upper limit suitable for the power distribution network is obtained;
clustering each N value and recording clustering errors of all corresponding samples, namely the advantages and disadvantages of clustering effects;
then drawing a relation graph of the clustering errors of the N and all samples;
finally, selecting the N value corresponding to the bending edge angle as the optimal clustering number;
the bending moment coefficient algorithm is as follows:
wherein N is a cluster number, O b For sample objects in the i cluster, C i The cluster center of the current i cluster.
4. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering according to claim 1, wherein the concave-convex coefficient method comprises the following steps:
according to the distances between the samples in the cluster, the concave-convex coefficients of all the samples are calculated, and then the average value is calculated to obtain the average concave-convex coefficient;
the value range of the average concave-convex coefficient is [ -1,1], the farther the sample distance between clusters is, the larger the average concave-convex coefficient is, and the M with the largest average concave-convex coefficient is taken as the optimal clustering number;
a certain sample point D i The concave-convex coefficient algorithm is as follows:
wherein, let k i Is the average distance of point i to all other points in its cluster, l i The minimum distance from point i to all points in any cluster where it is not present;
the definition of the nearest cluster is:
wherein S is a cluster R N Sample of (1), with D i The average distance of all samples to a cluster is used as a measure of the distance of the point to the cluster, and the distance D is selected i The closest cluster is taken as the closest cluster.
5. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the optimal N value is determined according to a concave-convex coefficient method, and if the conclusion of BMC supports or is not contradictory to the concave-convex coefficient method, the optimal N value is directly determined by the concave-convex coefficient method;
if the conclusion of BMC is contradicted with the concave-convex coefficient method, the conclusion of BMC is taken as the optimal N value.
6. The power distribution network data preprocessing method based on the particle swarm principle by adopting big data clustering as claimed in claim 1, wherein the power distribution network data preprocessing method is characterized in that the clustering number obtained by data samples is divided into N categories, and the distance between the data samples and a clustering center is calculated; finding out a local extreme value and a global extreme value according to the self position of each particle; and continuously updating the positions of the particles to a particle swarm optimization solution, and optimizing a clustering center.
7. The method as claimed in claim 1, wherein the method for preprocessing the power distribution network data based on the particle swarm principle by using big data clustering is characterized in that after the clustering analysis is performed on the samples, the method for preprocessing the power distribution network data uses the abnormal value recognition standards of the upper and lower critical graphs to define the diagnosis threshold, and the abnormal value is generally defined to be larger than D L -1.5GAP or less than D H A value of +1.5 GAP;
wherein D L The lower quartile indicates that one fourth of all sample values has a smaller data value than the lower quartile;
D H the number is called the upper quartile, and the data value of one fourth of all the sample values is larger than the upper quartile;
GAP is called the interquartile range, which is the upper quartile D H And lower quartile D L The difference between which half of the total observed value is contained.
8. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting the big data clustering as claimed in claim 7, wherein if the distance between the sample and the clustering center is greater than a diagnosis threshold value, the sample is diagnosed as an outlier sample and is removed.
9. The method for preprocessing the data of the power distribution network based on the particle swarm principle by adopting the big data clustering as claimed in claim 1, which is characterized in that the method for preprocessing the data of the power distribution network overcomes the defect that a bending moment method and a concave-convex coefficient algorithm are easy to fall into local extreme values, maintains the global optimization of the particle swarm algorithm, and has higher convergence speed of the bending moment method and the concave-convex coefficient algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010798057.6A CN112069633B (en) | 2020-08-10 | 2020-08-10 | Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010798057.6A CN112069633B (en) | 2020-08-10 | 2020-08-10 | Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112069633A CN112069633A (en) | 2020-12-11 |
CN112069633B true CN112069633B (en) | 2023-04-07 |
Family
ID=73661048
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010798057.6A Active CN112069633B (en) | 2020-08-10 | 2020-08-10 | Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112069633B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113592005B (en) * | 2021-08-04 | 2024-03-08 | 中冶赛迪信息技术(重庆)有限公司 | Converter tapping parameter recommendation method, system, medium and terminal |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104299035A (en) * | 2014-09-29 | 2015-01-21 | 国家电网公司 | Method for diagnosing fault of transformer on basis of clustering algorithm and neural network |
CN109902953A (en) * | 2019-02-27 | 2019-06-18 | 华北电力大学 | A kind of classification of power customers method based on adaptive population cluster |
CN110750524A (en) * | 2019-09-12 | 2020-02-04 | 中国电力科学研究院有限公司 | Method and system for determining fault characteristics of active power distribution network |
-
2020
- 2020-08-10 CN CN202010798057.6A patent/CN112069633B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104299035A (en) * | 2014-09-29 | 2015-01-21 | 国家电网公司 | Method for diagnosing fault of transformer on basis of clustering algorithm and neural network |
CN109902953A (en) * | 2019-02-27 | 2019-06-18 | 华北电力大学 | A kind of classification of power customers method based on adaptive population cluster |
CN110750524A (en) * | 2019-09-12 | 2020-02-04 | 中国电力科学研究院有限公司 | Method and system for determining fault characteristics of active power distribution network |
Non-Patent Citations (2)
Title |
---|
汽轮机故障诊断的粒子群优化加权模糊聚类法;陈平等;《振动.测试与诊断》;20111015(第05期);全文 * |
电力系统粒子群优化模糊聚类算法及其应用;何晓峰等;《继电器》;20071116(第22期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112069633A (en) | 2020-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223196B (en) | Anti-electricity-stealing analysis method based on typical industry feature library and anti-electricity-stealing sample library | |
CN105512799B (en) | Power system transient stability evaluation method based on mass online historical data | |
CN111160791A (en) | Abnormal user identification method based on GBDT algorithm and factor fusion | |
CN108304567B (en) | Method and system for identifying working condition mode and classifying data of high-voltage transformer | |
CN112434962B (en) | Enterprise user state evaluation method and system based on power load data | |
CN114389359A (en) | Intelligent operation and maintenance method of centralized control type relay protection equipment based on cloud edge cooperation | |
CN113189418B (en) | Topological relation identification method based on voltage data | |
CN111709554A (en) | Method and system for joint prediction of net loads of power distribution network | |
CN109446243B (en) | Method for detecting power generation abnormity of photovoltaic power station based on big data analysis | |
CN112668612A (en) | Partial discharge signal clustering analysis method based on grids | |
CN110930057A (en) | Quantitative evaluation method for reliability of distribution transformer test result based on LOF algorithm | |
CN112069633B (en) | Power distribution network data preprocessing method based on particle swarm principle and adopting big data clustering | |
CN114118588A (en) | Peak-facing summer power failure prediction method based on game feature extraction under clustering undersampling | |
CN113627735A (en) | Early warning method and system for safety risk of engineering construction project | |
CN112417763A (en) | Defect diagnosis method, device and equipment for power transmission line and storage medium | |
CN115526258A (en) | Power system transient stability evaluation method based on Spearman correlation coefficient feature extraction | |
CN107862459B (en) | Metering equipment state evaluation method and system based on big data | |
CN114417971A (en) | Electric power data abnormal value detection algorithm based on K nearest neighbor density peak clustering | |
CN113689079A (en) | Transformer area line loss prediction method and system based on multivariate linear regression and cluster analysis | |
CN113298148B (en) | Ecological environment evaluation-oriented unbalanced data resampling method | |
CN115940134A (en) | Distribution network data analysis processing method based on fault-tolerant data | |
CN114429240A (en) | Method and device for monitoring running state of highway equipment | |
Liangzhi et al. | Research on fault prediction and diagnosis of power equipment based on big data | |
CN113158552B (en) | Bioreactor operation condition grading prediction method and system based on time sequence | |
CN114168662A (en) | Power distribution network problem combing and analyzing method and system based on multiple data sources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |