CN114253953A

CN114253953A - Power distribution network multidimensional data processing method and system based on cluster analysis

Info

Publication number: CN114253953A
Application number: CN202111369118.8A
Authority: CN
Inventors: 孙常浩; 蔡雷鸣; 季玮; 施广德; 金舒
Original assignee: Guodian Nanjing Automation Co Ltd
Current assignee: Guodian Nanjing Automation Co Ltd
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2022-03-29

Abstract

The invention discloses a power distribution network multidimensional data processing method based on cluster analysis, which comprises the following steps: performing standard access and storage on data of multiple data sources; carrying out anomaly detection on the stored data; determining a strong correlation attribute of the abnormal data according to the multi-dimensional characteristics of the data after the abnormality detection; according to the method, the abnormal data are corrected by adopting cluster analysis according to the strong association attribute of the abnormal data, the data quality of the power distribution network can be effectively improved, and the method has higher data storage and query efficiency.

Description

Power distribution network multidimensional data processing method and system based on cluster analysis

Technical Field

The invention belongs to the technical field of power grid data processing, and particularly relates to a power distribution network multidimensional data processing method based on cluster analysis.

Background

Along with the continuous development of the information-based construction of the intelligent power distribution network, the types and the quantity of data collected by the power distribution network terminals are also continuously increased. The power distribution and utilization information management system in the intelligent power distribution network comprises more than ten systems such as a power distribution automation system, a load control and management system, a marketing service management system, a power utilization information acquisition system and the like, and acquired data have obvious multi-source and heterogeneous big data characteristics. On one hand, the integration of mass multi-source data can provide a data basis for large data application such as power distribution network operation state perception, but the multi-source and heterogeneous characteristics of the data also bring challenges to data fusion and storage; on the other hand, as the intelligent acquisition equipment is wide in distribution and numerous in quantity, the operation environment of part of terminals is severe, the working condition is poor, and data loss or abnormal conditions often occur in the data acquisition and transmission process. The statistical analysis of the abnormal data is often far from the true value, which affects the accuracy of prediction precision and control decision, and the existing research methods for improving data quality are many, such as interpolation method, neural network method, etc., but all correct from the characteristic of a certain dimension of data.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a power distribution network multidimensional data processing method and system based on cluster analysis, which can correct abnormal data.

The technical problem to be solved by the invention is realized by the following technical scheme:

in a first aspect, a power distribution network multidimensional data processing method based on cluster analysis is provided, which includes:

performing standard access and storage on data of multiple data sources;

carrying out anomaly detection on the stored data;

determining a strong correlation attribute of the abnormal data according to the multi-dimensional characteristics of the data after the abnormality detection;

and correcting the abnormal data by adopting cluster analysis according to the strong association attribute of the abnormal data.

With reference to the first aspect, further, the performing canonical access and storage on the multiple data sources specifically includes: accessing multi-data source data by adopting a unified standard interface based on a national power grid public data model; and storing the data of multiple data sources in a database according to different data types and characteristics.

With reference to the first aspect, further, the performing abnormality detection on the stored data specifically includes: and detecting abnormal data by adopting a density-based clustering method DBSCAN.

With reference to the first aspect, further, the determining the strong association attribute of the abnormal data specifically includes:

determining the relevance of anomalous data using the following equation

Wherein X is an abnormal data set, Y is a historical data set of the strong correlation attribute of the abnormal data, and sigma is_x、σ_yX, Y standard deviations, cov (X, Y) is the covariance between X, Y, respectively;

n is the number of data in the abnormal data set, x_iIs the ith element in X in the set, y_iIs the ith element in the set Y.

With reference to the first aspect, further, the modifying the abnormal data by cluster analysis according to the strong association attribute of the abnormal data specifically includes:

establishing a set C of strong correlation attributes;

determining the weight w of each attribute in the set C by adopting an entropy weight method_j；

For exception data x_iSelecting the jth strong correlation attribute from the set C, and performing historical data y on the jth strong correlation attribute in the same time period_jClustering the set;

at y_jAnd recording the point with the minimum distance with the strongly-associated attribute data in the class

Obtaining corrected data according to equation (3)

In a second aspect, a power distribution network multidimensional data processing system based on cluster analysis is provided, which includes:

the preprocessing module is used for performing standard access and storage on the data of multiple data sources;

the abnormal data detection module is used for carrying out abnormal detection on the stored data;

the data correction module is used for determining the strong correlation attribute of the abnormal data according to the multi-dimensional characteristics of the data after the abnormality detection;

The invention has the beneficial effects that: the invention provides a data service center for multi-source data fusion and data quality improvement. The data service center performs data fusion aiming at the characteristics of wide data sources and multi-source isomerism of the power distribution network, adopts standardized access, performs database storage according to data characteristics, and performs integrated management on data from different sources; aiming at the conditions of large data collection quantity and low quality of the power distribution network, a clustering analysis method is used for extracting and abnormal detecting historical data and correcting abnormal data by combining multidimensional characteristics of the data, so that the data quality is improved; and finally, providing the correction data for big data analysis application through a data publishing module. Practical application shows that the data service center can effectively improve the data quality of the power distribution network and has higher data storage and query efficiency.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a system architecture diagram of the present invention;

fig. 3 is a schematic diagram of the circuit breaker a-phase current before and after data correction according to the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For better understanding of the present invention, the related art in the technical solution of the present invention is explained below.

As shown in fig. 1 to 3, a method for processing multidimensional data of a power distribution network based on cluster analysis includes the following steps:

the method comprises the following steps of firstly, standardizing data access and integrated storage according to different data sources in a distribution network system, ensuring data uniformity, and specifically comprising the following steps:

the power distribution network operation real-time data, historical data and model data are accessed in a standard mode, stored in a sub-base mode and unified by main keys, and the method specifically comprises the following steps:

the synchronization and integration of data are based on a national power grid public data model (SG-CIM) and serve as multi-source data such as a standard docking marketing system, a power utilization information acquisition system, a production management system, a scheduling automation system and the like, and different system data are accessed into the system by adopting a uniform standard interface format. And the database storage is carried out according to different data types and characteristics, the storage efficiency is improved, and the data unification is ensured by establishing a global main key through a snowflake algorithm.

(1) Running real-time data storage

The operation real-time data source comprises relevant data of the production control system, and the data source comprises: a dispatching automation system, an electricity utilization metering system, a GIS system and the like. The data types are mainly metrology data. The data are transmitted through a data center real-time data interface, and the data center is responsible for receiving and transferring the data into a distributed redis deployment real-time library through a stream processing module.

(2) Historical profile data storage

The historical section data is from the electricity charge data of the marketing system, the user file data, the measurement historical value of the production control system and the like. And storing the time sequence library of infiluxdb through batch processing, and supporting offline analysis.

(3) Power distribution network model storage

The power distribution network model comprises a Common Information Model (CIM) generated by a power distribution automation system, network topology information, equipment ledger information and the like. And extracting the data to a data center for analysis in an ETL mode of a data warehouse.

Step two, carrying out abnormity detection on the stored data

1. DBSCAN relates to concept description

The algorithm partitions the data class clusters by analyzing how close the data points in the sample data set are. If the original data set is X ═ X₁,x₂…x_nThen, the following definitions can be given:

(1) ε -neighborhood: refers to the data set X, with the sample point X_jAll sets of points having a distance between them not greater than epsilon.

(2) Core point: if in data set X, a certain sample point X_jAt least MinPts data points exist in the epsilon-neighborhood, then x_jReferred to as core points.

(3) Boundary points are as follows: if the sample point x_jIf there are less than MinPts data points within its ε -neighborhood, x_jReferred to as boundary points.

(4) Noise points: in the data set X, points that are neither boundary points nor core points are referred to as noise points.

(5) The density is up to: if the sample point x_jAt point x_iAnd x is in the neighborhood of_iIs a core point, and is called x_jFrom x_iThe density is up to.

(6) The density can reach: for sample point x_iAnd x_jIf there is a set of sample sequences p₁,p₂,…p_mSo that x is_i＝p₁,x_j＝p_mAnd the sample sequence p_i+1Can be formed from p_iDensity is through, the sample point x can be called_jCan be composed of x_iThe sample is reachable.

(7) Density connection: for sample point x_iAnd x_jIf there is a sample x_kSo that x_iAnd x_jBy x_kIf the sample can be reached, x is indicated_iAnd x_jAre sample-connected.

2. The method comprises the following implementation steps:

(1) the method comprises the steps of inputting an original data set, a neighborhood radius epsilon and a judgment neighborhood data quantity threshold MinPts. Randomly selecting a point p as an initial object;

(2) calculating whether the number of points in the epsilon neighborhood of the point p is more than or equal to MinPts. Judging whether the point p is a core point, if the point p is the core point, finding out all data points with the reachable density of the point p to form a new cluster, and marking the points as processed;

(3) if the number of points in the epsilon neighborhood of the point p is not less than MinPts, marking the point p as a noise point and selecting another unprocessed data point;

(4) repeating (2) and (3) until all points are processed.

Through clustering calculation, abnormal data, namely noise points, in the data set can be screened out. We can then correct and fill in the data by multi-dimensional correlation of the data.

Step three, determining the strong correlation attribute of the abnormal data according to the multi-dimensional characteristics of the data after the abnormal detection, and specifically comprising the following steps:

determining the relevance of anomalous data using the following equation

Wherein X is an abnormal historical data set, Y is a normal historical data set strongly related to abnormal data, and sigma is_x、σ_yX, Y standard deviations, cov (X, Y) is the covariance between X, Y, respectively;

Step four, correcting the abnormal data by adopting cluster analysis according to the strong association attribute of the abnormal data, and specifically comprises the following steps:

establishing a set C of strong correlation attributes;

Obtaining corrected data according to equation (3)

The invention also provides a power distribution network multidimensional data processing system based on cluster analysis, which comprises:

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims

1. A power distribution network multidimensional data processing method based on cluster analysis is characterized by comprising the following steps:

performing standard access and storage on data of multiple data sources;

carrying out anomaly detection on the stored data;

2. The method for processing the multidimensional data of the power distribution network based on the cluster analysis as claimed in claim 1, wherein the performing of the canonical access and the storage of the data of multiple data sources specifically comprises: accessing multi-data source data by adopting a unified standard interface based on a national power grid public data model; and storing the data of multiple data sources in a database according to different data types and characteristics.

3. The method for processing the multidimensional data of the power distribution network based on the cluster analysis, according to claim 1, is characterized in that the abnormal detection of the stored data is specifically as follows: and detecting abnormal data by adopting a density-based clustering method DBSCAN.

4. The method for processing the multidimensional data of the power distribution network based on the cluster analysis is characterized by comprising the following steps of: the determining of the strong association attribute of the abnormal data specifically includes:

determining the relevance of anomalous data using the following equation

Wherein X is an abnormal historical data set, and Y is normal historical data strongly related to the abnormal dataSet, σ_x、σ_yX, Y standard deviations, cov (X, Y) is the covariance between X, Y, respectively;

5. The method for processing the multidimensional data of the power distribution network based on the cluster analysis as claimed in claim 4, wherein the correcting the abnormal data by the cluster analysis according to the strong association attribute of the abnormal data specifically comprises:

establishing a set C of strong correlation attributes;

Obtaining corrected data according to equation (3)

6. A power distribution network multidimensional data processing system based on cluster analysis is characterized by comprising: