Disclosure of Invention
In view of the above problems, the present invention aims to provide a method and a system for dividing a terminal user based on multi-dimensional big data of power supply reliability, which can realize the comprehensiveness and scientificity of the division.
In order to achieve the purpose, the invention adopts the following technical scheme: a terminal user dividing method based on power supply reliability multi-dimensional big data comprises the following steps:
1) acquiring historical data of each characteristic in a multi-dimensional power supply reliability analysis model which is pre-constructed by each terminal user to be classified;
2) performing power supply reliability factor correlation analysis according to preset reference characteristics and acquired historical data by adopting a grey correlation method, and extracting characteristics influencing the power supply reliability of the terminal user to be classified in a multi-dimensional power supply reliability analysis model;
3) acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user to be classified;
4) adopting a DBSCAN algorithm to perform preliminary clustering on the acquired data of each terminal user to be classified respectively to obtain the cluster number and the cluster center of each terminal user to be classified;
5) and determining the classification of each terminal user to be classified by using the K-means algorithm and taking the cluster number and the cluster center of each terminal user to be classified as the cluster number and the mass center respectively.
Further, the specific process of step 2) is as follows:
2.1) establishing an original matrix X consisting of m objects and n characteristics according to a pre-constructed multi-dimensional power supply reliability analysis model:
in the formula, XiA feature vector of the ith object; xijIs the jth feature of the ith object;
2.2) carrying out normalization processing on the characteristic data in the multidimensional power supply reliability analysis model to obtain the characteristic Y after normalization processingij:
In the formula, YijThe j characteristic of the ith object after normalization processing is obtained; xminjIs the minimum value of the same-column characteristics in the matrix X; xmax jThe maximum value of the same-column features in the matrix X is obtained, and the normalized matrix Y is as follows:
in the formula, YjNormalizing the vectors formed by different objects of the jth characteristic;
2.3) selecting one column of each type of characteristics in the normalized matrix Y as a reference number column Y in turn0:
Y0=(Y10,…,Yi0,…,Ym0)T
In the formula, Ym0Characteristics of a certain factor affecting power supply reliability for different objects;
2.4) comparing each column of vectors in the normalized matrix Y with a reference number column Y0Making difference and taking absolute value to obtain absolute difference matrix [ delta ]k]:
[Δk]=|Y-Y0|,k=1,2,...n
In the formula, YkVarious types of features for different objects in the kth column;
2.5) from the matrix of absolute differences [ Delta ]k]Calculating the correlation coefficient xi of each featurej(i):
In the formula, xij(i) The correlation coefficient of the jth characteristic of the ith object; deltaijIs a matrix of absolute differences [ Delta ]k]Row i, feature j; rho is a resolution coefficient;
2.6) correlation coefficient ξ for each featurej(i) Calculating a gray relevance value r of each featurej:
In the formula, rjThe grey correlation value of the jth feature and the reference feature is obtained;
2.7) Gray correlation values r according to the respective characteristicsjSize-to-multi-dimensional power supply reliability analysis modelThe characteristics in the method are sorted, and a grey relevance value r is extractedjCharacteristics that affect the end user power supply reliability above a predetermined threshold.
Further, the specific process of the step 4) is as follows:
4.1) respectively setting the acquired sample data set D of the data of each terminal user to be classified as { x ═ x }1,x2,…,xwDividing the sample data set D into eta clusters;
4.2) determining the minimum value MinPts of the e-neighborhood of each sample data set D and the neighborhood number of the core object in each e-neighborhood;
4.3) determining the clustering center of the sample data set D of each terminal user to be classified according to the belonging-neighborhood of each data in each sample data set D and the minimum MinPts of the neighborhood number of the core object in each belonging-neighborhood.
Further, the specific process of the step 4.3) is as follows:
4.3.1) if a data xpIncludes c (c ≧ MinPts) data, then one more data x is createdpA cluster as a core object;
4.3.2) find all core objects, i.e. data xpE-data in the neighborhood xdE.g. D, and obtaining a clustering center N of the sample data set D of each terminal user to be classified according to clustering∈(xp):
N∈(xp)={xd∈D|dist(xd,xp)≤∈}
Wherein N is∈(xp) The set of the clustering centers of the preliminarily judged sample data set D is obtained; dist is the Euclidean distance.
Further, the specific process of step 5) is as follows:
5.1) obtaining the clustering cluster number eta and the clustering center N of the sample data set D of each terminal user to be classified∈(xp) Cluster number and initial centroid vector [ mu ] as a K-means algorithm1,μ2,…,μΩAnd setting the iteration times N, wherein muΩAs a single centroid vector;omega is a centroid set N∈(xp) The number of the medium centroid vectors, wherein eta is omega;
5.2) initializing clusters
Wherein, C
tIs the t-th collection containing data;
is an empty set;
5.3) calculating the data xs(s-1, 2, …, w) and each centroid vector μv(v ═ 1,2, …, Ω) distance dsv:
dsv=||xs-μv||2
5.4) according to the distance dsvMinimum value of (2), data xsIs drawn into the corresponding centroid vector muvCluster C oft=vIn this case, C is updatedt=v=Ct=v∪{xs};
5.5) recalculating Cluster CtCentroid vector u oft:
5.6) if all clusters CtCentroid vector u oftIf no change occurs, the step 5.7) is carried out; otherwise, entering step 5.2) until all iterations are finished;
5.7) output final Cluster partition C ═ C1,C2,…,CηAnd finishing the classification of the terminal users.
Further, the multidimensional power supply reliability analysis model comprises grid structure characteristics, technical equipment level characteristics, equipment quality characteristics, fault cause characteristics and operation and maintenance level characteristics, the network frame structure characteristics comprise inter-station contact rate, rotatable power, average line segmentation number, ring network rate, average length of each line and network connection standardization rate, the technical equipment level characteristics comprise average line load rate, overhead line insulation rate and cabling rate, the equipment quality characteristics comprise medium-voltage line fault rate of bare wires, medium-voltage cable fault rate and medium-voltage insulation fault rate of insulated wires, the fault reason characteristics comprise natural factor caused fault times and external force factor caused fault times, and the operation maintenance level characteristics comprise live working rate (number of users in power failure), average time of emergency in-place fault needs, average power failure duration time of medium-voltage faults and average fault location duration time.
An end user partitioning system based on power supply reliability multi-dimensional big data comprises:
the historical data acquisition module is used for acquiring historical data of each characteristic in a multi-dimensional power supply reliability analysis model which is pre-constructed by each terminal user to be classified;
the characteristic extraction module is used for performing power supply reliability factor correlation analysis according to preset reference characteristics and acquired historical data by adopting a grey correlation degree method and extracting characteristics influencing the power supply reliability of the terminal user to be classified in the multi-dimensional power supply reliability analysis model;
the actual data acquisition module is used for acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user;
the primary clustering module is used for performing primary clustering on the acquired data of each terminal user to be classified by adopting a DBSCAN algorithm to obtain the clustering cluster number and the clustering center of each terminal user to be classified;
and the classification module is used for determining the classification of each terminal user to be classified by respectively taking the cluster number and the cluster center of each terminal user to be classified as the cluster number and the centroid by adopting a K-means algorithm.
A processor comprises computer program instructions, wherein the computer program instructions are used for realizing the steps corresponding to the end user dividing method based on the power supply reliability multi-dimensional big data when being executed by the processor.
A computer readable storage medium, which stores computer program instructions, wherein the computer program instructions, when executed by a processor, are configured to implement the steps corresponding to the above end user partitioning method based on multidimensional big data of power supply reliability.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. according to the invention, the K-means algorithm and the DBSCAN algorithm are adopted simultaneously, so that the clustering error of the DBSCAN algorithm caused by uneven data density and the clustering error of the K-means algorithm caused by poor initial clustering center and clustering number setting can be made up, and further the grade division of the terminal user can be realized scientifically.
2. The invention adopts the GRA method to perform relevance analysis on the clustered terminal user index data with different power supply reliability grades, can improve the sequencing of the power supply reliability influence factors and the accuracy of dimensionality reduction, and can be widely applied to the field of large data processing of the power distribution network.
Detailed Description
The present invention is described in detail below with reference to the attached drawings. It is to be understood, however, that the drawings are provided solely for the purposes of promoting an understanding of the invention and that they are not to be construed as limiting the invention. In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Example 1
As shown in fig. 1, the present embodiment provides an end user partitioning method based on multidimensional big data of power supply reliability, including the following steps:
1) because different end users have different requirements on power supply reliability of different power supply areas, in order to ensure that the power supply reliability of different areas is comprehensively analyzed, a multidimensional power supply reliability analysis model is constructed from five dimensions of a grid structure, a technical equipment level, equipment quality, a fault reason and an operation maintenance level, wherein the grid structure characteristic comprises inter-station contact rate, convertible power, average line segmentation number, ring network rate, average line length and network wiring standardization rate, the technical equipment level characteristic comprises average line load rate, insulation rate of an overhead line and cabling rate, the equipment quality characteristic comprises medium-voltage line fault rate of a bare conductor, medium-voltage cable fault rate and medium-voltage fault rate of an insulated line, The fault cause characteristics comprise the frequency of faults caused by natural factors and the frequency of faults caused by external force factors, and the operation and maintenance level characteristics comprise the live working rate (the number of households in power failure), the mean time of urgent need for in-place of faults, the mean duration time of power failure of medium-voltage faults and the mean duration time of fault location.
2) Acquiring historical data of each feature in a multidimensional power supply reliability analysis model of each terminal user to be classified, performing power supply reliability factor correlation analysis according to preset reference features and the acquired historical data by adopting a gray correlation degree method (GRA), extracting the features influencing the power supply reliability of the terminal users to be classified in the multidimensional power supply reliability analysis model, performing dimensionless quantization processing by exploring the similarity degree of a change curve between two features by the gray correlation degree method, comparing the change situations of the two feature curves, further comparing the correlation between the two features and the power supply reliability, and finally realizing the sequencing of the features influencing the power supply reliability, wherein the method specifically comprises the following steps of:
2.1) according to the constructed multidimensional power supply reliability analysis model, establishing an original matrix X consisting of m objects, namely terminal users to be classified and n characteristics:
in the formula, XiA feature vector of the ith object; xijIs the jth feature of the ith object.
2.2) carrying out normalization processing on the characteristic data in the multidimensional power supply reliability analysis model to obtain the characteristic Y after normalization processingij:
In the formula, YijThe j characteristic of the ith object after normalization processing is obtained; xmin jIs the minimum value of the same-column characteristics in the matrix X; xmax jThe maximum value of the same-column features in the matrix X is obtained, and the normalized matrix Y is as follows:
in the formula, YjAnd (4) normalizing the formed vectors for different objects of the jth characteristic.
2.3) selecting one column of each type of characteristics in the normalized matrix Y as a reference number column Y in turn0:
Y0=(Y10,…,Yi0,…,Ym0)T (4)
In the formula, Ym0The characteristics of one of different objects influencing the power supply reliability factor.
2.4) comparing each column of vectors in the normalized matrix Y with a reference number column Y0Making difference and taking absolute value to obtain absolute difference matrix [ delta ]k]:
[Δk]=|Yk-Y0|,k=1,2,...n (5)
In the formula, YkAre various types of features of different objects in the k-th column.
2.5) from the matrix of absolute differences [ Delta ]k]Calculating the correlation coefficient xi of each featurej(i):
In the formula, xij(i) The correlation coefficient of the jth characteristic of the ith object; deltaijIs a matrix of absolute differences [ Delta ]k]Row i, feature j; rho is a resolution coefficient and is 0.5.
2.6) correlation coefficient ξ for each featurej(i) Calculating a gray relevance value r of each featurej:
In the formula, rjIs the grey correlation value of the jth feature and the reference feature.
2.7) Gray correlation values r according to the respective characteristicsjThe method sorts the characteristics in the multi-dimensional power supply reliability analysis model and extracts a grey relevance value rjCharacteristics that affect the end user power supply reliability above a predetermined threshold.
3) And acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user.
4) Adopting a DBSCAN algorithm (a noise-based density clustering algorithm), considering that the power supply reliability data exponentially increases in time and space dimensions, and respectively carrying out primary clustering on the acquired data of each terminal user to be classified to obtain the cluster number and the cluster center of each terminal user to be classified, wherein the method specifically comprises the following steps:
4.1) respectively setting the acquired sample data set D of the data of each terminal user to be classified as { x ═ x }1,x2,…,xwAnd dividing the sample data set D into η clusters. Because the acquired data does not have any label information, if the acquired data is divided into eta clusters, one of the eta clusters must be selected as a standard, and therefore the square of the distance | | · | |2 is selected as the standard.
4.2) determining the minimum value MinPts of the belongings to the neighborhood of each sample data set D and the neighborhood number of the core object in each belongings to the neighborhood, wherein the belongings to the neighborhood are the radius of the clustering space.
4.3) determining the clustering center of the sample data set D of each terminal user to be classified according to the belonged to the neighborhood of each data in each sample data set D and the minimum MinPts of the neighborhood number of the core object in each belonged to the neighborhood:
4.3.1) if a data xpIncludes c (c ≧ MinPts) data, then one more data x is createdpA cluster as a core object.
4.3.2) find all core objects, i.e. data xpIs e to-Data x within the neighborhooddBelongs to the group D, and carries out clustering according to the following formula (8) to obtain a clustering center N of the sample data set D of each terminal user to be classified∈(xp):
N∈(xp)={xd∈D|dist(xd,xp)≤∈} (8)
Wherein N is∈(xp) The set of the clustering centers of the preliminarily judged sample data set D is obtained; dist is the Euclidean distance.
5) Adopting a K-means algorithm to cluster the clustering number eta and the clustering center N of each terminal user to be classified∈(xp) The method is used for determining the classification of each terminal user to be classified as the cluster number and the mass center of a K-means algorithm (K-means clustering algorithm), and specifically comprises the following steps:
5.1) obtaining the clustering cluster number eta and the clustering center N of the sample data set D of each terminal user to be classified∈(xp) Cluster number and initial centroid vector [ mu ] as a K-means algorithm1,μ2,…,μΩAnd setting the iteration times N, wherein muΩAs a single centroid vector; omega is a centroid set N∈(xp) The number of the medium centroid vectors, η ═ Ω.
5.2) initializing clusters
Wherein, C
tIs the t-th collection containing data;
is an empty set.
5.3) calculating the data xs(s-1, 2, …, w) and respective centroid directionsQuantity muv(v ═ 1,2, …, Ω) distance dsv:
dsv=||xs-μv||2 (9)
5.4) according to the distance dsvMinimum value of (2), data xsIs drawn into the corresponding centroid vector muvCluster C oft=vIn this case, C is updatedt=v=Ct=v∪{xs}。
5.5) recalculating Cluster CtCentroid vector u oft:
5.6) if all clusters CtCentroid vector u oftIf no change occurs, the step 5.7) is carried out; otherwise, step 5.2) is entered until all iterations are completed.
5.7) output final Cluster partition C ═ C1,C2,…,CηAnd finishing the classification of the terminal users.
The following describes in detail the method for dividing end users based on multidimensional big data of power supply reliability by specific embodiments:
in the embodiment, different power supply terminal users of a power distribution network in a certain city are specifically divided into commercial users, large hospitals, residential users, light industrial users, suburban users, steel mills and petrochemical plants, and based on different requirements of three-level loads on power supply reliability, power supply reliability sample data of different terminal users in the last decade of the certain city are taken as an example, and annual data includes data of 4000 terminal users.
The method is characterized in that the power supply reliability data of the city for 10 years are compared by adopting a GRA (generalized grammes) method, the ordering of the characteristics influencing the power supply reliability is finally realized, and z is adopted for simplifying the description of factor indexesj∈Z,Z={z0,z1,z2,…,zmAnd m is 18, it respectively represents power supply reliability, ring network rate, average line length, network connection standardization rate, inter-station connection rate, transfer rate and average lineThe number of line segments, the average load rate of lines, the insulation rate of overhead lines, the cabling rate, the fault rate of medium-voltage lines of bare conductors, the fault rate of medium-voltage cables, the medium-voltage fault rate of insulated wires, the fault probability caused by natural factors, the fault frequency caused by external force, the live-line working rate (power failure frequency), the average duration time of medium-voltage average faults, the average time of urgent need for in-place faults and the average duration time of fault location are shown in the following table 1, and the characteristics of terminal users partially influencing the power supply reliability are shown in the following table:
table 1: features affecting end-user reliability of power supply
As shown in fig. 2, the correlation between the features is visually displayed, and the darker the color is, the larger the correlation value between the two features is, and the stronger the correlation is. As can be seen from FIG. 2, mean time to failure location (z)17) And railway medium voltage failure rate (z)12) The correlation value of (2) reaches 0.88, and the correlation is strong. Thus, both features can be screened in selecting features that affect the ultimate power reliability. As shown in fig. 3, showing the correlation between each feature and the standard, i.e., the power reliability rate, all features may be sorted according to the grey correlation value to select the feature with higher power reliability for the city. Figures 2 and 3 can be used to finalize the features that mainly affect the reliability of the supply of power in this area, as a result of which the live working rate (z) is obtained15)>Mean time of emergency fault location (z)17)>Cable rate (z)9)>Transmissibility (z)5)>Inter-station contact rate (z)4)>Number of failures due to external force (z)14)。
The invention clusters the annual power supply reliability characteristics of each terminal user to be classified through a DBSCAN algorithm and a K-means algorithm, and repeatedly performs cluster verification, since different end users have different requirements on power supply reliability, the invention clusters the end users with the same power supply reliability requirements according to the reliability requirements of the three-level load, as shown in fig. 4, the clustering result of one year is displayed, the clustering center one comprises the main load, such as hospitals, major institutions, steel plants and oil, etc., cluster center two includes secondary loads, such as commercial, residential and light industrial areas, cluster center three includes tertiary loads for suburban users, the clustering result in fig. 4 can be used to classify the reliability levels of different end users, can reflect the importance of the end users, and provides a certain data support for the scheduling system.
Example 2
The embodiment provides a system for dividing end users based on multidimensional big data of power supply reliability, which comprises:
the characteristic extraction module is used for carrying out power supply reliability factor correlation analysis on each characteristic in a pre-constructed multi-dimensional power supply reliability analysis model according to a preset reference characteristic by adopting a grey correlation method, and extracting the characteristic influencing the power supply reliability of a terminal user;
the data acquisition module is used for acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user;
the primary clustering module is used for performing primary clustering on the acquired data of each terminal user to be classified by adopting a DBSCAN algorithm to obtain the clustering cluster number and the clustering center of each terminal user to be classified;
and the classification module is used for determining the classification of each terminal user to be classified by respectively taking the cluster number and the cluster center of each terminal user to be classified as the cluster number and the centroid by adopting a K-means algorithm.
Example 3
The present embodiment provides a processing device corresponding to the method for dividing an end user based on multidimensional big data with power supply reliability provided in embodiment 1, where the processing device may be a processing device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, to execute the method of embodiment 1.
The processing equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus so as to complete mutual communication. The memory stores a computer program capable of running on the processor, and the processor executes the method for dividing the end user based on the multidimensional big data of power supply reliability provided by the embodiment 1 when running the computer program.
In some implementations, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.
In other implementations, the processor may be various general-purpose processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.
Example 4
The end-user partitioning method based on the supply reliability multidimensional big data of embodiment 1 can be embodied as a computer program product, and the computer program product can include a computer readable storage medium on which computer readable program instructions for executing the voice recognition method described in embodiment 1 are loaded.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.
The above embodiments are only used for illustrating the present invention, and the structure, connection mode, manufacturing process, etc. of the components may be changed, and all equivalent changes and modifications performed on the basis of the technical solution of the present invention should not be excluded from the protection scope of the present invention.