CN112528113A - Terminal user dividing method and system based on power supply reliability multi-dimensional big data - Google Patents

Terminal user dividing method and system based on power supply reliability multi-dimensional big data Download PDF

Info

Publication number
CN112528113A
CN112528113A CN202011498719.4A CN202011498719A CN112528113A CN 112528113 A CN112528113 A CN 112528113A CN 202011498719 A CN202011498719 A CN 202011498719A CN 112528113 A CN112528113 A CN 112528113A
Authority
CN
China
Prior art keywords
power supply
supply reliability
terminal user
classified
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011498719.4A
Other languages
Chinese (zh)
Inventor
姜世公
杨卫红
范须露
杨赫
贾利虎
张重阳
张思聪
柳伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Tianjin Electric Power Co Ltd
State Grid Economic and Technological Research Institute
Original Assignee
Nanjing University of Science and Technology
State Grid Tianjin Electric Power Co Ltd
State Grid Economic and Technological Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology, State Grid Tianjin Electric Power Co Ltd, State Grid Economic and Technological Research Institute filed Critical Nanjing University of Science and Technology
Priority to CN202011498719.4A priority Critical patent/CN112528113A/en
Publication of CN112528113A publication Critical patent/CN112528113A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a system for dividing terminal users based on power supply reliability multi-dimensional big data, which are characterized by comprising the following contents: 1) acquiring historical data of each characteristic in a multi-dimensional power supply reliability analysis model which is pre-constructed by each terminal user to be classified; 2) extracting the characteristics influencing the power supply reliability of the terminal user to be classified in the multi-dimensional power supply reliability analysis model by adopting a grey correlation method; 3) acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user to be classified; 4) adopting a DBSCAN algorithm to perform preliminary clustering on the acquired data of each terminal user to be classified respectively to obtain the cluster number and the cluster center of each terminal user to be classified; 5) the invention adopts a K-means algorithm, and the cluster number and the cluster center of each terminal user to be classified are respectively used as the cluster number and the mass center to determine the classification of each terminal user to be classified.

Description

Terminal user dividing method and system based on power supply reliability multi-dimensional big data
Technical Field
The invention relates to a method and a system for dividing terminal users based on multi-dimensional big data of power supply reliability, and belongs to the field of big data processing of a power distribution network.
Background
Under the promotion of the rapid development of distributed energy, energy storage and multi-load, the construction and operation modes of the power distribution network are more flexible and diversified, and the capability requirement for meeting the power supply reliability differentiation requirements of different regions and different types of terminal users is increasingly highlighted. Meanwhile, the requirements of different terminal users for power reliability are different, and part of the terminal users have higher requirements for power, so that the terminal users to be classified need to be divided according to the power supply reliability requirements.
The high-reliability power distribution network can meet the increasingly diversified power utilization requirements of terminal users, and provides strong guarantee for supporting economic transformation upgrading high-quality development. However, the power distribution network structures in different areas are complicated, the number of indexes affecting power supply reliability is large, redundant information among the indexes is large, index calculation and modeling are complex, correlation analysis can be performed on the clustered index data by adopting a correlation analysis method, comprehensive analysis on the power supply reliability indexes is realized, main influence factor indexes affecting different terminals are further determined, and dimensionality reduction is realized. Meanwhile, in consideration of the fact that the data volume of the power supply reliability factor is exponentially increased in time dimension and space dimension, and the index of the power supply reliability factor has obvious regionality, the potential correlation of the power supply reliability factor index can be deeply mined by adopting a big data clustering mining method, the index data with commonality are clustered, and the power supply user division can be realized according to different power supply reliability requirements. Therefore, the research on the large data mining of the power supply reliability of the power distribution network terminal users is developed, and the basis is provided for the power distribution network terminal users to divide according to different power supply reliability requirements, so that the method has very important practical significance.
However, at present, no research for classifying the distribution network terminal users according to the power supply reliability requirements is found, and the following problems need to be solved in order to realize comprehensive and scientific terminal user classification based on the power supply reliability multidimensional big data at the beginning stage: 1) dividing the power distribution network terminal users according to the power supply reliability data; 2) how to utilize a big data mining processing method to realize the processing of power supply reliability data.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a method and a system for dividing a terminal user based on multi-dimensional big data of power supply reliability, which can realize the comprehensiveness and scientificity of the division.
In order to achieve the purpose, the invention adopts the following technical scheme: a terminal user dividing method based on power supply reliability multi-dimensional big data comprises the following steps:
1) acquiring historical data of each characteristic in a multi-dimensional power supply reliability analysis model which is pre-constructed by each terminal user to be classified;
2) performing power supply reliability factor correlation analysis according to preset reference characteristics and acquired historical data by adopting a grey correlation method, and extracting characteristics influencing the power supply reliability of the terminal user to be classified in a multi-dimensional power supply reliability analysis model;
3) acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user to be classified;
4) adopting a DBSCAN algorithm to perform preliminary clustering on the acquired data of each terminal user to be classified respectively to obtain the cluster number and the cluster center of each terminal user to be classified;
5) and determining the classification of each terminal user to be classified by using the K-means algorithm and taking the cluster number and the cluster center of each terminal user to be classified as the cluster number and the mass center respectively.
Further, the specific process of step 2) is as follows:
2.1) establishing an original matrix X consisting of m objects and n characteristics according to a pre-constructed multi-dimensional power supply reliability analysis model:
Figure BDA0002838735220000021
in the formula, XiA feature vector of the ith object; xijIs the jth feature of the ith object;
2.2) carrying out normalization processing on the characteristic data in the multidimensional power supply reliability analysis model to obtain the characteristic Y after normalization processingij
Figure BDA0002838735220000022
In the formula, YijThe j characteristic of the ith object after normalization processing is obtained; xminjIs the minimum value of the same-column characteristics in the matrix X; xmax jThe maximum value of the same-column features in the matrix X is obtained, and the normalized matrix Y is as follows:
Figure BDA0002838735220000023
in the formula, YjNormalizing the vectors formed by different objects of the jth characteristic;
2.3) selecting one column of each type of characteristics in the normalized matrix Y as a reference number column Y in turn0
Y0=(Y10,…,Yi0,…,Ym0)T
In the formula, Ym0Characteristics of a certain factor affecting power supply reliability for different objects;
2.4) comparing each column of vectors in the normalized matrix Y with a reference number column Y0Making difference and taking absolute value to obtain absolute difference matrix [ delta ]k]:
k]=|Y-Y0|,k=1,2,...n
In the formula, YkVarious types of features for different objects in the kth column;
2.5) from the matrix of absolute differences [ Delta ]k]Calculating the correlation coefficient xi of each featurej(i):
Figure BDA0002838735220000031
In the formula, xij(i) The correlation coefficient of the jth characteristic of the ith object; deltaijIs a matrix of absolute differences [ Delta ]k]Row i, feature j; rho is a resolution coefficient;
2.6) correlation coefficient ξ for each featurej(i) Calculating a gray relevance value r of each featurej
Figure BDA0002838735220000032
In the formula, rjThe grey correlation value of the jth feature and the reference feature is obtained;
2.7) Gray correlation values r according to the respective characteristicsjSize-to-multi-dimensional power supply reliability analysis modelThe characteristics in the method are sorted, and a grey relevance value r is extractedjCharacteristics that affect the end user power supply reliability above a predetermined threshold.
Further, the specific process of the step 4) is as follows:
4.1) respectively setting the acquired sample data set D of the data of each terminal user to be classified as { x ═ x }1,x2,…,xwDividing the sample data set D into eta clusters;
4.2) determining the minimum value MinPts of the e-neighborhood of each sample data set D and the neighborhood number of the core object in each e-neighborhood;
4.3) determining the clustering center of the sample data set D of each terminal user to be classified according to the belonging-neighborhood of each data in each sample data set D and the minimum MinPts of the neighborhood number of the core object in each belonging-neighborhood.
Further, the specific process of the step 4.3) is as follows:
4.3.1) if a data xpIncludes c (c ≧ MinPts) data, then one more data x is createdpA cluster as a core object;
4.3.2) find all core objects, i.e. data xpE-data in the neighborhood xdE.g. D, and obtaining a clustering center N of the sample data set D of each terminal user to be classified according to clustering(xp):
N(xp)={xd∈D|dist(xd,xp)≤∈}
Wherein N is(xp) The set of the clustering centers of the preliminarily judged sample data set D is obtained; dist is the Euclidean distance.
Further, the specific process of step 5) is as follows:
5.1) obtaining the clustering cluster number eta and the clustering center N of the sample data set D of each terminal user to be classified(xp) Cluster number and initial centroid vector [ mu ] as a K-means algorithm12,…,μΩAnd setting the iteration times N, wherein muΩAs a single centroid vector;omega is a centroid set N(xp) The number of the medium centroid vectors, wherein eta is omega;
5.2) initializing clusters
Figure BDA0002838735220000041
Wherein, CtIs the t-th collection containing data;
Figure BDA0002838735220000042
is an empty set;
5.3) calculating the data xs(s-1, 2, …, w) and each centroid vector μv(v ═ 1,2, …, Ω) distance dsv
dsv=||xsv||2
5.4) according to the distance dsvMinimum value of (2), data xsIs drawn into the corresponding centroid vector muvCluster C oft=vIn this case, C is updatedt=v=Ct=v∪{xs};
5.5) recalculating Cluster CtCentroid vector u oft
Figure BDA0002838735220000043
5.6) if all clusters CtCentroid vector u oftIf no change occurs, the step 5.7) is carried out; otherwise, entering step 5.2) until all iterations are finished;
5.7) output final Cluster partition C ═ C1,C2,…,CηAnd finishing the classification of the terminal users.
Further, the multidimensional power supply reliability analysis model comprises grid structure characteristics, technical equipment level characteristics, equipment quality characteristics, fault cause characteristics and operation and maintenance level characteristics, the network frame structure characteristics comprise inter-station contact rate, rotatable power, average line segmentation number, ring network rate, average length of each line and network connection standardization rate, the technical equipment level characteristics comprise average line load rate, overhead line insulation rate and cabling rate, the equipment quality characteristics comprise medium-voltage line fault rate of bare wires, medium-voltage cable fault rate and medium-voltage insulation fault rate of insulated wires, the fault reason characteristics comprise natural factor caused fault times and external force factor caused fault times, and the operation maintenance level characteristics comprise live working rate (number of users in power failure), average time of emergency in-place fault needs, average power failure duration time of medium-voltage faults and average fault location duration time.
An end user partitioning system based on power supply reliability multi-dimensional big data comprises:
the historical data acquisition module is used for acquiring historical data of each characteristic in a multi-dimensional power supply reliability analysis model which is pre-constructed by each terminal user to be classified;
the characteristic extraction module is used for performing power supply reliability factor correlation analysis according to preset reference characteristics and acquired historical data by adopting a grey correlation degree method and extracting characteristics influencing the power supply reliability of the terminal user to be classified in the multi-dimensional power supply reliability analysis model;
the actual data acquisition module is used for acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user;
the primary clustering module is used for performing primary clustering on the acquired data of each terminal user to be classified by adopting a DBSCAN algorithm to obtain the clustering cluster number and the clustering center of each terminal user to be classified;
and the classification module is used for determining the classification of each terminal user to be classified by respectively taking the cluster number and the cluster center of each terminal user to be classified as the cluster number and the centroid by adopting a K-means algorithm.
A processor comprises computer program instructions, wherein the computer program instructions are used for realizing the steps corresponding to the end user dividing method based on the power supply reliability multi-dimensional big data when being executed by the processor.
A computer readable storage medium, which stores computer program instructions, wherein the computer program instructions, when executed by a processor, are configured to implement the steps corresponding to the above end user partitioning method based on multidimensional big data of power supply reliability.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. according to the invention, the K-means algorithm and the DBSCAN algorithm are adopted simultaneously, so that the clustering error of the DBSCAN algorithm caused by uneven data density and the clustering error of the K-means algorithm caused by poor initial clustering center and clustering number setting can be made up, and further the grade division of the terminal user can be realized scientifically.
2. The invention adopts the GRA method to perform relevance analysis on the clustered terminal user index data with different power supply reliability grades, can improve the sequencing of the power supply reliability influence factors and the accuracy of dimensionality reduction, and can be widely applied to the field of large data processing of the power distribution network.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram illustrating the correlation between power supply reliability characteristics according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a power reliability characteristic and a power reliability rate dependency according to an embodiment of the present invention;
fig. 4 is a schematic diagram of end-user clustering in an embodiment of the present invention.
Detailed Description
The present invention is described in detail below with reference to the attached drawings. It is to be understood, however, that the drawings are provided solely for the purposes of promoting an understanding of the invention and that they are not to be construed as limiting the invention. In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Example 1
As shown in fig. 1, the present embodiment provides an end user partitioning method based on multidimensional big data of power supply reliability, including the following steps:
1) because different end users have different requirements on power supply reliability of different power supply areas, in order to ensure that the power supply reliability of different areas is comprehensively analyzed, a multidimensional power supply reliability analysis model is constructed from five dimensions of a grid structure, a technical equipment level, equipment quality, a fault reason and an operation maintenance level, wherein the grid structure characteristic comprises inter-station contact rate, convertible power, average line segmentation number, ring network rate, average line length and network wiring standardization rate, the technical equipment level characteristic comprises average line load rate, insulation rate of an overhead line and cabling rate, the equipment quality characteristic comprises medium-voltage line fault rate of a bare conductor, medium-voltage cable fault rate and medium-voltage fault rate of an insulated line, The fault cause characteristics comprise the frequency of faults caused by natural factors and the frequency of faults caused by external force factors, and the operation and maintenance level characteristics comprise the live working rate (the number of households in power failure), the mean time of urgent need for in-place of faults, the mean duration time of power failure of medium-voltage faults and the mean duration time of fault location.
2) Acquiring historical data of each feature in a multidimensional power supply reliability analysis model of each terminal user to be classified, performing power supply reliability factor correlation analysis according to preset reference features and the acquired historical data by adopting a gray correlation degree method (GRA), extracting the features influencing the power supply reliability of the terminal users to be classified in the multidimensional power supply reliability analysis model, performing dimensionless quantization processing by exploring the similarity degree of a change curve between two features by the gray correlation degree method, comparing the change situations of the two feature curves, further comparing the correlation between the two features and the power supply reliability, and finally realizing the sequencing of the features influencing the power supply reliability, wherein the method specifically comprises the following steps of:
2.1) according to the constructed multidimensional power supply reliability analysis model, establishing an original matrix X consisting of m objects, namely terminal users to be classified and n characteristics:
Figure BDA0002838735220000061
in the formula, XiA feature vector of the ith object; xijIs the jth feature of the ith object.
2.2) carrying out normalization processing on the characteristic data in the multidimensional power supply reliability analysis model to obtain the characteristic Y after normalization processingij
Figure BDA0002838735220000062
In the formula, YijThe j characteristic of the ith object after normalization processing is obtained; xmin jIs the minimum value of the same-column characteristics in the matrix X; xmax jThe maximum value of the same-column features in the matrix X is obtained, and the normalized matrix Y is as follows:
Figure BDA0002838735220000063
in the formula, YjAnd (4) normalizing the formed vectors for different objects of the jth characteristic.
2.3) selecting one column of each type of characteristics in the normalized matrix Y as a reference number column Y in turn0
Y0=(Y10,…,Yi0,…,Ym0)T (4)
In the formula, Ym0The characteristics of one of different objects influencing the power supply reliability factor.
2.4) comparing each column of vectors in the normalized matrix Y with a reference number column Y0Making difference and taking absolute value to obtain absolute difference matrix [ delta ]k]:
k]=|Yk-Y0|,k=1,2,...n (5)
In the formula, YkAre various types of features of different objects in the k-th column.
2.5) from the matrix of absolute differences [ Delta ]k]Calculating the correlation coefficient xi of each featurej(i):
Figure BDA0002838735220000071
In the formula, xij(i) The correlation coefficient of the jth characteristic of the ith object; deltaijIs a matrix of absolute differences [ Delta ]k]Row i, feature j; rho is a resolution coefficient and is 0.5.
2.6) correlation coefficient ξ for each featurej(i) Calculating a gray relevance value r of each featurej
Figure BDA0002838735220000072
In the formula, rjIs the grey correlation value of the jth feature and the reference feature.
2.7) Gray correlation values r according to the respective characteristicsjThe method sorts the characteristics in the multi-dimensional power supply reliability analysis model and extracts a grey relevance value rjCharacteristics that affect the end user power supply reliability above a predetermined threshold.
3) And acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user.
4) Adopting a DBSCAN algorithm (a noise-based density clustering algorithm), considering that the power supply reliability data exponentially increases in time and space dimensions, and respectively carrying out primary clustering on the acquired data of each terminal user to be classified to obtain the cluster number and the cluster center of each terminal user to be classified, wherein the method specifically comprises the following steps:
4.1) respectively setting the acquired sample data set D of the data of each terminal user to be classified as { x ═ x }1,x2,…,xwAnd dividing the sample data set D into η clusters. Because the acquired data does not have any label information, if the acquired data is divided into eta clusters, one of the eta clusters must be selected as a standard, and therefore the square of the distance | | · | |2 is selected as the standard.
4.2) determining the minimum value MinPts of the belongings to the neighborhood of each sample data set D and the neighborhood number of the core object in each belongings to the neighborhood, wherein the belongings to the neighborhood are the radius of the clustering space.
4.3) determining the clustering center of the sample data set D of each terminal user to be classified according to the belonged to the neighborhood of each data in each sample data set D and the minimum MinPts of the neighborhood number of the core object in each belonged to the neighborhood:
4.3.1) if a data xpIncludes c (c ≧ MinPts) data, then one more data x is createdpA cluster as a core object.
4.3.2) find all core objects, i.e. data xpIs e to-Data x within the neighborhooddBelongs to the group D, and carries out clustering according to the following formula (8) to obtain a clustering center N of the sample data set D of each terminal user to be classified(xp):
N(xp)={xd∈D|dist(xd,xp)≤∈} (8)
Wherein N is(xp) The set of the clustering centers of the preliminarily judged sample data set D is obtained; dist is the Euclidean distance.
5) Adopting a K-means algorithm to cluster the clustering number eta and the clustering center N of each terminal user to be classified(xp) The method is used for determining the classification of each terminal user to be classified as the cluster number and the mass center of a K-means algorithm (K-means clustering algorithm), and specifically comprises the following steps:
5.1) obtaining the clustering cluster number eta and the clustering center N of the sample data set D of each terminal user to be classified(xp) Cluster number and initial centroid vector [ mu ] as a K-means algorithm12,…,μΩAnd setting the iteration times N, wherein muΩAs a single centroid vector; omega is a centroid set N(xp) The number of the medium centroid vectors, η ═ Ω.
5.2) initializing clusters
Figure BDA0002838735220000081
Wherein, CtIs the t-th collection containing data;
Figure BDA0002838735220000082
is an empty set.
5.3) calculating the data xs(s-1, 2, …, w) and respective centroid directionsQuantity muv(v ═ 1,2, …, Ω) distance dsv
dsv=||xsv||2 (9)
5.4) according to the distance dsvMinimum value of (2), data xsIs drawn into the corresponding centroid vector muvCluster C oft=vIn this case, C is updatedt=v=Ct=v∪{xs}。
5.5) recalculating Cluster CtCentroid vector u oft
Figure BDA0002838735220000083
5.6) if all clusters CtCentroid vector u oftIf no change occurs, the step 5.7) is carried out; otherwise, step 5.2) is entered until all iterations are completed.
5.7) output final Cluster partition C ═ C1,C2,…,CηAnd finishing the classification of the terminal users.
The following describes in detail the method for dividing end users based on multidimensional big data of power supply reliability by specific embodiments:
in the embodiment, different power supply terminal users of a power distribution network in a certain city are specifically divided into commercial users, large hospitals, residential users, light industrial users, suburban users, steel mills and petrochemical plants, and based on different requirements of three-level loads on power supply reliability, power supply reliability sample data of different terminal users in the last decade of the certain city are taken as an example, and annual data includes data of 4000 terminal users.
The method is characterized in that the power supply reliability data of the city for 10 years are compared by adopting a GRA (generalized grammes) method, the ordering of the characteristics influencing the power supply reliability is finally realized, and z is adopted for simplifying the description of factor indexesj∈Z,Z={z0,z1,z2,…,zmAnd m is 18, it respectively represents power supply reliability, ring network rate, average line length, network connection standardization rate, inter-station connection rate, transfer rate and average lineThe number of line segments, the average load rate of lines, the insulation rate of overhead lines, the cabling rate, the fault rate of medium-voltage lines of bare conductors, the fault rate of medium-voltage cables, the medium-voltage fault rate of insulated wires, the fault probability caused by natural factors, the fault frequency caused by external force, the live-line working rate (power failure frequency), the average duration time of medium-voltage average faults, the average time of urgent need for in-place faults and the average duration time of fault location are shown in the following table 1, and the characteristics of terminal users partially influencing the power supply reliability are shown in the following table:
table 1: features affecting end-user reliability of power supply
Figure BDA0002838735220000091
As shown in fig. 2, the correlation between the features is visually displayed, and the darker the color is, the larger the correlation value between the two features is, and the stronger the correlation is. As can be seen from FIG. 2, mean time to failure location (z)17) And railway medium voltage failure rate (z)12) The correlation value of (2) reaches 0.88, and the correlation is strong. Thus, both features can be screened in selecting features that affect the ultimate power reliability. As shown in fig. 3, showing the correlation between each feature and the standard, i.e., the power reliability rate, all features may be sorted according to the grey correlation value to select the feature with higher power reliability for the city. Figures 2 and 3 can be used to finalize the features that mainly affect the reliability of the supply of power in this area, as a result of which the live working rate (z) is obtained15)>Mean time of emergency fault location (z)17)>Cable rate (z)9)>Transmissibility (z)5)>Inter-station contact rate (z)4)>Number of failures due to external force (z)14)。
The invention clusters the annual power supply reliability characteristics of each terminal user to be classified through a DBSCAN algorithm and a K-means algorithm, and repeatedly performs cluster verification, since different end users have different requirements on power supply reliability, the invention clusters the end users with the same power supply reliability requirements according to the reliability requirements of the three-level load, as shown in fig. 4, the clustering result of one year is displayed, the clustering center one comprises the main load, such as hospitals, major institutions, steel plants and oil, etc., cluster center two includes secondary loads, such as commercial, residential and light industrial areas, cluster center three includes tertiary loads for suburban users, the clustering result in fig. 4 can be used to classify the reliability levels of different end users, can reflect the importance of the end users, and provides a certain data support for the scheduling system.
Example 2
The embodiment provides a system for dividing end users based on multidimensional big data of power supply reliability, which comprises:
the characteristic extraction module is used for carrying out power supply reliability factor correlation analysis on each characteristic in a pre-constructed multi-dimensional power supply reliability analysis model according to a preset reference characteristic by adopting a grey correlation method, and extracting the characteristic influencing the power supply reliability of a terminal user;
the data acquisition module is used for acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user;
the primary clustering module is used for performing primary clustering on the acquired data of each terminal user to be classified by adopting a DBSCAN algorithm to obtain the clustering cluster number and the clustering center of each terminal user to be classified;
and the classification module is used for determining the classification of each terminal user to be classified by respectively taking the cluster number and the cluster center of each terminal user to be classified as the cluster number and the centroid by adopting a K-means algorithm.
Example 3
The present embodiment provides a processing device corresponding to the method for dividing an end user based on multidimensional big data with power supply reliability provided in embodiment 1, where the processing device may be a processing device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, to execute the method of embodiment 1.
The processing equipment comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus so as to complete mutual communication. The memory stores a computer program capable of running on the processor, and the processor executes the method for dividing the end user based on the multidimensional big data of power supply reliability provided by the embodiment 1 when running the computer program.
In some implementations, the Memory may be a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory.
In other implementations, the processor may be various general-purpose processors such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), and the like, and is not limited herein.
Example 4
The end-user partitioning method based on the supply reliability multidimensional big data of embodiment 1 can be embodied as a computer program product, and the computer program product can include a computer readable storage medium on which computer readable program instructions for executing the voice recognition method described in embodiment 1 are loaded.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any combination of the foregoing.
The above embodiments are only used for illustrating the present invention, and the structure, connection mode, manufacturing process, etc. of the components may be changed, and all equivalent changes and modifications performed on the basis of the technical solution of the present invention should not be excluded from the protection scope of the present invention.

Claims (9)

1. A terminal user dividing method based on power supply reliability multi-dimensional big data is characterized by comprising the following steps:
1) acquiring historical data of each characteristic in a multi-dimensional power supply reliability analysis model which is pre-constructed by each terminal user to be classified;
2) performing power supply reliability factor correlation analysis according to preset reference characteristics and acquired historical data by adopting a grey correlation method, and extracting characteristics influencing the power supply reliability of the terminal user to be classified in a multi-dimensional power supply reliability analysis model;
3) acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user to be classified;
4) adopting a DBSCAN algorithm to perform preliminary clustering on the acquired data of each terminal user to be classified respectively to obtain the cluster number and the cluster center of each terminal user to be classified;
5) and determining the classification of each terminal user to be classified by using the K-means algorithm and taking the cluster number and the cluster center of each terminal user to be classified as the cluster number and the mass center respectively.
2. The method for dividing the end user based on the multidimensional big data of the power supply reliability as claimed in claim 1, wherein the specific process of the step 2) is as follows:
2.1) establishing an original matrix X consisting of m objects and n characteristics according to a pre-constructed multi-dimensional power supply reliability analysis model:
Figure FDA0002838735210000011
in the formula, XiA feature vector of the ith object; xijIs the jth feature of the ith object;
2.2) carrying out normalization processing on the characteristic data in the multidimensional power supply reliability analysis model to obtain the characteristic Y after normalization processingij
Figure FDA0002838735210000012
In the formula, YijThe j characteristic of the ith object after normalization processing is obtained; xmin jIs the minimum value of the same-column characteristics in the matrix X; xmax jIs the maximum value of the same-column features in the matrix X, thenThe normalized matrix Y is:
Figure FDA0002838735210000021
in the formula, YjNormalizing the vectors formed by different objects of the jth characteristic;
2.3) selecting one column of each type of characteristics in the normalized matrix Y as a reference number column Y in turn0
Y0=(Y10,...,Yi0,...,Ym0)T
In the formula, Ym0Characteristics of a certain factor affecting power supply reliability for different objects;
2.4) comparing each column of vectors in the normalized matrix Y with a reference number column Y0Making difference and taking absolute value to obtain absolute difference matrix [ delta ]k]:
k]=|Yk-Y0|,k=1,2,...n
In the formula, YkVarious types of features for different objects in the kth column;
2.5) from the matrix of absolute differences [ Delta ]k]Calculating the correlation coefficient xi of each featurej(i):
Figure FDA0002838735210000022
In the formula, xij(i) The correlation coefficient of the jth characteristic of the ith object; deltaijIs a matrix of absolute differences [ Delta ]k]Row i, feature j; rho is a resolution coefficient;
2.6) correlation coefficient ξ for each featurej(i) Calculating a gray relevance value r of each featurej
Figure FDA0002838735210000023
In the formula, rjThe grey correlation value of the jth feature and the reference feature is obtained;
2.7) Gray correlation values r according to the respective characteristicsjThe method sorts the characteristics in the multi-dimensional power supply reliability analysis model and extracts a grey relevance value rjCharacteristics that affect the end user power supply reliability above a predetermined threshold.
3. The method for dividing the end user based on the multidimensional big data of the power supply reliability as claimed in claim 1, wherein the specific process of the step 4) is as follows:
4.1) respectively setting the acquired sample data set D of the data of each terminal user to be classified as { x ═ x }1,x2,...,xwDividing the sample data set D into eta clusters;
4.2) determining the minimum value MinPts of the e-neighborhood of each sample data set D and the neighborhood number of the core object in each e-neighborhood;
4.3) determining the clustering center of the sample data set D of each terminal user to be classified according to the belonging-neighborhood of each data in each sample data set D and the minimum MinPts of the neighborhood number of the core object in each belonging-neighborhood.
4. The method for dividing the end user based on the multidimensional big data of the power supply reliability as claimed in claim 3, wherein the specific process of the step 4.3) is as follows:
4.3.1) if a data xpIncludes c (c ≧ MinPts) data, then one more data x is createdpA cluster as a core object;
4.3.2) find all core objects, i.e. data xpE-data in the neighborhood xdE.g. D, and obtaining a clustering center N of the sample data set D of each terminal user to be classified according to clustering(xp):
N(xp)={xd∈D|dist(xd,xp)≤∈}
Wherein N is(xp) Sample data set for preliminary judgmentD, a set of cluster centers; dist is the Euclidean distance.
5. The method for dividing the end user based on the multidimensional big data of the power supply reliability as claimed in claim 1, wherein the specific process of the step 5) is as follows:
5.1) obtaining the clustering cluster number eta and the clustering center N of the sample data set D of each terminal user to be classified(xp) Cluster number and initial centroid vector [ mu ] as a K-means algorithm1,μ2,...,μΩAnd setting the iteration times N, wherein muΩAs a single centroid vector; omega is a centroid set N(xp) The number of the medium centroid vectors, wherein eta is omega;
5.2) initializing clusters
Figure FDA0002838735210000032
Wherein, CtIs the t-th collection containing data;
Figure FDA0002838735210000033
is an empty set;
5.3) calculating the data xs(s ═ 1,2,. eta., w) and each centroid vector μv(v ═ 1, 2.., Ω) distance dsv
dsv=||xsv||2
5.4) according to the distance dsvMinimum value of (2), data xsIs drawn into the corresponding centroid vector muvCluster C oft=vIn this case, C is updatedt=v=Ct=v∪{xs};
5.5) recalculating Cluster CtCentroid vector u oft
Figure FDA0002838735210000031
5.6) if all clusters CtCentroid vector u oftIf none of the changes occur, enterStep 5.7); otherwise, entering step 5.2) until all iterations are finished;
5.7) output final Cluster partition C ═ C1,C2,...,CηAnd finishing the classification of the terminal users.
6. The method as claimed in claim 1, wherein the multidimensional power supply reliability analysis model includes a grid structure characteristic, a technical equipment level characteristic, an equipment quality characteristic, a fault cause characteristic and an operation maintenance level characteristic, wherein the grid structure characteristic includes an inter-station contact rate, a convertible power, an average line segment number, a ring network rate, an average line length and a network connection standardization rate, the technical equipment level characteristic includes an average line load rate, an overhead line insulation rate and a cabling rate, the equipment quality characteristic includes a bare conductor medium voltage line fault rate, a medium voltage cable fault rate and an insulated line medium voltage fault rate, the fault cause characteristic includes a natural factor-induced fault number and an external factor-induced fault number, and the operation maintenance level characteristic includes a live working rate (number of users in power failure), Mean time to put in place, mean duration of medium voltage fault outage and mean duration of fault location are urgently needed.
7. An end user partitioning system based on multi-dimensional big data of power supply reliability is characterized by comprising:
the historical data acquisition module is used for acquiring historical data of each characteristic in a multi-dimensional power supply reliability analysis model which is pre-constructed by each terminal user to be classified;
the characteristic extraction module is used for performing power supply reliability factor correlation analysis according to preset reference characteristics and acquired historical data by adopting a grey correlation degree method and extracting characteristics influencing the power supply reliability of the terminal user to be classified in the multi-dimensional power supply reliability analysis model;
the actual data acquisition module is used for acquiring corresponding data of each terminal user to be classified according to the extracted characteristics influencing the power supply reliability of the terminal user;
the primary clustering module is used for performing primary clustering on the acquired data of each terminal user to be classified by adopting a DBSCAN algorithm to obtain the clustering cluster number and the clustering center of each terminal user to be classified;
and the classification module is used for determining the classification of each terminal user to be classified by respectively taking the cluster number and the cluster center of each terminal user to be classified as the cluster number and the centroid by adopting a K-means algorithm.
8. A processor, characterized by comprising computer program instructions, wherein the computer program instructions, when executed by the processor, are configured to implement the steps corresponding to the method for partitioning an end user based on power supply reliability multidimensional big data according to any one of claims 1 to 6.
9. A computer readable storage medium, wherein the computer readable storage medium stores thereon computer program instructions, and when executed by a processor, the computer program instructions are configured to implement the steps corresponding to the method for dividing an end user based on multidimensional big data of power supply reliability as recited in any one of claims 1 to 6.
CN202011498719.4A 2020-12-16 2020-12-16 Terminal user dividing method and system based on power supply reliability multi-dimensional big data Pending CN112528113A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011498719.4A CN112528113A (en) 2020-12-16 2020-12-16 Terminal user dividing method and system based on power supply reliability multi-dimensional big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011498719.4A CN112528113A (en) 2020-12-16 2020-12-16 Terminal user dividing method and system based on power supply reliability multi-dimensional big data

Publications (1)

Publication Number Publication Date
CN112528113A true CN112528113A (en) 2021-03-19

Family

ID=75001222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011498719.4A Pending CN112528113A (en) 2020-12-16 2020-12-16 Terminal user dividing method and system based on power supply reliability multi-dimensional big data

Country Status (1)

Country Link
CN (1) CN112528113A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948524A (en) * 2021-04-21 2021-06-11 广东电网有限责任公司计量中心 Intelligent electric meter operation area grouping method and system based on environment and geographic characteristics
CN116883059A (en) * 2023-09-06 2023-10-13 山东德源电力科技股份有限公司 Distribution terminal management method and system
CN117543791A (en) * 2023-11-08 2024-02-09 深圳市瀚海星光科技有限公司 Power supply detection method, device, equipment and storage medium for power supply

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190121836A1 (en) * 2017-10-23 2019-04-25 State Grid Zhejiang Electric Power Company Limited Support tensor machine based neutral point grounding mode decision method and system
CN110210740A (en) * 2019-05-22 2019-09-06 广西电网有限责任公司电力科学研究院 A kind of distribution network reliability evaluation method considering power supply quality
CN111724278A (en) * 2020-06-11 2020-09-29 国网吉林省电力有限公司 Fine classification method and system for power multi-load users
CN111950620A (en) * 2020-08-07 2020-11-17 国网能源研究院有限公司 User screening method based on DBSCAN and K-means algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190121836A1 (en) * 2017-10-23 2019-04-25 State Grid Zhejiang Electric Power Company Limited Support tensor machine based neutral point grounding mode decision method and system
CN110210740A (en) * 2019-05-22 2019-09-06 广西电网有限责任公司电力科学研究院 A kind of distribution network reliability evaluation method considering power supply quality
CN111724278A (en) * 2020-06-11 2020-09-29 国网吉林省电力有限公司 Fine classification method and system for power multi-load users
CN111950620A (en) * 2020-08-07 2020-11-17 国网能源研究院有限公司 User screening method based on DBSCAN and K-means algorithm

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948524A (en) * 2021-04-21 2021-06-11 广东电网有限责任公司计量中心 Intelligent electric meter operation area grouping method and system based on environment and geographic characteristics
CN112948524B (en) * 2021-04-21 2024-04-26 广东电网有限责任公司计量中心 Intelligent ammeter operation area grouping method and system based on environment and geographic features
CN116883059A (en) * 2023-09-06 2023-10-13 山东德源电力科技股份有限公司 Distribution terminal management method and system
CN116883059B (en) * 2023-09-06 2023-11-28 山东德源电力科技股份有限公司 Distribution terminal management method and system
CN117543791A (en) * 2023-11-08 2024-02-09 深圳市瀚海星光科技有限公司 Power supply detection method, device, equipment and storage medium for power supply
CN117543791B (en) * 2023-11-08 2024-07-12 深圳市瀚海星光科技有限公司 Power supply detection method, device, equipment and storage medium for power supply

Similar Documents

Publication Publication Date Title
CN112528113A (en) Terminal user dividing method and system based on power supply reliability multi-dimensional big data
CN109492103B (en) Label information acquisition method and device, electronic equipment and computer readable medium
CN114676883A (en) Power grid operation management method, device and equipment based on big data and storage medium
CN108333468B (en) The recognition methods of bad data and device under a kind of active power distribution network
CN110569316A (en) low-voltage distribution area user topology identification method based on t-SNE dimension reduction technology and BIRCH clustering
CN109190672A (en) Operation of Electric Systems operating condition unsupervised clustering and device
CN113408548A (en) Transformer abnormal data detection method and device, computer equipment and storage medium
CN113505537A (en) Building energy consumption detection method and device, computer equipment and storage medium
CN111709554A (en) Method and system for joint prediction of net loads of power distribution network
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN118114127B (en) Building model data analysis method and system based on BIM
CN111784066B (en) Method, system and equipment for predicting annual operation efficiency of power distribution network
CN110555138B (en) Hybrid cloud storage method under cloud computing architecture
CN117787572B (en) Abnormal electricity utilization user identification method and device, storage medium and electronic equipment
CN116169670A (en) Short-term non-resident load prediction method and system based on improved neural network
CN112329432B (en) Power distribution network voltage out-of-limit problem correlation analysis method based on improved Apriori
CN117608630A (en) Code quality detection method, device, equipment and storage medium
CN114372835B (en) Comprehensive energy service potential customer identification method, system and computer equipment
CN115034762A (en) Post recommendation method and device, storage medium, electronic equipment and product
CN111654853B (en) Data analysis method based on user information
CN113222624B (en) Intelligent analysis method and system for preventing electricity stealing
CN115186138A (en) Comparison method and terminal for power distribution network data
CN113822301B (en) Sorting center sorting method and device, storage medium and electronic equipment
CN117911662B (en) Digital twin scene semantic segmentation method and system based on depth hough voting
CN113537734B (en) Energy data application catalog extraction method based on maximum correlation minimum redundancy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211216

Address after: 102209 Beijing Changping District Beiqijia future science and Technology North District Power Grid Corp office area

Applicant after: STATE GRID ECONOMIC AND TECHNOLOGICAL RESEARCH INSTITUTE Co.,Ltd.

Applicant after: STATE GRID TIANJIN ELECTRIC POWER Co.

Address before: 102209 Beijing Changping District Beiqijia future science and Technology North District Power Grid Corp office area

Applicant before: STATE GRID ECONOMIC AND TECHNOLOGICAL RESEARCH INSTITUTE Co.,Ltd.

Applicant before: STATE GRID TIANJIN ELECTRIC POWER Co.

Applicant before: NANJING University OF SCIENCE AND TECHNOLOGY

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20210319

RJ01 Rejection of invention patent application after publication