CN116894744A - Power grid user data analysis method based on improved k-means clustering algorithm - Google Patents

Power grid user data analysis method based on improved k-means clustering algorithm Download PDF

Info

Publication number
CN116894744A
CN116894744A CN202310898484.5A CN202310898484A CN116894744A CN 116894744 A CN116894744 A CN 116894744A CN 202310898484 A CN202310898484 A CN 202310898484A CN 116894744 A CN116894744 A CN 116894744A
Authority
CN
China
Prior art keywords
cluster
clustering
data
centers
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310898484.5A
Other languages
Chinese (zh)
Inventor
金家桢
刘天慈
成诚
凌在汛
马小强
彭舜尧
孔令威
谌思桐
冷小聪
刘思杰
洪度
邓海伟
何顺帆
吴笑民
邓桂平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Suizhou Power Supply Co of State Grid Hubei Electric Power Co Ltd
South Central Minzu University
Original Assignee
Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
South Central University for Nationalities
Suizhou Power Supply Co of State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd, South Central University for Nationalities, Suizhou Power Supply Co of State Grid Hubei Electric Power Co Ltd filed Critical Electric Power Research Institute of State Grid Hubei Electric Power Co Ltd
Priority to CN202310898484.5A priority Critical patent/CN116894744A/en
Publication of CN116894744A publication Critical patent/CN116894744A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Public Health (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Probability & Statistics with Applications (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention relates to the field of power customer service data analysis, and discloses a power grid user data analysis method based on an improved k-means clustering algorithm, which comprises the following steps: preprocessing customer power data; constructing a K-means clustering algorithm model; model improvement and optimization, namely, adopting a K mean value++ method to enable the distance between initial clustering centers to be as far as possible, visualizing the performance of the clustering model under different K values by using an elbow graph to determine an optimal clustering number range, calculating a score value to evaluate a clustering effect, and selecting a clustering number; and carrying out cluster analysis on the data set by using the optimized cluster model, wherein various types of cluster center corresponding samples are used as representatives of the users. The invention classifies the data by utilizing the customer electricity utilization data through the improved k-means clustering algorithm, thereby realizing intelligent analysis of the user behavior, realizing classification and prejudgment of the power grid user electricity utilization behavior, improving the working efficiency of power customer service and providing data support for the treatment of power problems.

Description

Power grid user data analysis method based on improved k-means clustering algorithm
Technical Field
The invention relates to the field of power customer service data analysis, and particularly discloses a power grid user data analysis method based on an improved k-means clustering algorithm.
Background
At present, the power customer service mainly relies on manual statistics and summarization of data analysts to carry out classification analysis on work orders, and the classification result is greatly influenced by subjective judgment of the staff and is difficult to discover key electricity utilization problems from a large amount of data. To increase the service capacity for power customers, it is desirable to provide a more efficient data processing scheme.
Cluster analysis is a common data analysis method, in which samples in a data set are divided into clusters by a clustering algorithm, and each cluster may correspond to a class of samples with the same characteristics. The k-means clustering algorithm is a clustering analysis algorithm for iterative solution, which is widely used due to simplicity and efficiency, but has some defects: the selected initial clustering center has a larger influence on the clustering result; the k value, i.e. the number of clusters, is difficult to predict and give.
CN114971411a discloses a method for analyzing the user's power-free behavior of a power distribution network based on data driving, which comprises the following steps: acquiring historical electricity utilization data of different users, and calculating corresponding power factor data; preprocessing the power factor data to obtain a power factor daily curve and power factor daily sequence data; step S3: performing dimension reduction and clustering on the power factor daily sequence data to obtain a classification result of the power factor daily curve; step S4: and performing non-functional electricity behavior analysis to obtain non-functional electricity utilization modes and non-functional characteristics of different users. The method analyzes the non-functional electricity behavior of typical users, and is beneficial to electricity management and economic and safe operation of the power distribution network; the multi-layer convolution self-encoder can effectively extract deep features, has high accuracy, performs cluster analysis on the deep features with low dimension, can reduce cluster time and improves cluster efficiency.
CN113886669a discloses a self-adaptive clustering method for electric power user portraits, which adopts an automatic encoder principle to realize feature extraction, uses a proper square loss function to reduce the dimension of high-dimension data, and obtains low-dimension information with higher information density; performing cluster analysis by adopting K-means algorithm operation, and obtaining initial cluster category in low dimensionality by the low dimensionality information; adopting a unimodal statistical test as a basic algorithm of fusion to carry out category fusion; feature extraction, cluster analysis and class fusion optimization are integrated, a cluster mode is constructed, a single peak statistical test value among classes is calculated after initial cluster classes are obtained, and the inter-class fusion is carried out according to the value; the method has the advantages that the proper number of class clusters is obtained under the condition that the number of the classes is not known in advance, and the clustering performance is effectively improved. The method solves the problems that the prior art replaces one cluster parameter with other parameters, has poor and satisfactory effect on constructing a cluster mode by large-scale high-dimension data, and the cluster performance is unsatisfactory.
However, the above-mentioned prior art cannot classify and predict the electricity consumption behavior of the power grid user, so as to improve the working efficiency of the power customer service and provide data support for the disposal of the power problem.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a power grid user data analysis method based on an improved k-means clustering algorithm, which processes the power consumption data of a customer to be converted into a standard format, classifies the data by utilizing the improved k-means clustering algorithm, and further realizes intelligent analysis of user behaviors.
A power grid user data analysis method based on an improved k-means clustering algorithm comprises the following steps:
s1, preprocessing customer power data;
s2, constructing a K-means clustering algorithm model;
s3, model improvement and tuning: in order to select reasonable initial clustering centers, a K mean value++ method is adopted to enable the distance between the initial clustering centers to be as far as possible, the performance of the clustering model under different K values is visualized by an elbow graph to determine an optimal clustering number range, a Calinski-Harabasz score value is calculated to evaluate the clustering effect, and the clustering number is selected;
and S4, performing cluster analysis on the data set by using the optimized cluster model, wherein various types of cluster center corresponding samples are used as representatives of the users.
Further, the above-mentioned method for analyzing the power grid user data based on the improved k-means clustering algorithm, wherein the step S1 specifically includes: extracting key information from the customer power data, sorting the key information into an array, wherein each element is a binary group, and normalizing and standardizing the data; and detecting and eliminating abnormal points of the data, and reducing the influence of the outliers on the mean value.
Further, in the above method for analyzing power grid user data based on improved k-means clustering algorithm, the step S2 establishes a k-means clustering algorithm model, and the algorithm is as follows:
setting a value k of a cluster number;
k samples are selected as initial clustering centers;
for each sample in the dataset, calculating its Euclidean distance to k cluster centers and classifying it into the class of cluster centers closest to it;
for each class, recalculating its cluster center;
the first two steps are repeated until a predetermined number of iterations is reached.
Further, the above-mentioned power grid user data analysis method based on the improved k-means clustering algorithm, and the specific process of step S3 is as follows:
firstly, selecting an initial clustering center of a data set to be analyzed, wherein the method comprises the following steps: randomly selecting a center point m 1 Calculating the furthest distance D from the other data points to the previous n selected centers i And with probabilitySelecting a new center point, and repeating until all initial cluster centers are selected;
the method for selecting the optimal cluster number comprises the following steps: and sequentially calculating the sum of squares of errors in the total class of the clustering results of each clustering number in a reasonable clustering number k value range:
wherein c i For the collection of the ith cluster data points, m i Is the cluster center of the ith cluster, p is c i Data points, WCSS is p to m in all clusters i Is the sum of the squares of the distances of (a).
Drawing a line graph of the cluster number k and the error square sum WCSS, namely an elbow graph, wherein the sudden and gentle part of the line drop rate is the elbow point of the elbow graph, taking the optimal cluster number in the range, further calculating the Calinski-Harabasz score value to evaluate the clustering effect, and selecting the cluster number.
Further, after the data set is subjected to cluster analysis, coordinate values of various cluster centers are compared with a data preprocessing rule, so that specific electricity utilization characteristics of samples corresponding to the cluster centers can be obtained, and the cluster centers are considered to be the same as the characteristics of the cluster centers.
The invention is based on the analysis of the data of the power grid user by improving the k-means clustering algorithm, can realize classification and prejudgment of the power utilization behavior of the power grid user, improves the working efficiency of the power customer service, and provides data support for the treatment of the power problem.
Drawings
FIG. 1 is a flowchart of a k-means clustering algorithm.
FIG. 2 is a graph of the sum of squares of the errors in the total class of different cluster numbers and corresponding cluster results, i.e., an elbow graph.
FIG. 3 is a graph of Calinski-Harabasz scores for different cluster numbers.
FIG. 4 is a graph of the results of a cluster analysis of grid user data using a modified k-means clustering algorithm.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1 to 4, an embodiment of the present invention provides a power grid user data analysis method based on an improved k-means clustering algorithm, including the following steps:
s1, preprocessing the customer power data. Specifically, the key information is extracted from the power data of the rural power grid customer in a certain area, and is organized into an array, wherein each element is a binary group, in this example [ date, power consumption ]. Further normalizing and normalizing the data; detecting and removing abnormal points of the data, and reducing the influence of the outliers on the mean value; .
S2, establishing a k-means clustering algorithm model, wherein the algorithm is as shown in fig. 1:
setting a value k of a cluster number;
k samples are selected as initial clustering centers;
for each sample in the dataset, calculating its Euclidean distance to k cluster centers and classifying it into the class of cluster centers closest to it;
for each class, recalculating its cluster center;
the first two steps are repeated until the number of iterations is 20.
S3, improving and optimizing the model, wherein the specific process is as follows:
firstly, selecting an initial clustering center of a data set to be analyzed, wherein the method comprises the following steps: randomly selecting a center point m 1 Calculating the furthest distance D from the other data points to the previous n selected centers i And with probabilitySelecting a new center point, and repeating until all initial cluster centers are selected;
the method for selecting the optimal cluster number comprises the following steps: and sequentially calculating the sum of squares of errors in the total class of the clustering results of each clustering number in a reasonable clustering number k value range:
wherein c i For the collection of the ith cluster data points, m i Is the cluster center of the ith cluster, p is c i Data points, WCSS is p to m in all clusters i Is the sum of the squares of the distances of (a).
A graph of the clustering number k and WCSS, i.e., an elbow graph, is drawn as shown in fig. 2. The point where the fold line drop rate suddenly becomes gentle, i.e., the elbow point of the elbow graph, "elbow", is considered to be the vicinity where the optimal number of clusters is obtained. Number of Clusters in fig. 2 refers to the number of families.
Further calculating Calinski-Harabasz score values obtained by clustering with different cluster numbers in the range, evaluating the clustering effect, and selecting 6 as the optimal cluster number when the cluster number is 6 as shown in figure 3.
And S4, performing cluster analysis on the data set by using the optimized cluster model, wherein various types of cluster center corresponding samples are used as representatives of the users. The specific process is as follows: the clustering number is set to be 6, an initial clustering center is selected, and clustering analysis is carried out on the data set, and the result is shown in fig. 4. And comparing the coordinate values of the clustering centers of various types with a data preprocessing rule to obtain specific electricity utilization characteristics of samples corresponding to the clustering centers, wherein the clustering centers are considered to be the same as the characteristics of the clustering centers.
In this example, 6 cluster centers are obtained:
[5.25,3.70],[7.69,2.00],[4.70,0.54],[9.25,0.34],[6.27,0.63],[7.73,0.20]。
comparison of raw data yields: the clustering center 1 is used for agricultural irrigation and drainage users to drain irrigation electricity, and each sample in the clustering center is used for agricultural irrigation and drainage electricity, and is characterized in that a pump station is required to work at high strength in the agricultural production within 5-6 months, so that the electricity consumption is high; the clustering center 2 is used for power utilization of residents of a certain user in summer, and the power utilization mode of each sample in the clustering center is close to that of the residents of the city, and the power utilization amount of high-power electric appliances such as an air conditioner is large; other clustering centers are all the life electricity utilization of ordinary rural residents.
The invention analyzes the power grid user data based on the improved k-means clustering algorithm, realizes classification and prejudgment of the power grid user electricity consumption behavior, improves the working efficiency of power customer service, and provides data support for the treatment of power problems.
The foregoing is merely illustrative embodiments of the present invention, and the present invention is not limited thereto, and any changes or substitutions that may be easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (5)

1. A power grid user data analysis method based on an improved k-means clustering algorithm is characterized by comprising the following steps of: the method comprises the following steps:
s1, preprocessing customer power data;
s2, constructing a K-means clustering algorithm model;
s3, model improvement and tuning: in order to select reasonable initial clustering centers, a K mean value++ method is adopted to enable the distance between the initial clustering centers to be as far as possible, the performance of the clustering model under different K values is visualized by an elbow graph to determine an optimal clustering number range, a Calinski-Harabasz score value is calculated to evaluate the clustering effect, and the clustering number is selected;
and S4, performing cluster analysis on the data set by using the optimized cluster model, wherein various types of cluster center corresponding samples are used as representatives of the users.
2. The method for analyzing the power grid user data based on the improved k-means clustering algorithm as claimed in claim 1, wherein the method comprises the following steps of: the step S1 specifically includes: extracting key information from the customer power data, sorting the key information into an array, wherein each element is a binary group, and normalizing and standardizing the data; and detecting and eliminating abnormal points of the data, and reducing the influence of the outliers on the mean value.
3. The method for analyzing the power grid user data based on the improved k-means clustering algorithm as claimed in claim 1, wherein the method comprises the following steps of: the step S2 model algorithm is as follows:
setting a value k of a cluster number;
k samples are selected as initial clustering centers;
for each sample in the dataset, calculating Euclidean distances from each sample to k cluster centers and classifying the Euclidean distances into the class of the cluster center closest to the Euclidean distances;
for each class, recalculating its cluster center;
the first two steps are repeated until a predetermined number of iterations is reached.
4. The method for analyzing the power grid user data based on the improved k-means clustering algorithm as claimed in claim 1, wherein the method comprises the following steps of: the specific process of the step S3 is as follows:
firstly, selecting an initial clustering center of a data set to be analyzed, wherein the method comprises the following steps: randomly selecting a center point m 1 Calculating the furthest distance D from the other data points to the previous n selected centers i And with probabilitySelecting a new center point, and repeating until all initial cluster centers are selected;
the method for selecting the optimal cluster number comprises the following steps: and sequentially calculating the sum of squares of errors in the total class of the clustering results of each clustering number in a reasonable clustering number k value range:
drawing a line graph of the cluster number k and the error square sum WCSS, namely an elbow graph, wherein the sudden and gentle position of the fold line descent rate, namely the elbow point of the elbow graph, obtaining the optimal cluster number in the range, further calculating the Calinski-Harabasz score value to evaluate the clustering effect, and selecting the cluster number.
5. The method for analyzing the power grid user data based on the improved k-means clustering algorithm as claimed in claim 1, wherein the method comprises the following steps of: the specific process of the step S4 is as follows: after the data set is subjected to cluster analysis, coordinate values of various cluster centers are compared with a data preprocessing rule to obtain specific electricity utilization characteristics of samples corresponding to the cluster centers, and the clusters are considered to be the same as the characteristics of the cluster centers.
CN202310898484.5A 2023-07-21 2023-07-21 Power grid user data analysis method based on improved k-means clustering algorithm Pending CN116894744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310898484.5A CN116894744A (en) 2023-07-21 2023-07-21 Power grid user data analysis method based on improved k-means clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310898484.5A CN116894744A (en) 2023-07-21 2023-07-21 Power grid user data analysis method based on improved k-means clustering algorithm

Publications (1)

Publication Number Publication Date
CN116894744A true CN116894744A (en) 2023-10-17

Family

ID=88313353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310898484.5A Pending CN116894744A (en) 2023-07-21 2023-07-21 Power grid user data analysis method based on improved k-means clustering algorithm

Country Status (1)

Country Link
CN (1) CN116894744A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709662A (en) * 2016-12-30 2017-05-24 山东鲁能软件技术有限公司 Electrical equipment operation condition classification method
CN109840550A (en) * 2019-01-14 2019-06-04 华南理工大学 A kind of mobile subscriber's application preferences recognition methods based on deep neural network
CN114897097A (en) * 2022-06-06 2022-08-12 国网北京市电力公司 Power consumer portrait method, device, equipment and medium
CN115804575A (en) * 2021-09-13 2023-03-17 中国矿业大学(北京) Individualized physiological state clustering discrimination system based on pulse signals

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709662A (en) * 2016-12-30 2017-05-24 山东鲁能软件技术有限公司 Electrical equipment operation condition classification method
CN109840550A (en) * 2019-01-14 2019-06-04 华南理工大学 A kind of mobile subscriber's application preferences recognition methods based on deep neural network
CN115804575A (en) * 2021-09-13 2023-03-17 中国矿业大学(北京) Individualized physiological state clustering discrimination system based on pulse signals
CN114897097A (en) * 2022-06-06 2022-08-12 国网北京市电力公司 Power consumer portrait method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN101516099B (en) Test method for sensor network anomaly
CN111724278A (en) Fine classification method and system for power multi-load users
CN111680764B (en) Industry reworking and production-resuming degree monitoring method
CN111784093B (en) Enterprise reworking auxiliary judging method based on power big data analysis
CN110795690A (en) Wind power plant operation abnormal data detection method
CN116780781B (en) Power management method for smart grid access
CN112819299A (en) Differential K-means load clustering method based on center optimization
CN112686491A (en) Enterprise power data analysis method based on power consumption behavior
CN113125903A (en) Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN112905671A (en) Time series exception handling method and device, electronic equipment and storage medium
CN117559443A (en) Ordered power utilization control method for large industrial user cluster under peak load
CN111126449A (en) Battery fault classification diagnosis method based on cluster analysis
CN111612054B (en) User electricity stealing behavior identification method based on nonnegative matrix factorization and density clustering
CN111797899B (en) Low-voltage transformer area kmeans clustering method and system
CN117113114A (en) ACO-FCM and feature selection non-invasive load monitoring method based on information entropy
CN117493922A (en) Power distribution network household transformer relation identification method based on data driving
CN116894744A (en) Power grid user data analysis method based on improved k-means clustering algorithm
CN113837096B (en) Rolling bearing fault diagnosis method based on GA random forest
CN116522111A (en) Automatic diagnosis method for remote power failure
CN113298148B (en) Ecological environment evaluation-oriented unbalanced data resampling method
Shen et al. A Novel AI-based Method for EV Charging Load Profile Clustering
Wang et al. Analysis of user’s power consumption behavior based on k-means
CN112668446A (en) Flower pollination algorithm-based method for monitoring wear state of micro milling cutter by optimizing SVM (support vector machine)
CN117216599B (en) Questionnaire data analysis method and system
CN117216520A (en) Method and system for extracting abnormal characteristics of solar line loss rate curve of power distribution station

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination