CN113010504A

CN113010504A - Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm

Info

Publication number: CN113010504A
Application number: CN202110239950.XA
Authority: CN
Inventors: 王子涵; 仲春林; 刘述波; 王国际; 方超; 郑安宁; 张凡; 姚鹏; 姜宇轩
Original assignee: Jiangsu Fangtian Power Technology Co Ltd
Current assignee: Jiangsu Fangtian Power Technology Co Ltd
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2021-06-22
Anticipated expiration: 2041-03-04
Also published as: CN113010504B

Abstract

The invention discloses a method and a system for detecting power data abnormity based on LSTM and improved K-means algorithm in the technical field of power data analysis, comprising the following steps: inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data, and constructing a sample data set; and detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on an improved K-means algorithm. The method and the device have the advantages that the time-sequence feature extraction of the power data is realized, and meanwhile, the abnormal power data can be efficiently identified under the condition that the power data amount is large.

Description

Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm

Technical Field

The invention belongs to the technical field of electric power data analysis, and particularly relates to an electric power data anomaly detection method and system based on LSTM and an improved K-means algorithm.

Background

The electric power data that present production management system gathered because collection terminal quantity is huge, and the electric power data volume that needs to gather is great, gathers frequently high, and transmission mode is various, leads to the electric power data of gathering uneven. However, information such as equipment replacement and transmission conditions cannot be acquired in real time, and whether the quality of the acquired power data reaches the standard cannot be judged, so that the production power utilization condition cannot be accurate. Therefore, the quality of the power data is the basis of power utilization level analysis, and for the power data acquired by the system, the quality of the power data needs to be detected first, the power data with unqualified quality is checked, and the power data is timely acquired. At present, the quality of checking power data by using a machine learning method is becoming mainstream gradually because checking transmission problems, acquisition terminals and the like consumes a large amount of manpower and material resources. The data quality abnormity detection is carried out by utilizing a machine learning method, and an outlier detection algorithm based on clustering is mostly adopted. However, this approach has two problems: 1) the electric power data volume is large, and the convergence of a general clustering method is slow; 2) the power data has the characteristic of time sequence, and the existing method cannot effectively extract the time sequence characteristics of the power data. The existing outlier detection method has the problems of low clustering efficiency, low algorithm convergence speed and the like caused by improper initial clustering center selection.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides the electric power data abnormity detection method and system based on the LSTM and the improved K-means algorithm, which can realize the extraction of the time-sequence characteristics of the electric power data and can efficiently identify the abnormal electric power data under the condition of large electric power data quantity.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a power data anomaly detection method includes: inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data, and constructing a sample data set; and detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on an improved K-means algorithm.

Further, the inputting the collected power sequence data of the user into the trained LSTM model, extracting the time-sequence characteristics of the power sequence data, and constructing a sample data set, includes: inputting the collected power sequence data of the user into a trained LSTM model to obtain a predicted value of the power sequence data; and comparing the predicted value and the true value of the power sequence data to obtain a difference value, wherein the difference value is used as a time sequence characteristic of the power sequence data to describe the power data and construct a sample data set.

Further, in the trained LSTM model, the hidden layer corresponding to each time t except for receiving x_t，x_tData representing the power sequence at time t, and C_t-1，C_t-1Representing the memory state of the hidden layer at time t-1, and by processing these inputs, output h_t，h_tCorresponding to the output value of the hidden layer at the time t, and adding C_tOutput to the hidden layer at the next instant, C_tIndicating the memory state of the hidden layer at time t.

Further, in the trained LSTM model, a memory unit checks h through a forgetting gate_t-1And x_t，h_t-1Represents the output value of the hidden layer at the moment of t-1 and is C_t-1Each of the numbers in (1) outputs a number between 0 and 1, C_t-1The memory state of a hidden layer at the time of t-1 is represented, 1 represents complete retention, and 0 represents complete deletion; the method specifically comprises the following steps:

f_t＝σ(W_f[h_t-1,x_t]+b_f) (2)

wherein f is_tIs the value of the forgetting gate at time t, σ is the sigmoid function, W_fIs the weight of the forgetting gate f, b_fIs the offset of the forgetting gate f, h_t-1Corresponding to the output value of the hidden layer at the time t-1;

using a "memory gate" i_tControlling the influence of the current data input on the state value of the memory cell i_tShowing the state of the memory gate i at the time t; creation using tanh function

Representing a candidate value vector at the time t, and adding the vector into the state of the memory unit; the specific calculation steps are as follows:

i_t＝σ(W_i[h_t-1,x_t]+b_i) (3)

wherein, W_iRepresenting the update weight of the memory gate i, b_iIs the offset of the memory gate i, W_cIs a candidate for the memory gate i, b_cIt is the update of the candidate value offset,

is the candidate vector at time t;

using vectors of candidate values

Combining the state C of the last moment of the memory cell_t-1The state of the memory cell at the current time is updated,

the output of each memory cell is provided by an output gate o_tAnd controlling, wherein the calculation formula is as follows:

o_t＝σ(W_o[h_t-1,x_t]+b_o) (6)

h_t＝o_t tanhC_t (7)

wherein o is_tIs the value of the output gate at time t, W_oIs the weight of the updated output value, b_oIs to update the output value offset, h_tIs the output of the hidden layer at time t.

Further, the method for detecting abnormal data in power sequence data by taking the constructed sample data set as input based on the improved K-means algorithm comprises the following steps: calculating the compactness of all data points in the sample data set, acquiring a data dense area, and further determining an initial clustering center; and calculating Euclidean distances between all data points in the sample data set and each initial clustering center, dividing the data points into K clustering clusters, continuing iteration if the distance between the data point belonging to the clustering cluster and the clustering cluster center is greater than the average distance, and judging the data point as an abnormal data point when the iteration number is greater than or equal to a set value, thereby detecting abnormal data in the power sequence data.

Further, the determining the initial clustering center includes: selecting a data point with highest compactness as a first initial clustering center in a data dense area, and then selecting a data point farthest from the first initial clustering center as a second initial clustering center in the area; next, each initial cluster center is selected as the largest one of the closest distances to the selected initial cluster center.

Further, the closeness of the data points is obtained by:

wherein x is_iRepresents the ith data point, x, in the sample set_jDenotes the jth data point, D (x)_i,x_j) Denotes x_iAnd x_jA distance between, G_t(x_i) Is x_iT most recentA set of adjacent data points.

An electrical data anomaly detection system comprising: the first module is used for inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data and constructing a sample data set; and the second module is used for detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on the improved K-means algorithm.

Compared with the prior art, the invention has the following beneficial effects: according to the method, more valuable time sequence characteristics of the power data are effectively extracted through an LSTM (Long short-term memory neural network) model, so that a predicted value of the power sequence data is obtained, the absolute value of the difference value between the predicted value of the power sequence data and the real power data is used as the time sequence characteristics of the power data, the analyzed time sequence characteristics are combined, an outlier is found out through an improved K-means algorithm suitable for big data under the condition that the power data volume is larger, and the efficiency and the accuracy of data anomaly detection can be effectively improved by fusing the LSTM and the data anomaly detection method of the improved K-means; when the time-sequence characteristic extraction of the electric power data is realized, the outliers with unqualified quality can be efficiently identified under the condition of large electric power data volume.

Drawings

FIG. 1 is a flow chart of abnormal power usage data detection in an embodiment of the present invention;

FIG. 2 is a diagram of the LSTM model architecture in an embodiment of the present invention;

FIG. 3 is a diagram of the neuron structure of the LSTM in the embodiment of the present invention;

FIG. 4 is a flowchart of the K-means clustering algorithm in the embodiment of the present invention;

FIG. 5 is a flow chart of the data anomaly detection algorithm based on the improved K-means in the embodiment of the invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The first embodiment is as follows:

as shown in fig. 1 to 5, a power data abnormality detection method includes: inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data, and constructing a sample data set, wherein the method comprises the following steps: inputting the collected power sequence data of the user into a trained LSTM model to obtain a predicted value of the power sequence data; comparing the predicted value and the true value of the power sequence data to obtain a difference value serving as a time sequence characteristic of the power sequence data, and constructing a sample data set according to the time sequence characteristic; and detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on an improved K-means algorithm.

Firstly, training an LSTM model by using power consumption data of a user, and acquiring time sequence characteristics of the power consumption data; next, predicting power consumption data by using an LSTM model obtained by training, and taking a difference value between a predicted value and an actual value as a characteristic value of the power consumption of the user; finally, abnormal data detection is carried out on the power consumption data of the user by utilizing a data outlier detection algorithm based on an improved K-means algorithm; therefore, the aim of detecting abnormal electricity consumption data by combining the time sequence characteristics of the electricity consumption data of the user under the condition that the electricity consumption data of the user is large is achieved.

The LSTM (Long short-term memory neural network) is a recurrent neural network for processing time series data, and its structure is shown in fig. 2, and the recurrent neural network can be understood as a cyclic pile of a plurality of forward neural networks with the same structure and parameters, and the number of cycles is consistent with the length of the input sequence. In a recurrent neural network, the input is { x }₀,x₁,...,x_nOutput is { h }₀,h₁,...,h_nThe output of the hidden layer is denoted as { C }₀,C₁,...,C_n}. Hidden layer A, i.e. a neuron node in the LSTM network, corresponding to each time t, except for receiving x_t，x_tData representing the power sequence at time t, and C_t-1，C_t-1Representing the memory state of the hidden layer at time t-1, and by processing these inputs, output h_t，h_tCorrespond toHiding the output value of the layer at time t, and dividing C_tOutput to the hidden layer at the next instant, C_tThe memory state of the hidden layer at the time t is shown, so that the processing of the information at the next time is intervened,

wherein U represents the weight of the input layer, W represents the weight of the hidden layer, V represents the weight of the output layer, σ represents the sigmoid function, b_iIs represented by C_tOffset of (b)_oRepresents h_tThe offset of (3).

In the circulation Networks meridian formed in the way, the weight W from the hidden layer to the hidden layer is a 'memory controller' of the whole network, and the weight connected between the hidden layers represents the influence of the past information on the current time information, so that historical memory information is scheduled, and the time sequence information of an input sequence is 'memorized' and 'understood'.

However, the data of the user power consumption to be processed may contain a large amount of history information, that is, the input data sequence of the user power consumption may be long, and in order to deal with the situation that the amount of history information is large, the input time sequence data is long, and important information is lost, the LSTM adopts a "gate" structure to decide to delete or memorize information, and the memory unit of the LSTM is as shown in fig. 3. The first step of the hidden layer a is to decide what information to discard from the cell state, this decision being implemented by a Sigmoid layer called "forgetting gate" which looks at h_t-1(previous output) and x_t(currently entered) and is cell state C_t-1Each digit in (last state) outputs a number between 0 and 1. 1 represents a complete reservation and 0 represents a complete deletion. The selective memory and forgetting enables the LSTM to avoid the problem of information explosion, the LSTM can better understand the information, and the data processing of the forgetting gate is as follows:

f_t＝σ(W_f[h_t-1,x_t]+b_f) (2)

where σ denotes a sigmoid function, W_fIndicating forgetfulnessDoor weight, b_fIndicating forgetting gate bias, x_tData representing input at time t, h_t-1Representing the output value of hidden layer a at time t-1.

After determining the forgotten information, the LSTM needs to decide what information to store. This part is divided into two steps, the first one, using a "memory gate" i_tControlling the influence of the current data input on the state value of the memory unit; second step, create a new candidate vector using the tanh layer

The vector is added to the state of the memory cell. The specific calculation steps are as follows:

i_t＝σ(W_i[h_t-1,x_t]+b_i) (3)

is the candidate vector at time t;

after determining the information needing to be forgotten and the remembered information, using the candidate value vector

o_t＝σ(W_o[h_t-1,x_t]+b_o) (6)

h_t＝o_t tanhC_t (7)

And the LSTM adopts a gradient descent method to update the weights of all layers, so that the cost function value is minimum.

Data of electricity consumption sequence of user { x₀,x₁,...,x_nAfter the data are input into the LSTM, the predicted output { h) of the electricity consumption data of the user is obtained₀,h₁,...,h_nAnd solving a difference value between the output and the real power consumption data, and constructing a sample data set as a characteristic vector of the outlier detection algorithm. And detecting abnormal data of the power consumption of the user by using the detected characteristic vector and an outlier detection algorithm based on an improved K-means algorithm.

The purpose of the K-means clustering algorithm is to cluster unlabeled data sets

The classification into K classes, the steps are shown in fig. 4:

1. randomly selecting K sample points mu in the sample_iServing as the center point of each cluster;

2. calculating the distance between all the sample points and the center of each cluster, and then dividing the sample points into the nearest cluster; the distance calculation method is as follows:

D＝||x-μ_i||² (8)

wherein, x mu_iIs a cluster C_iCenter point of (a):

3. recalculating the cluster center according to the existing sample points in the cluster;

wherein, mu_iIs a cluster C_iCenter point of (a):

4. and (5) repeating the steps 2 and 3.

The K-means algorithm is widely used in the field of anomaly detection as an unsupervised partition clustering algorithm due to high efficiency and simplicity. But because the initial cluster center selection process of the algorithm is random, the clustering effect is easy to fill uncertainty. When the algorithm starts iteration, K initial clustering centers are randomly selected and have no fixed rule. Different iteration starting points have different search paths.

Therefore, the clustering result has a severe dependence on the initial clustering center, so that the final clustering effect is easy to fall into local optimization rather than global optimization. As shown in fig. 1, if the selected initial clustering center is close to the real clustering center, the clustering result is objective and real; as shown in fig. 2, if the randomly selected initial cluster center contains outliers, the final clustering result will have a large error.

Meanwhile, outliers have a significant impact on the clustering results. Each iteration of the algorithm is to divide the cluster-like center according to the characteristic attributes of all data points, and the existence of outliers will certainly cause interference to the cluster center and influence the clustering result.

Therefore, the embodiment is based on an improved K-means algorithm, and detects abnormal data in power sequence data by taking a constructed sample data set as an input, and includes: calculating the compactness of all data points in the sample data set, acquiring a data dense area, and further determining an initial clustering center; and calculating Euclidean distances between all data points in the sample data set and each initial clustering center, dividing the data points into K clustering clusters, continuing iteration if the distance between the data point belonging to the clustering cluster and the clustering cluster center is greater than the average distance, and judging the data point as an abnormal data point when the iteration number is greater than or equal to a set value, thereby detecting abnormal data in the power sequence data. The embodiment changes the selection mode of the initial clustering center, and from the property of the optimal clustering center, the initial clustering center of the algorithm is selected according to the farthest distance in the data tight region by removing the outlier region, so that the initialization process of the algorithm is optimized, and the algorithm obtains a more reasonable initial clustering center before iteration is executed; based on the above, a corresponding anomaly detection algorithm is adopted.

The specific improved K-means algorithm initial point selection principle is as follows:

(1) selection of outliers is avoided. The principle is satisfied, so that the algorithm can be prevented from getting into errors at the beginning, and the result generated by the algorithm is more accurate;

(2) the initial cluster centers are selected and uniformly distributed in the high-density area. Obviously, the true cluster centers should be where the data is most dense and at some distance from each other. Therefore, if the initial clustering center is selected closer to the real clustering center, the iteration times can be reduced, the convergence is accelerated, and the accuracy of the clustering algorithm can be improved.

According to the two principles, the K-means algorithm is improved, the compactness of all data points in the data set is firstly calculated, and sparse data regions are removed to obtain a data point set with high compactness, because the sparse regions are not only far away from the optimal clustering center, but also contain outliers; selecting a data point with highest compactness as a first initial clustering center in a data dense area; then selecting the data point farthest from the first initial clustering center in the area as a second initial clustering center; next, each initial cluster center is selected as the largest one of the closest distances to the selected initial cluster center, so that the uniform distribution of each initial cluster center can be fully ensured. An improved algorithm for initial cluster center selection is described in detail below.

The initialization process optimization algorithm comprises the following steps:

1. for a spatial data set

Each data point x in_iTo find the tightness

Wherein x is_iRepresents the ith data point, x, in the sample set_jDenotes the jth data point, D (x)_i,x_j) Denotes x_iAnd x_jA distance between, G_t(x_i) Is x_iT sets of nearest neighbor data points;

2. delete all compactities in X

Obtaining a dense data point set X';

3. in X', the one with the highest compactness, i.e. Tigh_max(x) X as the first initial cluster center c₁(ii) a Distance c₁The farthest data point is taken as the second initial cluster center c₂(ii) a M (3. ltoreq. m. ltoreq.k) th initial cluster center c_mIs a data point x satisfying the following condition_i，x_i∈X':max(D_min(x_i,c₁),D_min(x_i,c₂),...,D_min(x_i,c_m-1) I 1, 2.., n, until the final K initial cluster centers are obtained.

In the embodiment, firstly, outliers are eliminated as initial centers, so that the iteration starting point of the algorithm is ensured not to deviate from the center of a real cluster in a large range; secondly, the compactness of the data points is used as a main basis for selecting an initial center and accords with the characteristics of the optimal cluster center; finally, the principle of the closest maximum distance ensures uniform distribution of the initial clustering centers.

Due to the characteristics of the K-means algorithm, in each iteration process, if outliers participate in the operation of the cluster center, deviation is brought to a clustering result. Therefore, the abnormal point detection algorithm can be given by utilizing the characteristic that the K-means is sensitive to the outliers, and the abnormal points are detected and eliminated in the iterative process of the algorithm.

The algorithm is as follows:

inputting: d-dimensional data set

And finally, clustering number K, clustering function convergence precision epsilon and nearest neighbor number t.

And (3) outputting: k clustered cluster centers C ═ { C ═ C₁,c₂,...,c_KAnd h, a class cluster label L to which the data xi belongs, and an abnormal point set U.

The method comprises the following steps:

1. setting initial clustering criterion function value J₀0, initial degree of abnormality Abn for each data point x in the dataset_x＝0；

2. For R^dData set in space

Each data x in (2)_iCalculating the tightness;

3. delete all compactities in X

Obtaining a dense data point set X';

4. in X', the one with the highest compactness, i.e. Tigh_max(x) X as the first initial cluster center c₁(ii) a Distance c₁The farthest data point is taken as the second initial cluster center c₂(ii) a M (3. ltoreq. m. ltoreq.K) th initial cluster center c_mIs a data point x satisfying the following condition_i，x_i∈X':max(D_min(x_i,c₁),D_min(x_i,c₂),...,D_min(x_i,c_m-1) I ═ 1, 2.. times, n, until the final K initial cluster centers are obtained, representing K clusters w respectively_j,j＝1,2,...,K；

5. Calculating Euclidean distances between all data points in the X and each clustering center:

where i is 1,2, 3, …, m and j is 1,2, …, K. For data point x, if c_jSuch that D (x, c)_j)＝minD(x,c_j) J 1,2, K, then point x is divided into c_jThe cluster represented, i.e. L_x＝w_j；

6. If the distance between the data point x belonging to the cluster and the cluster center is larger than the average distance in the formed K clusters, namely

Wherein m is_jIs c_jRepresenting the total number of data points owned by the cluster, Abn_x++；

7、Abn_xIf the number X is more than or equal to 3, judging that X is an abnormal point, removing the abnormal point from the data set X, and merging the abnormal point into U;

8. judging clustering criterion function

If the convergence condition | J '-J | is less than or equal to epsilon (J is the function value of the last iteration clustering criterion, and J' is the function value of the current clustering iteration criterion), if not, continuing the iteration in the step 9; if so, finishing the algorithm and outputting C, L and U;

9. recalculating the cluster centers of the various clusters:

then go to step 5, m_jIs c_jRepresenting the total number of data points the cluster owns.

The difference value is analyzed through the algorithm to obtain the abnormal point, and the high-efficiency monitoring on the abnormal point is realized through combining a time sequence algorithm and a clustering algorithm.

According to the method, more valuable time sequence characteristics of the power data are effectively extracted through an LSTM (Long short-term memory neural network) model, so that a predicted value of the power sequence data is obtained, an absolute value of a difference value between the predicted value of the power sequence data and real power data is used as the time sequence characteristics of the power data, an outlier is found out through an improved K-means algorithm suitable for big data under the condition that the power data volume is large by combining the analyzed time sequence characteristics, and the efficiency and the accuracy of data anomaly detection can be effectively improved by fusing the LSTM and the improved K-means data anomaly detection method; when the time-sequence characteristic extraction of the electric power data is realized, the outliers with unqualified quality can be efficiently identified under the condition of large electric power data volume.

Example two:

based on the method for detecting the abnormality of the electric power data according to the first embodiment, the present embodiment provides an electric power data abnormality detection system, including:

the first module is used for inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data and constructing a sample data set;

and the second module is used for detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on the improved K-means algorithm.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A method for detecting power data abnormality is characterized by comprising the following steps:

inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data, and constructing a sample data set;

and detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on an improved K-means algorithm.

2. The method for detecting the power data abnormality according to claim 1, wherein the step of inputting the collected power sequence data of the user into a trained LSTM model, extracting a time-series characteristic of the power sequence data, and constructing a sample data set includes:

inputting the collected power sequence data of the user into a trained LSTM model to obtain a predicted value of the power sequence data;

and comparing the predicted value and the true value of the power sequence data to obtain a difference value, wherein the difference value is used as a time sequence characteristic of the power sequence data to describe the power data and construct a sample data set.

3. The method as claimed in claim 1, wherein in the trained LSTM model, the hidden layer corresponding to each time t is except for x to be received_t，x_tData representing the power sequence at time t, and C_t-1，C_t-1Representing the memory state of the hidden layer at time t-1, and by processing these inputs, output h_t，h_tCorresponding to the output value of the hidden layer at the time t, and adding C_tOutput to the hidden layer at the next instant, C_tIndicating the memory state of the hidden layer at time t.

4. The method for detecting the abnormality of the electric power data as claimed in claim 1, wherein in the trained LSTM model, the memory unit checks h through a forget gate_t-1And x_t，h_t-1Represents the output value of the hidden layer at the moment of t-1 and is C_t-1Each of the numbers in (1) outputs a number between 0 and 1, C_t-1The memory state of a hidden layer at the time of t-1 is represented, 1 represents complete retention, and 0 represents complete deletion; the method specifically comprises the following steps:

f_t＝σ(W_f[h_t-1,x_t]+b_f) (2)

i_t＝σ(W_i[h_t-1,x_t]+b_i) (3)

is the candidate vector at time t;

using vectors of candidate values

o_t＝σ(W_o[h_t-1,x_t]+b_o) (6)

h_t＝o_ttanhC_t (7)

5. The method for detecting the abnormal data of the electric power data according to claim 1, wherein the detecting the abnormal data in the electric power sequence data by taking the constructed sample data set as an input based on the improved K-means algorithm comprises:

calculating the compactness of all data points in the sample data set, acquiring a data dense area, and further determining an initial clustering center;

and calculating Euclidean distances between all data points in the sample data set and each initial clustering center, dividing the data points into K clustering clusters, continuing iteration if the distance between the data point belonging to the clustering cluster and the clustering cluster center is greater than the average distance, and judging the data point as an abnormal data point when the iteration number is greater than or equal to a set value, thereby detecting abnormal data in the power sequence data.

6. The method according to claim 5, wherein the determining an initial clustering center includes: selecting a data point with highest compactness as a first initial clustering center in a data dense area, and then selecting a data point farthest from the first initial clustering center as a second initial clustering center in the area; next, each initial cluster center is selected as the largest one of the closest distances to the selected initial cluster center.

7. The method for detecting an abnormality in electric power data according to claim 5, wherein the degree of closeness of the data points is obtained by:

wherein x is_iRepresents the ith data point, x, in the sample set_jDenotes the jth data point, D (x)_i,x_j) Denotes x_iAnd x_jA distance between, G_t(x_i) Is x_iT sets of nearest neighbor data points.

8. An electric power data abnormality detection system characterized by comprising: