CN113010504A - Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm - Google Patents
Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm Download PDFInfo
- Publication number
- CN113010504A CN113010504A CN202110239950.XA CN202110239950A CN113010504A CN 113010504 A CN113010504 A CN 113010504A CN 202110239950 A CN202110239950 A CN 202110239950A CN 113010504 A CN113010504 A CN 113010504A
- Authority
- CN
- China
- Prior art keywords
- data
- time
- power
- value
- power sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a method and a system for detecting power data abnormity based on LSTM and improved K-means algorithm in the technical field of power data analysis, comprising the following steps: inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data, and constructing a sample data set; and detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on an improved K-means algorithm. The method and the device have the advantages that the time-sequence feature extraction of the power data is realized, and meanwhile, the abnormal power data can be efficiently identified under the condition that the power data amount is large.
Description
Technical Field
The invention belongs to the technical field of electric power data analysis, and particularly relates to an electric power data anomaly detection method and system based on LSTM and an improved K-means algorithm.
Background
The electric power data that present production management system gathered because collection terminal quantity is huge, and the electric power data volume that needs to gather is great, gathers frequently high, and transmission mode is various, leads to the electric power data of gathering uneven. However, information such as equipment replacement and transmission conditions cannot be acquired in real time, and whether the quality of the acquired power data reaches the standard cannot be judged, so that the production power utilization condition cannot be accurate. Therefore, the quality of the power data is the basis of power utilization level analysis, and for the power data acquired by the system, the quality of the power data needs to be detected first, the power data with unqualified quality is checked, and the power data is timely acquired. At present, the quality of checking power data by using a machine learning method is becoming mainstream gradually because checking transmission problems, acquisition terminals and the like consumes a large amount of manpower and material resources. The data quality abnormity detection is carried out by utilizing a machine learning method, and an outlier detection algorithm based on clustering is mostly adopted. However, this approach has two problems: 1) the electric power data volume is large, and the convergence of a general clustering method is slow; 2) the power data has the characteristic of time sequence, and the existing method cannot effectively extract the time sequence characteristics of the power data. The existing outlier detection method has the problems of low clustering efficiency, low algorithm convergence speed and the like caused by improper initial clustering center selection.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides the electric power data abnormity detection method and system based on the LSTM and the improved K-means algorithm, which can realize the extraction of the time-sequence characteristics of the electric power data and can efficiently identify the abnormal electric power data under the condition of large electric power data quantity.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a power data anomaly detection method includes: inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data, and constructing a sample data set; and detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on an improved K-means algorithm.
Further, the inputting the collected power sequence data of the user into the trained LSTM model, extracting the time-sequence characteristics of the power sequence data, and constructing a sample data set, includes: inputting the collected power sequence data of the user into a trained LSTM model to obtain a predicted value of the power sequence data; and comparing the predicted value and the true value of the power sequence data to obtain a difference value, wherein the difference value is used as a time sequence characteristic of the power sequence data to describe the power data and construct a sample data set.
Further, in the trained LSTM model, the hidden layer corresponding to each time t except for receiving xt,xtData representing the power sequence at time t, and Ct-1,Ct-1Representing the memory state of the hidden layer at time t-1, and by processing these inputs, output ht,htCorresponding to the output value of the hidden layer at the time t, and adding CtOutput to the hidden layer at the next instant, CtIndicating the memory state of the hidden layer at time t.
Further, in the trained LSTM model, a memory unit checks h through a forgetting gatet-1And xt,ht-1Represents the output value of the hidden layer at the moment of t-1 and is Ct-1Each of the numbers in (1) outputs a number between 0 and 1, Ct-1The memory state of a hidden layer at the time of t-1 is represented, 1 represents complete retention, and 0 represents complete deletion; the method specifically comprises the following steps:
ft=σ(Wf[ht-1,xt]+bf) (2)
wherein f istIs the value of the forgetting gate at time t, σ is the sigmoid function, WfIs the weight of the forgetting gate f, bfIs the offset of the forgetting gate f, ht-1Corresponding to the output value of the hidden layer at the time t-1;
using a "memory gate" itControlling the influence of the current data input on the state value of the memory cell itShowing the state of the memory gate i at the time t; creation using tanh functionRepresenting a candidate value vector at the time t, and adding the vector into the state of the memory unit; the specific calculation steps are as follows:
it=σ(Wi[ht-1,xt]+bi) (3)
wherein, WiRepresenting the update weight of the memory gate i, biIs the offset of the memory gate i, WcIs a candidate for the memory gate i, bcIt is the update of the candidate value offset,is the candidate vector at time t;
using vectors of candidate valuesCombining the state C of the last moment of the memory cellt-1The state of the memory cell at the current time is updated,
the output of each memory cell is provided by an output gate otAnd controlling, wherein the calculation formula is as follows:
ot=σ(Wo[ht-1,xt]+bo) (6)
ht=ot tanhCt (7)
wherein o istIs the value of the output gate at time t, WoIs the weight of the updated output value, boIs to update the output value offset, htIs the output of the hidden layer at time t.
Further, the method for detecting abnormal data in power sequence data by taking the constructed sample data set as input based on the improved K-means algorithm comprises the following steps: calculating the compactness of all data points in the sample data set, acquiring a data dense area, and further determining an initial clustering center; and calculating Euclidean distances between all data points in the sample data set and each initial clustering center, dividing the data points into K clustering clusters, continuing iteration if the distance between the data point belonging to the clustering cluster and the clustering cluster center is greater than the average distance, and judging the data point as an abnormal data point when the iteration number is greater than or equal to a set value, thereby detecting abnormal data in the power sequence data.
Further, the determining the initial clustering center includes: selecting a data point with highest compactness as a first initial clustering center in a data dense area, and then selecting a data point farthest from the first initial clustering center as a second initial clustering center in the area; next, each initial cluster center is selected as the largest one of the closest distances to the selected initial cluster center.
Further, the closeness of the data points is obtained by:
wherein x isiRepresents the ith data point, x, in the sample setjDenotes the jth data point, D (x)i,xj) Denotes xiAnd xjA distance between, Gt(xi) Is xiT most recentA set of adjacent data points.
An electrical data anomaly detection system comprising: the first module is used for inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data and constructing a sample data set; and the second module is used for detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on the improved K-means algorithm.
Compared with the prior art, the invention has the following beneficial effects: according to the method, more valuable time sequence characteristics of the power data are effectively extracted through an LSTM (Long short-term memory neural network) model, so that a predicted value of the power sequence data is obtained, the absolute value of the difference value between the predicted value of the power sequence data and the real power data is used as the time sequence characteristics of the power data, the analyzed time sequence characteristics are combined, an outlier is found out through an improved K-means algorithm suitable for big data under the condition that the power data volume is larger, and the efficiency and the accuracy of data anomaly detection can be effectively improved by fusing the LSTM and the data anomaly detection method of the improved K-means; when the time-sequence characteristic extraction of the electric power data is realized, the outliers with unqualified quality can be efficiently identified under the condition of large electric power data volume.
Drawings
FIG. 1 is a flow chart of abnormal power usage data detection in an embodiment of the present invention;
FIG. 2 is a diagram of the LSTM model architecture in an embodiment of the present invention;
FIG. 3 is a diagram of the neuron structure of the LSTM in the embodiment of the present invention;
FIG. 4 is a flowchart of the K-means clustering algorithm in the embodiment of the present invention;
FIG. 5 is a flow chart of the data anomaly detection algorithm based on the improved K-means in the embodiment of the invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The first embodiment is as follows:
as shown in fig. 1 to 5, a power data abnormality detection method includes: inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data, and constructing a sample data set, wherein the method comprises the following steps: inputting the collected power sequence data of the user into a trained LSTM model to obtain a predicted value of the power sequence data; comparing the predicted value and the true value of the power sequence data to obtain a difference value serving as a time sequence characteristic of the power sequence data, and constructing a sample data set according to the time sequence characteristic; and detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on an improved K-means algorithm.
Firstly, training an LSTM model by using power consumption data of a user, and acquiring time sequence characteristics of the power consumption data; next, predicting power consumption data by using an LSTM model obtained by training, and taking a difference value between a predicted value and an actual value as a characteristic value of the power consumption of the user; finally, abnormal data detection is carried out on the power consumption data of the user by utilizing a data outlier detection algorithm based on an improved K-means algorithm; therefore, the aim of detecting abnormal electricity consumption data by combining the time sequence characteristics of the electricity consumption data of the user under the condition that the electricity consumption data of the user is large is achieved.
The LSTM (Long short-term memory neural network) is a recurrent neural network for processing time series data, and its structure is shown in fig. 2, and the recurrent neural network can be understood as a cyclic pile of a plurality of forward neural networks with the same structure and parameters, and the number of cycles is consistent with the length of the input sequence. In a recurrent neural network, the input is { x }0,x1,...,xnOutput is { h }0,h1,...,hnThe output of the hidden layer is denoted as { C }0,C1,...,Cn}. Hidden layer A, i.e. a neuron node in the LSTM network, corresponding to each time t, except for receiving xt,xtData representing the power sequence at time t, and Ct-1,Ct-1Representing the memory state of the hidden layer at time t-1, and by processing these inputs, output ht,htCorrespond toHiding the output value of the layer at time t, and dividing CtOutput to the hidden layer at the next instant, CtThe memory state of the hidden layer at the time t is shown, so that the processing of the information at the next time is intervened,
wherein U represents the weight of the input layer, W represents the weight of the hidden layer, V represents the weight of the output layer, σ represents the sigmoid function, biIs represented by CtOffset of (b)oRepresents htThe offset of (3).
In the circulation Networks meridian formed in the way, the weight W from the hidden layer to the hidden layer is a 'memory controller' of the whole network, and the weight connected between the hidden layers represents the influence of the past information on the current time information, so that historical memory information is scheduled, and the time sequence information of an input sequence is 'memorized' and 'understood'.
However, the data of the user power consumption to be processed may contain a large amount of history information, that is, the input data sequence of the user power consumption may be long, and in order to deal with the situation that the amount of history information is large, the input time sequence data is long, and important information is lost, the LSTM adopts a "gate" structure to decide to delete or memorize information, and the memory unit of the LSTM is as shown in fig. 3. The first step of the hidden layer a is to decide what information to discard from the cell state, this decision being implemented by a Sigmoid layer called "forgetting gate" which looks at ht-1(previous output) and xt(currently entered) and is cell state Ct-1Each digit in (last state) outputs a number between 0 and 1. 1 represents a complete reservation and 0 represents a complete deletion. The selective memory and forgetting enables the LSTM to avoid the problem of information explosion, the LSTM can better understand the information, and the data processing of the forgetting gate is as follows:
ft=σ(Wf[ht-1,xt]+bf) (2)
where σ denotes a sigmoid function, WfIndicating forgetfulnessDoor weight, bfIndicating forgetting gate bias, xtData representing input at time t, ht-1Representing the output value of hidden layer a at time t-1.
After determining the forgotten information, the LSTM needs to decide what information to store. This part is divided into two steps, the first one, using a "memory gate" itControlling the influence of the current data input on the state value of the memory unit; second step, create a new candidate vector using the tanh layerThe vector is added to the state of the memory cell. The specific calculation steps are as follows:
it=σ(Wi[ht-1,xt]+bi) (3)
wherein, WiRepresenting the update weight of the memory gate i, biIs the offset of the memory gate i, WcIs a candidate for the memory gate i, bcIt is the update of the candidate value offset,is the candidate vector at time t;
after determining the information needing to be forgotten and the remembered information, using the candidate value vectorCombining the state C of the last moment of the memory cellt-1The state of the memory cell at the current time is updated,
the output of each memory cell is provided by an output gate otAnd controlling, wherein the calculation formula is as follows:
ot=σ(Wo[ht-1,xt]+bo) (6)
ht=ot tanhCt (7)
wherein o istIs the value of the output gate at time t, WoIs the weight of the updated output value, boIs to update the output value offset, htIs the output of the hidden layer at time t.
And the LSTM adopts a gradient descent method to update the weights of all layers, so that the cost function value is minimum.
Data of electricity consumption sequence of user { x0,x1,...,xnAfter the data are input into the LSTM, the predicted output { h) of the electricity consumption data of the user is obtained0,h1,...,hnAnd solving a difference value between the output and the real power consumption data, and constructing a sample data set as a characteristic vector of the outlier detection algorithm. And detecting abnormal data of the power consumption of the user by using the detected characteristic vector and an outlier detection algorithm based on an improved K-means algorithm.
The purpose of the K-means clustering algorithm is to cluster unlabeled data setsThe classification into K classes, the steps are shown in fig. 4:
1. randomly selecting K sample points mu in the sampleiServing as the center point of each cluster;
2. calculating the distance between all the sample points and the center of each cluster, and then dividing the sample points into the nearest cluster; the distance calculation method is as follows:
D=||x-μi||2 (8)
wherein, x muiIs a cluster CiCenter point of (a):
3. recalculating the cluster center according to the existing sample points in the cluster;
wherein, muiIs a cluster CiCenter point of (a):
4. and (5) repeating the steps 2 and 3.
The K-means algorithm is widely used in the field of anomaly detection as an unsupervised partition clustering algorithm due to high efficiency and simplicity. But because the initial cluster center selection process of the algorithm is random, the clustering effect is easy to fill uncertainty. When the algorithm starts iteration, K initial clustering centers are randomly selected and have no fixed rule. Different iteration starting points have different search paths.
Therefore, the clustering result has a severe dependence on the initial clustering center, so that the final clustering effect is easy to fall into local optimization rather than global optimization. As shown in fig. 1, if the selected initial clustering center is close to the real clustering center, the clustering result is objective and real; as shown in fig. 2, if the randomly selected initial cluster center contains outliers, the final clustering result will have a large error.
Meanwhile, outliers have a significant impact on the clustering results. Each iteration of the algorithm is to divide the cluster-like center according to the characteristic attributes of all data points, and the existence of outliers will certainly cause interference to the cluster center and influence the clustering result.
Therefore, the embodiment is based on an improved K-means algorithm, and detects abnormal data in power sequence data by taking a constructed sample data set as an input, and includes: calculating the compactness of all data points in the sample data set, acquiring a data dense area, and further determining an initial clustering center; and calculating Euclidean distances between all data points in the sample data set and each initial clustering center, dividing the data points into K clustering clusters, continuing iteration if the distance between the data point belonging to the clustering cluster and the clustering cluster center is greater than the average distance, and judging the data point as an abnormal data point when the iteration number is greater than or equal to a set value, thereby detecting abnormal data in the power sequence data. The embodiment changes the selection mode of the initial clustering center, and from the property of the optimal clustering center, the initial clustering center of the algorithm is selected according to the farthest distance in the data tight region by removing the outlier region, so that the initialization process of the algorithm is optimized, and the algorithm obtains a more reasonable initial clustering center before iteration is executed; based on the above, a corresponding anomaly detection algorithm is adopted.
The specific improved K-means algorithm initial point selection principle is as follows:
(1) selection of outliers is avoided. The principle is satisfied, so that the algorithm can be prevented from getting into errors at the beginning, and the result generated by the algorithm is more accurate;
(2) the initial cluster centers are selected and uniformly distributed in the high-density area. Obviously, the true cluster centers should be where the data is most dense and at some distance from each other. Therefore, if the initial clustering center is selected closer to the real clustering center, the iteration times can be reduced, the convergence is accelerated, and the accuracy of the clustering algorithm can be improved.
According to the two principles, the K-means algorithm is improved, the compactness of all data points in the data set is firstly calculated, and sparse data regions are removed to obtain a data point set with high compactness, because the sparse regions are not only far away from the optimal clustering center, but also contain outliers; selecting a data point with highest compactness as a first initial clustering center in a data dense area; then selecting the data point farthest from the first initial clustering center in the area as a second initial clustering center; next, each initial cluster center is selected as the largest one of the closest distances to the selected initial cluster center, so that the uniform distribution of each initial cluster center can be fully ensured. An improved algorithm for initial cluster center selection is described in detail below.
The initialization process optimization algorithm comprises the following steps:
Wherein x isiRepresents the ith data point, x, in the sample setjDenotes the jth data point, D (x)i,xj) Denotes xiAnd xjA distance between, Gt(xi) Is xiT sets of nearest neighbor data points;
3. in X', the one with the highest compactness, i.e. Tighmax(x) X as the first initial cluster center c1(ii) a Distance c1The farthest data point is taken as the second initial cluster center c2(ii) a M (3. ltoreq. m. ltoreq.k) th initial cluster center cmIs a data point x satisfying the following conditioni,xi∈X':max(Dmin(xi,c1),Dmin(xi,c2),...,Dmin(xi,cm-1) I 1, 2.., n, until the final K initial cluster centers are obtained.
In the embodiment, firstly, outliers are eliminated as initial centers, so that the iteration starting point of the algorithm is ensured not to deviate from the center of a real cluster in a large range; secondly, the compactness of the data points is used as a main basis for selecting an initial center and accords with the characteristics of the optimal cluster center; finally, the principle of the closest maximum distance ensures uniform distribution of the initial clustering centers.
Due to the characteristics of the K-means algorithm, in each iteration process, if outliers participate in the operation of the cluster center, deviation is brought to a clustering result. Therefore, the abnormal point detection algorithm can be given by utilizing the characteristic that the K-means is sensitive to the outliers, and the abnormal points are detected and eliminated in the iterative process of the algorithm.
The algorithm is as follows:
inputting: d-dimensional data setAnd finally, clustering number K, clustering function convergence precision epsilon and nearest neighbor number t.
And (3) outputting: k clustered cluster centers C ═ { C ═ C1,c2,...,cKAnd h, a class cluster label L to which the data xi belongs, and an abnormal point set U.
The method comprises the following steps:
1. setting initial clustering criterion function value J00, initial degree of abnormality Abn for each data point x in the datasetx=0;
4. in X', the one with the highest compactness, i.e. Tighmax(x) X as the first initial cluster center c1(ii) a Distance c1The farthest data point is taken as the second initial cluster center c2(ii) a M (3. ltoreq. m. ltoreq.K) th initial cluster center cmIs a data point x satisfying the following conditioni,xi∈X':max(Dmin(xi,c1),Dmin(xi,c2),...,Dmin(xi,cm-1) I ═ 1, 2.. times, n, until the final K initial cluster centers are obtained, representing K clusters w respectivelyj,j=1,2,...,K;
5. Calculating Euclidean distances between all data points in the X and each clustering center:
where i is 1,2, 3, …, m and j is 1,2, …, K. For data point x, if cjSuch that D (x, c)j)=minD(x,cj) J 1,2, K, then point x is divided into cjThe cluster represented, i.e. Lx=wj;
6. If the distance between the data point x belonging to the cluster and the cluster center is larger than the average distance in the formed K clusters, namelyWherein m isjIs cjRepresenting the total number of data points owned by the cluster, Abnx++;
7、AbnxIf the number X is more than or equal to 3, judging that X is an abnormal point, removing the abnormal point from the data set X, and merging the abnormal point into U;
8. judging clustering criterion function
If the convergence condition | J '-J | is less than or equal to epsilon (J is the function value of the last iteration clustering criterion, and J' is the function value of the current clustering iteration criterion), if not, continuing the iteration in the step 9; if so, finishing the algorithm and outputting C, L and U;
9. recalculating the cluster centers of the various clusters:
then go to step 5, mjIs cjRepresenting the total number of data points the cluster owns.
The difference value is analyzed through the algorithm to obtain the abnormal point, and the high-efficiency monitoring on the abnormal point is realized through combining a time sequence algorithm and a clustering algorithm.
According to the method, more valuable time sequence characteristics of the power data are effectively extracted through an LSTM (Long short-term memory neural network) model, so that a predicted value of the power sequence data is obtained, an absolute value of a difference value between the predicted value of the power sequence data and real power data is used as the time sequence characteristics of the power data, an outlier is found out through an improved K-means algorithm suitable for big data under the condition that the power data volume is large by combining the analyzed time sequence characteristics, and the efficiency and the accuracy of data anomaly detection can be effectively improved by fusing the LSTM and the improved K-means data anomaly detection method; when the time-sequence characteristic extraction of the electric power data is realized, the outliers with unqualified quality can be efficiently identified under the condition of large electric power data volume.
Example two:
based on the method for detecting the abnormality of the electric power data according to the first embodiment, the present embodiment provides an electric power data abnormality detection system, including:
the first module is used for inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data and constructing a sample data set;
and the second module is used for detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on the improved K-means algorithm.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (8)
1. A method for detecting power data abnormality is characterized by comprising the following steps:
inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data, and constructing a sample data set;
and detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on an improved K-means algorithm.
2. The method for detecting the power data abnormality according to claim 1, wherein the step of inputting the collected power sequence data of the user into a trained LSTM model, extracting a time-series characteristic of the power sequence data, and constructing a sample data set includes:
inputting the collected power sequence data of the user into a trained LSTM model to obtain a predicted value of the power sequence data;
and comparing the predicted value and the true value of the power sequence data to obtain a difference value, wherein the difference value is used as a time sequence characteristic of the power sequence data to describe the power data and construct a sample data set.
3. The method as claimed in claim 1, wherein in the trained LSTM model, the hidden layer corresponding to each time t is except for x to be receivedt,xtData representing the power sequence at time t, and Ct-1,Ct-1Representing the memory state of the hidden layer at time t-1, and by processing these inputs, output ht,htCorresponding to the output value of the hidden layer at the time t, and adding CtOutput to the hidden layer at the next instant, CtIndicating the memory state of the hidden layer at time t.
4. The method for detecting the abnormality of the electric power data as claimed in claim 1, wherein in the trained LSTM model, the memory unit checks h through a forget gatet-1And xt,ht-1Represents the output value of the hidden layer at the moment of t-1 and is Ct-1Each of the numbers in (1) outputs a number between 0 and 1, Ct-1The memory state of a hidden layer at the time of t-1 is represented, 1 represents complete retention, and 0 represents complete deletion; the method specifically comprises the following steps:
ft=σ(Wf[ht-1,xt]+bf) (2)
wherein f istIs the value of the forgetting gate at time t, σ is the sigmoid function, WfIs the weight of the forgetting gate f, bfIs the offset of the forgetting gate f, ht-1Corresponding to the output value of the hidden layer at the time t-1;
using a "memory gate" itControlling the influence of the current data input on the state value of the memory cell itShowing the state of the memory gate i at the time t; creation using tanh function Representing a candidate value vector at the time t, and adding the vector into the state of the memory unit; the specific calculation steps are as follows:
it=σ(Wi[ht-1,xt]+bi) (3)
wherein, WiRepresenting the update weight of the memory gate i, biIs the offset of the memory gate i, WcIs a candidate for the memory gate i, bcIt is the update of the candidate value offset,is the candidate vector at time t;
using vectors of candidate valuesCombining the state C of the last moment of the memory cellt-1The state of the memory cell at the current time is updated,
the output of each memory cell is provided by an output gate otAnd controlling, wherein the calculation formula is as follows:
ot=σ(Wo[ht-1,xt]+bo) (6)
ht=ottanhCt (7)
wherein o istIs the value of the output gate at time t, WoIs the weight of the updated output value, boIs to update the output value offset, htIs the output of the hidden layer at time t.
5. The method for detecting the abnormal data of the electric power data according to claim 1, wherein the detecting the abnormal data in the electric power sequence data by taking the constructed sample data set as an input based on the improved K-means algorithm comprises:
calculating the compactness of all data points in the sample data set, acquiring a data dense area, and further determining an initial clustering center;
and calculating Euclidean distances between all data points in the sample data set and each initial clustering center, dividing the data points into K clustering clusters, continuing iteration if the distance between the data point belonging to the clustering cluster and the clustering cluster center is greater than the average distance, and judging the data point as an abnormal data point when the iteration number is greater than or equal to a set value, thereby detecting abnormal data in the power sequence data.
6. The method according to claim 5, wherein the determining an initial clustering center includes: selecting a data point with highest compactness as a first initial clustering center in a data dense area, and then selecting a data point farthest from the first initial clustering center as a second initial clustering center in the area; next, each initial cluster center is selected as the largest one of the closest distances to the selected initial cluster center.
7. The method for detecting an abnormality in electric power data according to claim 5, wherein the degree of closeness of the data points is obtained by:
wherein x isiRepresents the ith data point, x, in the sample setjDenotes the jth data point, D (x)i,xj) Denotes xiAnd xjA distance between, Gt(xi) Is xiT sets of nearest neighbor data points.
8. An electric power data abnormality detection system characterized by comprising:
the first module is used for inputting the collected power sequence data of the user into a trained LSTM model, extracting the time sequence characteristics of the power sequence data and constructing a sample data set;
and the second module is used for detecting abnormal data in the power sequence data by taking the constructed sample data set as input based on the improved K-means algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110239950.XA CN113010504B (en) | 2021-03-04 | 2021-03-04 | Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110239950.XA CN113010504B (en) | 2021-03-04 | 2021-03-04 | Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113010504A true CN113010504A (en) | 2021-06-22 |
CN113010504B CN113010504B (en) | 2022-06-10 |
Family
ID=76405160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110239950.XA Active CN113010504B (en) | 2021-03-04 | 2021-03-04 | Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113010504B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780440A (en) * | 2021-09-15 | 2021-12-10 | 江苏方天电力技术有限公司 | Low-voltage station area phase identification method for improving data disturbance resistance |
CN115834424A (en) * | 2022-10-09 | 2023-03-21 | 国网甘肃省电力公司临夏供电公司 | Method for identifying and correcting abnormal data of line loss of power distribution network |
CN117371996A (en) * | 2023-12-06 | 2024-01-09 | 北京中能亿信软件有限公司 | Electric power communication analysis method based on big data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109302410A (en) * | 2018-11-01 | 2019-02-01 | 桂林电子科技大学 | A kind of internal user anomaly detection method, system and computer storage medium |
CN110334726A (en) * | 2019-04-24 | 2019-10-15 | 华北电力大学 | A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure |
CN110569925A (en) * | 2019-09-18 | 2019-12-13 | 南京领智数据科技有限公司 | LSTM-based time sequence abnormity detection method applied to electric power equipment operation detection |
-
2021
- 2021-03-04 CN CN202110239950.XA patent/CN113010504B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109302410A (en) * | 2018-11-01 | 2019-02-01 | 桂林电子科技大学 | A kind of internal user anomaly detection method, system and computer storage medium |
CN110334726A (en) * | 2019-04-24 | 2019-10-15 | 华北电力大学 | A kind of identification of the electric load abnormal data based on Density Clustering and LSTM and restorative procedure |
CN110569925A (en) * | 2019-09-18 | 2019-12-13 | 南京领智数据科技有限公司 | LSTM-based time sequence abnormity detection method applied to electric power equipment operation detection |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780440A (en) * | 2021-09-15 | 2021-12-10 | 江苏方天电力技术有限公司 | Low-voltage station area phase identification method for improving data disturbance resistance |
CN115834424A (en) * | 2022-10-09 | 2023-03-21 | 国网甘肃省电力公司临夏供电公司 | Method for identifying and correcting abnormal data of line loss of power distribution network |
CN115834424B (en) * | 2022-10-09 | 2023-11-21 | 国网甘肃省电力公司临夏供电公司 | Identification and correction method for abnormal data of power distribution network line loss |
CN117371996A (en) * | 2023-12-06 | 2024-01-09 | 北京中能亿信软件有限公司 | Electric power communication analysis method based on big data |
CN117371996B (en) * | 2023-12-06 | 2024-03-19 | 北京中能亿信软件有限公司 | Electric power communication analysis method based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN113010504B (en) | 2022-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113010504B (en) | Electric power data anomaly detection method and system based on LSTM and improved K-means algorithm | |
CN109993270B (en) | Lithium ion battery residual life prediction method based on gray wolf group optimization LSTM network | |
CN109991542B (en) | Lithium ion battery residual life prediction method based on WDE optimization LSTM network | |
CN107992976B (en) | Hot topic early development trend prediction system and prediction method | |
CN108733976B (en) | Key protein identification method based on fusion biology and topological characteristics | |
CN112800231B (en) | Power data verification method and device, computer equipment and storage medium | |
CN112329350A (en) | Airplane lead-acid storage battery abnormity detection semi-supervision method based on isolation | |
CN112287980B (en) | Power battery screening method based on typical feature vector | |
CN112305441B (en) | Power battery health state assessment method under integrated clustering | |
CN112926635A (en) | Target clustering method based on iterative adaptive neighbor propagation algorithm | |
Savargaonkar et al. | A cycle-based recurrent neural network for state-of-charge estimation of li-ion battery cells | |
CN113534938B (en) | Method for estimating residual electric quantity of notebook computer based on improved Elman neural network | |
CN117117859B (en) | Photovoltaic power generation power prediction method and system based on neural network | |
CN116842459B (en) | Electric energy metering fault diagnosis method and diagnosis terminal based on small sample learning | |
CN113657678A (en) | Power grid power data prediction method based on information freshness | |
CN116774086B (en) | Lithium battery health state estimation method based on multi-sensor data fusion | |
CN113376541A (en) | Lithium ion battery health state prediction method based on CRJ network | |
CN113033898A (en) | Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network | |
CN117478390A (en) | Network intrusion detection method based on improved density peak clustering algorithm | |
CN116449218B (en) | Lithium battery health state estimation method | |
CN115794805A (en) | Medium-low voltage distribution network measurement data supplementing method | |
CN115982608A (en) | Line loss abnormity judgment method based on line loss dynamic analysis | |
CN115799580A (en) | OS-ELM fuel cell fault diagnosis method based on optimized FCM training | |
CN113884807B (en) | Power distribution network fault prediction method based on random forest and multi-layer architecture clustering | |
CN114896865A (en) | Digital twin-oriented self-adaptive evolutionary neural network health state online prediction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |