CN113139600A

CN113139600A - Intelligent power grid equipment anomaly detection method and system based on federal learning

Info

Publication number: CN113139600A
Application number: CN202110444328.2A
Authority: CN
Inventors: 林培斌; 戚远航; 刘毅
Original assignee: Guangdong Anheng Power Technology Co ltd
Current assignee: Guangdong Anheng Power Technology Co ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-07-20

Abstract

The invention provides a method and a system for detecting equipment abnormity of a smart power grid based on federal learning, wherein the method comprises the following steps: each intelligent power grid device establishes a local data set of the intelligent power grid device; each smart grid device performs local model training on the local data set; after each intelligent power grid device carries out the local model training, calculating the model update of the intelligent power grid device and uploading the model update of the intelligent power grid device to a server; the server performs aggregation operation on the received model updates to obtain a new global model, and sends the new global model to each smart grid device; and circularly executing the steps until the global model reaches the optimal convergence, taking the global model as the optimal global model, and executing an abnormality detection task by each intelligent power grid device by using the optimal global model. According to the method provided by the embodiment of the invention, the abnormal smart grid equipment in the plurality of smart grid equipment can be detected.

Description

Intelligent power grid equipment anomaly detection method and system based on federal learning

Technical Field

The invention belongs to the technical field of intelligent power grid equipment, and particularly relates to a method and a system for detecting abnormity of intelligent power grid equipment based on federal learning.

Background

The smart grid devices are deployed more and more in daily life. However, these devices are vulnerable to attack due to their insecure design, implementation and configuration. As a result, many smart grid devices are subject to attacks that cause them to malfunction or even fail to function properly. This has led to a new class of malware and malware that are specifically targeted to attack smart grid devices. However, given the large number of different types of devices involved and the large number of manufacturers, existing intrusion detection techniques are not effective in detecting damaged smart grid devices.

Disclosure of Invention

An object of the present application is to provide a new technical solution of a smart grid device anomaly detection method and system based on federal learning, which can realize effective detection of damaged smart grid devices.

The invention provides a method for detecting equipment abnormity of a smart power grid based on federal learning, which comprises the following steps: each intelligent power grid device establishes a local data set of the intelligent power grid device; each smart grid device performs local model training on the local data set; after each intelligent power grid device carries out the local model training, calculating the model update of the intelligent power grid device and uploading the model update of the intelligent power grid device to a server; the server performs aggregation operation on the received model updates to obtain a new global model, and sends the new global model to each smart grid device; and circularly executing the steps until the global model reaches the optimal convergence, taking the global model as the optimal global model, and executing an abnormality detection task by each intelligent power grid device by using the optimal global model.

According to the intelligent power grid equipment abnormity detection method based on the federal learning, disclosed by the embodiment of the invention, effective detection on damaged intelligent power grid equipment can be realized through the cooperation of multiple steps.

Optionally, each of the smart grid devices collects its own sensing time series data as the local data set.

Optionally, an aggregation operation is performed on the received model update by the cloud aggregator of the server, so as to obtain a new global model.

Optionally, the server is divided into a plurality of cluster classes according to a preset clustering method based on the model update of each smart grid device, and the cloud aggregator obtains the new global model by aggregating the model updates uploaded by the smart grid devices in each cluster class.

Optionally, the preset clustering method includes: finding out the number of the devices with the similarity exceeding a threshold value alpha with a certain device i by adopting a preset algorithm; the devices are divided into a cluster class.

Optionally, the preset algorithm is a greedy algorithm.

Optionally, the preset clustering method further includes: and judging the cosine similarity between the equipment i and other equipment, and classifying the equipment larger than the threshold value into a cluster class.

Optionally, the local model is a deep anomaly detection model.

Optionally, the depth anomaly detection model includes: an input layer for inputting data; an attention-based convolutional neural network unit capable of capturing fine-grained features of the data input by the input layer; a long-short term memory network unit having as an input an output of the attention-based convolutional neural network unit, the long-short term memory network unit being capable of predicting future time-series data and detecting anomalies; and the output layer is connected with the long-term and short-term memory network unit so as to output an abnormality detection result.

In a second aspect of the present invention, a system for detecting an anomaly of a smart grid device based on federal learning is provided, including: the system comprises a local data set establishing module, a local data set establishing module and a local data set establishing module, wherein the local data set establishing module establishes a local data set for each intelligent power grid device; a local model training module capable of performing local model training on the local data set; the updating module can calculate the self model updating and upload the self model updating to a server after each intelligent power grid device carries out the local model training; and the aggregation module can perform aggregation operation on the received model update to obtain a new global model, and sends the new global model to each smart grid device.

Further features of the present application and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which is to be read in connection with the accompanying drawings.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a smart grid device anomaly detection method based on federal learning according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a federated learning framework in accordance with an embodiment of the present invention;

FIG. 3 is an anomaly detection framework for a Federal learning-based Smart grid device, according to an embodiment of the present invention;

FIG. 4 is an overview of the AMCNN-LSTM model according to an embodiment of the invention;

FIG. 5 is a graph illustrating comparison of model and baseline accuracy in anomaly detection according to an embodiment of the present invention;

FIG. 6 is a graph illustrating the comparison of prediction errors between a model and a competition method according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The following describes a smart grid device anomaly detection method based on federal learning according to an embodiment of the invention with reference to the accompanying drawings.

The invention provides a method for detecting equipment abnormality of a smart power grid based on federal learning, which is described in detail below.

The federated learning is one of distributed machine learning systems, and can provide certain privacy protection for nodes participating in the distributed learning.

Specifically, as shown in fig. 2, Federal Learning (FL) is a distributed collaborative Learning paradigm that allows edge nodes (e.g., drones, sensors, vehicles) to keep data local to collaboratively train a global deep Learning model for the purpose of model Learning and privacy protection. In particular, the framework iterates and trains the global model by using a distributed random gradient descent algorithm, wherein it is to be noted that in deep learning, the gradient refers to the first derivative of the weight of the model. Thus, in each iteration T (T ∈ {1,2, …, T }), the process of learning for federated learning can be described as follows:

step 1: initialization

All nodes participating in the current training round send information to the cloud server to indicate that the nodes are registered to participate in federal learning, and the cloud server goesExcept for nodes with network failures or poor networks. The cloud server randomly extracts part of nodes from all the participated nodes to participate in the current round of training, and pre-trained (or initialized) global model w_tAnd sending the data to the corresponding node.

Step 2: local training

Each node receives a global model w_tAnd for own local model

Performing initialization, wherein k represents the number of nodes, and the initialization process is as follows:

the node then starts using its own local data set D_kTraining is performed with a data set of size | D_kFrom a training data set, i.e. input-output pairs (x)_i,y_i) The loss function to be optimized for local training is defined as follows:

where ω is a parameter of the model, f_i(ω) refers to the local loss function (e.g.

)，F_kThe local model may be updated after (ω) convergence

The following were used:

wherein eta is the learning rate of the model,

is the derivative of the weight, i.e. the gradient.

And step 3: model update aggregation

Model polymerization: in federal learning, model aggregation refers to an operation of averaging and summing model updates uploaded by nodes by a cloud server.

After local training is carried out on the nodes, local model updates of the nodes are uploaded to a cloud server, and the cloud server carries out aggregation operation on the received local model updates to obtain a new global model w_t+1It is defined as follows:

the three steps are repeated until the global model converges. Notably, the local data sets of the nodes remain local throughout the process and are not shared or revealed to the cloud server.

In federal learning, model update refers to parameter update generated by a node training a local model using its own local data.

In the application of smart grid equipment anomaly detection, the existence of K pieces of equipment to be detected and an honest server S is assumed. Each apparatus C_kA model update, i.e. aw, is generated^kWhere different devices may generate different model updates. First, similar data distributions will yield similar models. Thus, migrating this fact into a distributed scenario, i.e., similar data-distributed devices will produce similar model updates, i.e., the aggregated models in these similar data-distributed devices are of high quality because their model updates do not have data heterogeneity issues.

From the above, the data heterogeneity problem can be converted into the homogeneity problem of model update. Next, the model update homogeneity is defined as follows:

definition 1: (homogeneity of model update): suppose there are K devices and one serverS, let Δ w^kRepresenting the model update generated by the kth device, if the model updates of any two devices, i.e., the ith device and the jth device, satisfy the following condition,

A(Δwⁱ,Δw^j)≥α，

the model updates produced by these two devices are referred to as homogenous model updates. Wherein A (-) is a certain homogeneity judgment function, and alpha is a hyper-parameter, namely a threshold value, for homogeneity judgment.

It can be known from the above definitions that the invention needs to design a homogeneity judgment function, and on the basis of cosine similarity of space vectors, the invention provides a model update homogeneity judgment based on cosine similarity, and the concrete formalization is defined as follows:

definition 2: (cosine similarity): given two attribute vectors, A and B, the remaining chord similarity θ is given by the dot product and the vector length, as follows:

then the cosine similarity based model update homogeneity determination mechanism is defined as follows:

cos(Δwⁱ,Δw^j)≥α,

where cos (·, ·) represents a cosine similarity function. The cosine similarity function can well reflect the similarity of model updating generated by the two devices, so that whether the data distribution of the two devices is similar or not can be well judged.

As can be seen from definitions 1 and 2, any two devices with similar data distributions can be easily found by the above method, but it is still difficult to find the maximum number of devices with similar data distributions. In order to solve the problem, the invention designs a clustering federal learning framework based on the homogeneity of model updating. Specifically, the invention adopts a greedy algorithm to find out the number of devices with the similarity exceeding a threshold value alpha most, namely, the number is converted into a problem of the maximum value of the optimizing similarity. Then, the devices are divided into a cluster class, so that an optimal global model is aggregated in the cluster class. The formalization of the method is defined as follows:

as shown in the above equation, assuming that the device i is fixed, the cosine similarity between the device i and other devices is determined, and those devices which are greater than the threshold value are classified into a cluster class, and the optimization is finished until the sum of the cosine similarities is maximum. The overall model update process can then be expressed as follows:

wherein w_tFor the global model update of the previous round,

and representing the total updated model of each cluster class in the n cluster classes, and q is the weight of each cluster class, so as to form a new global model.

Based on the above description of federal learning, as shown in fig. 1 and 3, a method according to an embodiment of the present invention includes the steps of:

and S1, each smart grid device establishes a local data set of the smart grid device.

According to one embodiment of the invention, each smart grid device collects its own sensing time series data as a local data set.

And S2, each smart grid device performs local model training on the local data set.

S3, after local model training is carried out on each intelligent power grid device, model updating of each intelligent power grid device is calculated and uploaded to a server,

and S4, the server performs aggregation operation on the received model updates to obtain a new global model, and sends the new global model to each smart grid device.

Optionally, the server includes a cloud aggregator, and the cloud aggregator performs an aggregation operation on the received model updates to obtain a new global model.

In some embodiments of the present invention, the server is divided into a plurality of cluster classes according to a preset clustering method based on the model update of each smart grid device, and the cloud aggregator obtains a new global model by aggregating the model updates uploaded by the smart grid devices in each cluster class.

According to one embodiment of the invention, the preset clustering method comprises the following steps: finding out the number of devices with the similarity exceeding a threshold value with a certain device i by adopting a preset algorithm; the devices are divided into a cluster class.

Optionally, the preset algorithm is a greedy algorithm.

Optionally, the preset clustering method further includes: and (4) carrying out cosine similarity judgment on the device i and other devices, and classifying the devices larger than the threshold value into a cluster class.

And S5, circularly executing the steps until the global model reaches the optimal convergence, taking the global model as the optimal global model, and executing an abnormality detection task by each intelligent power grid device by using the optimal global model. The definition of abnormality detection is as follows: in data mining, anomaly detection identifies items, events, or observations that do not conform to an expected pattern or other items in a data set.

In some embodiments of the invention, the local model is a deep anomaly detection model.

The process according to the invention is described below with reference to specific examples.

As shown in fig. 3, the method according to the invention comprises five steps, as follows:

and S1, collecting self sensing time sequence data as a local data set by the smart grid device.

S2, the device performs local model (i.e. deep anomaly detection model (AMCNN-LSTM model) training on the local data set.

And S3, the equipment calculates the model update of the equipment and uploads the model update to the server, and the server divides the model update of each equipment into a plurality of cluster classes according to the clustering scheme.

And S4, the cloud aggregator acquires a new global model by aggregating model updates uploaded by the devices in each cluster class, and the cloud aggregator sends the new global model to each device.

And S5, circularly executing the steps until the global model reaches the optimal convergence. Each smart grid device may perform an anomaly detection task using the optimal global model.

To address this issue, the present invention introduces a Federal Learning (FL) framework that can be used to detect distributed machine Learning systems of hacked smart grid devices, and in particular, the FL framework can be efficiently built on device type specific communication profiles without manual intervention or tagging data through which abnormal deviations in device communication behavior (which may be caused by malicious attackers) can be detected. However, the accuracy of the current mainstream FL framework for detecting device anomalies is damaged by data heterogeneous problems caused by differences of smart grid devices (such as device types, device operating modes, and the like) of each region. Therefore, the invention designs a clustering federal learning framework based on the model updating homogeneity to solve the pain point problem, and designs a deep neural network model to detect the abnormality of the equipment.

The depth anomaly detection model (AMCNN-LSTM) according to an embodiment of the present invention is explained in detail below.

The depth anomaly detection model according to the embodiment of the invention comprises the following steps: the system comprises an input layer, a convolutional neural network unit based on an attention mechanism, a long-short term memory network unit and an output layer.

Specifically, the input layer is used for inputting data, the convolutional neural network unit (CNN) based on the attention mechanism can capture fine-grained characteristics of the data input by the input layer, the output of the convolutional neural network unit based on the attention mechanism is used as the input of a long-short term memory network unit (LSTM), the long-short term memory network unit can predict future time series data and detect abnormity, and the output layer is connected with the long-short term memory network unit to output an abnormity detection result.

The depth anomaly detection model (AMCNN-LSTM) is mainly established in a CNN-LSTM model based on an attention mechanism. The attention-based CNN-LSTM model uses CNN to capture fine-grained features of sensing time-series data and uses an LSTM module to accurately and timely detect anomalies.

On the basis of the CNN-LSTM model based on the attention mechanism, the unsupervised AMCNN-LSTM model is designed, and comprises an input layer, CNN units based on the attention mechanism, LSTM units and an output layer, as shown in FIG. 4. First, the preprocessed data is used as input to the input layer. Second, the CNN is used to capture fine-grained features of the input and focus on the important features of the CNN capture features with a focus mechanism. Again, the output of the CNN unit based on attention mechanism is used as the input of the LSTM unit, and the LSTM is used to predict future time series data. Finally, an anomaly may be detected by presenting an anomaly detection score.

Wherein, before inputting data to the input layer, the data can be preprocessed, for example, sensing time series data collected by the device is normalized to [0,1] to accelerate model convergence.

For a CNN unit based on the attention mechanism: first, a mechanism of attention is introduced in the CNN unit to raise the attention to important functions. In cognitive science, humans will selectively focus on important parts of information, ignoring other visible information, due to bottlenecks in information processing. The present application thus proposes a mechanism of attention for various tasks, such as computer vision and natural language processing. Thus, the focus mechanism may improve the performance of the model by focusing on important features. Note that the formal definition of the mechanism is as follows:

e_i＝a(u,v_i),

where u is the matching feature vector based on the current task and used to interact with the context, vi is the feature vector of the time stamps in the time series, ei is the unnormalized attention score, β_iIs the normalized attention score and c is the contextual feature of the current timestamp calculated from the attention score and the feature sequence v. In most cases, e_i＝u^TW_vWhere W is a weight matrix.

Secondly, fine-grained features of the time-series data are extracted using the CNN unit. The CNN module is formed by stacking a plurality of layers of one-dimensional (1-D) CNN, and each layer includes a convolution layer, a batch normalization layer, and a nonlinear layer. These modules implement sample aggregation by using pooling layers and create hierarchical structures that gradually extract more abstract features through the stacking of convolutional layers. The module outputs m signature sequences of length n, whose size can be expressed as (n × m). In order to further extract important time series data features, the application proposes a parallel feature extraction branch by combining an attention mechanism and CNN. Note that the mechanism module consists of feature aggregation and scale recovery.

The feature aggregation section extracts key features from the sequence using a stack of multiple convolution and pooling layers and mines linear relationships using convolution kernels of size 1x1, and then constrains the values to [0,1] using a Sigmoid function.

Thirdly, the output characteristics of the CNN module and the output of the important characteristics are multiplied by the corresponding attention mechanism module according to elements respectively. The present application assumes sequence

The output of the sequence Xi processed by the CNN module is represented by W_CNNIndicating the output of the corresponding module of interest as W_attentionAnd (4) showing. The present application multiplies two output elements element by element as follows:

W(i，c)＝W_CNN(i，c)⊙W_attention(i，c)。

wherein |, represents the element-by-element multiplication, i is the corresponding position of the time sequence in the element layer, and c is the channel. The present application uses the final feature layer W (i; c) as input to the LSTM block.

By introducing an attention mechanism to expand the acceptance domain of the input, the model can obtain more comprehensive context information, and important features of the current local sequence can be learned. In addition, the attention module is used for suppressing the interference of unimportant features to the model, so that the problem that the model cannot distinguish the importance of the features of the time series data is solved.

With respect to the LSTM architecture, the present application uses a variant of a recurrent neural network, called LSTM, to support accurate prediction of sensory time series data to detect anomalies. LSTM uses elaborate "gate" structures to delete or add information to the cell state. The "gate" structure is a method of selectively communicating information.

f_t＝σ_l(W_f·[h_t-1，x_t]+b_f)，

i_t＝σ_l(W_i·[h_t-1，x_t]+b_i)，

o_t＝σ_l(W_o·[h_t-1，x_t]+b_o)，

h_t＝o_t*tanh(C_t)，

Wherein, W_f，W_i，W_C，W_OAnd b_f，b_i，b_C，b_OWeight matrix and input vector x, respectively, of time step t_tThe deviation vector of (2). Sigma_lIs an activation function, representing the multiplication of the elements of a matrix, C_tRepresents the cell state, h_t-1Is hidden at time step t-1The state of the layer, and ht is the state t of the hidden layer at time step.

Experiments were performed below on a system according to an embodiment of the present invention.

The system proposed by the embodiment of the invention is applied to four real-world data sets, namely power requirements, space shuttle equipment, electrocardiograms and engines for performance verification. These data sets are time series data sets collected by different types of sensor devices from different domains. For example, the power demand data set consists of power consumption data recorded by a smart grid meter. These data sets have normal and abnormal subsequences. As shown in Table 1, X, Xn and Xa are the original, normal and abnormal subsequences, respectively. For a power demand data set, the abnormal subsequence indicates that the meter has failed or stopped functioning.

Therefore, the present application requires the use of these data sets to train a FL model that can detect anomalies. The present application refers to all data sets as 7: the ratio of 3 is divided into a training set and a test set. The proposed framework is implemented by using pytorech and PySyft. The experiment was performed on a virtual workstation with Ubuntu18.04 operating system, Intel (R) core (TM) i5-4210M CPU, 16GB RAM, 512GB SSD.

TABLE 1 data set

In this experiment, the number of smart grid devices is N-10, the learning rate is 0.001, the training renz is E-1000, and the minimum batch size is B-128.

The present application uses Root Mean Square Error (RMSE) to represent the performance of the AMCNN-LSTM model, as follows:

where yi is the true value and yp is the predicted value.

The experimental results of the above experiments are explained below.

The present application compares the proposed model to the performance of CNN-LSTM, LSTM, Gate Loop Unit (GRU), Stacked Autoencoder (SAE) and Support Vector Machine (SVM). In these comparative schemes, AMCNN-LSTM is a FL-based model, and the remaining methods are centralized machine learning methods. All models are popular machine learning models that are suitable for use in conventional anomaly detection applications. The present application evaluates these models, namely power demand, space shuttle, ECG, and engine, on four real-world datasets.

First, the accuracy of the proposed model in anomaly detection is compared to the baseline solution. In fig. 5, the experimental results show that the proposed model achieves the highest accuracy on all four data sets. For example, the accuracy of the AMCNN-LSTM model is 96.85% higher than the accuracy of the SVM model by 7.87% for the data set power requirements. From experimental results, the AMCNN-LSTM is more robust to different data sets.

Second, the prediction error of the proposed model and the competition method needs to be evaluated. As shown in fig. 6, the experimental results show that the proposed model achieves the best performance on all four real data sets. For the ECG dataset, the RMSE of the AMCNN-LSTM model was 63.9% lower than that of the SVM model.

The invention also provides a system for detecting the abnormity of the intelligent power grid equipment based on the federal learning, which comprises the following components: the intelligent power grid device comprises a local data set establishing module, a local model training module, an updating module and an aggregation module, wherein the local data set establishing module establishes a local data set for each intelligent power grid device, the local model training module can execute local model training on the local data set, the updating module can calculate model updating of each intelligent power grid device and upload the model updating of each intelligent power grid device to a server after the local model training of each intelligent power grid device is performed, and the aggregation module can execute aggregation operation on the received model updating to obtain a new global model and send the new global model to each intelligent power grid device.

In some embodiments of the invention, the number of smart grid devices is two or more.

According to the intelligent power grid equipment abnormity detection method and system based on the federal learning, the method and system mainly have the following advantages:

(1) the method has the advantages that the abnormal detection of the cross-regional smart grid equipment for protecting privacy is realized by introducing a federal learning framework. In particular, the FL framework allows data of each smart grid device to be kept locally for device anomaly detection only through a communication profile, which protects privacy of the power quantity data collected by the smart grid devices. Second, the FL framework is essentially a distributed system, and thus joint device anomaly detection across regions can be achieved.

(2) In order to solve the problem of data heterogeneity caused by equipment difference, the invention designs a clustering federal learning framework based on model update homogeneity, which focuses on model updates generated by each equipment, and uses cosine similarity between the model updates and a greedy algorithm to find the equipment with the highest similarity and the largest number of the model updates to form a cluster class, thereby realizing the clustering-based federal learning framework.

(3) In order to improve the accuracy of detecting the equipment abnormality, the invention designs a deep learning model specially used for identifying the time series data abnormality, and the model uses a one-dimensional (1D) Convolutional Neural Network (CNN) to capture spatial characteristics and uses a long-short term memory neural network (LSTM) to capture time series characteristics, thereby accurately detecting the equipment abnormality.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. The intelligent power grid equipment abnormity detection method based on federal learning is characterized by comprising the following steps:

each intelligent power grid device establishes a local data set of the intelligent power grid device;

each smart grid device performs local model training on the local data set;

after each intelligent power grid device carries out the local model training, calculating the model update of the intelligent power grid device and uploading the model update of the intelligent power grid device to a server;

the server performs aggregation operation on the received model updates to obtain a new global model, and sends the new global model to each smart grid device;

and circularly executing the steps until the global model reaches the optimal convergence, taking the global model as the optimal global model, and executing an abnormality detection task by each intelligent power grid device by using the optimal global model.

2. The method according to claim 1, wherein each smart grid device collects its own sensing time series data as the local data set.

3. The method of claim 1, wherein performing an aggregation operation on the received model updates by a cloud aggregator of the server results in a new global model.

4. The method according to claim 3, wherein the server is divided into a plurality of cluster classes according to a preset clustering method according to the model update of each smart grid device, and the cloud aggregator obtains the new global model by aggregating the model updates uploaded by the smart grid devices in each cluster class.

5. The method of claim 4, wherein the pre-set clustering method comprises:

finding out the number of the devices with the similarity exceeding a threshold value alpha with a certain device i by adopting a preset algorithm;

the devices are divided into a cluster class.

6. The method of claim 5, wherein the predetermined algorithm is a greedy algorithm.

7. The method of claim 6, wherein the pre-set clustering method further comprises: and judging the cosine similarity between the equipment i and other equipment, and classifying the equipment larger than the threshold value into a cluster class.

8. The method of claim 2, wherein the local model is a deep anomaly detection model.

9. The method of claim 8, wherein the depth anomaly detection model comprises:

an input layer for inputting data;

an attention-based convolutional neural network unit capable of capturing fine-grained features of the data input by the input layer;

a long-short term memory network unit having as an input an output of the attention-based convolutional neural network unit, the long-short term memory network unit being capable of predicting future time-series data and detecting anomalies;

and the output layer is connected with the long-term and short-term memory network unit so as to output an abnormality detection result.

10. The utility model provides a smart power grids equipment anomaly detection system based on federal study which characterized in that includes:

the system comprises a local data set establishing module, a local data set establishing module and a local data set establishing module, wherein the local data set establishing module establishes a local data set for each intelligent power grid device;

a local model training module capable of performing local model training on the local data set;

the updating module can calculate the self model updating and upload the self model updating to a server after each intelligent power grid device carries out the local model training;

and the aggregation module can perform aggregation operation on the received model update to obtain a new global model, and sends the new global model to each smart grid device.