CN116881746B

CN116881746B - Identification method and identification device for abnormal data in electric power system

Info

Publication number: CN116881746B
Application number: CN202311152729.6A
Authority: CN
Inventors: 杨晓林; 邵康; 袁琪; 金高铭; 韦绍毅; 承昊新
Original assignee: Changzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Current assignee: Changzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2023-09-08
Filing date: 2023-09-08
Publication date: 2023-11-14
Anticipated expiration: 2043-09-08
Also published as: CN116881746A

Abstract

The invention relates to the technical field of power systems, and provides a method and a device for identifying abnormal data in a power system, wherein the method comprises the following steps: acquiring measurement data of each node in the power system, converting the measurement data into a frequency domain, and calculating the amplitude of the measurement data in the frequency domain; calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result by a node data clustering method based on the correlation matrix; based on the clustered data clusters and the correlation matrix of the data clusters, calculating the phenotype correlation of a clustering tree formed by the data clusters; identifying whether the measurement data is abnormal data according to the phenotype correlation. According to the method and the device, the abnormal data are identified by utilizing the physical characteristics in the power system and the correlation among the nodes, so that the nodes possibly invaded by an attacker for injecting the attack vector can be effectively searched, and the safety and the reliability of the power system are improved.

Description

Identification method and identification device for abnormal data in electric power system

Technical Field

The invention relates to the technical field of power systems, in particular to a method for identifying abnormal data in a power system and a device for identifying the abnormal data in the power system.

Background

With the sustainable development of a novel power system, the problems of high uncertainty caused by high-permeability new energy power generation grid connection, strong nonlinearity caused by large-scale power electronic equipment access, high-dimensional state/action space caused by mass measurement/control nodes and the like are caused, so that the analysis difficulty and the control strategy complexity of the power system are sharply improved.

In the related art, although various data-driven solutions have been proposed, the data-driven method can greatly reduce the analysis and control complexity of the system and improve the execution efficiency of the control strategy. However, the development of data driving is accompanied by a large number of intelligent devices and internet of things terminal devices, the devices provide potential attack vector injection points for network attack aiming at the data driving algorithm, and an attacker can inject attack vectors from an information side or a physical side system by utilizing the physical coupling characteristic of power grid information to attack the data driving algorithm, so that the purpose of benefiting or damaging the stable operation of the system is achieved.

In this context, the challenge and defense problems for data driven algorithms are a technical problem to be solved by those skilled in the art.

Disclosure of Invention

In order to solve the above technical problems, an embodiment of a first aspect of the present invention provides a method for identifying abnormal data in an electric power system.

An embodiment of the second aspect of the present invention provides an apparatus for identifying abnormal data in a power system.

The technical scheme adopted by the invention is as follows:

an embodiment of a first aspect of the present invention provides a method for identifying abnormal data in an electric power system, including the following steps: obtaining measurement data of each node in the power system, transforming the measurement data into a frequency domain, and calculating the amplitude of the measurement data in the frequency domain; calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result by a node data clustering method based on the correlation matrix; based on the clustered data clusters and the correlation matrix of the data clusters, calculating the phenotype correlation of a clustering tree formed by the data clusters; identifying whether the measurement data is abnormal data according to the phenotype correlation.

The identification method of the abnormal data in the power system also has the following additional technical characteristics:

according to one embodiment of the invention, the magnitude of the measurement data in the frequency domain is calculated in particular according to the following formula:

；

wherein,to measure the amplitude of the data in the frequency domain,Wfor the number of data points within the sampling time window,Xin order to measure the data matrix of the data,ithe data points within the sampling time window are numbered,mnumbering nodes, ->The value of the ith data point of the node m in the sampling time window is given, k is the coordinate in the frequency domain, and e is the bottom of the natural logarithm.

According to one embodiment of the present invention, the correlation matrix between the measurement data of each node is calculated according to the following formula:

；

；/>；

wherein M and n are node numbers, M is a correlation matrix,the element of (a) is the correlation between node m and node n,Wfor the number of data points in the sampling time window, i is the number of data points in the sampling time window, +.>And->The magnitudes of the measured data of node m and node n, respectively, in the frequency domain, +.>And->Respectively->And->Average value of>For the amplitude of the ith data point of node m in the sampling time window, +.>For the amplitude of the ith data point of the node n in the sampling time window, T is the transposed matrix.

According to one embodiment of the invention, the node data clustering method based on the correlation matrix gradually aggregates the data clusters within a preset distance in an iterative mode to generate a clustering result, and specifically comprises the following steps: initially, each node is individually divided into 1 data cluster, labeled c ₁ ~c _m In the first step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c _m+1 And marking the data cluster set of the current step I as c _I The method comprises the steps of carrying out a first treatment on the surface of the Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c _m+1 With other data clusters c _s Correlation betweenThe method comprises the steps of carrying out a first treatment on the surface of the After iterating the m-1 step, all the measurement data are divided into 2 types, the type with less members is marked as containing the attack vector, the type with more members is marked as not containing the attack vector, and the label rho is set as 1 or 0 respectively.

According to one embodiment of the invention, the phenotype correlation of a cluster tree of data clusters is calculated in particular according to the following formula:

the phenotype correlation of the cluster tree formed by the data clusters is calculated according to the following formula:

；

；/> ；

wherein L is the phenotype correlation, M is a correlation matrix, I is an iteration step number, I and j are sum step numbers, p and q are data cluster numbers, alpha and beta are a first parameter and a second parameter respectively,for the correlation matrix of the data clusters p and q in iteration step i-1, +.>For the correlation matrix of the data clusters p and q in the ith iteration step,/for the correlation matrix of the data clusters p and q>The correlation matrix for the i and j summing steps.

According to one embodiment of the present invention, identifying whether the metrology data is anomalous data based on the phenotype association includes: if the phenotype association is greater than set point L _min And judging the measured data as abnormal data.

According to one embodiment of the invention, the set value L is obtained according to the following formula _min ：

；/>；/>；

The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the discrimination error rate for normal samples, +.>For normal sample total number>For judging the number of errors in the normal sample, +.>For the discrimination error rate for abnormal samples, +.>For the total number of abnormal samples, +.>For determining the number of errors in an abnormal sample, +.>The error rate of the sample discrimination is s.t. as constraint condition.

An embodiment of a second aspect of the present invention provides an apparatus for identifying abnormal data in an electric power system, including: the acquisition module is used for acquiring measurement data of each node in the power system, converting the measurement data into a frequency domain and calculating the amplitude of the measurement data in the frequency domain; the clustering module is used for calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, a node data clustering method based on the correlation matrix, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result; the calculation module is used for calculating phenotype correlation of a clustering tree formed by the data clusters based on the clustered data clusters and a correlation matrix of the data clusters; and the identification module is used for identifying whether the measurement data are abnormal data according to the phenotype correlation.

The identification device for the abnormal data in the power system provided by the invention further has the following additional technical characteristics:

according to one embodiment of the present invention, the clustering module is specifically configured to: initially, each node is individually divided into 1 data cluster, labeled c ₁ ~c _m In the first placeIn the step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c _m+1 And marking the data cluster set of the current step I as c _I The method comprises the steps of carrying out a first treatment on the surface of the Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c _m+1 With other data clusters c _s Correlation between->The method comprises the steps of carrying out a first treatment on the surface of the After iterating the m-1 step, all the measurement data are divided into 2 types, the type with less members is marked as containing the attack vector, the type with more members is marked as not containing the attack vector, and the label rho is set as 1 or 0 respectively.

According to one embodiment of the invention, the calculation module calculates the phenotype correlation of the cluster tree consisting of the data clusters according to the following formula:

；

；/> ；

wherein L is the phenotype correlation, M is a correlation matrix, I is an iteration step number, I and j are sum step numbers, p and q are data cluster numbers, alpha and beta are a first parameter and a second parameter respectively,for the phases of data clusters p and q in the i-1 th iteration stepA dependency matrix,/->For the correlation matrix of the data clusters p and q in the ith iteration step,/for the correlation matrix of the data clusters p and q>The correlation matrix for the i and j summing steps.

The invention has the beneficial effects that:

according to the method and the device, the abnormal data are identified by utilizing the physical characteristics in the power system and the correlation among the nodes, so that the nodes possibly invaded by an attacker for injecting the attack vector can be effectively searched, and the safety and the reliability of the power system are improved.

Drawings

FIG. 1 is a flow chart of a method of identifying anomaly data in a power system according to one embodiment of the present invention;

FIG. 2 is a flowchart of a method for identifying abnormal data in a power system according to another embodiment of the present invention;

FIG. 3 is a schematic node diagram of an electrical power system according to one specific example of the invention;

FIG. 4 is a sample test results schematic of a power system according to one specific example of the invention;

fig. 5 is a block diagram of an apparatus for identifying abnormal data in a power system according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flowchart of a method of identifying abnormal data in a power system according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:

s1, measuring data of each node in the power system is obtained, the measuring data are transformed into a frequency domain, and the amplitude of the measuring data in the frequency domain is calculated.

Specifically, the measurement data may be data such as voltage, power angle, active power, reactive power, etc. of each generator node in the power system.

The metrology data may be fourier transformed to calculate magnitudes in the frequency domain to provide input data for a subsequent correlation matrix. Specifically, the magnitude of the measurement data in the frequency domain can be calculated according to the following formula:

；

S2, calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result by a node data clustering method based on the correlation matrix.

；

wherein M and n are node numbers, M is a correlation matrix, M is a symmetric matrix,the element of (a) is the correlation between node m and node n,Wfor the number of data points in the sampling time window, i is the number of data points in the sampling time window, +.>And->The magnitudes of the measured data of node m and node n, respectively, in the frequency domain, +.>And->Respectively->And->Average value of>For the amplitude of the ith data point of node m in the sampling time window, +.>For the amplitude of the ith data point of the node n in the sampling time window, T is the transposed matrix.

According to one embodiment of the present invention, as shown in fig. 2, a method for clustering node data based on a correlation matrix gradually aggregates data clusters within a preset distance in an iterative manner to generate a clustering result, which specifically includes:

s21, at the beginning, each node is divided into 1 data cluster individually, marked asc ₁ ~c _m In the first placeIn the step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c _m+1 And marking the data cluster set of the current step I as c _I 。

Wherein,；/>。

s22, adding the m+1th row and the m+1th column into the correlation matrix, and calculating a data cluster c _m+1 With other data clusters c _s Correlation between。

；

Wherein,for data cluster->The total number of sub-data clusters contained in the data stream.

S23, after iterating the m-1 steps, dividing all measurement data into 2 types, marking the type with less members as containing attack vectors, marking the type with more members as not containing attack vectors, and setting the label rho as 1 or 0 respectively.

It can be understood that, considering that the attack resources of an attacker are limited, and meanwhile, the probability of tampering with more than 50% of node measurement data in the system is extremely low, so that after clustering, the class with the smaller number of members is marked as containing attack vectors, the class with the larger number is marked as not containing attack vectors, and the labels rho are respectively set to be 1 or 0.

S3, calculating the phenotype correlation L of a clustering tree formed by the data clusters based on the clustered data clusters and the correlation matrix of the data clusters.

Specifically, the phenotype correlation of the cluster tree formed by the data clusters is used for representing whether the clustering result can represent the difference between the data of each node, and the phenotype correlation of the cluster tree formed by the data clusters is calculated according to the following formula:

；

S4, identifying whether the measurement data is abnormal data according to the phenotype related L.

The closer L is to 1, the larger the difference between the data, i.e., the more likely an attack vector is present in the data. Therefore, there is a certain set value L _min When (when)When the phenotype association is greater than the set value L _min And indicating that the measured data contains an attack vector, and judging that the measured data is abnormal data.

L _min Needs to be determined based on a large amount of history and simulation data, and the optimal L can be obtained by solving the following optimization problem _min Thereby maximizing discrimination accuracy for abnormal data.

According to one embodiment of the present invention, the set value L is obtained according to the following formula _min ：

；

Wherein,for the discrimination error rate for normal samples, +.>For normal sample total number>For judging the number of errors in the normal sample, +.>For the discrimination error rate for abnormal samples, +.>For the total number of abnormal samples,for determining the number of errors in an abnormal sample, +.>The error rate of the sample discrimination is s.t. as constraint condition.

Determining L _min Then, can be in practical application according to L and L _min The quantitative relation between the two data is used for judging whether the measured data contains an attack vector.

In order to more intuitively describe the technical effects of the identification method of the invention, the method is applied to a 39-node system in a power system shown in fig. 3, and 1 measuring unit, numbered M1-M10, is respectively deployed on 10 generators in the system and is used for measuring equal measurement data of voltage, power angle, active power and reactive power of each generator node. The attack target is a system transient stability prediction algorithm after failure, the input of the attack target is measurement data of voltage of each node within a certain time window after failure, the output is a system stability prediction result, and the attack mode is to induce a distributed power supply at a user side to inject attack vectors into a power system.

600 groups of normal samples are randomly selected from a data set constructed by an attack scene, disturbance injected into a distributed power supply is calculated aiming at the samples, and an attack vector and a final attack effect which are injected into a power system by the distributed power supply are obtained. 120 groups of samples with successful attacks are selected, and the measured data in the samples are replaced by attack vectors so as to obtain abnormal samples, and other samples are kept unchanged. The phenotypic correlation of normal sample data and abnormal sample data containing an attack vector is counted as shown in (a) of fig. 4, and the number of nodes where distributed power sources identified as likely to be invaded are located as shown in (b) of fig. 4. The phenotype correlation of the abnormal sample data is mainly distributed in the interval (0.94,1), and in order to avoid the detection of abnormal samples as much as possible, a relatively aggressive discrimination mode is adopted, so that the abnormal sample data is obtained after optimization. The number of the nodes judged to be invaded in the abnormal sample is mostly distributed in 3, and the number is the same as that of an attack scheme selected by an attacker in an attack scene.

Therefore, by adopting the abnormal data identification method, the abnormal sample picking rate is 92.5%, the node where the invaded distributed power supply is located in the abnormal sample can be effectively identified, and the invaded node error detected by only a small part of the abnormal samples can be effectively identified.

In summary, according to the method for identifying abnormal data in a power system of the embodiment of the invention, measurement data of each node in the power system is obtained, the measurement data is transformed into a frequency domain, the amplitude of the measurement data in the frequency domain is calculated, then a correlation matrix among the measurement data of each node is calculated according to the amplitude of the measurement data in the frequency domain, a node data clustering method based on the correlation matrix is used, data clusters within a preset distance are gradually aggregated in an iterative mode to generate a clustering result, the phenotype correlation of a clustering tree formed by the data clusters is calculated based on the clustered data clusters and the correlation matrix of the data clusters, and finally whether the measurement data is abnormal data is identified according to the phenotype correlation. Therefore, abnormal data are identified by utilizing the physical characteristics in the power system and the correlation among the nodes, so that the nodes possibly invaded by an attacker for injecting attack vectors can be effectively searched, and the safety and reliability of the power system are improved.

Corresponding to the identification method of the abnormal data in the electric power system, the invention also provides an identification device of the abnormal data in the electric power system. Since the device embodiment of the present invention corresponds to the above-described method embodiment, for details not disclosed in the device embodiment, reference may be made to the above-described method embodiment, and details are not repeated in the present invention.

Fig. 5 is a block diagram of an apparatus for identifying abnormal data in a power system according to an embodiment of the present invention, as shown in fig. 5, the apparatus includes: the system comprises an acquisition module 100, a clustering module 200, a calculation module 300 and an identification module 400.

The acquisition module 100 is configured to acquire measurement data of each node in the power system, transform the measurement data into a frequency domain, and calculate an amplitude of the measurement data in the frequency domain; the clustering module 200 is configured to calculate a correlation matrix between measurement data of each node according to an amplitude value of the measurement data in a frequency domain, and gradually aggregate data clusters within a preset distance in an iterative manner to generate a clustering result; the calculating module 300 is used for calculating the phenotype correlation of the clustering tree formed by the data clusters based on the clustered data clusters and the correlation matrix of the data clusters; the identification module 400 is configured to identify whether the measurement data is abnormal data according to the phenotype association.

According to one embodiment of the invention, the clustering module 200 is specifically configured to: initially, each node is individually divided into 1 data cluster, labeled c ₁ ~c _m In the step I, selecting the data cluster p and q with highest correlation to aggregate to generate a new data cluster and marking the new data cluster as c _m+1 And marking the data cluster set of the current step I as c _I The method comprises the steps of carrying out a first treatment on the surface of the Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c _m+1 With other data clusters c _s Correlation betweenThe method comprises the steps of carrying out a first treatment on the surface of the After iterating the m-1 step, all the measurement data are divided into 2 types, the type with less members is marked as containing the attack vector, the type with more members is marked as not containing the attack vector, and the label rho is set as 1 or 0 respectively.

According to one embodiment of the invention, the calculation module 300 calculates the phenotype association of the cluster tree of data clusters according to the following formula:

；

According to one embodiment of the present invention, the identification module 400 is specifically configured to: if the phenotype association is greater than set point L _min And judging the measured data as abnormal data.

The identification module 400 obtains the set value L according to the following formula _min ：

；

Wherein,for the discrimination error rate for normal samples, +.>For normal sample total number>For judging the number of errors in the normal sample, +.>For the discrimination error rate for abnormal samples, +.>For the total number of abnormal samples,for determining the number of errors in an abnormal sample, +.>The discrimination error rate of the sample.

In summary, according to the identification device for the abnormal data in the electric power system provided by the embodiment of the invention, the abnormal data is identified by utilizing the correlation between the physical characteristics and the nodes in the electric power system, so that the nodes possibly invaded by an attacker for injecting the attack vector can be effectively searched, and the safety and the reliability of the electric power system are improved.

In the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The meaning of "a plurality of" is two or more, unless specifically defined otherwise.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily for the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The identification method of the abnormal data in the electric power system is characterized by comprising the following steps:

obtaining measurement data of each node in the power system, transforming the measurement data into a frequency domain, and calculating the amplitude of the measurement data in the frequency domain;

calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result by a node data clustering method based on the correlation matrix;

based on the clustered data clusters and the correlation matrix of the data clusters, calculating the phenotype correlation of a clustering tree formed by the data clusters;

identifying whether the measured data is abnormal data according to the phenotype correlation;

specifically, a correlation matrix between measurement data of each node is calculated according to the following formula:

；

wherein M and n are node numbers, M is a correlation matrix,the element of (1) is node->And node->The correlation between the two is that,Wfor the number of data points in the sampling time window, i is the number of data points in the sampling time window, +.>And->The magnitudes of the measured data of node m and node n, respectively, in the frequency domain, +.>And->Respectively->And->Average value of>For the amplitude of the ith data point of node m in the sampling time window, +.>The amplitude of the ith data point of the node n in the sampling time window is given, and T is the transposed matrix;

the node data clustering method based on the correlation matrix gradually aggregates data clusters within a preset distance in an iterative mode to generate a clustering result, and specifically comprises the following steps:

initially, each node is individually divided into 1 data cluster, labeled c ₁ ~c _m In the first placeIn the step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c _m+1 And marking the data cluster set of the current step I as c _I ；

Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c _m+1 With other data clusters c _s Correlation between；

After iterating the step m-1, dividing all measurement data into 2 types, marking the type with less members as containing attack vectors, marking the type with more members as not containing attack vectors, and setting the label rho as 1 or 0 respectively;

；

2. The method for identifying abnormal data in a power system according to claim 1, wherein the magnitude of the measured data in the frequency domain is calculated according to the following formula:

；

3. The method according to claim 1, wherein identifying whether the measured data is abnormal data according to the phenotype association comprises:

if the phenotype association is greater than set point L _min And judging the measured data as abnormal data.

4. A method according to claim 3The method for identifying abnormal data in the electric power system is characterized in that the set value L is obtained according to the following formula _min ：

；

Wherein,for the discrimination error rate for normal samples, +.>For normal sample total number>For judging the number of errors in the normal sample, +.>For the discrimination error rate for abnormal samples, +.>For the total number of abnormal samples, +.>For determining the number of errors in an abnormal sample, +.>Error discrimination of sampleError rate, s.t. is constraint.

5. An apparatus for identifying abnormal data in an electric power system, comprising:

the acquisition module is used for acquiring measurement data of each node in the power system, converting the measurement data into a frequency domain and calculating the amplitude of the measurement data in the frequency domain;

the clustering module is used for calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, a node data clustering method based on the correlation matrix, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result;

the calculation module is used for calculating phenotype correlation of a clustering tree formed by the data clusters based on the clustered data clusters and a correlation matrix of the data clusters;

the identification module is used for identifying whether the measurement data are abnormal data according to the phenotype correlation;

the clustering module specifically calculates a correlation matrix between measurement data of each node according to the following formula:

；

wherein M and n are node numbers, M is a correlation matrix,the element of (1) is node->And node->The correlation between the two is that,Wfor the number of data points in the sampling time window, i is the number of data points in the sampling time window, +.>And->Is used for the average value of (a),for the amplitude of the ith data point of node m in the sampling time window, +.>The amplitude of the ith data point of the node n in the sampling time window is given, and T is the transposed matrix;

the clustering module is specifically configured to:

the calculation module calculates the phenotype correlation of a clustering tree formed by the data clusters according to the following formula:

；