CN116881746B - Identification method and identification device for abnormal data in electric power system - Google Patents

Identification method and identification device for abnormal data in electric power system Download PDF

Info

Publication number
CN116881746B
CN116881746B CN202311152729.6A CN202311152729A CN116881746B CN 116881746 B CN116881746 B CN 116881746B CN 202311152729 A CN202311152729 A CN 202311152729A CN 116881746 B CN116881746 B CN 116881746B
Authority
CN
China
Prior art keywords
data
node
correlation
correlation matrix
clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311152729.6A
Other languages
Chinese (zh)
Other versions
CN116881746A (en
Inventor
杨晓林
邵康
袁琪
金高铭
韦绍毅
承昊新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Changzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Changzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202311152729.6A priority Critical patent/CN116881746B/en
Publication of CN116881746A publication Critical patent/CN116881746A/en
Application granted granted Critical
Publication of CN116881746B publication Critical patent/CN116881746B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention relates to the technical field of power systems, and provides a method and a device for identifying abnormal data in a power system, wherein the method comprises the following steps: acquiring measurement data of each node in the power system, converting the measurement data into a frequency domain, and calculating the amplitude of the measurement data in the frequency domain; calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result by a node data clustering method based on the correlation matrix; based on the clustered data clusters and the correlation matrix of the data clusters, calculating the phenotype correlation of a clustering tree formed by the data clusters; identifying whether the measurement data is abnormal data according to the phenotype correlation. According to the method and the device, the abnormal data are identified by utilizing the physical characteristics in the power system and the correlation among the nodes, so that the nodes possibly invaded by an attacker for injecting the attack vector can be effectively searched, and the safety and the reliability of the power system are improved.

Description

Identification method and identification device for abnormal data in electric power system
Technical Field
The invention relates to the technical field of power systems, in particular to a method for identifying abnormal data in a power system and a device for identifying the abnormal data in the power system.
Background
With the sustainable development of a novel power system, the problems of high uncertainty caused by high-permeability new energy power generation grid connection, strong nonlinearity caused by large-scale power electronic equipment access, high-dimensional state/action space caused by mass measurement/control nodes and the like are caused, so that the analysis difficulty and the control strategy complexity of the power system are sharply improved.
In the related art, although various data-driven solutions have been proposed, the data-driven method can greatly reduce the analysis and control complexity of the system and improve the execution efficiency of the control strategy. However, the development of data driving is accompanied by a large number of intelligent devices and internet of things terminal devices, the devices provide potential attack vector injection points for network attack aiming at the data driving algorithm, and an attacker can inject attack vectors from an information side or a physical side system by utilizing the physical coupling characteristic of power grid information to attack the data driving algorithm, so that the purpose of benefiting or damaging the stable operation of the system is achieved.
In this context, the challenge and defense problems for data driven algorithms are a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to solve the above technical problems, an embodiment of a first aspect of the present invention provides a method for identifying abnormal data in an electric power system.
An embodiment of the second aspect of the present invention provides an apparatus for identifying abnormal data in a power system.
The technical scheme adopted by the invention is as follows:
an embodiment of a first aspect of the present invention provides a method for identifying abnormal data in an electric power system, including the following steps: obtaining measurement data of each node in the power system, transforming the measurement data into a frequency domain, and calculating the amplitude of the measurement data in the frequency domain; calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result by a node data clustering method based on the correlation matrix; based on the clustered data clusters and the correlation matrix of the data clusters, calculating the phenotype correlation of a clustering tree formed by the data clusters; identifying whether the measurement data is abnormal data according to the phenotype correlation.
The identification method of the abnormal data in the power system also has the following additional technical characteristics:
according to one embodiment of the invention, the magnitude of the measurement data in the frequency domain is calculated in particular according to the following formula:
wherein,to measure the amplitude of the data in the frequency domain,Wfor the number of data points within the sampling time window,Xin order to measure the data matrix of the data,ithe data points within the sampling time window are numbered,mnumbering nodes, ->The value of the ith data point of the node m in the sampling time window is given, k is the coordinate in the frequency domain, and e is the bottom of the natural logarithm.
According to one embodiment of the present invention, the correlation matrix between the measurement data of each node is calculated according to the following formula:
;/>
wherein M and n are node numbers, M is a correlation matrix,the element of (a) is the correlation between node m and node n,Wfor the number of data points in the sampling time window, i is the number of data points in the sampling time window, +.>And->The magnitudes of the measured data of node m and node n, respectively, in the frequency domain, +.>And->Respectively->And->Average value of>For the amplitude of the ith data point of node m in the sampling time window, +.>For the amplitude of the ith data point of the node n in the sampling time window, T is the transposed matrix.
According to one embodiment of the invention, the node data clustering method based on the correlation matrix gradually aggregates the data clusters within a preset distance in an iterative mode to generate a clustering result, and specifically comprises the following steps: initially, each node is individually divided into 1 data cluster, labeled c 1 ~c m In the first step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c m+1 And marking the data cluster set of the current step I as c I The method comprises the steps of carrying out a first treatment on the surface of the Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c m+1 With other data clusters c s Correlation betweenThe method comprises the steps of carrying out a first treatment on the surface of the After iterating the m-1 step, all the measurement data are divided into 2 types, the type with less members is marked as containing the attack vector, the type with more members is marked as not containing the attack vector, and the label rho is set as 1 or 0 respectively.
According to one embodiment of the invention, the phenotype correlation of a cluster tree of data clusters is calculated in particular according to the following formula:
the phenotype correlation of the cluster tree formed by the data clusters is calculated according to the following formula:
;/>
wherein L is the phenotype correlation, M is a correlation matrix, I is an iteration step number, I and j are sum step numbers, p and q are data cluster numbers, alpha and beta are a first parameter and a second parameter respectively,for the correlation matrix of the data clusters p and q in iteration step i-1, +.>For the correlation matrix of the data clusters p and q in the ith iteration step,/for the correlation matrix of the data clusters p and q>The correlation matrix for the i and j summing steps.
According to one embodiment of the present invention, identifying whether the metrology data is anomalous data based on the phenotype association includes: if the phenotype association is greater than set point L min And judging the measured data as abnormal data.
According to one embodiment of the invention, the set value L is obtained according to the following formula min
;/>;/>
The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the discrimination error rate for normal samples, +.>For normal sample total number>For judging the number of errors in the normal sample, +.>For the discrimination error rate for abnormal samples, +.>For the total number of abnormal samples, +.>For determining the number of errors in an abnormal sample, +.>The error rate of the sample discrimination is s.t. as constraint condition.
An embodiment of a second aspect of the present invention provides an apparatus for identifying abnormal data in an electric power system, including: the acquisition module is used for acquiring measurement data of each node in the power system, converting the measurement data into a frequency domain and calculating the amplitude of the measurement data in the frequency domain; the clustering module is used for calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, a node data clustering method based on the correlation matrix, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result; the calculation module is used for calculating phenotype correlation of a clustering tree formed by the data clusters based on the clustered data clusters and a correlation matrix of the data clusters; and the identification module is used for identifying whether the measurement data are abnormal data according to the phenotype correlation.
The identification device for the abnormal data in the power system provided by the invention further has the following additional technical characteristics:
according to one embodiment of the present invention, the clustering module is specifically configured to: initially, each node is individually divided into 1 data cluster, labeled c 1 ~c m In the first placeIn the step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c m+1 And marking the data cluster set of the current step I as c I The method comprises the steps of carrying out a first treatment on the surface of the Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c m+1 With other data clusters c s Correlation between->The method comprises the steps of carrying out a first treatment on the surface of the After iterating the m-1 step, all the measurement data are divided into 2 types, the type with less members is marked as containing the attack vector, the type with more members is marked as not containing the attack vector, and the label rho is set as 1 or 0 respectively.
According to one embodiment of the invention, the calculation module calculates the phenotype correlation of the cluster tree consisting of the data clusters according to the following formula:
;/>
wherein L is the phenotype correlation, M is a correlation matrix, I is an iteration step number, I and j are sum step numbers, p and q are data cluster numbers, alpha and beta are a first parameter and a second parameter respectively,for the phases of data clusters p and q in the i-1 th iteration stepA dependency matrix,/->For the correlation matrix of the data clusters p and q in the ith iteration step,/for the correlation matrix of the data clusters p and q>The correlation matrix for the i and j summing steps.
The invention has the beneficial effects that:
according to the method and the device, the abnormal data are identified by utilizing the physical characteristics in the power system and the correlation among the nodes, so that the nodes possibly invaded by an attacker for injecting the attack vector can be effectively searched, and the safety and the reliability of the power system are improved.
Drawings
FIG. 1 is a flow chart of a method of identifying anomaly data in a power system according to one embodiment of the present invention;
FIG. 2 is a flowchart of a method for identifying abnormal data in a power system according to another embodiment of the present invention;
FIG. 3 is a schematic node diagram of an electrical power system according to one specific example of the invention;
FIG. 4 is a sample test results schematic of a power system according to one specific example of the invention;
fig. 5 is a block diagram of an apparatus for identifying abnormal data in a power system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flowchart of a method of identifying abnormal data in a power system according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
s1, measuring data of each node in the power system is obtained, the measuring data are transformed into a frequency domain, and the amplitude of the measuring data in the frequency domain is calculated.
Specifically, the measurement data may be data such as voltage, power angle, active power, reactive power, etc. of each generator node in the power system.
The metrology data may be fourier transformed to calculate magnitudes in the frequency domain to provide input data for a subsequent correlation matrix. Specifically, the magnitude of the measurement data in the frequency domain can be calculated according to the following formula:
wherein,to measure the amplitude of the data in the frequency domain,Wfor the number of data points within the sampling time window,Xin order to measure the data matrix of the data,ithe data points within the sampling time window are numbered,mnumbering nodes, ->The value of the ith data point of the node m in the sampling time window is given, k is the coordinate in the frequency domain, and e is the bottom of the natural logarithm.
S2, calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result by a node data clustering method based on the correlation matrix.
According to one embodiment of the present invention, the correlation matrix between the measurement data of each node is calculated according to the following formula:
wherein M and n are node numbers, M is a correlation matrix, M is a symmetric matrix,the element of (a) is the correlation between node m and node n,Wfor the number of data points in the sampling time window, i is the number of data points in the sampling time window, +.>And->The magnitudes of the measured data of node m and node n, respectively, in the frequency domain, +.>And->Respectively->And->Average value of>For the amplitude of the ith data point of node m in the sampling time window, +.>For the amplitude of the ith data point of the node n in the sampling time window, T is the transposed matrix.
According to one embodiment of the present invention, as shown in fig. 2, a method for clustering node data based on a correlation matrix gradually aggregates data clusters within a preset distance in an iterative manner to generate a clustering result, which specifically includes:
s21, at the beginning, each node is divided into 1 data cluster individually, marked asc 1 ~c m In the first placeIn the step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c m+1 And marking the data cluster set of the current step I as c I
Wherein,;/>
s22, adding the m+1th row and the m+1th column into the correlation matrix, and calculating a data cluster c m+1 With other data clusters c s Correlation between
Wherein,for data cluster->The total number of sub-data clusters contained in the data stream.
S23, after iterating the m-1 steps, dividing all measurement data into 2 types, marking the type with less members as containing attack vectors, marking the type with more members as not containing attack vectors, and setting the label rho as 1 or 0 respectively.
It can be understood that, considering that the attack resources of an attacker are limited, and meanwhile, the probability of tampering with more than 50% of node measurement data in the system is extremely low, so that after clustering, the class with the smaller number of members is marked as containing attack vectors, the class with the larger number is marked as not containing attack vectors, and the labels rho are respectively set to be 1 or 0.
S3, calculating the phenotype correlation L of a clustering tree formed by the data clusters based on the clustered data clusters and the correlation matrix of the data clusters.
Specifically, the phenotype correlation of the cluster tree formed by the data clusters is used for representing whether the clustering result can represent the difference between the data of each node, and the phenotype correlation of the cluster tree formed by the data clusters is calculated according to the following formula:
wherein L is the phenotype correlation, M is a correlation matrix, I is an iteration step number, I and j are sum step numbers, p and q are data cluster numbers, alpha and beta are a first parameter and a second parameter respectively,for the correlation matrix of the data clusters p and q in iteration step i-1, +.>For the correlation matrix of the data clusters p and q in the ith iteration step,/for the correlation matrix of the data clusters p and q>The correlation matrix for the i and j summing steps.
S4, identifying whether the measurement data is abnormal data according to the phenotype related L.
The closer L is to 1, the larger the difference between the data, i.e., the more likely an attack vector is present in the data. Therefore, there is a certain set value L min When (when)When the phenotype association is greater than the set value L min And indicating that the measured data contains an attack vector, and judging that the measured data is abnormal data.
L min Needs to be determined based on a large amount of history and simulation data, and the optimal L can be obtained by solving the following optimization problem min Thereby maximizing discrimination accuracy for abnormal data.
According to one embodiment of the present invention, the set value L is obtained according to the following formula min
Wherein,for the discrimination error rate for normal samples, +.>For normal sample total number>For judging the number of errors in the normal sample, +.>For the discrimination error rate for abnormal samples, +.>For the total number of abnormal samples,for determining the number of errors in an abnormal sample, +.>The error rate of the sample discrimination is s.t. as constraint condition.
Determining L min Then, can be in practical application according to L and L min The quantitative relation between the two data is used for judging whether the measured data contains an attack vector.
In order to more intuitively describe the technical effects of the identification method of the invention, the method is applied to a 39-node system in a power system shown in fig. 3, and 1 measuring unit, numbered M1-M10, is respectively deployed on 10 generators in the system and is used for measuring equal measurement data of voltage, power angle, active power and reactive power of each generator node. The attack target is a system transient stability prediction algorithm after failure, the input of the attack target is measurement data of voltage of each node within a certain time window after failure, the output is a system stability prediction result, and the attack mode is to induce a distributed power supply at a user side to inject attack vectors into a power system.
600 groups of normal samples are randomly selected from a data set constructed by an attack scene, disturbance injected into a distributed power supply is calculated aiming at the samples, and an attack vector and a final attack effect which are injected into a power system by the distributed power supply are obtained. 120 groups of samples with successful attacks are selected, and the measured data in the samples are replaced by attack vectors so as to obtain abnormal samples, and other samples are kept unchanged. The phenotypic correlation of normal sample data and abnormal sample data containing an attack vector is counted as shown in (a) of fig. 4, and the number of nodes where distributed power sources identified as likely to be invaded are located as shown in (b) of fig. 4. The phenotype correlation of the abnormal sample data is mainly distributed in the interval (0.94,1), and in order to avoid the detection of abnormal samples as much as possible, a relatively aggressive discrimination mode is adopted, so that the abnormal sample data is obtained after optimization. The number of the nodes judged to be invaded in the abnormal sample is mostly distributed in 3, and the number is the same as that of an attack scheme selected by an attacker in an attack scene.
Therefore, by adopting the abnormal data identification method, the abnormal sample picking rate is 92.5%, the node where the invaded distributed power supply is located in the abnormal sample can be effectively identified, and the invaded node error detected by only a small part of the abnormal samples can be effectively identified.
In summary, according to the method for identifying abnormal data in a power system of the embodiment of the invention, measurement data of each node in the power system is obtained, the measurement data is transformed into a frequency domain, the amplitude of the measurement data in the frequency domain is calculated, then a correlation matrix among the measurement data of each node is calculated according to the amplitude of the measurement data in the frequency domain, a node data clustering method based on the correlation matrix is used, data clusters within a preset distance are gradually aggregated in an iterative mode to generate a clustering result, the phenotype correlation of a clustering tree formed by the data clusters is calculated based on the clustered data clusters and the correlation matrix of the data clusters, and finally whether the measurement data is abnormal data is identified according to the phenotype correlation. Therefore, abnormal data are identified by utilizing the physical characteristics in the power system and the correlation among the nodes, so that the nodes possibly invaded by an attacker for injecting attack vectors can be effectively searched, and the safety and reliability of the power system are improved.
Corresponding to the identification method of the abnormal data in the electric power system, the invention also provides an identification device of the abnormal data in the electric power system. Since the device embodiment of the present invention corresponds to the above-described method embodiment, for details not disclosed in the device embodiment, reference may be made to the above-described method embodiment, and details are not repeated in the present invention.
Fig. 5 is a block diagram of an apparatus for identifying abnormal data in a power system according to an embodiment of the present invention, as shown in fig. 5, the apparatus includes: the system comprises an acquisition module 100, a clustering module 200, a calculation module 300 and an identification module 400.
The acquisition module 100 is configured to acquire measurement data of each node in the power system, transform the measurement data into a frequency domain, and calculate an amplitude of the measurement data in the frequency domain; the clustering module 200 is configured to calculate a correlation matrix between measurement data of each node according to an amplitude value of the measurement data in a frequency domain, and gradually aggregate data clusters within a preset distance in an iterative manner to generate a clustering result; the calculating module 300 is used for calculating the phenotype correlation of the clustering tree formed by the data clusters based on the clustered data clusters and the correlation matrix of the data clusters; the identification module 400 is configured to identify whether the measurement data is abnormal data according to the phenotype association.
According to one embodiment of the invention, the clustering module 200 is specifically configured to: initially, each node is individually divided into 1 data cluster, labeled c 1 ~c m In the step I, selecting the data cluster p and q with highest correlation to aggregate to generate a new data cluster and marking the new data cluster as c m+1 And marking the data cluster set of the current step I as c I The method comprises the steps of carrying out a first treatment on the surface of the Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c m+1 With other data clusters c s Correlation betweenThe method comprises the steps of carrying out a first treatment on the surface of the After iterating the m-1 step, all the measurement data are divided into 2 types, the type with less members is marked as containing the attack vector, the type with more members is marked as not containing the attack vector, and the label rho is set as 1 or 0 respectively.
According to one embodiment of the invention, the calculation module 300 calculates the phenotype association of the cluster tree of data clusters according to the following formula:
wherein L is the phenotype correlation, M is a correlation matrix, I is an iteration step number, I and j are sum step numbers, p and q are data cluster numbers, alpha and beta are a first parameter and a second parameter respectively,for the correlation matrix of the data clusters p and q in iteration step i-1, +.>For the correlation matrix of the data clusters p and q in the ith iteration step,/for the correlation matrix of the data clusters p and q>The correlation matrix for the i and j summing steps.
According to one embodiment of the present invention, the identification module 400 is specifically configured to: if the phenotype association is greater than set point L min And judging the measured data as abnormal data.
The identification module 400 obtains the set value L according to the following formula min
Wherein,for the discrimination error rate for normal samples, +.>For normal sample total number>For judging the number of errors in the normal sample, +.>For the discrimination error rate for abnormal samples, +.>For the total number of abnormal samples,for determining the number of errors in an abnormal sample, +.>The discrimination error rate of the sample.
In summary, according to the identification device for the abnormal data in the electric power system provided by the embodiment of the invention, the abnormal data is identified by utilizing the correlation between the physical characteristics and the nodes in the electric power system, so that the nodes possibly invaded by an attacker for injecting the attack vector can be effectively searched, and the safety and the reliability of the electric power system are improved.
In the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The meaning of "a plurality of" is two or more, unless specifically defined otherwise.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily for the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. The identification method of the abnormal data in the electric power system is characterized by comprising the following steps:
obtaining measurement data of each node in the power system, transforming the measurement data into a frequency domain, and calculating the amplitude of the measurement data in the frequency domain;
calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result by a node data clustering method based on the correlation matrix;
based on the clustered data clusters and the correlation matrix of the data clusters, calculating the phenotype correlation of a clustering tree formed by the data clusters;
identifying whether the measured data is abnormal data according to the phenotype correlation;
specifically, a correlation matrix between measurement data of each node is calculated according to the following formula:
wherein M and n are node numbers, M is a correlation matrix,the element of (1) is node->And node->The correlation between the two is that,Wfor the number of data points in the sampling time window, i is the number of data points in the sampling time window, +.>And->The magnitudes of the measured data of node m and node n, respectively, in the frequency domain, +.>And->Respectively->And->Average value of>For the amplitude of the ith data point of node m in the sampling time window, +.>The amplitude of the ith data point of the node n in the sampling time window is given, and T is the transposed matrix;
the node data clustering method based on the correlation matrix gradually aggregates data clusters within a preset distance in an iterative mode to generate a clustering result, and specifically comprises the following steps:
initially, each node is individually divided into 1 data cluster, labeled c 1 ~c m In the first placeIn the step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c m+1 And marking the data cluster set of the current step I as c I
Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c m+1 With other data clusters c s Correlation between
After iterating the step m-1, dividing all measurement data into 2 types, marking the type with less members as containing attack vectors, marking the type with more members as not containing attack vectors, and setting the label rho as 1 or 0 respectively;
the phenotype correlation of the cluster tree formed by the data clusters is calculated according to the following formula:
wherein L is the phenotype correlation, M is a correlation matrix, I is an iteration step number, I and j are sum step numbers, p and q are data cluster numbers, alpha and beta are a first parameter and a second parameter respectively,for the correlation matrix of the data clusters p and q in iteration step i-1, +.>For the correlation matrix of the data clusters p and q in the ith iteration step,/for the correlation matrix of the data clusters p and q>The correlation matrix for the i and j summing steps.
2. The method for identifying abnormal data in a power system according to claim 1, wherein the magnitude of the measured data in the frequency domain is calculated according to the following formula:
wherein,to measure the amplitude of the data in the frequency domain,Wfor the number of data points within the sampling time window,Xin order to measure the data matrix of the data,ithe data points within the sampling time window are numbered,mnumbering nodes, ->The value of the ith data point of the node m in the sampling time window is given, k is the coordinate in the frequency domain, and e is the bottom of the natural logarithm.
3. The method according to claim 1, wherein identifying whether the measured data is abnormal data according to the phenotype association comprises:
if the phenotype association is greater than set point L min And judging the measured data as abnormal data.
4. A method according to claim 3The method for identifying abnormal data in the electric power system is characterized in that the set value L is obtained according to the following formula min
Wherein,for the discrimination error rate for normal samples, +.>For normal sample total number>For judging the number of errors in the normal sample, +.>For the discrimination error rate for abnormal samples, +.>For the total number of abnormal samples, +.>For determining the number of errors in an abnormal sample, +.>Error discrimination of sampleError rate, s.t. is constraint.
5. An apparatus for identifying abnormal data in an electric power system, comprising:
the acquisition module is used for acquiring measurement data of each node in the power system, converting the measurement data into a frequency domain and calculating the amplitude of the measurement data in the frequency domain;
the clustering module is used for calculating a correlation matrix among the measured data of each node according to the amplitude of the measured data in the frequency domain, a node data clustering method based on the correlation matrix, and gradually aggregating data clusters within a preset distance in an iterative mode to generate a clustering result;
the calculation module is used for calculating phenotype correlation of a clustering tree formed by the data clusters based on the clustered data clusters and a correlation matrix of the data clusters;
the identification module is used for identifying whether the measurement data are abnormal data according to the phenotype correlation;
the clustering module specifically calculates a correlation matrix between measurement data of each node according to the following formula:
wherein M and n are node numbers, M is a correlation matrix,the element of (1) is node->And node->The correlation between the two is that,Wfor the number of data points in the sampling time window, i is the number of data points in the sampling time window, +.>And->Is used for the average value of (a),for the amplitude of the ith data point of node m in the sampling time window, +.>The amplitude of the ith data point of the node n in the sampling time window is given, and T is the transposed matrix;
the clustering module is specifically configured to:
initially, each node is individually divided into 1 data cluster, labeled c 1 ~c m In the first placeIn the step, the data clusters p and q with highest correlation are selected for aggregation to generate a new data cluster and marked as c m+1 And marking the data cluster set of the current step I as c I
Adding the m+1th row and the m+1th column into the correlation matrix to calculate a data cluster c m+1 With other data clusters c s Correlation between
After iterating the step m-1, dividing all measurement data into 2 types, marking the type with less members as containing attack vectors, marking the type with more members as not containing attack vectors, and setting the label rho as 1 or 0 respectively;
the calculation module calculates the phenotype correlation of a clustering tree formed by the data clusters according to the following formula:
wherein L is the phenotype correlation, M is a correlation matrix, I is an iteration step number, I and j are sum step numbers, p and q are data cluster numbers, alpha and beta are a first parameter and a second parameter respectively,for the correlation matrix of the data clusters p and q in iteration step i-1, +.>For the correlation matrix of the data clusters p and q in the ith iteration step,/for the correlation matrix of the data clusters p and q>The correlation matrix for the i and j summing steps.
CN202311152729.6A 2023-09-08 2023-09-08 Identification method and identification device for abnormal data in electric power system Active CN116881746B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311152729.6A CN116881746B (en) 2023-09-08 2023-09-08 Identification method and identification device for abnormal data in electric power system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311152729.6A CN116881746B (en) 2023-09-08 2023-09-08 Identification method and identification device for abnormal data in electric power system

Publications (2)

Publication Number Publication Date
CN116881746A CN116881746A (en) 2023-10-13
CN116881746B true CN116881746B (en) 2023-11-14

Family

ID=88260938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311152729.6A Active CN116881746B (en) 2023-09-08 2023-09-08 Identification method and identification device for abnormal data in electric power system

Country Status (1)

Country Link
CN (1) CN116881746B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010095314A1 (en) * 2009-02-17 2010-08-26 株式会社日立製作所 Abnormality detecting method and abnormality detecting system
CN110674864A (en) * 2019-09-20 2020-01-10 国网上海市电力公司 Wind power abnormal data identification method with synchronous phasor measurement device
CN111382862A (en) * 2018-12-27 2020-07-07 国网辽宁省电力有限公司信息通信分公司 Method for identifying abnormal data of power system
CN114418006A (en) * 2022-01-21 2022-04-29 广东电网有限责任公司 Abnormal data detection method and device
CN114827211A (en) * 2022-05-13 2022-07-29 浙江启扬智能科技有限公司 Abnormal monitoring area detection method driven by node data of Internet of things
JP7240691B1 (en) * 2021-09-08 2023-03-16 山東大学 Data drive active power distribution network abnormal state detection method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010095314A1 (en) * 2009-02-17 2010-08-26 株式会社日立製作所 Abnormality detecting method and abnormality detecting system
CN111382862A (en) * 2018-12-27 2020-07-07 国网辽宁省电力有限公司信息通信分公司 Method for identifying abnormal data of power system
CN110674864A (en) * 2019-09-20 2020-01-10 国网上海市电力公司 Wind power abnormal data identification method with synchronous phasor measurement device
JP7240691B1 (en) * 2021-09-08 2023-03-16 山東大学 Data drive active power distribution network abnormal state detection method and system
CN114418006A (en) * 2022-01-21 2022-04-29 广东电网有限责任公司 Abnormal data detection method and device
CN114827211A (en) * 2022-05-13 2022-07-29 浙江启扬智能科技有限公司 Abnormal monitoring area detection method driven by node data of Internet of things

Also Published As

Publication number Publication date
CN116881746A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN106505557B (en) Remote measurement error identification method and device
Zhang et al. Time series anomaly detection for smart grids: A survey
CN106598822B (en) A kind of abnormal deviation data examination method and device for Capacity Assessment
CN110942109A (en) PMU false data injection attack prevention method based on machine learning
US20220128614A1 (en) Partial discharge determination apparatus and partial discharge determination method
CN110858072B (en) Method and device for determining running state of equipment
Ahmadi et al. A new false data injection attack detection model for cyberattack resilient energy forecasting
JPWO2018229897A1 (en) Aging deterioration diagnosis apparatus and aging deterioration diagnosis method
Meng et al. Forced oscillation source location via multivariate time series classification
CN112416662A (en) Multi-time series data anomaly detection method and device
CN116187552A (en) Abnormality detection method, computing device, and computer storage medium
Pathak et al. Iterative signal separation assisted energy disaggregation
CN116155561A (en) Hierarchical clustering-based method and system for detecting multi-class false data injection attack of electric power Internet of things
CN116881746B (en) Identification method and identification device for abnormal data in electric power system
CN112345972B (en) Power distribution network line transformation relation abnormity diagnosis method, device and system based on power failure event
CN116401532B (en) Method and system for recognizing frequency instability of power system after disturbance
Wu et al. Online detection of false data injection attacks to synchrophasor measurements: A data-driven approach
CN112416661B (en) Multi-index time sequence anomaly detection method and device based on compressed sensing
CN116577552A (en) Method and device for diagnosing electricity larceny of intelligent measuring switch
CN116307269A (en) Photovoltaic power generation power prediction method and device based on artificial intelligence
CN116298670A (en) Intelligent fault positioning method and system suitable for multi-branch distribution line
CN112684402B (en) Method and system for monitoring electric energy running error data of stable electric consumption
CN115145790A (en) False data injection attack detection method and system for smart power grid
CN116070302A (en) Cable insulation state prediction method and device
CN115758162A (en) Data prediction model training and photovoltaic inverter fault prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant