CN116304776B - Power grid data value anomaly detection method and system based on k-Means algorithm - Google Patents

Power grid data value anomaly detection method and system based on k-Means algorithm Download PDF

Info

Publication number
CN116304776B
CN116304776B CN202310278784.3A CN202310278784A CN116304776B CN 116304776 B CN116304776 B CN 116304776B CN 202310278784 A CN202310278784 A CN 202310278784A CN 116304776 B CN116304776 B CN 116304776B
Authority
CN
China
Prior art keywords
power grid
data
value
hardware
cluster center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310278784.3A
Other languages
Chinese (zh)
Other versions
CN116304776A (en
Inventor
翁东雷
赵铁林
王露民
莫建国
卢俊
林维修
邱云
唐金祥
李开文
邬霄雷
沈一鹏
张贵中
方凯伦
周行
周冬升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Power Transmission And Transformation Construction Co ltd Operation And Maintenance Branch
Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Ningbo Power Transmission And Transformation Construction Co ltd Operation And Maintenance Branch
Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Power Transmission And Transformation Construction Co ltd Operation And Maintenance Branch, Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical Ningbo Power Transmission And Transformation Construction Co ltd Operation And Maintenance Branch
Priority to CN202310278784.3A priority Critical patent/CN116304776B/en
Publication of CN116304776A publication Critical patent/CN116304776A/en
Application granted granted Critical
Publication of CN116304776B publication Critical patent/CN116304776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The disclosure relates to the field of smart grids, in particular to a method and a system for detecting abnormal power grid data values based on a k-Means algorithm. Judging whether the hardware monitoring has abnormality or data offset when the cluster center marking is carried out, removing the abnormal cluster center when the hardware monitoring is classified, carrying out deviation correction on data which is problematic in hardware but can be normally used after deviation correction, forming the cluster center, judging whether sample points deviating from the new cluster center fall into a confidence threshold after forming the new cluster center, wherein the confidence threshold is obtained through probability statistical analysis, and obtaining more accurate abnormal value detection relative to the existing K-means algorithm by matching with hardware deviation correction and setting of expected values.

Description

Power grid data value anomaly detection method and system based on k-Means algorithm
Technical Field
The disclosure relates to the field of smart grids, in particular to a method and a system for detecting abnormal power grid data values based on a k-Means algorithm.
Background
The principle of abnormal value detection by the k-Means clustering algorithm is as follows: continuously calculating the distance between each sample and the center point, attributing each sample point to the cluster closest to the center point, updating the center point of the cluster, and repeating the steps until the center point is stably converged; and secondly, calculating the distance from each point in the cluster to the center of the cluster, comparing the distance with a threshold value, and considering that the distance is abnormal if the distance is larger than the threshold value, or else, judging that the distance is normal.
In the aspect of realizing abnormal data identification and data management method research based on a K-means algorithm, national grid company provides a cluster-based power grid operation monitoring information identification and classification method in 2015. The method introduces cluster analysis into the research of power grid monitoring, and converts the power grid monitoring alarm signal stored in history into an effective alarm signal document set on the premise of preprocessing the power grid monitoring alarm signal. And a clustering analysis method is adopted to establish a corresponding spatial feature vector, and a K-means algorithm is combined to calculate a typical alarm signal spatial feature vector. When a new alarm signal appears, the new alarm signal can be classified by calculating the similarity between the new alarm information and the space feature vector of the typical alarm signal. Therefore, the automatic classification of the power grid monitoring information is realized, the identification efficiency of the power grid equipment alarm signals is improved, the phenomena of missed watching and wrong identification of the signals are prevented, and the safe and stable operation of the power grid is ensured. The power distribution network region classification research power information and communication technology research of the sparse noise reduction self-coding and clustering algorithm is developed by the North China electric university, the K-means algorithm is utilized for carrying out cluster analysis on the characteristic sequences to obtain the region types.
However, if hardware (such as sensor drift, in the case of normal operation after correction, should not be actually considered as another cluster) or normal deviation of part of the period of normal data such as shutdown due to an emergency or overload of regional voltage occurs, the occurrence of normal load data beyond the conventional change may cause that, when calculation is performed using the conventional K-means algorithm, cluster analysis is performed only on the data itself, and problems such as inaccurate clustering, cluster center calculation error and abnormal information recognition error may occur.
Disclosure of Invention
The utility model provides a method and a system for detecting the abnormity of the power grid data value based on a K-Means algorithm, which can solve the problems that when the conventional K-Means algorithm is used for calculation, clustering analysis is only carried out on data, clustering inaccuracy is generated, cluster center calculation errors are generated, and abnormal information identification errors are caused. In order to solve the technical problems, the disclosure provides a method and a system for detecting abnormal power grid data values based on a k-Means algorithm.
As an aspect of the embodiments of the present disclosure, there is provided a method for detecting an anomaly of a power grid data value based on a k-Means algorithm, including:
s10, acquiring at least one type of power grid data value;
s20, calculating expected values under the statistical rule of historical power grid data values, and taking the expected values as initial test cluster centers;
s30, acquiring the running condition of the power grid equipment, and judging whether the power grid equipment belongs to hardware faults or monitoring data offset according to the running condition; if the power grid data value is in hardware fault, directly identifying the power grid data value as abnormal data, and if the power grid data value is in monitoring data deviation, performing offset deviation correction on the cluster center of the power grid data value to obtain a corrected initial cluster center;
s40, marking the distance between the sample point in the power grid data value and the cluster center as new cluster center data;
s50, determining a new cluster center according to the marked cluster center data, and repeating the steps S20-S50 until the cluster center changes stably to determine a final cluster center;
s60, comparing the distance between the power grid data value and the center of the final cluster with a confidence threshold value to determine whether the data is abnormal.
Preferably, the running condition of the power grid equipment comprises hardware monitoring verification data after hardware testing, wherein the hardware monitoring verification data is the actual monitoring result of hardware and the stability of the hardware standard output with time, if the stability is high, the hardware monitoring verification data is judged to be offset, and if the stability is low, the hardware monitoring verification data is judged to be faulty.
Preferably, the hardware monitoring verification data is the actual monitoring result of the hardware and the stability of the hardware standard output with time, including:
calculating the difference value between the actual monitoring result of the given hardware and the standard output of the hardware;
judging whether the change rate of the difference value along with time is monotonous;
if the difference value is monotonous, judging whether the ratio of the difference value to the average value of the difference value in the preset time is close to 1 or not, if so, judging that the stability is high, and if the difference value is far from 1, judging that the stability is low.
Preferably, after determining that the stability is high, the method further comprises:
and correcting the cluster center by taking the average value of the difference values as the deviation correction quantity to form a corrected primary cluster center.
Preferably, calculating the expected value under the statistical law of the historical grid data value comprises:
and carrying out probability statistical analysis by taking at least one type of grid data value of the same time period of a plurality of historical days as a sample, and calculating expected values in a normal distribution model of the time period.
Preferably, comparing the distance of the grid data value from the final cluster center with a confidence threshold to determine if the data is anomalous comprises: obtaining a confidence threshold: calculating variance estimation in the normal distribution model of the period, and then setting confidence coefficient to finish a confidence threshold value of the load level of the period;
and judging whether the distance between the power grid data value and the center of the final cluster is within the confidence threshold value or not, and if the distance exceeds the confidence threshold value, judging that the data is abnormal.
Preferably, the method further comprises: and judging whether the distance between the power grid data value and the center of the final cluster is within the confidence threshold value or not, and if the distance does not exceed the confidence threshold value, judging that the data is normal.
As another aspect of the embodiments of the present disclosure, there is provided a system for detecting abnormality of a power grid data value based on a k-Means algorithm, the system comprising:
a power grid data value acquisition unit that acquires at least one type of power grid data value;
the expected value determining unit is used for calculating expected values under the statistical rule of the historical power grid data values, and taking the expected values as initial test cluster centers;
the operation condition judging unit is used for obtaining the operation condition of the power grid equipment and judging whether the power grid equipment belongs to hardware faults or monitoring data deviation according to the operation condition; if the power grid data value is in hardware fault, directly identifying the power grid data value as abnormal data, and if the power grid data value is in monitoring data deviation, performing offset deviation correction on the cluster center of the power grid data value to obtain a corrected initial cluster center;
the distance marking unit marks the distance between the sample point in the power grid data value and the cluster center as new cluster center data;
a new cluster center stability determination unit determining a new cluster center according to the marked cluster center data until the cluster center change is stable to determine a final cluster center;
and the data abnormality judging unit is used for comparing the distance between the power grid data value and the center of the final cluster with a confidence threshold value to determine whether the data is abnormal.
As another aspect of the embodiments of the present disclosure, there is also provided an electronic device including:
a processor; a memory for storing computer-executable instructions; the processor is in communication with the memory;
wherein the processor is configured to invoke computer class execution instructions stored in the memory to perform the method described above.
As another aspect of the embodiments of the present disclosure, there is also provided a computer-readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the above-mentioned method.
Compared with the application of the traditional K-means algorithm in the anomaly detection of the power grid data value, the advantages of the embodiment of the disclosure include: 1. and taking an expected value of the power grid data value as a cluster center of the initial test, wherein the expected value is obtained through probability statistical analysis so as to reduce the probability that a sample value is an initial value determined when a problem exists, namely the problem occurs. 2. Judging whether the hardware monitoring has abnormality or data offset when the cluster center marking is carried out, so that the abnormal cluster center can be removed when the hardware monitoring is classified, the cluster center is formed after the hardware is subjected to error correction but the data which can be normally used after error correction is subjected to error correction, the obtained new cluster center can reflect the condition of the real power grid data value, and the real abnormal value (relative to the hardware offset which can be corrected, for example, sensor data distortion formed by sensor drift) in the power grid operation process can be removed. 3. After the new cluster center is formed, whether sample points deviating from the new cluster center fall into a confidence threshold value or not is judged, the confidence threshold value is obtained through probability statistical analysis, and the abnormal value detection more accurately compared with the existing K-means algorithm can be obtained through matching with hardware deviation correction and setting of expected values.
Drawings
FIG. 1 is a flowchart of a method for detecting abnormal power grid data values based on a k-Means algorithm;
FIG. 2 is a block diagram of a system for detecting anomalies in power grid data values based on the k-Means algorithm.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure.
In addition, the disclosure further provides a system, an electronic device, a computer readable storage medium and a program for detecting the abnormal power grid data value based on the k-Means algorithm, and any one of the methods for detecting the abnormal power grid data value based on the k-Means algorithm provided by the disclosure can be realized, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are omitted.
The main execution body of the k-Means algorithm-based power grid data value anomaly detection method can be a computer or other devices capable of realizing k-Means algorithm-based power grid data value anomaly detection, for example, the method can be executed by a terminal device or a server or other processing devices, wherein the terminal device can be a user device (UserEquipment, UE), a mobile device, a user terminal, a cellular phone, a cordless phone, a personal digital processing (PersonalDigitalAssistant, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device and the like. In some possible implementations, the method for detecting abnormal grid data values based on the k-Means algorithm can be implemented by a mode that a processor calls computer readable instructions stored in a memory.
As an aspect of the embodiments of the present disclosure, a method for detecting an anomaly of a power grid data value based on a k-Means algorithm is provided, as shown in fig. 1, including the steps of:
s10, acquiring at least one type of power grid data value;
s20, calculating expected values under the statistical rule of historical power grid data values, and taking the expected values as initial test cluster centers;
s30, acquiring the running condition of the power grid equipment, and judging whether the power grid equipment belongs to hardware faults or monitoring data offset according to the running condition; if the power grid data value is in hardware fault, directly identifying the power grid data value as abnormal data, and if the power grid data value is in monitoring data deviation, performing offset deviation correction on the cluster center of the power grid data value to obtain a corrected initial cluster center;
s40, marking the distance between the sample point in the power grid data value and the cluster center as new cluster center data;
s50, determining a new cluster center according to the marked cluster center data, and repeating the steps S20-S50 until the cluster center changes stably to determine a final cluster center;
s60, comparing the distance between the power grid data value and the center of the final cluster with a confidence threshold value to determine whether the data is abnormal.
Based on the configuration, the embodiment of the disclosure can take the expected value of the power grid data value as the cluster center of the initial test to reduce the probability that the initial value determined when the sample value is problematic. Judging whether the hardware monitoring has abnormality or data offset when the cluster center marking is carried out, so that the abnormal cluster center can be removed when the hardware monitoring is classified, the cluster center is formed after the hardware is subjected to the problem, but the data which can be normally used after the correction is corrected, the obtained new cluster center can reflect the condition of the real power grid data value, and the real abnormal value (relative to the hardware offset which can be corrected, for example, the sensor data distortion formed by sensor drift) in the power grid operation process can be removed. After the new cluster center is formed, whether sample points deviating from the new cluster center fall into a confidence threshold value or not is judged, the confidence threshold value is obtained through probability statistical analysis, and the abnormal value detection more accurately compared with the existing K-means algorithm can be obtained through matching with hardware deviation correction and setting of expected values.
The steps of the embodiments of the present disclosure are described in detail below, respectively.
S10, acquiring at least one type of power grid data value;
wherein the types of the grid data values include the following types: the remote signaling alarm information, the equipment power failure overhaul information, the operation ticket information, the control duty log information, the equipment defect information and the like need to be confirmed, and at least one type including the equipment defect information or the remote signaling alarm information can be obtained.
S20, calculating expected values under the statistical rule of historical power grid data values, and taking the expected values as initial test cluster centers;
from the statistical rule, the data such as the power grid output, the load, the tie-line power flow and the like in the same time period of multiple days are approximately in normal distribution, and the data change rate in the same continuous time period of multiple days is also approximately in normal distribution. And carrying out probability statistical analysis by taking certain data of a historical multi-day simultaneous period as a sample, and calculating an expected value in the period normal distribution model.
S30, acquiring the running condition of the power grid equipment, and judging whether the power grid equipment belongs to hardware faults or monitoring data offset according to the running condition; if the power grid data value is in hardware fault, directly identifying the power grid data value as abnormal data, and if the power grid data value is in monitoring data deviation, performing offset deviation correction on the cluster center of the power grid data value to obtain a corrected initial cluster center;
in this embodiment, the running condition of the power grid device includes hardware monitoring verification data after hardware testing, where the hardware monitoring verification data is an actual monitoring result of hardware and stability of a hardware standard output changing with time, if the stability is high, the hardware monitoring verification data is determined to be offset, and if the stability is low, the hardware monitoring verification data is determined to be a hardware fault.
The hardware monitoring verification data is the stability of the actual monitoring result of the hardware and the hardware standard output along with the change of time, and comprises the following steps:
solving a difference D between an actual monitoring result X of given hardware and a hardware standard output A; wherein d=x-a;
judging whether the change rate of the difference value along with time is monotonous; the change rate is d (X-A)/dt, and monotone proves that the actual monitoring result is single change relative to the hardware standard output, rather than the up-and-down jump.
If the difference value is monotonous, judging whether the ratio of the difference value to the average value of the difference value in the preset time is close to 1 or not, if so, judging that the stability is high, and if the difference value is far from 1, judging that the stability is low. Namely:
wherein,is the average value of the difference values in a preset time, and n is the number of times of sampling the calculated difference values in the preset time, D n I.e. the difference obtained by the nth calculation. When->The change is small when the difference approaches to D in the preset time, and the change stability is high when the ratio of the difference value at any point to the average value of the difference values at each time approaches to 1.
In some embodiments, after determining that the stability is high, the method further comprises the steps of: and correcting the cluster center by taking the average value of the difference values as the deviation correction quantity to form a corrected primary cluster center. Therefore, the data of stable offset can be rectified, correct data are formed, and the consistency of the whole data of the power grid is further realized.
In some embodiments, comparing the distance of the grid data value from the final cluster center to a confidence threshold to determine if the data is anomalous comprises: obtaining a confidence threshold: calculating variance estimation in the normal distribution model of the period, and then setting confidence coefficient to finish a confidence threshold value of the load level of the period; for example, a certain data of a period of time of a plurality of historical days is taken as a sample to carry out probability statistical analysis, expected values and variance estimation in a normal distribution model of the period are calculated, then confidence is set, and the confidence interval estimation of the load level of the period is completed.
S40, marking the distance between the sample point in the power grid data value and the cluster center as new cluster center data;
s50, determining a new cluster center according to the marked cluster center data, and repeating the steps S20-S50 until the cluster center changes stably to determine a final cluster center; and judging whether the cluster center change is stable, and actually judging whether the cluster center is converged or not, and determining whether the cluster center is converged or not by judging the Euclidean distance change condition of the new cluster center and other cluster centers.
S60, comparing the distance between the power grid data value and the center of the final cluster with a confidence threshold value to determine whether the data is abnormal.
And judging whether the distance between the power grid data value and the center of the final cluster is within the confidence threshold value or not, and if the distance exceeds the confidence threshold value, judging that the data is abnormal.
Embodiments of the present disclosure have the following advantages: 1. and taking an expected value of the power grid data value as a cluster center of the initial test, wherein the expected value is obtained through probability statistical analysis so as to reduce the probability that a sample value is an initial value determined when a problem exists, namely the problem occurs. 2. Judging whether the hardware monitoring has abnormality or data offset when the cluster center marking is carried out, so that the abnormal cluster center can be removed when the hardware monitoring is classified, the cluster center is formed after the hardware is subjected to the problem, but the data which can be normally used after the correction is corrected, the obtained new cluster center can reflect the condition of the real power grid data value, and the real abnormal value (relative to the hardware offset which can be corrected, for example, the sensor data distortion formed by sensor drift) in the power grid operation process can be removed. 3. After the new cluster center is formed, whether sample points deviating from the new cluster center fall into a confidence threshold value or not is judged, the confidence threshold value is obtained through probability statistical analysis, and the abnormal value detection more accurately compared with the existing K-means algorithm can be obtained through matching with hardware deviation correction and setting of expected values.
As another aspect of the embodiments of the present disclosure, there is provided a system 100 for detecting abnormality of a power grid data value based on a k-Means algorithm, as shown in fig. 2, including:
a grid data value acquisition unit 1 that acquires at least one type of grid data value;
the expected value determining unit 2 calculates expected values under the statistical rule of historical power grid data values, and takes the expected values as initial test cluster centers;
the running condition judging unit 3 acquires the running condition of the power grid equipment and judges whether the power grid equipment belongs to hardware faults or monitoring data deviation according to the running condition; if the power grid data value is in hardware fault, directly identifying the power grid data value as abnormal data, and if the power grid data value is in monitoring data deviation, performing offset deviation correction on the cluster center of the power grid data value to obtain a corrected initial cluster center;
a distance marking unit 4, configured to mark the distance between the sample point in the power grid data value and the cluster center as new cluster center data;
a new cluster center stability determination unit 5 that determines a new cluster center from the marked cluster center data until the cluster center change is stable to determine a final cluster center;
and the data abnormality determination unit 6 compares the distance between the power grid data value and the center of the final cluster with a confidence threshold value to determine whether the data is abnormal.
In some embodiments, in the grid data value acquisition unit, the type of the grid data value includes the following types: the remote signaling alarm information, the equipment power failure overhaul information, the operation ticket information, the control duty log information, the equipment defect information and the like need to be confirmed, and at least one type including the equipment defect information or the remote signaling alarm information can be obtained.
In some embodiments, in the expected value determining unit 2, calculating an expected value under a statistical rule of historical grid data values, and taking the expected value as a primary cluster center;
from the statistical rule, the data such as the power grid output, the load, the tie-line power flow and the like in the same time period of multiple days are approximately in normal distribution, and the data change rate in the same continuous time period of multiple days is also approximately in normal distribution. And carrying out probability statistical analysis by taking certain data of a historical multi-day simultaneous period as a sample, and calculating an expected value in the period normal distribution model.
In this embodiment, in the operation condition determining unit 3, an operation condition of the power grid device is obtained, and whether the power grid device belongs to a hardware fault or monitoring data deviation is determined according to the operation condition; if the power grid data value is in hardware fault, directly identifying the power grid data value as abnormal data, and if the power grid data value is in monitoring data deviation, performing offset deviation correction on the cluster center of the power grid data value to obtain a corrected initial cluster center;
in this embodiment, the running condition of the power grid device includes hardware monitoring verification data after hardware testing, where the hardware monitoring verification data is an actual monitoring result of hardware and stability of a hardware standard output changing with time, if the stability is high, the hardware monitoring verification data is determined to be offset, and if the stability is low, the hardware monitoring verification data is determined to be a hardware fault.
The hardware monitoring verification data is the stability of the actual monitoring result of the hardware and the hardware standard output along with the change of time, and comprises the following steps:
solving a difference D between an actual monitoring result X of given hardware and a hardware standard output A; wherein d=x-a;
judging whether the change rate of the difference value along with time is monotonous; the change rate is d (X-A)/dt, and monotone proves that the actual monitoring result is single change relative to the hardware standard output, rather than the up-and-down jump.
If the difference value is monotonous, judging whether the ratio of the difference value to the average value of the difference value in the preset time is close to 1 or not, if so, judging that the stability is high, and if the difference value is far from 1, judging that the stability is low. Namely:
wherein,average value of difference values in a predetermined time, and n is the number of times of sampling calculated difference values in the predetermined time,D n I.e. the difference obtained by the nth calculation. When->The change is small when the difference approaches to D in the preset time, and the change stability is high when the ratio of the difference value at any point to the average value of the difference values at each time approaches to 1.
In some embodiments, after determining that the stability is high, further comprising: and correcting the cluster center by taking the average value of the difference values as the deviation correction quantity to form a corrected primary cluster center. Therefore, the data of stable offset can be rectified, correct data are formed, and the consistency of the whole data of the power grid is further realized.
In some embodiments, comparing the distance of the grid data value from the final cluster center to a confidence threshold to determine if the data is anomalous comprises: obtaining a confidence threshold: calculating variance estimation in the normal distribution model of the period, and then setting confidence coefficient to finish a confidence threshold value of the load level of the period; for example, a certain data of a period of time of a plurality of historical days is taken as a sample to carry out probability statistical analysis, expected values and variance estimation in a normal distribution model of the period are calculated, then confidence is set, and the confidence interval estimation of the load level of the period is completed.
A distance marking unit 4, configured to mark the distance between the sample point in the power grid data value and the cluster center as new cluster center data;
a new cluster center stability determination unit 5 that determines a new cluster center from the marked cluster center data, and repeats the above operation until the cluster center change is stable to determine a final cluster center; and judging whether the cluster center change is stable, and actually judging whether the cluster center is converged or not, and determining whether the cluster center is converged or not by judging the Euclidean distance change condition of the new cluster center and other cluster centers.
In the data anomaly determination unit 6, the distance of the grid data value from the final cluster center is compared with a confidence threshold to determine if the data is anomalous.
And judging whether the distance between the power grid data value and the center of the final cluster is within the confidence threshold value or not, and if the distance exceeds the confidence threshold value, judging that the data is abnormal.
Embodiments of the present disclosure have the following advantages: 1. and taking an expected value of the power grid data value as a cluster center of the initial test, wherein the expected value is obtained through probability statistical analysis so as to reduce the probability that a sample value is an initial value determined when a problem exists, namely the problem occurs. 2. Judging whether the hardware monitoring has abnormality or data offset when the cluster center marking is carried out, so that the abnormal cluster center can be removed when the hardware monitoring is classified, the cluster center is formed after the hardware is subjected to the problem, but the data which can be normally used after the correction is corrected, the obtained new cluster center can reflect the condition of the real power grid data value, and the real abnormal value (relative to the hardware offset which can be corrected, for example, the sensor data distortion formed by sensor drift) in the power grid operation process can be removed. 3. After the new cluster center is formed, whether sample points deviating from the new cluster center fall into a confidence threshold value or not is judged, the confidence threshold value is obtained through probability statistical analysis, and the abnormal value detection more accurately compared with the existing K-means algorithm can be obtained through matching with hardware deviation correction and setting of expected values.
The disclosed embodiments also include an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed implements the method of embodiment 1.
Embodiment 3 of the present disclosure is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure.
The electronic device may be in the form of a general purpose computing device, which may be a server device, for example. Components of an electronic device may include, but are not limited to: at least one processor, at least one memory, a bus connecting different system components, including the memory and the processor.
The buses include a data bus, an address bus, and a control bus.
The memory may include volatile memory such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).
The memory may also include program means having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor executes various functional applications and data processing by running computer programs stored in the memory.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter. The network adapter communicates with other modules of the electronic device via a bus. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
The present disclosure also proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method in embodiment 1.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible embodiment, the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of the method as described in embodiment 1, when said program product is run on the terminal device.
Wherein the program code for carrying out the present disclosure may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device, partly on the remote device or entirely on the remote device.

Claims (6)

1. The power grid data value anomaly detection method based on the k-Means algorithm is characterized by comprising the following steps of:
s10, acquiring at least one type of power grid data value;
s20, calculating expected values under the statistical rule of historical power grid data values, and taking the expected values as initial test cluster centers;
s30, acquiring the running condition of the power grid equipment, and judging whether the power grid equipment belongs to hardware faults or monitoring data offset according to the running condition; if the power grid data value is in hardware fault, directly identifying the power grid data value as abnormal data, and if the power grid data value is in monitoring data deviation, performing offset deviation correction on the cluster center of the power grid data value to obtain a corrected initial cluster center;
s40, marking the distance between the sample point in the power grid data value and the cluster center as new cluster center data;
s50, determining a new cluster center according to the marked cluster center data, and repeating the steps S20-S50 until the cluster center changes stably to determine a final cluster center;
s60, comparing the distance between the power grid data value and the center of the final cluster with a confidence threshold value to determine whether the data is abnormal; the running condition of the power grid equipment comprises hardware monitoring verification data after hardware testing, wherein the hardware monitoring verification data are actual monitoring results of hardware and stability of hardware standard output along with time change, if the stability is high, the hardware monitoring verification data are judged to be offset, and if the stability is low, the hardware monitoring verification data are judged to be hardware faults; the hardware monitoring verification data is the stability of the actual monitoring result of the hardware and the hardware standard output along with the change of time, and comprises the following steps:
calculating the difference value between the actual monitoring result of the given hardware and the standard output of the hardware;
judging whether the change rate of the difference value along with time is monotonous;
if the difference value is monotonous, judging whether the ratio of the difference value to the average value of the difference value in the preset time is close to 1 or not, if so, judging that the stability is high, and if the difference value is far from 1, judging that the stability is low;
calculating expected values under statistical laws of historical grid data values, including:
carrying out probability statistical analysis by taking at least one type of power grid data value in the same period of a plurality of days as a sample, and calculating an expected value in a normal distribution model of the period;
comparing the distance of the grid data value from the final cluster center to a confidence threshold to determine if the data is anomalous, comprising: obtaining a confidence threshold: calculating variance estimation in the normal distribution model of the period, and then setting confidence coefficient to finish a confidence threshold value of the load level of the period;
and judging whether the distance between the power grid data value and the center of the final cluster is within the confidence threshold value or not, and if the distance exceeds the confidence threshold value, judging that the data is abnormal.
2. The method for detecting abnormal grid data values based on the k-Means algorithm according to claim 1, further comprising, after determining that the stability is high:
and correcting the cluster center by taking the average value of the difference values as the deviation correction quantity to form a corrected primary cluster center.
3. The method for detecting the abnormality of the power grid data value based on the k-Means algorithm as set forth in claim 1, further comprising: and judging whether the distance between the power grid data value and the center of the final cluster is within the confidence threshold value or not, and if the distance does not exceed the confidence threshold value, judging that the data is normal.
4. The utility model provides a grid data value anomaly detection system based on k-Means algorithm which characterized in that the system includes:
a power grid data value acquisition unit that acquires at least one type of power grid data value;
the expected value determining unit is used for calculating expected values under the statistical rule of the historical power grid data values, and taking the expected values as initial test cluster centers;
the operation condition judging unit is used for obtaining the operation condition of the power grid equipment and judging whether the power grid equipment belongs to hardware faults or monitoring data deviation according to the operation condition; if the power grid data value is in hardware fault, directly identifying the power grid data value as abnormal data, and if the power grid data value is in monitoring data deviation, performing offset deviation correction on the cluster center of the power grid data value to obtain a corrected initial cluster center;
the distance marking unit marks the distance between the sample point in the power grid data value and the cluster center as new cluster center data;
a new cluster center stability determination unit determining a new cluster center according to the marked cluster center data until the cluster center change is stable to determine a final cluster center;
the data abnormality judging unit is used for comparing the distance between the power grid data value and the center of the final cluster with a confidence threshold value to determine whether the data is abnormal or not;
the running condition of the power grid equipment comprises hardware monitoring verification data after hardware testing, wherein the hardware monitoring verification data are actual monitoring results of hardware and stability of hardware standard output along with time change, if the stability is high, the hardware monitoring verification data are judged to be offset, and if the stability is low, the hardware monitoring verification data are judged to be hardware faults; the hardware monitoring verification data is the stability of the actual monitoring result of the hardware and the hardware standard output along with the change of time, and comprises the following steps:
calculating the difference value between the actual monitoring result of the given hardware and the standard output of the hardware;
judging whether the change rate of the difference value along with time is monotonous;
if the difference value is monotonous, judging whether the ratio of the difference value to the average value of the difference value in the preset time is close to 1 or not, if so, judging that the stability is high, and if the difference value is far from 1, judging that the stability is low;
calculating expected values under statistical laws of historical grid data values, including:
carrying out probability statistical analysis by taking at least one type of power grid data value in the same period of a plurality of days as a sample, and calculating an expected value in a normal distribution model of the period;
comparing the distance of the grid data value from the final cluster center to a confidence threshold to determine if the data is anomalous, comprising: obtaining a confidence threshold: calculating variance estimation in the normal distribution model of the period, and then setting confidence coefficient to finish a confidence threshold value of the load level of the period;
and judging whether the distance between the power grid data value and the center of the final cluster is within the confidence threshold value or not, and if the distance exceeds the confidence threshold value, judging that the data is abnormal.
5. An electronic device, comprising:
a processor; a memory for storing computer-executable instructions; the processor is in communication with the memory;
wherein the processor is configured to invoke computer-executable instructions stored in the memory to perform the method of any of claims 1 to 3.
6. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 3.
CN202310278784.3A 2023-03-21 2023-03-21 Power grid data value anomaly detection method and system based on k-Means algorithm Active CN116304776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310278784.3A CN116304776B (en) 2023-03-21 2023-03-21 Power grid data value anomaly detection method and system based on k-Means algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310278784.3A CN116304776B (en) 2023-03-21 2023-03-21 Power grid data value anomaly detection method and system based on k-Means algorithm

Publications (2)

Publication Number Publication Date
CN116304776A CN116304776A (en) 2023-06-23
CN116304776B true CN116304776B (en) 2023-11-21

Family

ID=86812908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310278784.3A Active CN116304776B (en) 2023-03-21 2023-03-21 Power grid data value anomaly detection method and system based on k-Means algorithm

Country Status (1)

Country Link
CN (1) CN116304776B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991436A (en) * 2017-03-09 2017-07-28 东软集团股份有限公司 Noise spot detection method and device
CN107528823A (en) * 2017-07-03 2017-12-29 中山大学 A kind of network anomaly detection method based on improved K Means clustering algorithms
CN111397728A (en) * 2020-04-08 2020-07-10 河海大学 High-voltage shunt reactor iron core and winding loosening state monitoring method based on chaos theory and GOA-Kmeans
CN112070109A (en) * 2020-07-21 2020-12-11 广东工业大学 Calla kiln energy consumption abnormity detection method based on improved density peak clustering
WO2022166380A1 (en) * 2021-02-05 2022-08-11 天翼数字生活科技有限公司 Data processing method and apparatus based on meanshift optimization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991436A (en) * 2017-03-09 2017-07-28 东软集团股份有限公司 Noise spot detection method and device
CN107528823A (en) * 2017-07-03 2017-12-29 中山大学 A kind of network anomaly detection method based on improved K Means clustering algorithms
CN111397728A (en) * 2020-04-08 2020-07-10 河海大学 High-voltage shunt reactor iron core and winding loosening state monitoring method based on chaos theory and GOA-Kmeans
CN112070109A (en) * 2020-07-21 2020-12-11 广东工业大学 Calla kiln energy consumption abnormity detection method based on improved density peak clustering
WO2022166380A1 (en) * 2021-02-05 2022-08-11 天翼数字生活科技有限公司 Data processing method and apparatus based on meanshift optimization

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Anomaly Detection of Argo Data using Variational Autoencoder and K-means Clustering;Yongguo Jiang, et al;2022 IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC);全文 *
基于多目标优化聚类的挖掘机故障诊断研究;张欢;王锦锟;陈程;;建设机械技术与管理(第06期);全文 *
基于改进k-means的电力信息系统异常检测方法;黄林;常健;杨帆;李忆;牛新征;;深圳大学学报(理工版)(第02期);全文 *
最小化误差平方和k-means初始聚类中心优化方法;周本金;陶以政;纪斌;谢永辉;;计算机工程与应用(第15期);全文 *

Also Published As

Publication number Publication date
CN116304776A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN109034244B (en) Line loss abnormity diagnosis method and device based on electric quantity curve characteristic model
CN108090567B (en) Fault diagnosis method and device for power communication system
CN111143102B (en) Abnormal data detection method and device, storage medium and electronic equipment
CN106933618B (en) System upgrade evaluation method based on system parameter correlation coefficient
CN113852603B (en) Abnormality detection method and device for network traffic, electronic equipment and readable medium
CN112418687B (en) User electricity utilization abnormity identification method and device based on electricity utilization characteristics and storage medium
CN111458661A (en) Power distribution network line variation relation diagnosis method, device and system
CN113723861A (en) Abnormal electricity consumption behavior detection method and device, computer equipment and storage medium
CN115658408A (en) Sensor state detection method and device and readable storage medium
CN112882898B (en) Anomaly detection method, system, device and medium based on big data log analysis
CN112637888B (en) Coverage hole area identification method, device, equipment and readable storage medium
CN114325400A (en) Method and device for determining remaining life of battery, electronic equipment and storage medium
CN116256661B (en) Battery fault detection method, device, electronic equipment and storage medium
CN116304776B (en) Power grid data value anomaly detection method and system based on k-Means algorithm
CN110909804A (en) Method, device, server and storage medium for detecting abnormal data of base station
CN112240992B (en) Protection polarity verification method, device and equipment based on line empty charge transient current
CN111258788B (en) Disk failure prediction method, device and computer readable storage medium
Pena et al. Data-Driven Detection of Phase Changes in Evolving Distribution Systems
CN112949951A (en) Data prediction method, data prediction device, electronic equipment and storage medium
CN113869373A (en) Equipment abnormality detection method and device, computer equipment and storage medium
CN113240076A (en) Clock error abnormal data detection method based on clustering and neural network algorithm
CN113515507B (en) Method and system applied to dam water seepage detection
CN117150233B (en) Power grid abnormal data management method, system, equipment and medium
CN115392812B (en) Abnormal root cause positioning method, device, equipment and medium
CN116821834B (en) Vacuum circuit breaker overhauling management system based on embedded sensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant