CN113112188B - Power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration - Google Patents
Power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration Download PDFInfo
- Publication number
- CN113112188B CN113112188B CN202110529491.9A CN202110529491A CN113112188B CN 113112188 B CN113112188 B CN 113112188B CN 202110529491 A CN202110529491 A CN 202110529491A CN 113112188 B CN113112188 B CN 113112188B
- Authority
- CN
- China
- Prior art keywords
- data
- detectors
- base
- detector
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 75
- 238000012544 monitoring process Methods 0.000 title claims abstract description 53
- 238000012216 screening Methods 0.000 title claims abstract description 34
- 230000010354 integration Effects 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 53
- 238000012549 training Methods 0.000 claims abstract description 49
- 230000002159 abnormal effect Effects 0.000 claims abstract description 47
- 238000012795 verification Methods 0.000 claims abstract description 38
- 238000012360 testing method Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 16
- 238000005070 sampling Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 238000010187 selection method Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000005856 abnormality Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000010248 power generation Methods 0.000 description 2
- 238000002759 z-score normalization Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Educational Administration (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Game Theory and Decision Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Testing Or Calibration Of Command Recording Devices (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The embodiment of the invention provides a power dispatching monitoring data abnormity detection method based on pre-screening dynamic integration, which comprises the following steps: training a number of base detectors using power schedule monitoring historical data; using an isolated forest method to pre-screen all the base detectors, and screening out the base detectors with poor performance; selecting historical data with a smaller Euclidean distance from the historical data to be detected as a verification subset from the historical data by using an integrated KNN algorithm; generating a false true value of the verification subset according to the output of the screened residual basis detectors on the verification subset by using a maximum value method, and calculating a Pearson correlation coefficient of the output of the basis detectors on the verification subset and the false true value; a histogram-based basis detector selection method is used to select basis detectors according to Pearson's correlation coefficients, and the outputs of the selected basis detectors are averaged as the detection result of the data to be detected. According to the technical scheme provided by the embodiment of the invention, the accuracy of the abnormal detection of the power dispatching monitoring data can be improved.
Description
[ technical field ] A method for producing a semiconductor device
The invention relates to a power dispatching monitoring data anomaly detection method, in particular to a power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration.
[ background of the invention ]
The unified and strong smart power grid is a novel power grid formed by highly integrating modern advanced sensing measurement technology, communication technology, information technology, computer technology and control technology with a physical power grid on the basis of the physical power grid, and comprises the links of power generation, power transmission, power transformation, power distribution, power utilization and scheduling. In the actual work of the power system, the dispatching undertakes the functions of commanding, supervising and managing the power production and operation, and is an important guarantee for the safe operation of the power system. With the increasing expansion of the scale of the power grid, the requirement on the safe and stable operation of the power grid is higher and higher, and the abnormal detection of the power grid dispatching monitoring data is more and more important. Because the monitoring system can generate a large amount of monitoring data in a short time when the power grid runs, it is almost impossible to manually calibrate the positive and abnormal labels for the data in a mode of consulting experts and the like. Therefore, these stored historical grid dispatching monitoring data often lack accurate tag information. Therefore, the unsupervised anomaly detection method without using training data label information can better cope with the condition that the stored historical data lacks accurate labels. In the existing unsupervised anomaly detection method based on dynamic integration, the false true values generated by combining all the initially trained base detectors are influenced by the base detectors with poor performance to generate deviation, so that the base detectors calculated by using the false true values as the basis have inaccurate scores, and the overall performance of the dynamic integration method is damaged. Therefore, the dynamic integration abnormity detection method for generating a more accurate false true value by eliminating part of base detectors with poor performance in advance is provided, the accuracy of the power dispatching monitoring data abnormity detection method based on dynamic integration can be improved, and the method has important significance for enhancing power grid state monitoring and guaranteeing power grid safety.
[ summary of the invention ]
In view of this, the invention provides a power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration, so as to improve the accuracy of power dispatching monitoring data anomaly detection.
The invention provides a power dispatching monitoring data abnormity detection method based on pre-screening dynamic integration, which comprises the following steps:
(1) the method for training a certain number of base detectors by using power dispatching monitoring historical data specifically comprises the following steps:
all power monitoring historical data are used as a training set StrainTraining m base detectors based on training set by using different unsupervised anomaly detection algorithms, generally taking m to be more than or equal to 50, and recording base detectors consisting of all base detectorsThe pool is a Detectorall. The output of each base detector is an anomaly score, the greater the anomaly score the greater the degree of anomaly of the input data. Will the DetectorallThe anomaly score output by each base detector is converted into a Z score by Z score normalization.
The input of each base detector is process real-time resource occupation data which is collected by the power dispatching monitoring system and is related to the power dispatching system service, and the process real-time resource occupation data comprises process CPU occupancy rate, memory occupancy rate, disk IO, network IO, thread number and network connection number. The Z-fraction output by the ith basis detector is [ min ]i,maxi]Numerical values within the range, wherein miniAnd maxiThe value of (A) is not fixed by the influence of the base detector itself, and the range of values in which the input data is of the normal class isThe input data is an abnormal range of valuesThe ith base detector is applied to all training data StrainSorting the Z-scores of the upper outputs from big to small, classification threshold of the ith base detectorIs the minimum of the first con% Z scores after sorting; the con% is a set abnormal data rate, and is generally 10%.
(2) The method comprises the following steps of using an isolated forest method to pre-screen all base detectors, and screening out the base detectors with poor performance, wherein the method specifically comprises the following steps:
use of a DetectorallIn the training set S of all m basis detectorstrainOutput composed of Z scores Output on all n pieces of historical datam×nAn orphan forest consisting of n _ estimators orphan trees is trained, with n _ estimators typically taking 100. When an isolated tree is constructed, the slave Outputm×nSampling phi-stripe data without putting back in medium-uniform manner, and generally takingOutputting all psi-bar n-dimension dataψ×nAs a training sample for this isolated tree. In each isolated tree sample, a dimension is randomly selected, a value is randomly selected from the maximum value and the minimum value of the sample in the dimension, the sample is divided into two branches, the sample which is smaller than the value in the dimension is divided into the left side of a node, the sample which is larger than or equal to the value is divided into the right side of the node, and a splitting condition and data sets of the left side and the right side are obtained. The above process is repeated on the data sets on the left side and the right side respectively, and the termination condition is directly reached, wherein the termination condition comprises two conditions:
1) the data set itself comprises only one sample, or all samples are identical;
2) the height of the tree reaches log2(ψ)。
And forming an isolated forest IForest by using all the trained isolated trees, wherein the output of the isolated forest IForest is a continuous value, and the smaller the output is, the larger the abnormal degree of the input data is.
Output willm×nThe r-th data Output inrAs an input of the isolated forest IForest, r is 1,2An isolated forest IForest is put at Outputm×nThe m outputs are sorted from small to large, the base detectors corresponding to the input data corresponding to the outputs of the first drop _ rate% after sorting are marked as abnormal base detectors, the percentage of the drop _ rate is generally 10%, and the base detectors are selected from a DetectorallRemoving the base detectors marked as abnormal, and recording the base Detector pool formed by the m' base detectors after screening as a Detectorfilter。
(3) Selecting historical data with a smaller Euclidean distance from the historical data to be detected as a verification subset by using an integrated KNN algorithm, specifically:
training set StrainThe dimension of the middle history data is d, and d is randomly selected in the j-th cyclejGenerating subsets S of training set by dimensionjJ is 1,2, t, t is the total number of cycles, djIs composed ofRandom number in the range, recording the data x to be detectedtestAt djThe corresponding value in each dimension is xj. Calculating xjTo a training set subset SjMiddle q dataEuclidean distance ofq=1,2,...,Qj,QjFor a subset S of the training setjThe amount of data in.
Subset S of training setjAccording to which all data injEuropean distance ofSorting from small to large, selecting K pieces of original historical data corresponding to K pieces of data arranged in front as a verification data set generated by the current cycleGenerally, K is more than or equal to 10 and less than or equal to 30.
Choose to repeat in all t validation datasetsTaking the more than the next historical data as the data x to be detectedtestIs verified to be a subset SxtestGenerally, t is 10-30.
(4) Generating a false true value of the verification subset according to the output of the screened residual basis detectors on the verification subset by using a maximum value method, and calculating a Pearson correlation coefficient of the output of the basis detectors on the verification subset and the false true value, wherein the method specifically comprises the following steps:
for verification subsetsThe p-th history data x in (1)p,p=1,2, T is a verification subsetNumber of history data, DetectorfilterIn the historical data x of all base detectorspThe Z score of the upper output isGetThe maximum value in (1) is history data xpFalse true value ofVerifying subsetsThe false value of all the historical data is
Note the DetectorfilterThe ith base detector in the verification subsetThe Z score output on all the historical data in (1) isComputingAndpearson's correlation coefficient PiAs a performance score for the ith basis detector, PiThe higher the i-th basis detector performance.
(5) And (5) selecting the base detectors according to the performance scores of all the base detectors in the step (4), and averaging the output of the selected base detectors to serve as the detection result of the data to be detected, so that the abnormal detection of the power dispatching monitoring data is realized.
In the step (5) of the method, the base detectors are selected according to the performance scores of the base detectors in the step (4), and the output of the selected base detectors is averaged to serve as the detection result of the data to be detected, so that the abnormal detection of the power dispatching monitoring data is realized, specifically:
detector will be detectedfilterAll the base detectors in the group are divided into b groups with the same interval according to the performance score and the set group number b, wherein b is generally 10, and all the base detectors in the group with the most base detectors form a base Detector pool Detectorselect. Computing DetectorselectIn which all basis detectors are detecting data xtestThe average value of Z scores output above is used as the data x to be detectedtestThe detection result of (1). Computing DetectorselectThe average value of the classification threshold values of all the medium-base detectors is used as the detection threshold value of the current detection, and the detection result is greater than or equal to the data x to be detected of the detection threshold valuetestAnd judging the data to be abnormal data, and realizing the abnormal detection of the power dispatching monitoring data.
The power dispatching monitoring data anomaly detection method improves the anomaly detection accuracy of the power dispatching monitoring data.
According to the technical scheme, the invention has the following beneficial effects:
in the technical scheme implemented by the invention, part of the base detectors with poor performance on all training data are eliminated in advance by using the isolated forest before the dynamic integration method, so that the deviation of the generated false true value can be reduced, the performance of the base detectors can be evaluated more accurately, and the accuracy of the abnormal detection of the power dispatching monitoring data based on the dynamic integration method is improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.
FIG. 1 is a schematic diagram of a framework flow of a power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration according to the present invention;
FIG. 2 is a schematic flow diagram of a pre-screening method for a base detector;
FIG. 3 is a schematic flow diagram of an integrated KNN algorithm;
FIG. 4 is a schematic diagram of an anomaly detection method for power dispatching monitoring data based on pre-screening dynamic integration according to the present invention;
FIG. 5 is a schematic of the input data and output results of a base detector used in the present invention;
[ detailed description ] embodiments
For better understanding of the technical solutions of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings.
It should be understood that the described embodiments of the invention are only some, but not all embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a power dispatching monitoring data abnormity detection method based on pre-screening dynamic integration. In order to meet the abnormal detection of the power dispatching monitoring data, the isolated forest screening base detector is used, the performance of the base detector is evaluated according to the historical data near the data to be detected, and the base detector with better performance is selected to detect the data to be detected.
Fig. 1 is a schematic flow chart of a frame of a power scheduling monitoring data anomaly detection method based on pre-screening dynamic integration, which includes the following steps:
Specifically, all power monitoring historical data are used as a training set StrainTraining m base detectors based on a training set by using different unsupervised anomaly detection algorithms, generally taking m to be more than or equal to 50, and recording all base detectorsThe Detector is a Detector based on a Detector poolall. The output of each base detector is an anomaly score, the greater the anomaly score the greater the degree of anomaly of the input data. Will the DetectorallThe anomaly score output by each base detector is converted into a Z score by Z score normalization.
The input of each base detector is process real-time resource occupation data which is collected by the power dispatching monitoring system and is related to the power dispatching system service, and the process real-time resource occupation data comprises process CPU occupancy rate, memory occupancy rate, disk IO, network IO, thread number and network connection number. The Z-fraction output by the ith basis detector is [ min ]i,maxi]Numerical values within the range, wherein miniAnd maxiThe value of (A) is not fixed by the influence of the base detector itself, and the range of values in which the input data is of the normal class isThe input data is an abnormal range of valuesThe ith base detector is applied to all training data StrainSorting the Z-scores of the upper outputs from big to small, classification threshold of the ith base detectorIs the minimum of the first con% Z scores after sorting. The con% is a set abnormal data rate, and is generally 10%.
And 102, pre-screening all the basis detectors by using an isolated forest method, and screening the basis detectors with poor performance.
Use of a DetectorallIn the training set S of all m basis detectorstrainOutput composed of Z scores Output on all n pieces of historical datam×nAn orphan forest consisting of n _ estimators orphan trees is trained, with n _ estimators typically taking 100. When an isolated tree is constructed, the slave Outputm×nSampling phi-stripe data without putting back in medium-uniform manner, and generally takingOutputting all psi-bar n-dimension dataψ×nAs a training sample for this isolated tree. In each isolated tree sample, a dimension is randomly selected, a value is randomly selected from the maximum value and the minimum value of the sample in the dimension, the sample is divided into two branches, the sample which is smaller than the value in the dimension is divided into the left side of a node, the sample which is larger than or equal to the value is divided into the right side of the node, and a splitting condition and data sets of the left side and the right side are obtained. The above process is repeated on the data sets on the left side and the right side respectively, and the termination condition is directly reached, wherein the termination condition comprises two conditions:
1) the data set itself comprises only one sample, or all samples are identical;
2) the height of the tree reaches log2(ψ)。
And forming an isolated forest IForest by using all the trained isolated trees, wherein the output of the isolated forest IForest is a continuous value, and the smaller the output is, the larger the abnormal degree of the input data is.
Output willm×nThe r-th data Output inrAs an input of the isolated forest IForest, r is 1,2An isolated forest IForest is put at Outputm×nThe m outputs are sorted from small to large, the base detectors corresponding to the input data corresponding to the outputs of the first drop _ rate% after sorting are marked as abnormal base detectors, the percentage of the drop _ rate is generally 10%, and the base detectors are selected from a DetectorallRemoving the base detectors marked as abnormal, and recording the base Detector pool formed by the m' base detectors after screening as a Detectorfilter。
and 103, selecting historical data with a smaller Euclidean distance from the data to be detected from the historical data as a verification subset by using an integrated KNN algorithm.
Specifically, the training set StrainThe dimension of the middle history data is d, and d is randomly selected in the j-th cyclejGenerating subsets S of training set by dimensionjJ is 1,2, t, t is the total number of cycles, djIs composed ofRandom number in the range, recording the data x to be detectedtestAt djThe corresponding value in each dimension is xj. Calculating xjTo a training set subset SjMiddle q dataEuclidean distance ofq=1,2,...,Qj,QjFor a subset S of the training setjThe amount of data in.
Subset S of training setjAccording to which all data injEuropean distance ofSorting from small to large, selecting K pieces of original historical data corresponding to K pieces of data arranged in front as a verification data set generated by the current cycleGenerally, K is more than or equal to 10 and less than or equal to 30.
Choose to repeat in all t validation datasetsTaking the more than the next historical data as the data x to be detectedtestIs verified to be a subsetT is generally 10-30.
Algorithm 2 is a pseudo code of the integrated KNN algorithm:
pseudo code 3-6: integrated KNN algorithm
And 104, generating a false true value of the verification subset according to the output of the screened residual basis detectors on the verification subset by using a maximum value method, and calculating a Pearson correlation coefficient of the output of the basis detectors on the verification subset and the false true value.
For verification subsetsThe p-th history data x in (1)pT, T is the verification subset, p 1,2Number of history data, DetectorfilterIn the historical data x of all base detectorspThe Z score of the upper output isGetThe maximum value in (1) is history data xpFalse true value ofVerifying subsetsThe false value of all the historical data is
Record the DetectorfilterThe ith base test inTester on verification subsetThe Z score output on all the historical data in (1) isComputingAndpearson's correlation coefficient PiAs a performance score for the ith basis detector, PiThe higher the i-th basis detector performance.
And 105, selecting a base detector according to the Pearson correlation coefficient by using a histogram-based base detector selection method, and averaging the output of the selected base detector to obtain the detection result of the data to be detected.
Specifically, the Detector is usedfilterAll the base detectors in the group are divided into b groups with the same interval according to the performance score and the set group number b, wherein b is generally 10, and all the base detectors in the group with the most base detectors form a base Detector pool Detectorselect. Compute DetectorselectIn which all basis detectors are detecting data xtestThe average value of Z scores output above is used as the data x to be detectedtestThe detection result of (1). Computing DetectorselectThe average value of the classification threshold values of all the medium-base detectors is used as the detection threshold value of the current detection, and the detection result is greater than or equal to the data x to be detected of the detection threshold valuetestAnd judging the data to be abnormal data, and realizing the abnormal detection of the power dispatching monitoring data.
Fig. 2 is a schematic flow chart of a pre-screening method for basis detectors, in which all the basis detectors output Z scores on all historical data are used to train isolated forests, the outputs of the isolated forests on all the Z scores are sorted from small to large, the basis detectors corresponding to the Z scores corresponding to the front drop _ rate% abnormal scores in the sorted abnormal scores are marked as abnormal basis detectors, and the basis detectors marked as abnormal are removed from all the basis detectors.
FIG. 3 is a schematic flow chart of an integrated KNN algorithm, which randomly selects djGenerating subsets of training set by each dimension, and recording data x to be detectedtestAt djThe value in each dimension is xj. Calculating xjEuclidean distance to historical data in the subset of the training set. And sequencing the Euclidean distances from small to large, and selecting original historical data corresponding to the front K data in the sequenced subset of the training set as a verification data set generated by the cycle. The above cycle is performed t times in total, and t verification data sets are selected to repeatedly appearTaking the more than the next historical data as the data x to be detectedtestThe verification subset of (1).
Fig. 4 is a schematic diagram of a power scheduling monitoring data anomaly detection method based on pre-screening dynamic integration, which is provided by the present invention, and the proposed method mainly includes 5 stages: the method comprises a training base detector stage, a pre-screening stage, a selecting and verifying subset stage, a stage of generating a false value and calculating a Pearson correlation coefficient, and a stage of selecting a base detector and obtaining a detection result. In the stage of training the base detectors, training a certain number of base detectors by using power dispatching monitoring historical data; in the pre-screening stage, Z scores output by all the base detectors on all historical data are used for training isolated forests, and the base detectors corresponding to smaller outputs of the isolated forests on all the Z scores are removed; in the stage of selecting the verification subset, an integrated KNN algorithm is used, original historical data corresponding to data with a small Euclidean distance from the data to be detected are selected on a training set subset with randomly selected characteristics for multiple times, and the selected historical data for multiple times are used as the verification subset of the data to be detected; in the stage of generating a false true value and calculating the Pearson correlation coefficient, taking the maximum value of Z scores output by all the base detectors on the historical data in the verification subset as the false true value of the verification subset, and calculating the Pearson correlation coefficient of the Z scores output by all the base detectors on the historical data in the verification subset and the false true value as the score of the base detectors; in the stage of selecting the base detector and obtaining a detection result, the base detector is selected according to the Pearson correlation coefficient by using a base detector selection method based on a histogram, the average value of Z scores output by the selected base detector when data to be detected are input is calculated as a detection result, the average value of classification thresholds of the selected base detector is calculated as a detection threshold of the current time, the data to be detected, of which the detection result is greater than or equal to the detection threshold, is judged to be abnormal data, and abnormal detection of power dispatching monitoring data is realized. .
Fig. 5 is a schematic diagram of input data and output results of the base detectors used in the present invention, where the input of each base detector is process real-time resource occupation data related to the power scheduling system service, which is acquired by the power scheduling monitoring system, and includes process CPU occupancy, memory occupancy, disk IO, network IO, thread number, and network connection number. The Z-fraction output by the ith basis detector is [ min ]i,maxi]Numerical values within the range, wherein miniAnd maxiThe value of (A) is not fixed by the influence of the base detector itself, and the range of values in which the input data is of the normal class isThe input data is an abnormal range of valuesThe ith base detector is applied to all training data StrainSorting the Z-scores of the upper outputs from big to small, classification threshold of the ith base detectorIs the minimum of the first con% Z scores after sorting. The con% is a set abnormal data rate, and is generally 10%.
In a specific embodiment, three abnormal conditions in a smart grid dispatching control system (referred to as a D5000 system for short) are used: and (4) carrying out data jumping, applying network disconnection and not refreshing the telemetry table to the system monitoring data. The data jump abnormity is that for a remote measuring point, the process data of the D5000 system is collected periodically, and if the numerical difference value of adjacent sampling points is larger than an artificially set threshold value, the data jump abnormity is considered to occur. When data jump variation occurs, deviation occurs when the power dispatching position distributes power generation amount to subordinate power grid companies, the dispatching plan of a power grid is influenced, and meanwhile deviation occurs in a report form of electric quantity, and electric quantity charging is influenced. The application network disconnection abnormity is that the network connection of a server running the D5000 system application is interrupted or a network card fails, so that the key process of the D5000 system runs slowly and even stops running, and the service under the application cannot execute tasks normally, thereby influencing the power grid dispatching. The telemetering table does not refresh the abnormal state, and the automatic system of the power grid fails to update the telemetering data in time. Real-time and accurate telemetering data can be received, and the working condition of the power grid can be timely and accurately adjusted by a dispatcher. When the state of the power grid changes, corresponding telemetering data should be immediately reflected to a dispatching center, and if the telemetering meter does not update data for a long time, the overall control of the operation state of the power grid by a dispatching person is influenced.
The specific information of the system monitoring data corresponding to the three types of anomalies is shown in table 1:
TABLE 1 concrete information of system monitoring data when three kinds of abnormalities appear
Table 2 shows the basis detector algorithm and parameters used in the examples of the present invention:
table 2 base detector algorithm and parameters used in the embodiment
In order to verify the effectiveness of the algorithm, the dynamic integration method without the pre-screening and the dynamic integration method with the pre-screening are compared in the embodiment of the invention, wherein the first algorithm is the dynamic integration method without the pre-screening, and the second algorithm is the dynamic integration method with the pre-screening.
The inventive examples were evaluated using AUC values. Generally, the Area Under the ROC Curve (AUC) is used to evaluate the performance of the anomaly detection algorithm, and the more the ROC Area is close to 1, i.e., the larger the AUC value, the better the performance of the anomaly detection algorithm is.
In the embodiment of the invention, the parameter t is set to be 20, the parameter K is set to be 30, the parameter n _ estimators is set to be 100, the parameter drop _ rate% is set to be 10%, the parameter b is set to be 10, and the parameter con% is set to be 10%.
The AUC results on the D5000 monitored data set for the inventive and comparative examples are shown in table 3. It can be seen that the power dispatching monitoring data anomaly detection method based on the pre-screening dynamic integration obtains higher AUC on all three anomalies, which shows that the accuracy of the dynamic integration method on the power dispatching monitoring data anomaly detection is effectively improved by the pre-screening method provided by the invention.
TABLE 3 AUC results on three abnormalities
| Algorithm | 1 | Algorithm two |
Data hopping | 0.9305 | 0.9595 | |
Application cut-off net | 0.9857 | 0.9870 | |
Remote meter not refreshing | 0.9986 | 0.9987 |
In summary, the embodiments of the present invention have the following beneficial effects:
in the technical scheme implemented by the invention, a certain number of base detectors are trained by using different unsupervised anomaly detection algorithms based on original power dispatching monitoring historical data; removing all base detectors with poor performance by using an isolated forest through a pre-screening method; selecting a verification subset from all historical data by using an integrated KNN algorithm according to data to be detected in a detection stage; generating a false true value of the verification subset by using the screened residual base detectors through a maximum value method, and calculating a Z score output by the base detectors and a Pearson correlation coefficient of the false true value as a performance score of the base detectors; selecting a base detector by using a base detector selection method based on a histogram, calculating an average value of Z scores output by the selected base detector when the selected base detector inputs data to be detected as a detection result of the data to be detected, calculating an average value of classification thresholds of the selected base detector as a detection threshold of the current detection, and judging the data to be detected with the detection result larger than or equal to the detection threshold as abnormal data to realize the abnormal detection of the power dispatching monitoring data. According to the technical scheme provided by the embodiment of the invention, when the problem of abnormal detection of the power dispatching monitoring data is faced, compared with a dynamic integration method without pre-screening, the method can obtain higher accuracy.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (2)
1. A power dispatching monitoring data abnormity detection method based on pre-screening dynamic integration is characterized by comprising the following steps:
(1) the method for training a certain number of base detectors by using power dispatching monitoring historical data specifically comprises the following steps:
all power monitoring historical data are used as a training set StrainTraining m base detectors by using different unsupervised anomaly detection algorithms based on a training set, taking m to be more than or equal to 50, and recording a base Detector pool formed by all the base detectors as a Detectorall(ii) a The output of each base detector is an abnormal score, and the larger the abnormal score is, the larger the abnormal degree of the input data is; will the DetectorallNormalizing the Z score of the abnormal score output by each base detector and converting the Z score into a Z score;
the input of each base detector is process real-time resource occupation data which is acquired by the power dispatching monitoring system and is related to the power dispatching system service, and the process real-time resource occupation data comprises process CPU occupancy rate, memory occupancy rate, disk IO, network IO, thread number and network connection number; the Z-fraction output by the ith basis detector is [ min ]i,maxi]Numerical values within the range, wherein miniAnd maxiThe value of (A) is not fixed by the influence of the base detector itself, and the range of values in which the input data is of the normal class isThe input data is an abnormal range of valuesThe ith base detector is applied to all training data StrainSorting the Z-scores of the upper outputs from big to small, classification threshold of the ith base detectorIs the minimum of the first con% Z scores after sorting; the con% is the set abnormal data proportion, and is 10%;
(2) the method comprises the following steps of using an isolated forest method to pre-screen all base detectors, and screening out the base detectors with poor performance, wherein the method specifically comprises the following steps:
using a DetecorrallIn the training set S of all m basis detectorstrainOutput composed of Z scores Output on all n pieces of historical datam×nTraining a tree consisting of n _ estimators of orphan treesTaking 100 n _ estimators as an isolated forest; when an isolated tree is constructed, the slave Outputm×nSampling psi-stripe data without putting back, and takingAll psi-bar n-dimensional data Outputψ×nAs a training sample for this isolated tree; randomly selecting a dimension in each isolated tree sample, randomly selecting a value from the maximum value and the minimum value of the sample in the dimension, performing binary division on the sample, dividing the sample which is smaller than the value in the dimension to the left of a node, and dividing the sample which is larger than or equal to the value to the right of the node to obtain a splitting condition and data sets on the left side and the right side; the above process is repeated on the data sets on the left side and the right side respectively, and the termination condition is directly reached, wherein the termination condition comprises two conditions:
1) the data set itself comprises only one sample, or all samples are identical;
2) the height of the tree reaches log2(ψ);
Forming an isolated forest IForest by using all the trained isolated trees, wherein the output of the isolated forest IForest is a continuous value, and the smaller the output is, the larger the abnormal degree of input data is;
output willm×nThe r-th data Output inrAs an input of the isolated forest IForest, r is 1,2An isolated forest IForest is put at Outputm×nThe m outputs are sorted from small to large, the base detectors corresponding to the input data corresponding to the outputs of the first drop _ rate% after sorting are marked as abnormal base detectors, the percentage of drop _ rate is 10%, and the output of the slave DetectorallRemoving the base detectors marked as abnormal, and recording the base Detector pool formed by the m' base detectors after screening as a Detectorfilter;
(3) Selecting historical data with a smaller Euclidean distance from the historical data to be detected as a verification subset by using an integrated KNN algorithm, specifically:
memory training set StrainThe dimension of the middle history data is d, and d is randomly selected in the j-th cyclejGenerating subsets S of training set by dimensionjJ is 1,2, t, t is the total number of cycles, djIs composed ofRandom number in the range, recording the data x to be detectedtestAt djThe corresponding value in each dimension is xj(ii) a Calculating xjTo a training set subset SjMiddle q dataEuclidean distance ofQjFor a subset S of the training setjThe number of data in;
subset S of training setjAccording to which all data injEuropean distance ofSorting from small to large, selecting K pieces of original historical data corresponding to K pieces of data arranged in front as a verification data set generated by the current cycleTaking K which is more than or equal to 10 and less than or equal to 30;
choose to repeat in all t validation datasetsTaking the more than the next historical data as the data x to be detectedtestIs verified to be a subsetT is not less than 10 and not more than 30;
(4) generating a false true value of the verification subset according to the output of the screened residual basis detectors on the verification subset by using a maximum value method, and calculating a Pearson correlation coefficient of the output of the basis detectors on the verification subset and the false true value, wherein the method specifically comprises the following steps:
for verification subsetsThe p-th history data x in (1)pT, T is the verification subset, p 1,2Number of history data, DetectorfilterIn the historical data x of all base detectorspThe Z score of the upper output isGetThe maximum value in (1) is history data xpFalse true value ofVerifying subsetsThe false value of all the historical data is
Note the DetectorfilterThe ith base detector in the verification subsetThe Z score output on all the historical data in (1) isComputingAndpearson's correlation coefficient PiAs a performance score for the ith basis detector, PiThe higher the i-th base detector performance is better;
(5) and (4) selecting the base detectors according to the performance scores of the base detectors in the step (4), averaging the output of the selected base detectors to serve as the detection result of the data to be detected, and realizing abnormal detection of the power dispatching monitoring data.
2. The method according to claim 1, wherein in the step (5), the basis detectors are selected according to the performance scores of the basis detectors in the step (4), and the output of the selected basis detectors is averaged to serve as the detection result of the data to be detected, so as to realize the abnormal detection of the power dispatching monitoring data, specifically:
will the DetectorfilterAll the base detectors in the group are divided into b groups with the same interval according to the performance score and the set group number b, b is 10, and all the base detectors in the group with the most base detectors form a base Detector pool Detectorselect(ii) a Computing DetectorselectIn which all basis detectors are detecting data xtestThe average value of Z scores output above is used as the data x to be detectedtestThe detection result of (3); computing DetectorselectThe average value of the classification threshold values of all the medium-base detectors is used as the detection threshold value of the current detection, and the detection result is greater than or equal to the data x to be detected of the detection threshold valuetestAnd judging the data to be abnormal data, and realizing the abnormal detection of the power dispatching monitoring data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110529491.9A CN113112188B (en) | 2021-05-14 | 2021-05-14 | Power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110529491.9A CN113112188B (en) | 2021-05-14 | 2021-05-14 | Power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113112188A CN113112188A (en) | 2021-07-13 |
CN113112188B true CN113112188B (en) | 2022-05-17 |
Family
ID=76722231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110529491.9A Active CN113112188B (en) | 2021-05-14 | 2021-05-14 | Power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113112188B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113591400B (en) * | 2021-08-23 | 2023-06-27 | 北京邮电大学 | Power dispatching monitoring data anomaly detection method based on characteristic correlation partition regression |
CN113822379B (en) * | 2021-11-22 | 2022-02-22 | 成都数联云算科技有限公司 | Process process anomaly analysis method and device, electronic equipment and storage medium |
CN114399407B (en) * | 2022-02-17 | 2024-08-27 | 北京邮电大学 | Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657288A (en) * | 2017-10-26 | 2018-02-02 | 国网冀北电力有限公司 | A kind of power scheduling flow data method for detecting abnormality based on isolated forest algorithm |
CN109543765A (en) * | 2018-08-23 | 2019-03-29 | 江苏海平面数据科技有限公司 | A kind of industrial data denoising method based on improvement IForest |
WO2020244893A1 (en) * | 2019-06-04 | 2020-12-10 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and arrangement for detecting anomalies in network data traffic |
CN112181706A (en) * | 2020-10-23 | 2021-01-05 | 北京邮电大学 | Power dispatching data anomaly detection method based on logarithmic interval isolation |
-
2021
- 2021-05-14 CN CN202110529491.9A patent/CN113112188B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657288A (en) * | 2017-10-26 | 2018-02-02 | 国网冀北电力有限公司 | A kind of power scheduling flow data method for detecting abnormality based on isolated forest algorithm |
CN109543765A (en) * | 2018-08-23 | 2019-03-29 | 江苏海平面数据科技有限公司 | A kind of industrial data denoising method based on improvement IForest |
WO2020244893A1 (en) * | 2019-06-04 | 2020-12-10 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and arrangement for detecting anomalies in network data traffic |
CN112181706A (en) * | 2020-10-23 | 2021-01-05 | 北京邮电大学 | Power dispatching data anomaly detection method based on logarithmic interval isolation |
Non-Patent Citations (1)
Title |
---|
"基于孤立森林算法的电力调度流";李新鹏;《电网技术》;20190430;第43卷(第4期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113112188A (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113112188B (en) | Power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration | |
CN107657288B (en) | Power dispatching flow data anomaly detection method based on isolated forest algorithm | |
CN114090396B (en) | Cloud environment multi-index unsupervised anomaly detection and root cause analysis method | |
Zheng et al. | Raw wind data preprocessing: a data-mining approach | |
CN113298297B (en) | Wind power output power prediction method based on isolated forest and WGAN network | |
CN112181706B (en) | Power dispatching data anomaly detection method based on logarithmic interval isolation | |
CN111796957B (en) | Transaction abnormal root cause analysis method and system based on application log | |
CN111191720B (en) | Service scene identification method and device and electronic equipment | |
CN110297469B (en) | Production line fault judgment method based on resampling integrated feature selection algorithm | |
CN109409444B (en) | Multivariate power grid fault type discrimination method based on prior probability | |
CN105930629A (en) | On-line fault diagnosis method based on massive amounts of operating data | |
CN112363896A (en) | Log anomaly detection system | |
CN114201374A (en) | Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning | |
CN112257784A (en) | Electricity stealing detection method based on gradient boosting decision tree | |
CN113408659A (en) | Building energy consumption integrated analysis method based on data mining | |
CN114202243A (en) | Engineering project management risk early warning method and system based on random forest | |
CN115617784A (en) | Data processing system and processing method for informationized power distribution | |
CN113608968A (en) | Power dispatching monitoring data anomaly detection method based on density and distance comprehensive decision | |
CN114399407B (en) | Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration | |
CN111984514A (en) | Prophet-bLSTM-DTW-based log anomaly detection method | |
CN116541780A (en) | Power transmission line galloping early warning method, device, equipment and storage medium | |
CN114676931B (en) | Electric quantity prediction system based on data center technology | |
CN114167837B (en) | Intelligent fault diagnosis method and system for railway signal system | |
CN116304814A (en) | Method and system for analyzing working condition of monitoring object based on classification algorithm | |
CN113128913B (en) | Power dispatching monitoring data anomaly detection method based on reversal information entropy dynamic integration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |