CN114399407A - Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration - Google Patents

Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration Download PDF

Info

Publication number
CN114399407A
CN114399407A CN202210147086.5A CN202210147086A CN114399407A CN 114399407 A CN114399407 A CN 114399407A CN 202210147086 A CN202210147086 A CN 202210147086A CN 114399407 A CN114399407 A CN 114399407A
Authority
CN
China
Prior art keywords
data
detector
output
base
meta
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210147086.5A
Other languages
Chinese (zh)
Inventor
高欣
傅世元
薛冰
于家豪
黄子健
黄旭
张光耀
李康生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202210147086.5A priority Critical patent/CN114399407A/en
Publication of CN114399407A publication Critical patent/CN114399407A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The embodiment of the invention provides a power dispatching monitoring data abnormity detection method based on dynamic and static selection integration, which comprises the following steps: training a number of base detectors using power schedule monitoring historical data; using an isolated forest to reject a base detector with poor performance; generating a false true value of the historical data according to the output of the residual base detector by using an average value method, and respectively converting the false true value and the output of the base detector into two types of labels; removing historical data with over-small false values, and extracting meta-features and meta-tags of the base detector on the remaining historical data; training a random forest through meta features and meta tags; and extracting the meta-characteristics of the base detector on the data to be detected, inputting the meta-characteristics into a random forest, selecting the base detector according to the output of the random forest, and taking the maximum value of the output of the selected base detector as the detection result of the data to be detected. According to the technical scheme provided by the embodiment of the invention, the accuracy of the abnormal detection of the power dispatching monitoring data can be improved.

Description

Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration
[ technical field ] A method for producing a semiconductor device
The invention relates to an electric power dispatching monitoring data abnormity detection method, in particular to an electric power dispatching monitoring data abnormity detection method based on dynamic and static selection integration.
[ background of the invention ]
The unified and strong smart power grid is a novel power grid formed by highly integrating modern advanced sensing measurement technology, communication technology, information technology, computer technology and control technology with a physical power grid on the basis of the physical power grid, and comprises the links of power generation, power transmission, power transformation, power distribution, power utilization and scheduling. In the actual work of the power system, the dispatching undertakes the functions of commanding, monitoring and managing the power production operation, and is an important guarantee for the safe operation of the power system. With the increasing expansion of the scale of the power grid, the requirement on the safe and stable operation of the power grid is higher and higher, and the abnormal detection of the power grid dispatching monitoring data is more and more important. Because the monitoring system can generate a large amount of monitoring data in a short time when the power grid runs, it is almost impossible to manually calibrate the positive and abnormal labels for the data in a mode of consulting experts and the like. Therefore, these stored historical grid dispatching monitoring data often lack accurate tag information. Therefore, the unsupervised anomaly detection method without using training data label information can better cope with the condition that the stored historical data lacks accurate labels. In the existing unsupervised anomaly detection method based on dynamic selection integration, the false values generated by all the initially trained base detectors are influenced by the base detectors with poor performance to generate deviation, so that the performance scores of the base detectors calculated by taking the false values as the basis are not accurate enough; and the existing dynamic selection integration method only uses a single evaluation index to measure the performance of the base detector, and has limited universality, so that the method has poor performance when the used index is not applicable. Therefore, a more accurate false true value is generated by statically selecting and rejecting a part of base detectors with poor performance, and then a dynamic and static selection integration anomaly detection method which integrates a meta-learning thought to comprehensively evaluate the performance of the detectors by combining various indexes and dynamically select the base detectors is provided, so that the accuracy of the anomaly detection method based on the integrated power dispatching monitoring data can be improved, and the method has important significance for enhancing the monitoring of the state of a power grid and ensuring the safety of the power grid.
[ summary of the invention ]
In view of this, the invention provides a power dispatching monitoring data anomaly detection method based on dynamic and static selection integration, so as to improve the accuracy of power dispatching monitoring data anomaly detection.
The invention provides a power dispatching monitoring data anomaly detection method based on dynamic and static selection integration, which comprises the following steps:
(1) the method for training a certain number of base detectors by using power dispatching monitoring historical data specifically comprises the following steps:
all power monitoring historical data are used as a training set XTRTraining m base detectors by using different unsupervised anomaly detection algorithms based on a training set, generally taking m to be more than or equal to 50, and recording a base detector pool composed of all the base detectors as PO. The output of each base detector is an abnormal score, and the larger the abnormal score is, the larger the abnormal degree of the input data is, the POThe anomaly score output by each base detector is converted into a Z score by Z score normalization. Note POWherein the ith base detector is at XTRThe jth history data
Figure BDA0003509395530000031
The abnormal score of the upper output is
Figure BDA0003509395530000032
Z fraction thereof
Figure BDA0003509395530000033
Comprises the following steps:
Figure BDA0003509395530000034
wherein: 1, 2, 1, n, n is XTRThe amount of history data in the database is,
Figure BDA0003509395530000035
is the average of the anomaly scores output by the ith basis detector over the entire history,
Figure BDA0003509395530000036
the standard deviation of the anomaly scores output for the ith basis detector over the entire history.
The input of each base detector is process real-time resource occupation data which is collected by the power dispatching monitoring system and is related to the power dispatching system service, and the process real-time resource occupation data comprises process CPU occupancy rate, memory occupancy rate, disk IO, network IO, thread number and network connection number. If the Z-fraction of the ith base detector output is less than
Figure BDA0003509395530000037
The input data is normal; if the Z-fraction of the ith base detector output is greater than or equal to
Figure BDA0003509395530000038
The input data is abnormal. The ith base detector is applied to all the training data XTRSorting the Z-scores of the upper outputs from big to small, classification threshold of the ith base detector
Figure BDA0003509395530000039
Is the front R after the orderingDAMinimum in% Z fraction. RDAThe% is the set base detector output conversion ratio, and is generally 10%.
(2) The method comprises the following steps of using a base detector with poor performance of removing isolated forests, specifically:
using POIn the training set X of all m basis detectorsTRComposed of Z scores output on all n pieces of historical data
Figure BDA00035093955300000310
An isolated forest consisting of n _ itree isolated trees is trained, with n _ itree typically taking 100. When constructing an isolated tree, from
Figure BDA00035093955300000311
Sampling phi-stripe data without putting back in medium-uniform manner, and generally taking
Figure BDA0003509395530000041
All psi pieces of n-dimension data Scoreψ×nAs a training sample for this isolated tree. In each isolated tree sample, a dimension is randomly selected, a value is randomly selected from the maximum value and the minimum value of the sample in the dimension, the sample is divided into two branches, the sample which is smaller than the value in the dimension is divided into the left side of a node, the sample which is larger than or equal to the value is divided into the right side of the node, and a splitting condition and data sets of the left side and the right side are obtained. The above process is repeated on the data sets on the left and right sides respectively until the termination condition is reached, which has two:
1) the data set itself comprises only one sample, or all samples are identical;
2) the height of the tree reaches log2(ψ)。
And forming an isolated forest IForest by using all the trained isolated trees, wherein the output of the isolated forest IForest is a continuous value, and the smaller the output is, the larger the abnormal degree of the input data is.
Will be provided with
Figure BDA0003509395530000042
The r-th data in
Figure BDA0003509395530000043
As an input of the isolated forest IForest, r is 1, 2
Figure BDA0003509395530000044
An isolated forest IForest is arranged in
Figure BDA0003509395530000045
M outputs ofSorting from small to large, and sorting the first R after the sortingDThe% output corresponds to the base detector flag abnormal, RDThe% is generally 10%. From PORemoving the base detectors marked as abnormal, and recording the base detector pool consisting of the m' base detectors left after screening as PF
(3) Generating a false true value of the historical data according to the output of the residual base detector by using an average value method, and respectively converting the false true value and the output of the base detector into two types of labels, specifically:
note PFIn the training set X of all m' basis detectorsTRThe jth history data
Figure BDA0003509395530000046
Composition of Z fraction of up output
Figure BDA0003509395530000047
Computing
Figure BDA0003509395530000048
The average value of all Z fractions in the composition is used as
Figure BDA0003509395530000049
False true value of
Figure BDA00035093955300000410
Training set XTRThe false truth set corresponding to all the historical data is
Figure BDA0003509395530000051
Will be provided with
Figure BDA0003509395530000052
False true values in (1) are sorted from large to small, threshold PScorethrIs the front R after the orderingGAMinimum of% false values, RGAThe% is the set false true value conversion ratio, and is generally 20%. If the jth historical data
Figure BDA0003509395530000053
Corresponding false true value
Figure BDA0003509395530000054
Greater than or equal to PScorethrThen its false label
Figure BDA0003509395530000055
Is 1, otherwise is 0. Training set XTRThe false label set corresponding to all the historical data is
Figure BDA0003509395530000056
If P isFThe a-th base detector in history data
Figure BDA0003509395530000057
Z score of up output
Figure BDA0003509395530000058
Greater than or equal to its classification threshold
Figure BDA0003509395530000059
a 1, 2, m', then
Figure BDA00035093955300000510
Class II tag with upper output
Figure BDA00035093955300000511
Is 1, otherwise is 0. Recording the a-th basis detector in the training set XTRClass II tags of the upper output are
Figure BDA00035093955300000512
All-radical detectors at XTRClass II tag set of the upper output
Figure BDA00035093955300000513
(4) And eliminating historical data with over-small false values, and extracting meta-features and meta-tags of the base detector on the remaining historical data, specifically:
false true value of all historical data
Figure BDA00035093955300000514
Sorting from small to large, eliminating the front R after sortingS% false values correspond to historical data. Recording the remaining n' historical data as XSTRThe corresponding false label set and the second type label set are respectively
Figure BDA00035093955300000515
And
Figure BDA00035093955300000516
residual radical detector at XSTRZ in the above is
Figure BDA00035093955300000517
For XSTRThe t-th history data
Figure BDA00035093955300000518
Calculate it to the original training set XTRThe jth history data
Figure BDA00035093955300000519
Euclidean distance of
Figure BDA00035093955300000520
Figure BDA00035093955300000521
Wherein: t 1, 2, n', l 1, 2, u, u is the dimension of the historical data,
Figure BDA00035093955300000522
is composed of
Figure BDA00035093955300000523
The value in the l-th dimension is,
Figure BDA00035093955300000524
is composed of
Figure BDA00035093955300000525
Numerical values in the l-th dimension.
Will be the original training set XTRAccording to the historical data in
Figure BDA0003509395530000061
The Euclidean distance of the K-shaped elements is ranked from small to large, and the K arranged at the front is takenRCAs a history data
Figure BDA0003509395530000062
Performance evaluation set of
Figure BDA0003509395530000063
Generally, K is 10-KRC≤30。
For the
Figure BDA0003509395530000064
Note PFWherein the total basis detector is in
Figure BDA0003509395530000065
The Z score of the upper output is
Figure BDA0003509395530000066
For the
Figure BDA0003509395530000067
Note PFWherein the total basis detector is in
Figure BDA0003509395530000068
The Z score of the upper output is
Figure BDA0003509395530000069
Computing
Figure BDA00035093955300000610
And
Figure BDA00035093955300000611
euclidean distance of
Figure BDA00035093955300000612
Figure BDA00035093955300000613
Wherein:
Figure BDA00035093955300000614
is PFWherein the a-th radical detector is in
Figure BDA00035093955300000615
The Z-score of the upper output is,
Figure BDA00035093955300000616
is PFWherein the a-th radical detector is in
Figure BDA00035093955300000617
The Z-score of the upper output.
Will be the original training set XTRBased on the Z-score and the sum of all historical data output by the base detector
Figure BDA00035093955300000618
The Euclidean distance of the K-shaped elements is ranked from small to large, and the K arranged at the front is takenSOPAs a history data
Figure BDA00035093955300000619
Approximate output set of
Figure BDA00035093955300000620
Generally, K is 10-KSOP≤30。
Extraction of PFWherein the a-th radical detector is in
Figure BDA00035093955300000621
The six-component characteristic:
1) computing in a performance evaluation set
Figure BDA00035093955300000622
The quantity of the history data with the same type II labels and corresponding false labels output by the middle base detector is calculated, and the quantity of the history data is calculated to be equal to KRCThe ratio of (A) to (B) is taken as a characteristic; this set includes a feature;
2) computing in an approximate output set
Figure BDA00035093955300000623
The quantity of the history data with the same type II labels and corresponding false labels output by the middle base detector is calculated, and the quantity of the history data is calculated to be equal to KSOPThe ratio of (A) to (B) is taken as a characteristic; this set includes a feature;
3) for performance evaluation set
Figure BDA00035093955300000624
Whether the base detector can correctly judge the normal abnormal condition of each historical data in the data base; if the basis detector can correctly judge
Figure BDA00035093955300000625
Q 1, 2.., K, the q-th history data in (1)RCThe qth feature in this group is 0, otherwise it is 1; this group comprises KRCA feature;
4) for approximate output set
Figure BDA0003509395530000071
Whether the base detector can correctly judge the normal abnormal condition of each historical data in the data base; if the basis detector can correctly judge
Figure BDA0003509395530000072
The pth history of (1, 2., K)SOPIf so, the pth feature in this group is 0, otherwise it is 1; this group comprises KSOPA feature;
5) set of computational performance evaluations
Figure BDA0003509395530000073
Z-score output by the middle base detector for each historical data and classification threshold of the base detector
Figure BDA0003509395530000074
The absolute value of the difference of (a); this group comprises KRCA feature;
6) computing basis detector pairs data to extract meta-features
Figure BDA0003509395530000075
Output Z-score and base detector self-positive classification threshold
Figure BDA0003509395530000076
The absolute value of the difference of (a); this set contains 1 feature.
The six groups contain M number of element characteristics, wherein M is 3+2 xKRC+KSOP(ii) a Extraction of P by the above methodFWherein each base detector is at XSTRThe meta-feature on each historical data in the set constitutes a meta-feature set XTRM,XTRMWhich contains n '× m' pieces of meta-feature data.
Comparison PFWherein the a-th radical detector is in
Figure BDA0003509395530000077
Class II tag with upper output
Figure BDA0003509395530000078
And
Figure BDA0003509395530000079
false label of
Figure BDA00035093955300000710
Whether or not they are the same. If they are the same, the a-th base detector is
Figure BDA00035093955300000711
Meta tag on
Figure BDA00035093955300000712
Is 0Indicating that the a-th basis detector can correctly judge
Figure BDA00035093955300000713
Otherwise, it is 1, which means that the a-th basis detector cannot correctly judge
Figure BDA00035093955300000716
Calculating P by the above methodFWherein each base detector is at XSTRSet of meta-tags L per history dataTRM,LTRMContains n '× m' meta tags.
(5) Training a random forest through meta-features and meta-labels, specifically:
using a meta feature set XTRMAnd meta tag set LTRMA random forest consisting of n _ dtree decision trees is trained, n _ dtree generally being 100. When constructing a decision tree, from XTRMThe middle uniform has the place back to sample out N pieces of data
Figure BDA00035093955300000715
As a training sample of this decision tree, N ═ N '× m' is generally taken. In each decision tree sample, M' dimensions are randomly taken from M dimensions, typically
Figure BDA0003509395530000081
And selecting the optimal division dimension and the division point on the selected M' dimensions according to the kini index to perform binary division on the samples, dividing the samples smaller than the value in the dimension to the left side of the node, and dividing the samples larger than or equal to the value to the right side of the node to obtain a splitting condition and data sets on the left side and the right side. The above process is repeated on the data sets on the left and right sides, respectively, until the data set itself includes only one sample, or the metatags of all samples are the same. And (4) forming Random Forests (RFCs) by using all the trained decision trees, outputting the RFCs as class II labels 0 or 1, and showing whether the corresponding base detectors can correctly judge corresponding data or not.
(6) Extracting the meta-characteristics of the base detector on the data to be detected, inputting the meta-characteristics into a random forest, selecting the base detector according to the output of the random forest, and taking the maximum value of the output of the selected base detector as the detection result of the data to be detected to realize the abnormal detection of the power dispatching monitoring data, which specifically comprises the following steps:
for data x to be detectedTEExtracting P by the same method as in the step (4)FWherein each base detector is at xTEThe M meta-features on the (A) form a detection meta-feature set XTEM. Mixing XTEMInputting the RFC into the random forest RFC trained in the step (5) to obtain a detection meta-tag set L containing m' second class tagsTEM
For PFIf the corresponding detection element tag of each base detector in (1) is 0, which means that the detector is considered to be capable of correctly judging the data to be detected, adding the data to the selected base detector pool PSIn (1). Calculating PSWherein the total basis detector is at xTEThe maximum value of the Z score is used as the data x to be detectedTEThe detection result of (1). Calculating PSThe maximum value of the classification threshold values of all the medium-base detectors is used as the detection threshold value of the current detection, and the detection result is greater than or equal to the data x to be detected of the detection threshold valueTEAnd judging the data to be abnormal data, and realizing the abnormal detection of the power dispatching monitoring data.
According to the technical scheme, the invention has the following beneficial effects:
in the technical scheme implemented by the invention, the isolated forest is used for removing part of the base detectors with poor performance on all training data in advance before dynamically selecting the base detectors, so that the accuracy of the generated false true value can be improved, and the performance of the base detectors can be evaluated more accurately; when the base detector is dynamically selected, the performance of the base detector is comprehensively evaluated by effectively combining various evaluation indexes through a meta-learning idea, the problem that the performance of a dynamic selection integration method is poor under partial conditions due to effective universality of a single index can be solved, and the accuracy of power dispatching monitoring data anomaly detection based on the integration method is improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.
FIG. 1 is a schematic flow chart of a frame of a power dispatching monitoring data anomaly detection method based on dynamic and static selection integration, which is provided by the invention;
FIG. 2 is a schematic diagram of an abnormal detection method for power dispatching monitoring data based on dynamic and static selection integration according to the present invention;
FIG. 3 is a schematic of the input data and output results of a base detector used in the present invention;
[ detailed description ] embodiments
For better understanding of the technical solutions of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings.
It should be understood that the described embodiments of the invention are only some, but not all embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a power dispatching monitoring data anomaly detection method based on dynamic and static selection integration. In order to meet the abnormal detection of power dispatching monitoring data, the invention uses an isolated forest screening base detector, combines various evaluation indexes to comprehensively measure the performance of the base detector, and selects the base detector with better performance from random forests to detect the data to be detected.
Fig. 1 is a schematic flow chart of a framework of a power scheduling monitoring data anomaly detection method based on dynamic and static selection integration, which is provided by the invention, and the method comprises the following steps:
step 101, training a certain number of base detectors using power scheduling monitoring historical data.
Specifically, all power monitoring historical data are used as a training set XTRTraining m basis detectors using different unsupervised anomaly detection algorithms based on a training set,generally, m is greater than or equal to 50, and the number of the base detector cells formed by all the base detectors is recorded as PO. The output of each base detector is an abnormal score, and the larger the abnormal score is, the larger the abnormal degree of the input data is, the POThe anomaly score output by each base detector is converted into a Z score by Z score normalization. Note POWherein the ith base detector is at XTRThe jth history data
Figure BDA0003509395530000101
The abnormal score of the upper output is
Figure BDA0003509395530000102
Z fraction thereof
Figure BDA0003509395530000103
Comprises the following steps:
Figure BDA0003509395530000111
wherein: 1, 2, 1, n, n is XTRThe amount of history data in the database is,
Figure BDA0003509395530000112
is the average of the anomaly scores output by the ith basis detector over the entire history,
Figure BDA0003509395530000113
the standard deviation of the anomaly scores output for the ith basis detector over the entire history.
The input of each base detector is process real-time resource occupation data which is collected by the power dispatching monitoring system and is related to the power dispatching system service, and the process real-time resource occupation data comprises process CPU occupancy rate, memory occupancy rate, disk IO, network IO, thread number and network connection number. If the Z-fraction of the ith base detector output is less than
Figure BDA0003509395530000114
The input data is normal; if the ith base detectsThe Z fraction of the output of the device is greater than or equal to
Figure BDA0003509395530000115
The input data is abnormal. The ith base detector is applied to all the training data XTRSorting the Z-scores of the upper outputs from big to small, classification threshold of the ith base detector
Figure BDA0003509395530000116
Is the front R after the orderingDAMinimum in% Z fraction. RDAThe% is the set base detector output conversion ratio, and is generally 10%.
And 102, using the isolated forest to reject the base detector with poor performance.
In particular, using POIn the training set X of all m basis detectorsTRComposed of Z scores output on all n pieces of historical data
Figure BDA0003509395530000117
An isolated forest consisting of n _ itree isolated trees is trained, with n _ itree typically taking 100. When constructing an isolated tree, from
Figure BDA0003509395530000118
Sampling phi-stripe data without putting back in medium-uniform manner, and generally taking
Figure BDA0003509395530000119
All psi pieces of n-dimension data Scoreψ×nAs a training sample for this isolated tree. In each isolated tree sample, a dimension is randomly selected, a value is randomly selected from the maximum value and the minimum value of the sample in the dimension, the sample is divided into two branches, the sample which is smaller than the value in the dimension is divided into the left side of a node, the sample which is larger than or equal to the value is divided into the right side of the node, and a splitting condition and data sets of the left side and the right side are obtained. The above process is repeated on the data sets on the left and right sides respectively until the termination condition is reached, which has two:
1) the data set itself comprises only one sample, or all samples are identical;
2) the height of the tree reaches log2(ψ)。
And forming an isolated forest IForest by using all the trained isolated trees, wherein the output of the isolated forest IForest is a continuous value, and the smaller the output is, the larger the abnormal degree of the input data is.
Will be provided with
Figure BDA0003509395530000121
The r-th data in
Figure BDA0003509395530000122
As an input of the isolated forest IForest, r is 1, 2
Figure BDA0003509395530000123
An isolated forest IForest is arranged in
Figure BDA0003509395530000124
The m outputs on the sequence are sorted from small to large, and the front R after the sorting isDThe% output corresponds to the base detector flag abnormal, RDThe% is generally 10%. From PORemoving the base detectors marked as abnormal, and recording the base detector pool consisting of the m' base detectors left after screening as PF
Algorithm 1 is pseudo code for this step:
Figure BDA0003509395530000125
Figure BDA0003509395530000131
step 103, generating a false true value of the history data according to the output of the residual basis detector by using an average value method, and converting the false true value and the output of the basis detector into two types of labels respectively.
Specifically, note PFIn the training set X of all m' basis detectorsTRThe jth history data
Figure BDA0003509395530000132
Composition of Z fraction of up output
Figure BDA0003509395530000141
Computing
Figure BDA0003509395530000142
The average value of all Z fractions in the composition is used as
Figure BDA0003509395530000143
False true value of
Figure BDA0003509395530000144
Training set XTRThe false truth set corresponding to all the historical data is
Figure BDA0003509395530000145
Will be provided with
Figure BDA0003509395530000146
False true values in (1) are sorted from large to small, threshold PScorethrIs the front R after the orderingGAMinimum of% false values, RGAThe% is the set false true value conversion ratio, and is generally 20%. If the jth historical data
Figure BDA0003509395530000147
Corresponding false true value
Figure BDA0003509395530000148
Greater than or equal to PScorethrThen its false label
Figure BDA0003509395530000149
Is 1, otherwise is 0. Training set XTRThe false label set corresponding to all the historical data is
Figure BDA00035093955300001410
If P isFThe a-th base detector in history data
Figure BDA00035093955300001411
Z score of up output
Figure BDA00035093955300001412
Greater than or equal to its classification threshold
Figure BDA00035093955300001413
a 1, 2, m', then
Figure BDA00035093955300001414
Class II tag with upper output
Figure BDA00035093955300001415
Is 1, otherwise is 0. Recording the a-th basis detector in the training set XTRClass II tags of the upper output are
Figure BDA00035093955300001416
All-radical detectors at XTRClass II tag set of the upper output
Figure BDA00035093955300001417
And 104, eliminating the historical data with the smaller false value, and extracting the meta-characteristics and the meta-tags of the base detector on the residual historical data.
Specifically, false values of all historical data are determined
Figure BDA00035093955300001418
Sorting from small to large, eliminating the front R after sortingS% false values correspond to historical data. Recording the remaining n' historical data as XSTRThe corresponding false label set and the second type label set are respectively
Figure BDA00035093955300001419
And
Figure BDA00035093955300001420
residual radical detector at XSTRZ in the above is
Figure BDA00035093955300001421
For XSTRThe t-th history data
Figure BDA00035093955300001422
Calculate it to the original training set XTRThe jth history data
Figure BDA00035093955300001423
Euclidean distance of
Figure BDA00035093955300001424
Figure BDA00035093955300001425
Wherein: t 1, 2, n', l 1, 2, u, u is the dimension of the historical data,
Figure BDA0003509395530000151
is composed of
Figure BDA0003509395530000152
The value in the l-th dimension is,
Figure BDA0003509395530000153
is composed of
Figure BDA0003509395530000154
Numerical values in the l-th dimension.
Will be the original training set XTRAccording to the historical data in
Figure BDA0003509395530000155
The Euclidean distance of the K-shaped elements is ranked from small to large, and the K arranged at the front is takenRCAs a history data
Figure BDA0003509395530000156
Performance evaluation set of
Figure BDA0003509395530000157
Generally, K is 10-KRC≤30。
For the
Figure BDA0003509395530000158
Note PFWherein the total basis detector is in
Figure BDA0003509395530000159
The Z score of the upper output is
Figure BDA00035093955300001510
For the
Figure BDA00035093955300001511
Note PFWherein the total basis detector is in
Figure BDA00035093955300001512
The Z score of the upper output is
Figure BDA00035093955300001513
Computing
Figure BDA00035093955300001514
And
Figure BDA00035093955300001515
euclidean distance of
Figure BDA00035093955300001516
Figure BDA00035093955300001517
Wherein:
Figure BDA00035093955300001518
is PFWherein the a-th radical detector is in
Figure BDA00035093955300001519
The Z-score of the upper output is,
Figure BDA00035093955300001520
is PFWherein the a-th radical detector is in
Figure BDA00035093955300001521
The Z-score of the upper output.
Will be the original training set XTRBased on the Z-score and the sum of all historical data output by the base detector
Figure BDA00035093955300001522
The Euclidean distance of the K-shaped elements is ranked from small to large, and the K arranged at the front is takenSOPAs a history data
Figure BDA00035093955300001523
Approximate output set of
Figure BDA00035093955300001524
Generally, K is 10-KSOP≤30。
Extraction of PFWherein the a-th radical detector is in
Figure BDA00035093955300001525
The six-component characteristic:
1) computing in a performance evaluation set
Figure BDA00035093955300001526
The quantity of the history data with the same type II labels and corresponding false labels output by the middle base detector is calculated, and the quantity of the history data is calculated to be equal to KRCThe ratio of (A) to (B) is taken as a characteristic; this set includes a feature;
2) computing in an approximate output set
Figure BDA00035093955300001527
Class two tags and corresponding false tag phases output by a mid-base detectorThe same amount of history data is calculated, and K is calculatedSOPThe ratio of (A) to (B) is taken as a characteristic; this set includes a feature;
3) for performance evaluation set
Figure BDA00035093955300001528
Whether the base detector can correctly judge the normal abnormal condition of each historical data in the data base; if the basis detector can correctly judge
Figure BDA0003509395530000161
Q 1, 2.., K, the q-th history data in (1)RCThe qth feature in this group is 0, otherwise it is 1; this group comprises KRCA feature;
4) for approximate output set
Figure BDA0003509395530000162
Whether the base detector can correctly judge the normal abnormal condition of each historical data in the data base; if the basis detector can correctly judge
Figure BDA0003509395530000163
The pth history of (1, 2., K)SOPIf so, the pth feature in this group is 0, otherwise it is 1; this group comprises KSOPA feature;
5) set of computational performance evaluations
Figure BDA0003509395530000164
Z-score output by the middle base detector for each historical data and classification threshold of the base detector
Figure BDA0003509395530000165
The absolute value of the difference of (a); this group comprises KRCA feature;
6) computing basis detector pairs data to extract meta-features
Figure BDA0003509395530000166
Output Z-score and base detector self-positive classification threshold
Figure BDA0003509395530000167
The absolute value of the difference of (a); this set contains 1 feature.
The six groups contain M number of element characteristics, wherein M is 3+2 xKRC+KSOP(ii) a Extraction of P by the above methodFWherein each base detector is at XSTRThe meta-feature on each historical data in the set constitutes a meta-feature set XTRM,XTRMWhich contains n '× m' pieces of meta-feature data.
Comparison PFWherein the a-th radical detector is in
Figure BDA0003509395530000168
Class II tag with upper output
Figure BDA0003509395530000169
And
Figure BDA00035093955300001610
false label of
Figure BDA00035093955300001611
Whether or not they are the same. If they are the same, the a-th base detector is
Figure BDA00035093955300001612
Meta tag on
Figure BDA00035093955300001613
Is 0, indicating that the a-th basis detector can correctly judge
Figure BDA00035093955300001614
Otherwise, it is 1, which means that the a-th basis detector cannot correctly judge
Figure BDA00035093955300001615
Calculating P by the above methodFWherein each base detector is at XSTRSet of meta-tags L per history dataTEM,LTRMContains n '× m' meta tags.
And 105, training a random forest through the meta-features and the meta-labels.
In particular, using the meta feature set XTRMAnd meta tag set LTRMA random forest consisting of n _ dtree decision trees is trained, n _ dtree generally being 100. When constructing a decision tree, from XTEMThe middle uniform has the place back to sample out N pieces of data
Figure BDA0003509395530000171
As a training sample of this decision tree, N ═ N '× m' is generally taken. In each decision tree sample, M' dimensions are randomly taken from M dimensions, typically
Figure BDA0003509395530000172
And selecting the optimal division dimension and the division point on the selected M' dimensions according to the kini index to perform binary division on the samples, dividing the samples smaller than the value in the dimension to the left side of the node, and dividing the samples larger than or equal to the value to the right side of the node to obtain a splitting condition and data sets on the left side and the right side. The above process is repeated on the data sets on the left and right sides, respectively, until the data set itself includes only one sample, or the metatags of all samples are the same. And (4) forming Random Forests (RFCs) by using all the trained decision trees, outputting the RFCs as class II labels 0 or 1, and showing whether the corresponding base detectors can correctly judge corresponding data or not.
Algorithm 2 is the pseudo code of step 103-105:
Figure BDA0003509395530000173
Figure BDA0003509395530000181
Figure BDA0003509395530000191
and 106, extracting the meta-features of the base detectors on the data to be detected, inputting the meta-features into a random forest, selecting the base detectors according to the output of the random forest, and taking the maximum output value of the selected base detectors as the detection result of the data to be detected to realize the abnormal detection of the power dispatching monitoring data.
In particular, for the data x to be detectedTEP is extracted using the same method as in step 104FWherein each base detector is at xTEThe M meta-features on the (A) form a detection meta-feature set XTEM. Mixing XTEMInputting the RFC into the random forest RFC trained in the step (5) to obtain a detection meta-tag set L containing m' second class tagsTEM
For PFIf the corresponding detection element tag of each base detector in (1) is 0, which means that the detector is considered to be capable of correctly judging the data to be detected, adding the data to the selected base detector pool PSIn (1). Calculating PSWherein the total basis detector is at xTEThe maximum value of the Z score is used as the data x to be detectedTEThe detection result of (1). Calculating PSThe maximum value of the classification threshold values of all the medium-base detectors is used as the detection threshold value of the current detection, and the detection result is greater than or equal to the data x to be detected of the detection threshold valueTEAnd judging the data to be abnormal data, and realizing the abnormal detection of the power dispatching monitoring data.
Algorithm 3 is the pseudo code for step 106:
Figure BDA0003509395530000192
Figure BDA0003509395530000201
Figure BDA0003509395530000211
fig. 2 is a schematic diagram of an abnormal detection method for power dispatching monitoring data based on dynamic and static selection integration, which is provided by the invention. Firstly, training a certain number of base detectors by using power dispatching monitoring historical data, training isolated forests according to Z scores output by all the base detectors on all the historical data, and removing the base detectors corresponding to smaller outputs of the isolated forests on all the Z scores; secondly, generating a false true value of each historical data according to the Z fraction output by the residual basis detector by using an average value method, and converting the false true value into a false label; removing data with smaller false true values from all historical data, extracting the meta-features of each base detector on the residual historical data to form a meta-feature set, and generating a meta-tag set according to whether tags output by the base detectors on the residual historical data are the same as corresponding false tags or not; secondly, training a random forest by using the meta feature set and the meta tag set; and finally, extracting a detection element characteristic set of each base detector on the data to be detected, inputting the detection element characteristic set into a random forest to obtain a detection element label set, selecting the base detectors according to the detection element label set, taking the maximum value of the Z scores of the selected base detectors as a detection result, taking the maximum value of the classification threshold values of the selected base detectors as a detection threshold value of the time, judging the data to be detected, of which the detection result is greater than or equal to the detection threshold value, as abnormal data, and realizing abnormal detection of the power dispatching monitoring data.
Fig. 3 is a schematic diagram of input data and output results of the base detectors used in the present invention, where the input of each base detector is process real-time resource occupation data related to the power scheduling system service, which is acquired by the power scheduling monitoring system, and includes process CPU occupancy, memory occupancy, disk IO, network IO, thread number, and network connection number. If the Z-fraction of the ith base detector output is less than
Figure BDA0003509395530000221
The input data is normal; if the Z-fraction of the ith base detector output is greater than or equal to
Figure BDA0003509395530000222
The input data is abnormal. Sorting Z scores output by the ith base detector on all training data XTR from large to small, and classifying threshold values of the ith base detector
Figure BDA0003509395530000223
Is the front R after the orderingDAMinimum in% Z fraction. RDAThe% is the set base detector output conversion ratio, and is generally 10%.
In a specific embodiment, three abnormal conditions in a smart grid dispatching control system (referred to as a D5000 system for short) are used: and (4) carrying out data jumping, applying network disconnection and not refreshing the telemetry table to the system monitoring data. The data jump abnormity is that for a remote measuring point, the process data of the D5000 system is collected periodically, and if the numerical difference value of adjacent sampling points is larger than an artificially set threshold value, the data jump abnormity is considered to occur. When data jump variation occurs, deviation occurs when the power dispatching position distributes power generation amount to subordinate power grid companies, the dispatching plan of a power grid is influenced, and meanwhile deviation occurs in a report form of electric quantity, and electric quantity charging is influenced. The application network disconnection abnormity is that the network connection of a server running the D5000 system application is interrupted or a network card fails, so that the key process of the D5000 system runs slowly and even stops running, and the service under the application cannot execute tasks normally, thereby influencing the power grid dispatching. The telemetering table does not refresh the abnormal state, and the automatic system of the power grid fails to update the telemetering data in time. Real-time and accurate telemetering data can be received, and the working condition of the power grid can be timely and accurately adjusted by a dispatcher. When the state of the power grid changes, corresponding telemetering data should be immediately reflected to a dispatching center, and if the telemetering meter does not update data for a long time, the overall control of the operation state of the power grid by a dispatching person is influenced.
The specific information of the system monitoring data corresponding to the three types of anomalies is shown in table 1:
TABLE 1 concrete information of system monitoring data when three kinds of abnormalities appear
Figure BDA0003509395530000224
Figure BDA0003509395530000231
Table 2 shows the basis detector algorithm and its parameters used in the examples of the present invention:
table 2 base detector algorithm and parameters used in the embodiment
Figure BDA0003509395530000232
In order to verify the effectiveness of the provided algorithm, the dynamic and static selection integration anomaly detection method is compared with other direct integration anomaly detection methods, such as Average, Max, AOM and MOA, the anomaly detection methods HEnS, SS-FS and Boostselect based on static selection integration and the anomaly detection methods LSCP and ELSCP based on dynamic selection integration.
The AUC values were used for the assessment in the examples of the present invention. Generally, the Area Under the ROC Curve (AUC) is used to evaluate the performance of the anomaly detection algorithm, and the more the ROC Area is close to 1, i.e., the larger the AUC value, the better the performance of the anomaly detection algorithm is.
Parameter R in the examples of the present inventionDA% is set to 10%, RGA% and RS% are set to 20%, KRCAnd KSOPBoth set to 30 and both n _ itree and n _ dtree set to 100.
The AUC results on the D5000 monitored data set for the inventive and comparative examples are shown in table 3. It can be seen that the power dispatching monitoring data anomaly detection method based on dynamic and static selection integration of the invention obtains the highest AUC on data jump anomalies and obtains the highest average AUC on three anomalies, which shows that the invention obtains higher accuracy on dispatching monitoring data anomaly detection than the prior method.
TABLE 3 AUC results over three abnormalities
Exception name Average Max AOM MOA HEnS SS-FS BoostSelect LSCP ELSCP The invention
Application cut-off net 0.9908 0.9848 0.9872 0.9904 0.9795 0.9862 0.9603 0.9672 0.9757 0.9885
Data hopping 0.7571 0.8132 0.7844 0.7604 0.7506 0.7840 0.6099 0.7874 0.8095 0.8575
Remote meter not refreshing 0.9979 0.9971 0.9971 0.9977 0.5840 0.9978 1.0000 0.9957 0.9966 0.9970
Mean AUC value 0.9153 0.9317 0.9229 0.9162 0.7714 0.9227 0.8567 0.9168 0.9272 0.9477
In summary, the embodiments of the present invention have the following beneficial effects:
in the technical scheme implemented by the invention, a certain number of base detectors are trained by using different unsupervised anomaly detection algorithms based on original power dispatching monitoring historical data; using an isolated forest to eliminate all base detectors with poor performance; generating a false true value of each historical data according to the Z fraction output by the residual basis detector by using an average value method, and converting the false true value into a false label; removing data with smaller false true values from all historical data, extracting the meta-features of each base detector on the residual historical data to form a meta-feature set, and generating a meta-tag set according to whether tags output by the base detectors on the residual historical data are the same as corresponding false tags or not; secondly, training a random forest by using the meta feature set and the meta tag set; and finally, extracting a detection element characteristic set of each base detector on the data to be detected, inputting the detection element characteristic set into a random forest to obtain a detection element label set, selecting the base detectors according to the detection element label set, taking the maximum value of the Z scores of the selected base detectors as a detection result, taking the maximum value of the classification threshold values of the selected base detectors as a detection threshold value of the time, judging the data to be detected, of which the detection result is greater than or equal to the detection threshold value, as abnormal data, and realizing abnormal detection of the power dispatching monitoring data. According to the technical scheme provided by the embodiment of the invention, when the problem of abnormality detection of the power dispatching monitoring data is faced, compared with other abnormality detection methods based on integration, the method can obtain higher accuracy.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A power dispatching monitoring data abnormity detection method based on dynamic and static selection integration is characterized by comprising the following steps:
(1) training a number of base detectors using power schedule monitoring historical data;
(2) using an isolated forest to reject a base detector with poor performance;
(3) generating a false true value of the historical data according to the output of the residual base detector by using an average value method, and respectively converting the false true value and the output of the base detector into two types of labels;
(4) removing historical data with over-small false values, and extracting meta-features and meta-tags of the base detector on the remaining historical data;
(5) training a random forest through meta features and meta tags;
(6) and extracting the meta-characteristics of the base detector on the data to be detected, inputting the meta-characteristics into a random forest, selecting the base detector according to the output of the random forest, and taking the maximum value of the output of the selected base detector as the detection result of the data to be detected to realize the abnormal detection of the power dispatching monitoring data.
2. The power dispatching monitoring data anomaly detection method based on dynamic and static selection integration according to claim 1, wherein in the step (1), a certain number of base detectors are trained by using power dispatching monitoring historical data, and specifically:
all power monitoring historical data are used as a training set XTRTraining m base detectors by using different unsupervised anomaly detection algorithms based on a training set, generally taking m to be more than or equal to 50, and recording a base detector pool composed of all the base detectors as PO(ii) a The output of each base detector is an abnormal score, and the larger the abnormal score is, the larger the abnormal degree of the input data is, the PONormalizing the Z score of the abnormal score output by each base detector to convert the Z score into a Z score; note POWherein the ith base detector is at XTRThe jth history data
Figure FDA0003509395520000021
The abnormal score of the upper output is
Figure FDA0003509395520000022
Z fraction thereof
Figure FDA0003509395520000023
Comprises the following steps:
Figure FDA0003509395520000024
wherein: 1, 2, 1, n, n is XTRThe amount of history data in the database is,
Figure FDA0003509395520000025
is the average of the anomaly scores output by the ith basis detector over the entire history,
Figure FDA0003509395520000026
a standard deviation of the anomaly scores output for the ith basis detector over the entire historical data;
the input of each base detector is process real-time resource occupation data which is collected by the power dispatching monitoring system and is related to the power dispatching system service, and the process real-time resource occupation data comprises process CPU occupancy rate, memory occupancy rate, disk IO, network IO, thread number and network connection number; if the Z-fraction of the ith base detector output is less than
Figure FDA0003509395520000027
The input data is normal; if the Z-fraction of the ith base detector output is greater than or equal to
Figure FDA0003509395520000028
The input data is abnormal; the ith base detector is applied to all the training data XTRSorting the Z-scores of the upper outputs from big to small, classification threshold of the ith base detector
Figure FDA0003509395520000029
Is the front R after the orderingDAMinimum of% Z scores; rDAThe% is the set base detector output conversion ratio, and is generally 10%.
3. The power dispatching monitoring data abnormity detection method based on dynamic and static selection integration according to claim 1, wherein in the step (2), a base detector with poor isolation forest elimination performance is used, and specifically:
using POIn the training set X of all m basis detectorsTRComposed of Z scores output on all n pieces of historical data
Figure FDA0003509395520000031
Training an isolated forest consisting of n _ itree isolated trees, wherein n _ itree is generally 100; when constructing an isolated tree, from
Figure FDA0003509395520000032
Sampling phi-stripe data without putting back in medium-uniform manner, and generally taking
Figure FDA0003509395520000033
All psi pieces of n-dimension data Scoreψ×nAs a training sample for this isolated tree; randomly selecting a dimension in each isolated tree sample, randomly selecting a value from the maximum value and the minimum value of the sample in the dimension, performing binary division on the sample, dividing the sample which is smaller than the value in the dimension to the left of a node, and dividing the sample which is larger than or equal to the value to the right of the node to obtain a splitting condition and data sets on the left side and the right side; the above process is repeated on the data sets on the left and right sides respectively until the termination condition is reached, which has two:
1) the data set itself comprises only one sample, or all samples are identical;
2) the height of the tree reaches log2(ψ);
Forming an isolated forest IForest by using all the trained isolated trees, wherein the output of the isolated forest IForest is a continuous value, and the smaller the output is, the larger the abnormal degree of input data is;
will be provided with
Figure FDA0003509395520000034
The r-th data in
Figure FDA0003509395520000035
As an input of the isolated forest IForest, r is 1, 2
Figure FDA0003509395520000036
An isolated forest IForest is arranged in
Figure FDA0003509395520000037
The m outputs on the sequence are sorted from small to large, and the front R after the sorting isDThe% output corresponds to the base detector flag abnormal, RD% is generally 10%; from PORemoving the base detectors marked as abnormal, and recording the base detector pool consisting of the m' base detectors left after screening as PF
4. The power scheduling monitoring data anomaly detection method based on dynamic and static selection integration according to claim 1, wherein in the step (3), an averaging method is used to generate a false true value of historical data according to the output of the residual basis detector, and the false true value and the output of the basis detector are respectively converted into two types of labels, specifically:
note PFIn the training set X of all m' basis detectorsTRThe jth history data
Figure FDA0003509395520000041
Composition of Z fraction of up output
Figure FDA0003509395520000042
Computing
Figure FDA0003509395520000043
The average value of all Z fractions in the composition is used as
Figure FDA0003509395520000044
False true value of
Figure FDA0003509395520000045
Training set XTRThe false truth set corresponding to all the historical data is
Figure FDA0003509395520000046
Will be provided with
Figure FDA0003509395520000047
False true values in (1) are sorted from large to small, threshold PScorethrIs the front R after the orderingGAMinimum of% false values, RGA% is a set false true value conversion ratio, and is generally 20%; if the jth historical data
Figure FDA0003509395520000048
Corresponding false true value
Figure FDA0003509395520000049
Greater than or equal to PScorethrThen its false label
Figure FDA00035093955200000410
Is 1, otherwise is 0; training set XTRThe false label set corresponding to all the historical data is
Figure FDA00035093955200000411
If P isFThe a-th base detector in history data
Figure FDA00035093955200000412
Z score of up output
Figure FDA00035093955200000413
Greater than or equal to its classification threshold
Figure FDA00035093955200000414
Then it is at
Figure FDA00035093955200000415
Class II tag with upper output
Figure FDA00035093955200000416
Is 1, otherwise is 0; recording the a-th basis detector in the training set XTRClass II tags of the upper output are
Figure FDA00035093955200000417
All-radical detectors at XTRClass II tag set of the upper output
Figure FDA00035093955200000418
5. The power scheduling monitoring data anomaly detection method based on dynamic and static selection integration according to claim 1, wherein in the step (4), historical data with too small false value is removed, and meta-features and meta-tags of a base detector on the remaining historical data are extracted, specifically:
false true value of all historical data
Figure FDA00035093955200000419
Sorting from small to large, eliminating the front R after sortingS% of the historical data corresponding to the false true values; recording the remaining n' historical data as XSTRThe corresponding false label set and the second type label set are respectively
Figure FDA00035093955200000420
And
Figure FDA00035093955200000421
residual radical detector at XSTRZ in the above is
Figure FDA00035093955200000422
For XSTRThe t-th history data
Figure FDA0003509395520000051
Calculate it to the original training set XTRThe jth history data
Figure FDA0003509395520000052
Euclidean distance of
Figure FDA0003509395520000053
Figure FDA0003509395520000054
Wherein: t 1, 2, n', l 1, 2, u, u is the dimension of the historical data,
Figure FDA0003509395520000055
is composed of
Figure FDA0003509395520000056
The value in the l-th dimension is,
Figure FDA0003509395520000057
is composed of
Figure FDA0003509395520000058
A value in the l-dimension;
will be the original training set XTRAccording to the historical data in
Figure FDA0003509395520000059
The Euclidean distance of the K-shaped elements is ranked from small to large, and the K arranged at the front is takenRCAs a history data
Figure FDA00035093955200000510
Performance evaluation set of
Figure FDA00035093955200000511
Generally, K is 10-KRC≤30;
For the
Figure FDA00035093955200000512
Note PFWherein the total basis detector is in
Figure FDA00035093955200000513
The Z score of the upper output is
Figure FDA00035093955200000514
For the
Figure FDA00035093955200000515
Note PFWherein the total basis detector is in
Figure FDA00035093955200000516
The Z score of the upper output is
Figure FDA00035093955200000517
Computing
Figure FDA00035093955200000518
And
Figure FDA00035093955200000519
euclidean distance of
Figure FDA00035093955200000520
Figure FDA00035093955200000521
Wherein:
Figure FDA00035093955200000522
is PFWherein the a-th radical detector is in
Figure FDA00035093955200000523
The Z-score of the upper output is,
Figure FDA00035093955200000524
is PFWherein the a-th radical detector is in
Figure FDA00035093955200000525
The Z score of the upper output;
will be the original training set XTRBased on the Z-score and the sum of all historical data output by the base detector
Figure FDA00035093955200000526
The Euclidean distance of the K-shaped elements is ranked from small to large, and the K arranged at the front is takenSOPAs a history data
Figure FDA00035093955200000527
Approximate output set of
Figure FDA00035093955200000528
Generally, K is 10-KSOP≤30;
Extraction of PFWherein the a-th radical detector is in
Figure FDA00035093955200000529
The six-component characteristic:
1) computing in a performance evaluation set
Figure FDA00035093955200000530
The quantity of the history data with the same type II labels and corresponding false labels output by the middle base detector is calculated, and the quantity of the history data is calculated to be equal to KRCThe ratio of (A) to (B) is taken as a characteristic; this set includes a feature;
2) computing in an approximate output set
Figure FDA0003509395520000061
Middle baseThe quantity of the history data with the same type II labels and corresponding false labels output by the detector is calculated, and the quantity of the history data is calculated to be equal to KSOPThe ratio of (A) to (B) is taken as a characteristic; this set includes a feature;
3) for performance evaluation set
Figure FDA0003509395520000062
Whether the base detector can correctly judge the normal abnormal condition of each historical data in the data base; if the basis detector can correctly judge
Figure FDA0003509395520000063
Q 1, 2.., K, the q-th history data in (1)RCThe qth feature in this group is 0, otherwise it is 1; this group comprises KRCA feature;
4) for approximate output set
Figure FDA0003509395520000064
Whether the base detector can correctly judge the normal abnormal condition of each historical data in the data base; if the basis detector can correctly judge
Figure FDA0003509395520000065
The pth history of (1, 2., K)SOPIf so, the pth feature in this group is 0, otherwise it is 1; this group comprises KSOPA feature;
5) set of computational performance evaluations
Figure FDA0003509395520000066
Z-score output by the middle base detector for each historical data and classification threshold of the base detector
Figure FDA0003509395520000067
The absolute value of the difference of (a); this group comprises KRCA feature;
6) computing basis detector pairs data to extract meta-features
Figure FDA0003509395520000068
Output Z-score and base detector self-positive classification threshold
Figure FDA0003509395520000069
The absolute value of the difference of (a); this set includes 1 feature;
the six groups contain M number of element characteristics, wherein M is 3+2 xKRC+KSOP(ii) a Extraction of P by the above methodFWherein each base detector is at XSTRThe meta-feature on each historical data in the set constitutes a meta-feature set XTRM,XTRMThe method comprises n '× m' pieces of meta-characteristic data;
comparison PFWherein the a-th radical detector is in
Figure FDA00035093955200000610
Class II tag with upper output
Figure FDA00035093955200000611
And
Figure FDA00035093955200000612
false label of
Figure FDA00035093955200000613
Whether they are the same; if they are the same, the a-th base detector is
Figure FDA00035093955200000614
Meta tag on
Figure FDA00035093955200000615
Is 0, indicating that the a-th basis detector can correctly judge
Figure FDA00035093955200000616
Otherwise, it is 1, which means that the a-th basis detector cannot correctly judge
Figure FDA0003509395520000071
Calculating P by the above methodFWherein each base detector is at XSTRSet of meta-tags L per history dataTRM,LTRMContains n '× m' meta tags.
6. The power dispatching monitoring data anomaly detection method based on dynamic and static selection integration according to claim 1, wherein in the step (5), a random forest is trained through meta-features and meta-tags, and specifically comprises the following steps:
using a meta feature set XTRMAnd meta tag set LTRMTraining a random forest consisting of n _ dtree decision trees, wherein n _ dtree generally takes 100; when constructing a decision tree, from XTRMThe middle uniform has the place back to sample out N pieces of data
Figure FDA0003509395520000072
As a training sample of this decision tree, N ═ N '× m' is generally taken; in each decision tree sample, M' dimensions are randomly taken from M dimensions, typically
Figure FDA0003509395520000073
Selecting an optimal division dimension and a division point on the selected M' dimensions according to the kini index to perform binary division on the samples, dividing the samples smaller than the value in the dimension to the left side of the node, and dividing the samples larger than or equal to the value to the right side of the node to obtain a splitting condition and data sets on the left side and the right side; repeating the above process on the data sets on the left side and the right side respectively until the data sets only comprise one sample or the meta tags of all samples are the same; and (4) forming Random Forests (RFCs) by using all the trained decision trees, outputting the RFCs as class II labels 0 or 1, and showing whether the corresponding base detectors can correctly judge corresponding data or not.
7. The power dispatching monitoring data anomaly detection method based on dynamic and static selection integration according to claim 1, wherein in the step (6), the meta-features of the base detectors on the data to be detected are extracted, the meta-features are input into a random forest, the base detectors are selected according to the output of the random forest, the maximum value of the output of the selected base detectors is taken as the detection result of the data to be detected, and the power dispatching monitoring data anomaly detection is realized, and specifically:
for data x to be detectedTEExtracting P by the same method as in the step (4)FWherein each base detector is at xTEThe M meta-features on the (A) form a detection meta-feature set XTEM(ii) a Mixing XTEMInputting the RFC into the random forest RFC trained in the step (5) to obtain a detection meta-tag set L containing m' second class tagsTEM
For PFIf the corresponding detection element tag of each base detector in (1) is 0, which means that the detector is considered to be capable of correctly judging the data to be detected, adding the data to the selected base detector pool PSPerforming the following steps; calculating PSWherein the total basis detector is at xTEThe maximum value of the Z score is used as the data x to be detectedTEThe detection result of (3); calculating PSThe maximum value of the classification threshold values of all the medium-base detectors is used as the detection threshold value of the current detection, and the detection result is greater than or equal to the data x to be detected of the detection threshold valueTEAnd judging the data to be abnormal data, and realizing the abnormal detection of the power dispatching monitoring data.
CN202210147086.5A 2022-02-17 2022-02-17 Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration Pending CN114399407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210147086.5A CN114399407A (en) 2022-02-17 2022-02-17 Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210147086.5A CN114399407A (en) 2022-02-17 2022-02-17 Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration

Publications (1)

Publication Number Publication Date
CN114399407A true CN114399407A (en) 2022-04-26

Family

ID=81234250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210147086.5A Pending CN114399407A (en) 2022-02-17 2022-02-17 Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration

Country Status (1)

Country Link
CN (1) CN114399407A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081966A (en) * 2022-08-22 2022-09-20 南通俊朗智能科技有限公司 Abnormal state monitoring method and aluminum alloy extrusion process controller applying same

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081966A (en) * 2022-08-22 2022-09-20 南通俊朗智能科技有限公司 Abnormal state monitoring method and aluminum alloy extrusion process controller applying same

Similar Documents

Publication Publication Date Title
CN107169628B (en) Power distribution network reliability assessment method based on big data mutual information attribute reduction
CN106909933A (en) A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features
CN112181706B (en) Power dispatching data anomaly detection method based on logarithmic interval isolation
CN109409444B (en) Multivariate power grid fault type discrimination method based on prior probability
CN114723285B (en) Power grid equipment safety evaluation prediction method
CN111191720B (en) Service scene identification method and device and electronic equipment
CN113112188B (en) Power dispatching monitoring data anomaly detection method based on pre-screening dynamic integration
CN114201374A (en) Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning
CN114065605A (en) Intelligent electric energy meter running state detection and evaluation system and method
CN112241606A (en) Cooperative decision-making method for operation and maintenance of ship intelligent equipment based on CPS decision-making module
CN113569462A (en) Distribution network fault level prediction method and system considering weather factors
CN117394337A (en) Power grid load early warning method and system thereof
CN114202243A (en) Engineering project management risk early warning method and system based on random forest
CN114611738A (en) Load prediction method based on user electricity consumption behavior analysis
CN115617784A (en) Data processing system and processing method for informationized power distribution
CN114399407A (en) Power dispatching monitoring data anomaly detection method based on dynamic and static selection integration
CN112508254B (en) Method for determining investment prediction data of transformer substation engineering project
CN113608968A (en) Power dispatching monitoring data anomaly detection method based on density and distance comprehensive decision
CN113689079A (en) Transformer area line loss prediction method and system based on multivariate linear regression and cluster analysis
CN117592656A (en) Carbon footprint monitoring method and system based on carbon data accounting
CN115034278A (en) Performance index abnormality detection method and device, electronic equipment and storage medium
CN112434886A (en) Method for predicting client mortgage loan default probability
CN109635008B (en) Equipment fault detection method based on machine learning
CN116383645A (en) Intelligent system health degree monitoring and evaluating method based on anomaly detection
CN114167837B (en) Intelligent fault diagnosis method and system for railway signal system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination