CN114580580B

CN114580580B - Intelligent operation and maintenance abnormity detection method and device

Info

Publication number: CN114580580B
Application number: CN202210492320.8A
Authority: CN
Inventors: 朱松涛; 邵俊
Original assignee: Shenzhen Suoxinda Data Technology Co ltd
Current assignee: Shenzhen Suoxinda Data Technology Co ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2022-08-16
Anticipated expiration: 2042-05-07
Also published as: CN114580580A

Abstract

The invention discloses an intelligent operation and maintenance abnormity detection method and device, wherein the method comprises the following steps: acquiring operation and maintenance data and performing dimension reduction processing to obtain a sample of the operation and maintenance data; establishing an independent tree according to the sample and forming an independent forest; calculating a preliminary abnormal score of each sample according to the independent tree and the independent forest, and marking the samples with the preliminary abnormal scores larger than a preset value as preliminary abnormal points; marking part of the positive sample; identifying an effective tree according to the marked preliminary abnormal points; assigning a score to the features of the identified preliminary outliers in the valid trees, and calculating a total score according to the number of the independent trees of the identified outliers and the number of the marked positive samples; calculating feature selection probability according to the total score and reconstructing an independent tree and an independent forest; carrying out anomaly detection according to the reconstructed independent tree and the independent forest; according to the method, the independent tree and the independent forest are reconstructed according to the preliminarily identified abnormal points, and the abnormal detection efficiency and the accuracy are high.

Description

Intelligent operation and maintenance abnormity detection method and device

Technical Field

The invention relates to the field of anomaly detection and calculation, in particular to an intelligent operation and maintenance anomaly detection method and device.

Background

In an intelligent operation and maintenance scene, operation and maintenance personnel often need to capture abnormal signals in time from a plurality of indexes related to system transactions and diagnose the abnormal signals, so that the aims of rapidly troubleshooting and avoiding accidents are fulfilled. The indexes associated with the system transaction include page opening delay, user click rate, CPU utilization rate and the like. The challenges often faced in this scenario are that the metrics that need to be tracked are very large in dimension, it is difficult to capture outliers in time, and there is no label to mark whether the sample is an outlier. In the existing anomaly detection technology, the conventional unsupervised training has poor accuracy, and if each sample point is labeled manually, the cost is very high.

For example, patent document CN111026925A discloses an anomaly detection method and device for parallelization of an isolated forest algorithm based on Flink, which extracts a data set to be tested from historical data to construct a binary tree, further forms an independent forest, scores the anomaly according to the depth of a sample point in each independent binary tree, and determines whether a sample in the data set is abnormal according to the anomaly score.

According to the scheme, an unsupervised detection algorithm is adopted to carry out abnormity detection on the sample, and the abnormity degree of the sample point is scored through the independent tree, so that the abnormity point can be identified in time. However, there is a problem that the abnormality point determination is performed only by the abnormality degree score in the soliton, which is inefficient and not accurate.

Disclosure of Invention

The invention provides an intelligent operation and maintenance abnormity detection method and device, which are used for reconstructing an independent tree and an independent forest according to an initially identified abnormal point, realizing the integration of an unsupervised independent forest algorithm and supervised learning, and having high abnormity detection efficiency and high accuracy.

An intelligent operation and maintenance abnormity detection method comprises the following steps:

acquiring operation and maintenance data and performing dimension reduction processing to obtain a sample of the operation and maintenance data;

establishing an independent tree according to the sample and forming an independent forest;

calculating a preliminary abnormal score of each sample according to the independent tree and the independent forest, and marking the samples with the preliminary abnormal scores larger than a preset value as preliminary abnormal points;

marking part of the positive sample;

identifying an effective tree according to the marked preliminary abnormal points;

assigning a score to the features of the identified preliminary outliers in the valid trees, and calculating a total score according to the number of the independent trees of the identified preliminary outliers and the number of the marked positive samples;

calculating feature selection probability according to the total score and reconstructing an independent tree and an independent forest;

and carrying out anomaly detection according to the reconstructed independent tree and the independent forest.

Further, the operation and maintenance data are collected and subjected to dimension reduction treatment, and the method comprises the following steps:

forming a matrix by each operation and maintenance data according to columns;

zero-averaging each row of the matrix;

solving a covariance matrix of the matrix after zero-mean processing;

solving eigenvalues and corresponding characteristics of the covariance matrix;

and arranging the characteristics into a characteristic matrix according to the characteristic value size in rows as a sample.

Further, establishing independent trees according to the samples and forming independent forests, comprising:

randomly selecting a feature as a root node;

selecting a characteristic value between the maximum characteristic value and the minimum characteristic value of the characteristics of the root node as a dividing basis, and dividing two child nodes;

dividing samples into two groups and respectively entering the two sub-nodes;

repeatedly executing the following steps until the path reaches a preset length or the child node only contains one sample to form an independent tree: selecting a characteristic value of one characteristic from each child node as a dividing basis to divide the two child nodes again, and dividing the rest samples into two groups again to enter the two child nodes;

and the independent trees generated by taking different characteristics as root nodes form an independent forest.

Further, the preliminary anomaly score for each sample is calculated by the following formula:

；

wherein,

representing the initial abnormal score, L (p) represents the path length of a leaf node where the sample p is located in an independent tree, and E (L (p)) represents the average value of the path lengths of each independent tree where the sample p is located in an independent forest;

；

indicating the number of samples.

Further, identifying the valid tree according to the marked preliminary abnormal points comprises:

and determining the independent tree in which the preliminary abnormal point is identified when the path length does not exceed the preset value as a valid tree.

Further, the total score is calculated by the following formula:

；

；

wherein,

representing the score assigned to a feature of the preliminary outlier P, N representing the number of independent trees in which the outlier P is identified,

the sum of the scores representing the relevant features of the preliminary outliers P,

representing the total score, n represents the number of positive samples of the marker.

Further, the feature selection probability is calculated by the following formula:

；

wherein,

shows the mth feature selection outlineThe ratio of the total weight of the particles,

the total score is represented as a function of the total score,

representing the mth feature.

Further, calculating feature selection probability according to the total score and reconstructing an independent tree and an independent forest, comprising:

sampling a random variable U, wherein the random variable U obeys uniform distribution between 0 and 1;

selecting the ith characteristic

As a root node, the characteristics

Satisfies the following conditions:

wherein

representing the mth characteristic selection probability;

dividing samples into two groups and respectively entering the two sub-nodes;

the following steps are repeatedly executed until the path reaches the preset length or the child node only contains one sample: randomly selecting a characteristic value of a characteristic vector from each child node as a dividing basis to divide the two child nodes again, and dividing the rest samples into two groups again to enter the two child nodes;

and the independent trees generated by taking different characteristics as root nodes are recombined into an independent forest.

Further, the anomaly detection is carried out according to the reconstructed independent trees and the independent forests, and comprises the following steps:

calculating the final abnormal score of each sample according to the reconstructed independent tree and the reconstructed independent forest, and marking the sample with the final abnormal score larger than a preset value as an abnormal point;

the final anomaly score is calculated by the following formula:

；

wherein,

a final anomaly score is indicated which is indicative of,

represents the path length of the sample p at the leaf node where a reorganized independent tree is located,

then represents the average of the path lengths of each individual tree of sample p in the recombined individual forest;

；

indicating the number of samples.

An intelligent operation and maintenance abnormity detection device comprises:

the data processing module is used for acquiring operation and maintenance data and performing dimension reduction processing to obtain a sample of the operation and maintenance data;

the preliminary forest establishment module is used for establishing an independent tree according to the sample and forming an independent forest;

the preliminary judgment module is used for calculating the preliminary abnormal score of each sample according to the independent tree and the independent forest and marking the sample with the preliminary abnormal score larger than a preset value as a preliminary abnormal point;

the marking module is used for marking part of the positive samples;

the identification module is used for identifying the effective tree according to the marked preliminary abnormal points;

the total score calculating module is used for giving scores to the features of the identified primary abnormal points in the effective trees and calculating the total score according to the number of the independent trees of the identified primary abnormal points and the number of the marked positive samples;

the reconstruction module is used for calculating the feature selection probability according to the total score and reconstructing an independent tree and an independent forest;

and the anomaly detection module is used for carrying out anomaly detection according to the reconstructed independent tree and the independent forest.

The intelligent operation and maintenance abnormity detection method and device provided by the invention at least have the following beneficial effects:

(1) the operation and maintenance data are subjected to dimension reduction processing before anomaly detection, sample data applied to anomaly detection are simplified, operation time is saved, and the working efficiency of an anomaly detection algorithm is improved.

(2) The method is characterized in that a part of positive samples are marked in an artificial labeling mode, and a labeled supervised learning mode is added into an unsupervised independent forest algorithm, so that the advantages of the two algorithms can be combined, the accuracy of the algorithm is improved, and the efficiency of the algorithm is guaranteed.

(3) All the characteristics related to the samples are assigned through a plurality of effective trees of a plurality of positive samples, the total value of the characteristics is calculated to describe the action size of each characteristic in the anomaly detection process, the total value is used as a basis for selecting root nodes when the independent trees are reconstructed, and the identification accuracy of the reconstructed independent forest is improved.

(4) Root node selection is performed by uniformly distributing and sampling random variables, the probability that each feature is selected can be guaranteed as the feature selection probability, and therefore the accuracy of reconstructing the independent forest is guaranteed.

Drawings

Fig. 1 is a flowchart of an embodiment of an intelligent operation and maintenance anomaly detection method provided by the present invention.

Fig. 2 is a flowchart of an embodiment of a method for reconstructing an independent tree and an independent forest in the method provided by the present invention.

Fig. 3 is a schematic structural diagram of an embodiment of the intelligent operation and maintenance abnormality detection apparatus provided in the present invention.

Fig. 4 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.

Reference numerals: 1-a processor, 2-a storage device, 101-a data processing module, 102-a preliminary forest establishment module, 103-a preliminary judgment module, 104-a marking module, 105-an identification module, 106-a total score calculation module, 107-a reconstruction module and 108-an abnormity detection module.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Referring to fig. 1, in some embodiments, an intelligent operation and maintenance anomaly detection method is provided, including:

s1, collecting operation and maintenance data and performing dimension reduction processing to obtain a sample of the operation and maintenance data;

s2, establishing an independent tree according to the sample and forming an independent forest;

s3, calculating a preliminary abnormal score of each sample according to the independent tree and the independent forest, and marking the samples with the preliminary abnormal scores larger than a preset value as preliminary abnormal points;

s4, marking part of positive samples;

s5, identifying an effective tree according to the marked preliminary abnormal points;

s6, giving scores to the features of the identified primary outliers in the effective trees, and calculating the total score according to the number of the independent trees of the identified primary outliers and the number of the marked positive samples;

s7, calculating the feature selection probability according to the total score, and reconstructing an independent tree and an independent forest;

and S8, carrying out abnormity detection according to the reconstructed independent tree and the independent forest.

The intelligent operation and maintenance data comprises a plurality of characteristics related to the operation of equipment, a system and a network environment, including but not limited to: network delay, request concurrency number and database capacity. In the collected operation and maintenance data, one dimension corresponds to one feature, that is, the operation and maintenance data are multidimensional data, so that the operation and maintenance data need to be subjected to dimension reduction before anomaly detection.

Specifically, in step S1, the operation and maintenance data collection and the dimension reduction processing include:

s11, forming a matrix by each operation and maintenance data according to columns;

s12, carrying out zero equalization on each line of the matrix;

s13, solving a covariance matrix of the matrix after zero-mean processing;

s14, solving eigenvalues and corresponding characteristics of the covariance matrix;

and S15, arranging the features into a feature matrix according to the sizes of the feature values in rows as samples.

As a preferred embodiment, the operation and maintenance data is subjected to pca (principal Component analysis) dimension reduction processing. Reducing k M-dimension data to M-dimension, firstly, forming the original operation and maintenance data into a matrix X with M rows and k columns according to columns ₀ Then the matrix X is divided into ₀ Subtracting the mean value of each row from the data of each row to obtain a matrix X after zero-averaging processing, and solving the covariance matrix of the matrix X

Solving the eigenvalue and the corresponding characteristic of the covariance matrix, arranging the characteristic into a matrix from top to bottom according to the size of the corresponding eigenvalue, taking the first m rows to form a matrix P, thereby obtaining a sample with dimension reduced to m dimension, wherein the characteristic after dimension reduction is

，...

。

In step S2, establishing an independent tree according to the sample and forming an independent forest, including:

s21, randomly selecting a feature as a root node;

s22, selecting a characteristic value between the maximum characteristic value and the minimum characteristic value of the characteristics of the root node as a division basis, and dividing two child nodes;

s23, dividing the samples into two groups and respectively entering the two sub-nodes;

s24, the following steps are repeatedly executed until the path reaches the preset length or the child node only contains one sample, and an independent tree is formed: selecting a characteristic value of one characteristic from each child node as a dividing basis to divide the two child nodes again, and dividing the rest samples into two groups again to enter the two child nodes;

and S25, forming an independent forest by the independent trees generated by taking different characteristics as root nodes.

The anomaly detection method provided by the embodiment adopts an independent forest algorithm, the independent forest algorithm is an unsupervised anomaly detection method suitable for continuous data, and an anomaly value is detected by isolating sample points. The essence of each independent tree in the independent forest algorithm is a decision tree, and each sample flows to the child nodes of the independent forest algorithm from the root node according to the dividing mode of the nodes and finally falls onto one leaf node. There is no uniform rule for generating the number of independent trees, and the number of independent trees is not directly related to the number of samples. Each independent tree is independent, and the judgment of each independent tree on a sample needs to be comprehensively considered when the independent forest algorithm is adopted for abnormal scoring.

In steps S21-S25, since the abnormal data sample is relatively isolated from other data samples, the number of partitions required for the abnormal sample to be partitioned separately is relatively small compared to other samples, i.e., the path length of the abnormal sample in the independent tree is relatively short. Therefore, the possibility that the sample is an abnormal sample can be judged according to the path length of each sample which is divided out separately, and the sample is represented by a preliminary abnormal score, and the sample with the preliminary abnormal score larger than the preset value is marked as a preliminary abnormal point.

Specifically, the preliminary abnormality score of each sample in step S3 is calculated by the following formula:

；

wherein,

；

indicating the number of samples.

As a preferred embodiment, samples with a preliminary anomaly score greater than 0.9, derived according to the above formula, are labeled as preliminary anomaly points.

In step S4, a part of a small number of positive samples are marked by manual marking, where the manually marked positive samples are marked as: {

}. By marking part of positive samples, a foundation is provided for realizing the integration of unsupervised independent forest algorithm and supervised learning, so that the advantages of the two algorithms can be combined, the algorithm accuracy is improved, the algorithm efficiency is ensured, and in addition, compared with marking of all samples, the cost of manual marking can be saved.

The identification precision of the preliminarily identified preliminary abnormal points is not high, so that the reconstruction of the independent trees and the independent forests is required to be further carried out.

In step S5, identifying a valid tree according to the marked preliminary abnormal point includes:

and determining the independent tree of which the preliminary abnormal point is identified when the path length does not exceed the preset value as a valid tree.

In step S6, assigning a score to the feature of the valid tree in which the preliminary outlier is identified, and calculating a total score according to the number of the independent trees in which the outlier is identified and the number of the marked positive samples, includes:

s61, assigning zero values to each feature as initial scores;

s62, executing the following steps on the initial abnormal points until all effective trees and all the initial abnormal points are traversed to obtain the total score of a certain characteristic: assigning a score to a feature in a valid tree that identifies a preliminary outlier

Wherein

the path length of the preliminary outlier in the effective tree i;

s63, executing step S62 on all the characteristics to obtain the total score of all the characteristics.

In step S62, the total score is calculated by the following formula:

；

；

wherein,

representing the score assigned to a feature of the preliminary outlier P, N representing the number of independent trees from which the preliminary outlier P was identified,

represents the total score, n represents the number of positive samples of the token;

in certain embodiments, the maximum path of each individual tree does not exceed D, and the individual trees that identify the preliminary outlier P when the path length does not exceed D-1 are determined to be valid trees, the valid trees for the preliminary outlier P having a total of N. The initial score of each feature is 0, and for the ith independent tree in which the preliminary outlier P is effectively identified, the features involved in the path for detecting the preliminary outlier are assigned a score

Wherein

Path length for point P in ith independent tree

. It is assumed that the features involved for detecting the preliminary outlier P are

，

，

Then, for the ith independent tree detecting the preliminary outlier P, the three features can all get the score

Thus, based on N valid trees, features

The total score that can be assigned by the preliminary outlier P is

. The characteristics of all the positive samples are identified and given scores according to the mode, and finally the characteristics are obtained

Is given a total score of

. It should be noted that if a feature is never used for the detection of any preliminary outlier, the score of the feature is always zero.

Referring to fig. 2, in step S7, calculating feature selection probabilities according to the total scores and reconstructing an independent tree and an independent forest includes:

s71, sampling a random variable U, wherein the random variable U obeys uniform distribution between 0 and 1;

s72, selecting the ith characteristic

As a root node, the characteristics

Satisfies the following conditions:

wherein

representing the mth characteristic selection probability;

s73, selecting a characteristic value between the maximum characteristic value and the minimum characteristic value as the characteristic of the root node as a dividing basis, and dividing two child nodes;

s74, dividing the samples into two groups and respectively entering the two sub-nodes;

s75, the following steps are repeatedly executed until the path reaches the preset length or the child node only contains one sample: randomly selecting a characteristic value of a characteristic vector from each child node as a dividing basis to divide the two child nodes again, and dividing the rest samples into two groups again to enter the two child nodes;

and S76, recombining the independent trees generated by taking different characteristics as root nodes into an independent forest.

In step S72, the feature selection probability is calculated by the following formula:

；

；

wherein,

the m-th feature selection probability is shown,

the total score is represented as a function of the total score,

the mth feature is shown.

The procedure of reconstructing the independent tree in step S7 is substantially the same as the procedure of initially constructing the independent tree in step S2, except that the feature selection of the root node is randomly equal in probability when the independent tree is initially constructed, the feature selection probability when the independent tree is reconstructed is determined by the total score of the features, and the higher the total score is, the higher the probability is that the feature is selected as the root node of the reconstructed independent tree. By uniformly distributing and sampling the random variable U and then selecting the root node, the probability that each feature is selected can be ensured to be

. In particular, if the total score of a feature that has never been used for any preliminary outlier detection is zero, then the probability of feature selection is zero.

In step S8, performing anomaly detection according to the reconstructed independent tree and independent forest, including:

the final anomaly score is calculated by the following formula:

；

wherein,

a final anomaly score is indicated which is indicative of,

；

indicating the number of samples.

As a preferred embodiment, the samples with final anomaly scores greater than 0.9 according to the above formula are labeled as final anomaly points. In the independent tree and the independent forest obtained by probability reconstruction according to the feature selection, the proportion of the features playing more roles in primary abnormal point detection in the root node is improved, so that the accuracy rate of abnormal detection by adopting the reconstructed independent tree and the independent forest is higher.

Referring to fig. 3, in some embodiments, an intelligent operation and maintenance anomaly detection device is provided, including:

the data processing module 101 is configured to acquire operation and maintenance data and perform dimension reduction processing to obtain a sample of the operation and maintenance data;

a preliminary forest establishment module 102, configured to establish an independent tree according to the sample and form an independent forest;

a preliminary judgment module 103, configured to calculate a preliminary abnormal score of each sample according to the independent tree and the independent forest, and mark a sample with the preliminary abnormal score larger than a preset value as a preliminary abnormal point;

a marking module 104 for marking a portion of the positive sample;

an identification module 105 for identifying the valid tree according to the marked preliminary abnormal point;

a total score calculating module 106, configured to assign a score to the feature of the identified preliminary outlier in the effective tree, and calculate a total score according to the number of the independent trees and the number of the marked positive samples, where the preliminary outlier is identified;

a reconstruction module 107, configured to calculate a feature selection probability according to the total score and reconstruct an independent tree and an independent forest;

and an anomaly detection module 108, configured to perform anomaly detection according to the reconstructed independent tree and the independent forest.

Wherein the data processing module 101 is further configured to:

forming a matrix by each operation and maintenance data according to columns;

zero-averaging each row of the matrix;

solving a covariance matrix of the matrix after zero-mean processing;

solving eigenvalues and corresponding characteristics of the covariance matrix;

The preliminary forest establishment module 102 is further configured to establish an independent tree according to the sample and form an independent forest, including:

randomly selecting a feature as a root node;

dividing samples into two groups and respectively entering the two sub-nodes;

and forming an independent forest by using the independent trees generated by the different characteristics as root nodes.

In the preliminary judgment module 103, the preliminary abnormal score of each sample is calculated by the following formula:

；

wherein,

；

indicating the number of samples.

The identification module 105 is further configured to:

and determining the independent tree in which the preliminary abnormal point is identified when the path does not exceed the preset value as a valid tree.

In the total score calculating module 106, the total score value is calculated by the following formula:

；

；

wherein,

the sum of the scores representing the relevant features of the outlier P,

representing the total score and n representing the number of positive samples of the token.

In the reconstruction module 107, the feature selection probability is calculated by the following formula:

；

wherein,

the m-th feature selection probability is shown,

the total score is represented as a function of the total score,

the mth feature is shown.

The reconstruction module 107 is further configured to:

selecting the ith characteristic

As a root node, the characteristics

Satisfies the following conditions:

wherein

representing the mth characteristic selection probability;

dividing samples into two groups and respectively entering the two sub-nodes;

and the independent trees generated by taking different characteristics as root nodes form an independent forest again.

The anomaly detection module 108 is further configured to:

calculating the final abnormal score of each sample according to the reconstructed independent tree and the reconstructed independent forest, and marking the samples with the final abnormal scores larger than a preset value as abnormal points;

the final anomaly score is calculated by the following formula:

；

wherein,

a final anomaly score is indicated which is indicative of,

represents the path length of the leaf node of the sample p in a recombined independent tree,

；

indicating the number of samples.

Referring to fig. 4, in some embodiments, an electronic device is provided, which includes a processor 1 and a storage 2, where the storage 2 stores a plurality of instructions, and the processor 1 is configured to read the plurality of instructions and execute the method.

According to the intelligent operation and maintenance anomaly detection method and device, operation and maintenance data are subjected to dimensionality reduction before anomaly detection, sample data applied to anomaly detection is simplified, operation time is saved, and the working efficiency of an anomaly detection algorithm is improved; the method has the advantages that the positive samples are marked in an artificial marking mode, and a marked supervised learning mode is added into an unsupervised independent forest algorithm, so that the advantages of the two algorithms can be combined, the accuracy of the algorithm is improved, and the efficiency of the algorithm is guaranteed; assigning scores to all the characteristics involved in the samples through a plurality of effective trees of a plurality of positive samples, calculating the total score of the characteristics to describe the action size of each characteristic in the abnormal detection process, and taking the total score as a basis for selecting root nodes when the independent trees are reconstructed, so that the identification accuracy of the reconstructed independent forests is improved; root node selection is performed by uniformly distributing and sampling random variables, the probability that each feature is selected can be guaranteed as the feature selection probability, and therefore the accuracy of reconstructing the independent forest is guaranteed.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An intelligent operation and maintenance abnormity detection method is characterized by comprising the following steps:

marking part of the positive sample;

carrying out anomaly detection according to the reconstructed independent trees and independent forests;

calculating feature selection probability according to the total score and reconstructing an independent tree and an independent forest, wherein the method comprises the following steps:

selecting the ith characteristic

As a root node, the characteristics

Satisfies the following conditions:

wherein

representing the mth characteristic selection probability;

dividing samples into two groups, and respectively entering the two sub-nodes;

2. The method of claim 1, wherein collecting operation and maintenance data and performing dimension reduction processing comprises:

forming a matrix by each operation and maintenance data according to columns;

zero-averaging each row of the matrix;

solving a covariance matrix of the matrix after zero-mean processing;

solving eigenvalues and corresponding characteristics of the covariance matrix;

3. The method of claim 2, wherein building independent trees from the samples and composing independent forests comprises:

randomly selecting a feature as a root node;

dividing samples into two groups and respectively entering the two sub-nodes;

4. The method of claim 1, wherein the preliminary anomaly score for each sample is calculated by the formula:

；

wherein,

representing the initial abnormal score, L (p) represents the path length of a leaf node where the sample p is located in an independent tree, and E (L (p)) represents the average value of the path lengths of each independent tree of the sample p in the independent forest;

；

indicating the number of samples.

5. The method of claim 4, wherein identifying valid trees from the labeled preliminary outliers comprises:

6. The method of claim 4, wherein the total score is calculated by the formula:

；

；

wherein,

7. The method of claim 1, wherein anomaly detection is performed based on the reconstructed independent trees and independent forests, comprising:

the final anomaly score is calculated by the following formula:

；

wherein,

a final anomaly score is indicated which is indicative of,

；

indicating the number of samples.

8. An intelligent operation and maintenance anomaly detection device applied to the method of any one of claims 1-7, comprising:

the marking module is used for marking part of positive samples;

the anomaly detection module is used for carrying out anomaly detection according to the reconstructed independent tree and the independent forest;

the reconstruction module is further configured to:

selecting the ith characteristic

As root node, the characteristics

Satisfies the following conditions:

wherein

representing the mth feature selection probability;

dividing samples into two groups, and respectively entering the two sub-nodes;

the following steps are repeatedly executed until the path reaches the preset length or the child node only contains one sample: randomly selecting a characteristic value of a characteristic vector from each child node as a division basis to divide the two child nodes again, and dividing the rest samples into two groups again to enter the two child nodes;