CN107682319A - A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor - Google Patents
A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor Download PDFInfo
- Publication number
- CN107682319A CN107682319A CN201710823063.0A CN201710823063A CN107682319A CN 107682319 A CN107682319 A CN 107682319A CN 201710823063 A CN201710823063 A CN 201710823063A CN 107682319 A CN107682319 A CN 107682319A
- Authority
- CN
- China
- Prior art keywords
- data
- cluster
- point
- factor
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Abstract
A kind of method for disclosing data flow anomaly detection based on enhanced angle Outlier factor and multiple-authentication, it is characterized in that, comprise the following steps:1) real-time stream is handled;2) data set S in sliding window is set;3) initiation parameter k, r, ξ;4) distance matrix dist is obtained;5) r neighborhood point sets are obtained;6) angular factors of r neighborhood point sets are obtainedAnd local density7) distinctiveness ratio is obtained;8) the cluster heart factor of each data point is obtained;9) ownership matrix is obtained;10) determine the cluster heart and cluster;11) abnormality detection is carried out respectively to each cluster after cluster;12) multiple-authentication.This approach application sliding window and basic window technology, construct efficient data Stream Processing Model, and occupancy, the real-time for reducing internal memory are good, abnormality detection accuracy rate is high, time complexity is low.
Description
Technical Field
The invention relates to data flow anomaly detection and data clustering, in particular to a data flow anomaly detection and multiple verification method based on enhanced angle anomaly factors.
Background
The rapid development of network technology and the continuous improvement of social informatization lead to the explosive increase of information quantity, so that various industries generate massive, high-speed and dynamic stream data, such as network intrusion monitoring, commercial transaction management and analysis, video monitoring, sensor network monitoring and the like. Due to the characteristics of real-time infinite dynamic data flow and the like, the traditional static data anomaly detection method cannot accurately and effectively analyze and process the large-scale dynamically-increased flow data, so that the construction of a real-time effective anomaly detection method suitable for the data flow becomes particularly important.
For the practical problems faced by different stages, different data stream anomaly detection methods are provided by scientific and technological workers. The conventional data flow anomaly detection methods can be roughly classified into density-based data flow anomaly detection methods, angle-based data flow anomaly detection methods, and cluster-based data flow anomaly detection methods. The density-based data flow anomaly detection method applies density as the most basic anomaly measurement mode and constructs an anomaly factor which can be dynamically updated and is used for measuring the data anomaly degree, pokrajac et al quotes a static data anomaly detection method LOF into a data flow and researches an incremental local anomaly detection method INCLOF which can be applied to the dynamic data flow, and the INCLOF deletes historical data and dynamically updates the anomaly factor of each data point along with the insertion of new data; the method of improving INCLOF by Ke Gao et al introduces the idea of sliding window, and proposes an n-INCLOF method, wherein the n-INCLOF method only updates the abnormal factors of each data object in the sliding window at the current moment; in some cases, some data points are abnormal at a certain moment, but are not abnormal at the next moment, based on the problem, karimian S H et al proposes an I-IncLOF method, the I-IncLOF method introduces a multiple verification idea, the I-IncLOF method judges data objects which are always abnormal in the whole sliding process of a window as abnormal points, the I-IncLOF method greatly reduces the misjudgment rate, but the I-IncLOF method is poor in effectiveness under the multidimensional condition; xinjie Lu et al proposed an INCLOCI method, which introduced a multi-granularity anomaly factor MDEF, and which was able to detect not only scattered outliers but also abnormal clusters. In order to solve the problem that the effectiveness of similarity measurement modes such as distance and density is reduced in a high-dimensional data space, some scientific researchers provide angle measurement modes, the basic idea of the angle similarity measurement is that the angle formed by an abnormal point and other points is generally small and the fluctuation range is small, and the angle formed by a conventional point and other points is large and small and the fluctuation range is large, HPKriegel et al provide an angle-based anomaly detection method ABOD, the ABOD method takes the variance of the angle as an anomaly factor ABOF for measuring the anomaly degree of a data point, and the ABOD method still has high detection accuracy in the high-dimensional space; yeH provides an angle-based data stream anomaly detection method DSABOD, the DSABOD dynamically updates an anomaly factor of each data point relative to a neighborhood point of the data point along with the continuous flow of the data point of the data stream into a memory, the DSABOD provides a new idea for anomaly detection in a high-dimensional data stream, but the traditional angle-based data stream anomaly detection method has the problem of low anomaly detection rate. The data flow abnormity detection method based on clustering comprises two stages of clustering data points and carrying out abnormity detection on the data points in each cluster, elahi M and the like provide a data flow abnormity detection method based on clustering, a method for combining K-Means and LOF is adopted, abnormity factors are defined by regions in the method, and the abnormity detection accuracy of the method is improved; thakran Y et al also propose a method of combining DBSCAN method with W-K-Means method, this method applies DBSCAN method to carry on clustering and getting candidate abnormal point and initial cluster to the data block of the present moment, this method combines candidate abnormal point to be multiple-verified that the previous moment got, apply W-K-Means method to carry on clustering again, get candidate abnormal point and conventional point cluster of the present moment, this method adopts multiple verification to delete the abnormal point release memory of erroneous judgement to candidate abnormal point at the same time, this method adjusts attribute weight of parameter MinPts, epsilon, W-K-Means method that DBSCAN method needs dynamically in the whole course, this method is higher to the accuracy of the abnormal detection, but the necessary artificial parameter is set for too much, the artificial intervention is serious, the complexity of the method is higher, and the validity of this method in the multidimensional space is worse.
Data flow abnormity detection is a research hotspot and difficulty in the field of data mining nowadays, and the main aim is to accurately detect information which does not conform to a conventional mode in real time from a complex data environment which is dynamically changed.
Disclosure of Invention
The invention provides a data flow anomaly detection and multi-verification method based on enhanced angle anomaly factors, which aims at the problems of high time complexity, large memory occupation, low use efficiency, excessive manual parameter intervention, low effectiveness in a multi-dimensional data environment and the like of a traditional method. The method can reduce the occupancy rate of the memory, and has good real-time performance, high accuracy rate of abnormal detection and low time complexity.
The technical scheme for realizing the purpose of the invention is as follows:
a method for data flow abnormity detection and multiple verification based on an enhanced angle abnormity factor comprises the following steps:
1) Processing the real-time data stream: processing various real-time data streams acquired by a data acquisition terminal;
2) Setting a data set S in a sliding window: step 1) processing to obtain a data set S in the current sliding window, and setting S = { X = 1 ,X 2 ,…,X n N data points, each data point being represented by its attribute asFor subsequent clustering and anomaly detection;
3) Initialization parameters k, r, ξ: setting initialization parameters, wherein k represents the number of k nearest neighbors of a data point, r is the spatial neighborhood radius of the data point, ξ is an anomaly decision threshold adjustment coefficient, and an anomaly decision threshold theta = mu + ξ · δ, wherein mu and δ correspond to the mean value and standard deviation of all data point enhanced angle anomaly factors;
4) Obtaining a distance matrix dist: calculating the distances between all data points by combining the data set S in the step 2), and obtaining an n × n distance matrix dist, dist = [ d ] ij ] n×n The calculation formula is formula (1):
5) Obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all circled data points at the point by taking the neighborhood radius r as the radius;
6) Obtaining r neighborhood point setAngle factor ofAnd local densityObtaining an angle factor of the r neighborhood point set by combining the distance matrix distAnd local density of r neighborhood point set
7) Obtaining a dissimilarity degree delta (x) i ): according to the local density of the r neighborhood point set obtained in the step 6)After sorting, the corresponding dissimilarity degree delta (x) is calculated i );
8) Obtaining a cluster heart factor τ (x) for each data point i ): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x) i ) The calculation formula is formula (5):cluster heart factor tau (x) i ) To measure how well the data points are at the cluster center;
9) Acquiring an attribution matrix: sorting all data point cluster heart factors obtained in the step 8) in a descending way to obtain tau (p) 1 )≥τ(p 2 )≥…≥τ(p n ) So as to obtain a home matrix F = [ F ] for clustering 1 ,f 2 ,…,f n ];
10 Determine cluster centers and cluster: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, and forming a set, namely a cluster, by using all data points with the same class label to obtain m (m = C) center_id ) An individual cluster C 1 ,C 2 ,…,C m Finishing clustering on the data set S;
11 Differentiating the clustered clusters respectivelyFrequently detecting: obtaining each cluster C in step 10) i (i =1,2,l, m), each cluster C in the clustered data set S is first aligned 1 ,C 2 ,…,C m Respectively carrying out anomaly detection to obtain a cluster of anomaly point set O i Finally, all abnormal point sets O = { O } in the data set S are obtained 1 ,…,O m The formula involved in anomaly detection is: intra cluster angle factorIs formula (7):
local delta value H (X) j ) Is formula (8):
distance sum of k nearest neighbors L (X) j ) As in equation (9):
wherein the content of the first and second substances,represents the data point X j K neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X) j ) Is formula (10):
wherein o is the data point X j Cluster center of the cluster, dist (o, X) j ) Is a data point X j The distance from the cluster center of the cluster,represents a cluster C i (i =1,2,l, m) the angular factor, H (X), of each data point within the cluster relative to the cluster j ) Is a local delta value;
12 Multiple validation: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process, so that the accuracy rate of abnormal detection can be increased.
The processing in the step 1) means that the data acquired by the data acquisition terminal is cached in a stream form, and the cached data is divided into E 0 ,E 1 ,E 2 The method comprises the following steps of (i) \8230; \8230, data blocks, wherein each data block represents a basic window, each sliding window W comprises epsilon (epsilon = 2) basic windows, and the insertion and deletion of data are realized by combining the basic window and the sliding window, wherein the process of combining the basic window and the sliding window is as follows: at T i Time of day transition to T i+1 At the moment, the sliding window is formed by W i Slide to W i+1 Accompanied by a new basic window E i+1 Merge and History base Window of E i-1 While removing T i Time W i Incorporation of detected candidate outliers into W i+1 In (3) performing multiple validations.
The angle factor calculation formula of the r neighborhood point set in the step 6) is a formula (2):
the local density calculation formula of the r neighborhood point set in the step 6) is a formula (3):
the local density is related to the number of the neighborhood data points and the position of the neighborhood data points, and the more the number of the neighborhood data points is, the more the neighborhood data points are positioned in the center of the data set, the larger the local density is.
Dissimilarity δ (x) described in step 7) i ) The local densities of all data points are sorted in descending order, and the dissimilarity degree delta (x) i ) The calculation formula of (2) is formula (4):
home matrix F = [ F) described in step 9) 1 ,f 2 ,…,f n ]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
wherein, { p i Denotes the cluster heart factor τ (x) i ) And descending the sorted original subscript sequence numbers.
The data flow abnormity detection method is divided into 2 processes, namely a data flow processing process and a data flow abnormity detection process. In the data flow processing process, dynamic data flow is converted into static data blocks, so that subsequent abnormal detection is facilitated, and the real-time performance and the high efficiency of the whole detection are ensured; the data flow abnormity detection process is used for carrying out abnormity detection on the static data set processed in the data flow processing process, and in order to improve the abnormity detection accuracy, a method of clustering firstly and then carrying out abnormity detection is adopted. In the technical scheme, the real-time data stream processing method combining the sliding window and the basic window is the core of the data stream processing process, the memory occupancy rate is reduced, the quality of subsequent abnormal detection is improved, the cluster center factor and the attribution matrix are two parameters which are newly introduced in the technical scheme and used for determining the cluster center and clustering, the cluster center of the multidimensional data space can be rapidly and effectively determined, and the clustering is accurately performed according to the determined cluster center; the enhanced angle anomaly factor is another important parameter in the technical scheme, makes up for partial defects of the traditional anomaly factor, retains the effectiveness of an angle measurement mode in a multi-dimensional space, and is the core of an anomaly detection part.
The method applies sliding window and basic window technologies, constructs an efficient data stream processing model, reduces the occupancy rate of the memory, and has good real-time performance, high accuracy of abnormal detection and low time complexity.
Drawings
FIG. 1 is a schematic flow chart of the method in the example;
FIG. 2 shows example t 1 A schematic diagram of a data point distribution diagram in a time sliding window;
FIG. 3 shows example t 2 A schematic diagram of a data point distribution diagram in a time sliding window;
FIG. 4 is a diagram illustrating the combination of the sliding window and the base window to process the real-time data stream and the multiple verification processes in one embodiment;
FIG. 5 is a graph showing an angular measure of data points in an embodiment;
FIG. 6 is a schematic diagram illustrating a data point distribution of the U-shaped cluster data based on the conventional angle measurement method in the embodiment;
FIG. 7 is a schematic diagram illustrating a data point distribution of multi-cluster data misjudged based on a conventional angle measurement method in an embodiment;
FIG. 8 is a diagram illustrating the distribution of original coordinates of a data set in an embodiment;
FIG. 9 is a schematic diagram showing a local density-degree of dissimilarity distribution in the example;
FIG. 10 is a diagram showing the distribution of the cluster cofactors in the example;
FIG. 11a is a schematic diagram of the distribution of the data set 1 in the example;
FIG. 11b is a diagram showing the distribution of outliers in the data set 1 according to the example;
FIG. 11c is a schematic diagram showing the abnormal point identifiers detected by the abnormal detection of the data set 1 in the embodiment;
FIG. 11d is a schematic diagram illustrating the data set 1 shown in the embodiment where the abnormal detection has falsely detected a normal point as an abnormal point identifier;
FIG. 12a is a schematic diagram of the distribution of the data set 2 in the example;
FIG. 12b is a diagram illustrating the distribution of the data set 2 in the embodiment;
FIG. 12c is a schematic diagram showing the identification of an abnormal point detected by the abnormal detection of the data set 2 in the embodiment;
FIG. 12d is a diagram illustrating the abnormal point is detected as the normal point by the abnormal detection of the data set 2 in the embodiment.
Detailed Description
The invention will be further illustrated, but not limited, by the following description of the embodiments with reference to the accompanying drawings.
Referring to fig. 1, a method for data stream anomaly detection and multi-verification based on enhanced angle anomaly factors includes the following steps:
1) Processing the real-time data stream: processing various real-time data streams acquired by a data acquisition terminal, wherein the real-time data streams have dynamic and changeable characteristics, and some data objects are represented as abnormal in a current sliding window but are represented as normal points in a sliding window at the next moment, as shown in fig. 2 and 3, and t is t in fig. 2 1 A profile of the time-of-day sliding-window data points, where point P 'appears abnormal, but as data points continue to flow in, more and more data points accumulate around point P', fig. 3, t 2 The distribution diagram of the data points of the time sliding window shows that the point P' is normal at the time;
2) Setting a data set S in a sliding window: step 1), processing to obtain a data set S in a current sliding window: let S = { X 1 ,X 2 ,…,X n N data points, each data point being represented by its attributeFor subsequent clustering and anomaly detection;
3) Initialization parameters k, r, ξ: setting initialization parameters, wherein k represents the number of k nearest neighbors of a data point, r is the radius of a spatial neighborhood of the data point, ξ is an anomaly decision threshold adjustment coefficient, and an anomaly decision threshold theta = mu + xi × δ, wherein mu and δ correspond to the mean value and standard deviation of all data point enhanced angle anomaly factors;
4) Obtaining a distance matrix dist: calculating the distances between all data points by combining the data set S in the step 2), and obtaining an n × n distance matrix dist, dist = [ d ] ij ] n×n The calculation formula is formula (1):
5) Obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all circled data points at the point by taking the neighborhood radius r as the radius;
6) Obtaining an angle factor of a r neighborhood point setAnd local densityObtaining an angle factor of the r neighborhood point set by combining the distance matrix distAnd local density of r neighborhood point setAs shown in FIG. 5, the method is based on the angle measurement idea, which calculates the angle between the data point and each other pair of data points, and then takes the variance to find the core region point A 1 The angle change range formed by the point pair and other points is large, so the variance is large; for anomaly point A 3 The angle change range formed by the point pair and other point pairs is very small, so the variance is small; and for the boundary point A 2 The angle between it and other point pairs is in the range of A 1 And A 3 The variance is between the range of variation, so the variance is between the core region point and the outlier, but this has some defects, as shown in fig. 6 and 7, the outlier B in fig. 6 1 Located at the center of the U-shaped cluster, and the angle formed by the U-shaped cluster and the surrounding point pair is wide in change range, namely the variance is large, and the edge point B is 2 The angle change range formed by the point pairs and other point pairs is small, namely the variance is small; similarly, the abnormal point D in FIG. 7 1 Located in the middle of the two clusters, the angle formed by the point pair between the point and the two clusters is wide, and the edge point D 2 The angle change range formed by the point pairs is smaller; the obtained result is just opposite to the actual result, and missing and misjudgment occur;
7) Obtaining a dissimilarity degree delta (x) i ): according to the local density of the r neighborhood point set obtained in the step 6)After sorting, the corresponding dissimilarity δ (x) is calculated i );
8) Obtaining a cluster heart factor τ (x) for each data point i ): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x) i ) The calculation formula is formula (5):cluster heart factor τ (x) i ) The method is used for measuring the degree of a data point in a cluster center, the cluster center factor is an improved parameter factor for quickly and effectively determining the cluster center of a multidimensional data space in the embodiment method, and is a crucial step in clustering, the implementation process is shown in fig. 8, 9 and 10, and it can be seen that the data set is composed of two clusters, wherein a point 13 and a point 25 are the cluster centers of the two clusters respectively; fig. 9 is a graph showing ρ - δ (local density-dissimilarity) distributions of points in the data set obtained by the equations (3) and (4), and it can be seen that the local densities and dissimilarities of the points 13 and 25 are large; FIG. 10 is a distribution diagram of the points sorted by descending cluster center factors according to equation (5), and it can be seen that the cluster center factors of the points 13 and 25 are the largest and thus most likely to be the cluster centers;
9) Acquiring an attribution matrix: sorting all data point cluster heart factors obtained in the step 8) in a descending way to obtain tau (p) 1 )≥τ(p 2 )≥…≥τ(p n ) To obtain a membership moment for clusteringArray F = [ F = [ ] 1 ,f 2 ,…,f n ];
10 Determine cluster centers and cluster: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, and forming a set, namely a cluster, by using all data points with the same class label to obtain m (m = C) center_id ) An individual cluster C 1 ,C 2 ,…,C m Completing the clustering of the data set S;
11 Anomaly detection is performed on each clustered cluster: obtaining each cluster C in the step 10) i (i =1,2,l, m), each cluster C in the clustered data set S is first aligned 1 ,C 2 ,…,C m Respectively carrying out anomaly detection to obtain a cluster of anomaly point set O i Finally, all abnormal point sets O = { O } in the data set S are obtained 1 ,…,O m The formula involved in anomaly detection is: intra cluster angle factorIs formula (7):
local delta value H (X) j ) As in equation (8):
distance sum of k nearest neighbors L (X) j ) Is formula (9):
wherein the content of the first and second substances,represents the data point X j K neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X) j ) Is formula (10):
wherein o is the data point X j Cluster center of the cluster, dist (o, X) j ) Is a data point X j The distance from the cluster center of the cluster,is represented by C i (i =1,2,l, m) the angular factor, H (X), of each data point within a cluster relative to the cluster j ) Is a local delta value;
12 Multiple validation: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process, so that the effect of the accuracy rate of abnormal detection can be improved.
The processing in the step 1) means that the data acquired by the data acquisition terminal is cached in a stream form, and the cached data is divided into E 0 ,E 1 ,E 2 The method comprises the following steps of (i) \8230; \8230, data blocks, wherein each data block represents a basic window, each sliding window W comprises epsilon (epsilon = 2) basic windows, the insertion and deletion of data are realized by adopting the combination of the basic window and the sliding window, and the process of combining the basic window and the sliding window is shown in FIG. 4: at T i Time of day transition to T i+1 At the moment, the sliding window is formed by W i Slide to W i+1 With a new basic window E i+1 Merge and History base Window of E i-1 While removing T i Time W i Incorporation of detected candidate outliers into W i+1 In (3) performing multiple validations.
The angle factor calculation formula of the r neighborhood point set in the step 6) is a formula (2):
the local density calculation formula of the r neighborhood point set in the step 6) is a formula (3):
the local density is related to the number of the neighborhood data points and the position of the neighborhood data points, and the more the number of the neighborhood data points is, the more the neighborhood data points are positioned in the center of the data set, the larger the local density is.
Dissimilarity δ (x) described in step 7) i ) The local densities of all data points are sorted in a descending order to obtain the dissimilarity delta (x) i ) Is the formula (4): the dissimilarity is a measure of the probability of different clusters between data points, and is obtained by sorting the local densities obtained in step 6) in descending order from a given data set SWherein, { p i Denotes local densityA descending original subscript number, d (p) i ,p j ) Representing a data point p i And p j The Euclidean distance between them, a certain data point p i The degree of dissimilarity of (c) can be defined as follows:
the home matrix F = [ F ] described in step 9) 1 ,f 2 ,…,f n ]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
wherein, { p i Denotes the cluster heart factor τ (x) i ) Sort in descending orderThe latter original subscript number.
The step 10) of determining the cluster centers and clustering refers to that the serial number of the cluster centers is defined as C center_id Data points are labeled as C cluster_label And initializes the cluster core number to 1, i.e., C center_id =1; the data point with the largest cluster center factor obtained in step 8) is also labeled with 1, i.e.Then according to the descending subscript serial number { p) obtained in the step 8) i Fourthly, the condition traversal is carried out on the whole data set S, if yes, the condition traversal is carried outAndthe distances of all points satisfy(wherein r is the initial parameter value neighborhood radius), redefining the point as a new cluster center, increasing the class label of the point by 1, and accordingly obtaining all cluster centers; then, according to the obtained cluster center, the attribution matrix F = [ F ] in the step 9) is reused 1 ,f 2 ,…,f n ]The same label (i.e. class label) is attached to the points belonging to the same cluster center by the following method: by the descending subscript number { p) obtained in step 9) i Fourthly, the condition traversal is carried out on the whole data set S, if p is i Non-clustered centers, based on the home matrixCorresponding label is assigned to p i Otherwise p i The label of (1) is itself, and all data points with the same class label are finally grouped into a set, i.e. a cluster, to obtain m (m = C) center_id ) An individual cluster C 1 ,C 2 ,…,C m And finishing clustering the data set S.
Step 11) is to perform anomaly detection on each clustered cluster, and the anomaly detection specifically includes the following steps:
(1) for arbitrary cluster C i (i =1,2,l, m), calculating an angle factor for each data point within the cluster relative to the cluster
As in equation (7):
wherein, C i (i =1,2,l,m) represents an arbitrary cluster after clustering;
(2) computing a local increment value H (X) in the neighborhood of each data point in the cluster with respect to its space r j ) As in equation (8):
the local increment is to reflect the density of the data points within the spatial neighborhood of the cluster to which the data points belong, wherein,data points X are represented j In the r neighborhood of its clusterNumber of data points in
(3) Calculating the distance dist (o, X) between each data point and the cluster center of the cluster according to the cluster centers confirmed in the step 10) j );
(4) Calculate the distance sum L (X) of each data point from its k nearest neighbors j ) As in equation (9):
wherein, the first and the second end of the pipe are connected with each other,represents the data point X j K neighborhoods consisting of k nearest neighbors in the cluster to which the neighbor belongs, and the sum of distances L (X) of the k nearest neighbors j ) Reflecting how far and near the data point is from the surrounding data points, so as to avoid the angle-based abnormality factor appearing similarly to B in FIG. 6 1 The presence of defects;
(5) computing an enhanced angular anomaly factor EAOF (X) for each data point j ) Is formula (10):
wherein o is the data point X j Cluster center of the cluster, dist (o, X) j ) Is a data point X j Distance from its cluster center, V Ci (X j ) Is represented by C i (i =1,2,l, m) the angular factor, H (X), of each data point within a cluster relative to the cluster j ) Is a local delta value; the enhanced angle anomaly factor EAOF not only has excellent measurement performance of an angle measurement mode in a multi-dimensional space, but also introduces the ideas of distance and density, and makes up the defects of the traditional angle anomaly factor-based method;
(6) calculating the mean value mu and the standard deviation delta of all the data point enhanced angle abnormal factors obtained in the step (5), and calculating an abnormal judgment threshold theta by using the mean value and the standard deviation, wherein theta = mu + xi · delta, and xi is an initially set abnormal judgment threshold adjustment coefficient;
(7) enhancing each point obtained in (5) by an angle anomaly factor EAOF (X) j ) Comparing the judgment threshold value theta obtained in the step (6), and if the judgment threshold value theta meets EAOF (X) j )&G, theta, marking the point as a candidate abnormal object in the cluster, and storing the candidate abnormal point set O of the cluster i In (1).
The embodiment provides a data stream anomaly detection and multiple verification method based on enhanced angle anomaly factors, which adopts a technology of combining a sliding window and a basic window, constructs a high-efficiency real-time data stream processing technology, and introduces the enhanced angle anomaly factors, thereby solving the problems of high memory occupancy rate and low data processing efficiency of the traditional method, and simultaneously ensuring the advantages of high real-time performance, high anomaly detection accuracy and low time complexity.
In order to verify the effectiveness of the method of the present embodiment, the following will be further explained by comparing the simulation results:
in this embodiment, verification is performed on both a manually generated data set and a real data set, and the verification is compared with a weighted clustering-based data flow unsupervised anomaly detection method (abbreviated as method I) proposed by the traditional methods I-IncLOF, thakran and the like, experimental data set information is shown in table 1, table 1 is experimental data set information, and the three data sets are data sets with different dimensions, different data amounts and different data characteristics.
The data distribution of the artificial data set 1 is shown in FIG. 11a, which has 1615 data points in total, and consists of 5 clusters and 15 discrete points, wherein the cluster 1 is a Gaussian distribution N 1 (u 1 ,∑ 1 ) The 500 data points generated are composed, and the cluster 2 is a Gaussian distribution N 2 (u 2 ,∑ 2 ) The 500 data points generated are composed, and the cluster 3 is a Gaussian distribution N 3 (u 3 ,∑ 3 ) 500 data points are generated, and the cluster 4 and the cluster 5 are respectively composed of Gaussian distribution N 4 (u 4 ,∑ 4 ) And N 5 (u 5 ,∑ 5 ) 50 data points generated are composed, and N is 4 And N 5 The number of data points is very small and is therefore considered an outlier cluster. Meanwhile, according to the distribution characteristics of the data set, 15 discrete abnormal points are randomly generated, so the data set contains 115 abnormal points in total, the distribution situation is shown in fig. 11b, the abnormal points are marked by circles, in the experimental process, the abnormal clusters and the discrete abnormal points are randomly mixed into the normal clusters, and the following parameters are used for generating the data set 1 by gaussian distribution:
μ 1 =[+1 +1],μ 2 =[-1 -1],μ 3 =[+1 -1],μ 4 =[-1 +1],μ 5 =[0 0]
the data distribution of the artificial data set 2 is shown in fig. 12a, and there are 860 data points, which are composed of 3 normal clusters and 1 abnormal cluster, and 48 discrete abnormal points, wherein the abnormal cluster is composed of 21 abnormal points. Therefore, the data set has 69 abnormal points, and the distribution of the abnormal points is shown in fig. 12 b.
The real data set Breast Cancer is shown in Table 1, and the data set is derived from a UCI machine learning library, comprises 699 data points, and consists of two normal clusters, wherein in order to verify the validity of the method, 34 abnormal points are added to the real data set according to statistical characteristics such as mean, variance, and the like, and are used for comparison and verification of abnormal detection.
In the verification experiment of the method of this embodiment, the length of a basic window is set to be 20, two basic windows form a sliding window, the number of nearest neighbor points k =3, the radius of a spatial neighborhood is determined as the mean value of the first 20% distance values of the descending order of the distance values between the data points in the sliding window at the current time, the adjustment coefficient of the anomaly determination threshold is 2.5, the number of times of multiple verification is 3, and meanwhile, the detection rate and the false determination rate which can most reflect the effectiveness of the anomaly detection method are selected for comparison, as shown in fig. 11a to 11d and fig. 12a to 12d, which are the visualization experiment results of the data set 1 and the data set 2.
For the artificial data set 1, as can be seen from fig. 11a to 11d, with this method, 2 abnormal clusters and 15 discrete abnormal points can be effectively detected, and the effect of zero missing detection is achieved, and as can be seen from fig. 11d, 3 normal points are mistakenly detected as abnormal points because these normal points are generated by normal gaussian distribution, but slightly far away from the normal clusters, and appear as abnormalities in 3 consecutive multiple verifications, and are therefore determined as abnormal points;
for the artificial data set 2, as can be seen from fig. 12a to 12d, the method still maintains good effectiveness in the three-dimensional data space, and as can be seen from fig. 12b, 12c, and 12d, all the points in the abnormal cluster can be detected, and 47 of the 48 discrete abnormal points are detected, and one discrete abnormal point is missed, and the reason for the missed detection is that the missed detection point is closer to the normal cluster, so that a certain time appears normal in the multi-verification, and therefore the point is determined to be the normal point.
While the effectiveness of the method of the present embodiment is verified, the method of the present embodiment is compared with a conventional method, and the advantages of the method of the present embodiment are further verified, as shown in table 2, table 2 is statistical information of experimental results, and detailed statistical results of comparative experiments on three data sets are performed on the three methods. As can be seen from table 2, the method provided by this embodiment has high detection rate, low false positive rate, and effectiveness is significantly better than the other two methods, and the superiority of the method is more significant when the dimension of the data set is higher, method I combines W-K-Means and DBSCAN methods, and dynamically updates parameters and weights of each dimension required by DBSCAN, so method I has good adaptability to dynamic data streams, but because it uses a conventional distance and density-based abnormal measurement mode, the effectiveness is reduced when the dimension increases; the I-IncLOF method is based on the idea of local density, is also influenced by dimension disasters, and has good performance when the data dimension is low, but has poor effectiveness when the dimension is increased.
Through the verification of different data sets and the comparative analysis with the traditional method, it can be seen that the method for data stream anomaly detection and multi-verification based on the enhanced angle anomaly factor provided by the embodiment has better effectiveness and feasibility.
TABLE 1
TABLE 2
Claims (6)
1. A method for data flow abnormity detection and multiple verification based on an enhanced angle abnormity factor is characterized by comprising the following steps:
1) Processing the real-time data stream: processing various real-time data streams acquired by a data acquisition terminal;
2) Setting a data set S in a sliding window: step 1), processing to obtain a data set S in the current sliding window: let S = { X 1 ,X 2 ,...,X n N data points, each data point being represented by its attribute asFor subsequent clustering and anomaly detection;
3) Initialization parameters k, r, ξ: setting initialization parameters, wherein k represents the number of k nearest neighbors of a data point, r is the spatial neighborhood radius of the data point, ξ is an anomaly decision threshold adjustment coefficient, and an anomaly decision threshold theta = mu + ξ · δ, wherein mu and δ correspond to the mean value and standard deviation of all data point enhanced angle anomaly factors;
4) Obtaining a distance matrix dist: calculating the distances between all data points by combining the data set S in the step 2), and obtaining an n × n distance matrix dist, dist = [ d ] ij ] n×n The calculation formula is formula (1):
5) Obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all data points encircled at the point by taking the neighborhood radius r as the radius;
6) Obtaining an angle factor of the r neighborhood point setAnd local densityObtaining an angle factor of the r neighborhood point set by combining the distance matrix distAnd local density of r neighborhood point set
7) Obtaining a dissimilarity degree delta (x) i ): according to the local density of the r neighborhood point set obtained in the step 6)After sorting, the corresponding dissimilarity degree delta (x) is calculated i );
8) Obtaining a cluster heart factor tau (x) for each data point i ): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x) i ) As in equation (5):cluster heart factor τ (x) i ) To measure how well a data point is in the cluster center;
9) Acquiring an attribution matrix: sorting all the data point cluster heart factors obtained in the step 8) in a descending order to obtain tau (p) 1 )≥τ(p 2 )≥...≥τ(p n ) So as to obtain a home matrix F = [ F ] for clustering 1 ,f 2 ,...,f n ];
10 Determine cluster centers and cluster: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, and forming a set, namely a cluster, by using all data points with the same class label to obtain m (m = C) center_id ) An individual cluster C 1 ,C 2 ,...,C m Finishing clustering on the data set S;
11 Respectively carrying out anomaly detection on each clustered cluster: obtaining each cluster C in step 10) i (i =1,2,l, m), each cluster C in the clustered data set S is first aligned 1 ,C 2 ,...,C m Respectively carrying out anomaly detection to obtain a set of anomaly points O of each cluster i Finally, all abnormal point sets O = { O ] in the data set S are obtained 1 ,...,O m } anomaly detection involvesThe formula of (1) is: intra cluster angle factorAs in equation (7):
local increment value H (X) j ) Is formula (8):
distance sum of k nearest neighbors L (X) j ) Is formula (9):
wherein the content of the first and second substances,represents the data point X j K neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X) j ) Is formula (10):
wherein o is the data point X j Cluster center of the cluster, dist (o, X) j ) Is a data point X j The distance from the center of its cluster is,is represented by C i (i =1,2,l, m) the angular factor, H (X), of each data point within a cluster relative to the cluster j ) Is a local delta value;
12 Multiple validation: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process. This increases the effect of the abnormality detection accuracy.
2. The method for data stream anomaly detection and multi-validation based on enhanced angle anomaly factor as claimed in claim 1, wherein said processing in step 1) means that data collected by the data collection terminal is buffered in stream form, and the buffered data is divided into E 0 ,E 1 ,E 2 A. At T i Time of day transition to T i+1 At the moment, the sliding window is formed by W i Slide to W i+1 With a new basic window E i+1 Merge and History base Window of E i-1 While removing T i Time W i Incorporation of detected candidate outliers into W i+1 In (3) performing multiple validations.
3. The method for data stream anomaly detection and multi-verification based on enhanced angle anomaly factor as claimed in claim 1, wherein the calculation formula of the angle factor of the r neighborhood point set in step 6) is formula (2):
4. the method for data stream anomaly detection and multi-verification based on enhanced angle anomaly factor as claimed in claim 1, wherein said local density calculation formula of r neighborhood point set in step 6) is formula (3):
the local density is related to the number of neighborhood data points and the position of the neighborhood data points, and the more the number of the neighborhood data points is, the more the neighborhood data points are positioned in the center of the data set, the larger the local density is.
5. The method for enhanced angle anomaly factor based data stream anomaly detection and multi-verification as claimed in claim 1, wherein said dissimilarity δ (x) in step 7) i ) The local densities of all data points are sorted in a descending order to obtain the dissimilarity delta (x) i ) The calculation formula of (2) is formula (4):
6. the method for enhanced angle anomaly factor-based data stream anomaly detection and multi-verification according to claim 1, wherein said home matrix F = [ F ] in step 9) 1 ,f 2 ,...,f n ]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
wherein, { p i Denotes the cluster heart factor τ (x) i ) And descending the sorted original subscript sequence numbers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710823063.0A CN107682319B (en) | 2017-09-13 | 2017-09-13 | Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710823063.0A CN107682319B (en) | 2017-09-13 | 2017-09-13 | Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107682319A true CN107682319A (en) | 2018-02-09 |
CN107682319B CN107682319B (en) | 2020-07-03 |
Family
ID=61136410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710823063.0A Active CN107682319B (en) | 2017-09-13 | 2017-09-13 | Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107682319B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108667684A (en) * | 2018-03-30 | 2018-10-16 | 桂林电子科技大学 | A kind of data flow anomaly detection method based on partial vector dot product density |
CN109978070A (en) * | 2019-04-03 | 2019-07-05 | 北京市天元网络技术股份有限公司 | A kind of improved K-means rejecting outliers method and device |
CN110311879A (en) * | 2018-03-20 | 2019-10-08 | 重庆邮电大学 | A kind of data flow anomaly recognition methods based on accidental projection angular distribution |
CN111125470A (en) * | 2019-12-25 | 2020-05-08 | 成都康赛信息技术有限公司 | Method for improving abnormal data mining and screening |
CN111680751A (en) * | 2020-06-09 | 2020-09-18 | 南京农业大学 | Grain yield map abnormal data detection algorithm |
CN112286951A (en) * | 2020-11-26 | 2021-01-29 | 杭州数梦工场科技有限公司 | Data detection method and device |
CN112381181A (en) * | 2020-12-11 | 2021-02-19 | 桂林电子科技大学 | Dynamic detection method for building energy consumption abnormity |
CN112800101A (en) * | 2019-11-13 | 2021-05-14 | 中国信托登记有限责任公司 | FP-growth algorithm based abnormal behavior detection method and model applying same |
CN113225391A (en) * | 2021-04-27 | 2021-08-06 | 东莞中山大学研究院 | Atmospheric environment monitoring quality monitoring method based on sliding window anomaly detection and computing equipment |
CN113537061A (en) * | 2021-07-16 | 2021-10-22 | 中天通信技术有限公司 | Format identification method, device and storage medium for two-dimensional quadrature amplitude modulation signal |
CN115271003A (en) * | 2022-09-30 | 2022-11-01 | 江苏云天新材料制造有限公司 | Abnormal data analysis method and system for automatic environment monitoring equipment |
CN116089846A (en) * | 2023-04-03 | 2023-05-09 | 北京智蚁杨帆科技有限公司 | New energy settlement data anomaly detection and early warning method based on data clustering |
CN116502169A (en) * | 2023-06-28 | 2023-07-28 | 深圳特力自动化工程有限公司 | Centrifugal dehydrator working state detection method based on data detection |
CN116628729B (en) * | 2023-07-25 | 2023-09-29 | 天津市城市规划设计研究总院有限公司 | Method and system for improving data security according to data characteristic differentiation |
CN117313957A (en) * | 2023-11-28 | 2023-12-29 | 威海华创软件有限公司 | Intelligent prediction method for production flow task amount based on big data analysis |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080253229A1 (en) * | 2007-04-16 | 2008-10-16 | Acellent Technologies, Inc. | Methods and apparatus for extracting first arrival wave packets in a structural health monitoring system |
CN103336906A (en) * | 2013-07-15 | 2013-10-02 | 哈尔滨工业大学 | Sampling GPR method of continuous anomaly detection in collecting data flow of environment sensor |
CN103974311A (en) * | 2014-05-21 | 2014-08-06 | 哈尔滨工业大学 | Condition monitoring data stream anomaly detection method based on improved gaussian process regression model |
CN104283737A (en) * | 2014-09-30 | 2015-01-14 | 杭州华为数字技术有限公司 | Data flow processing method and device |
CN104809594A (en) * | 2015-05-13 | 2015-07-29 | 中国电力科学研究院 | Distribution network data online cleaning method based on dynamic outlier detection |
CN104902509A (en) * | 2015-05-19 | 2015-09-09 | 浙江农林大学 | Abnormal data detection method based on top-k(sigma) algorithm |
CN106506556A (en) * | 2016-12-29 | 2017-03-15 | 北京神州绿盟信息安全科技股份有限公司 | A kind of network flow abnormal detecting method and device |
-
2017
- 2017-09-13 CN CN201710823063.0A patent/CN107682319B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080253229A1 (en) * | 2007-04-16 | 2008-10-16 | Acellent Technologies, Inc. | Methods and apparatus for extracting first arrival wave packets in a structural health monitoring system |
CN103336906A (en) * | 2013-07-15 | 2013-10-02 | 哈尔滨工业大学 | Sampling GPR method of continuous anomaly detection in collecting data flow of environment sensor |
CN103974311A (en) * | 2014-05-21 | 2014-08-06 | 哈尔滨工业大学 | Condition monitoring data stream anomaly detection method based on improved gaussian process regression model |
CN104283737A (en) * | 2014-09-30 | 2015-01-14 | 杭州华为数字技术有限公司 | Data flow processing method and device |
CN104809594A (en) * | 2015-05-13 | 2015-07-29 | 中国电力科学研究院 | Distribution network data online cleaning method based on dynamic outlier detection |
CN104902509A (en) * | 2015-05-19 | 2015-09-09 | 浙江农林大学 | Abnormal data detection method based on top-k(sigma) algorithm |
CN106506556A (en) * | 2016-12-29 | 2017-03-15 | 北京神州绿盟信息安全科技股份有限公司 | A kind of network flow abnormal detecting method and device |
Non-Patent Citations (1)
Title |
---|
苏晓珂: "基于聚类的异常挖掘算法研究", 《中国博士学位论文全文数据库信息科技辑》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110311879A (en) * | 2018-03-20 | 2019-10-08 | 重庆邮电大学 | A kind of data flow anomaly recognition methods based on accidental projection angular distribution |
CN110311879B (en) * | 2018-03-20 | 2022-02-22 | 重庆邮电大学 | Data flow abnormity identification method based on random projection angle distribution |
CN108667684A (en) * | 2018-03-30 | 2018-10-16 | 桂林电子科技大学 | A kind of data flow anomaly detection method based on partial vector dot product density |
CN108667684B (en) * | 2018-03-30 | 2021-04-30 | 桂林电子科技大学 | Data flow anomaly detection method based on local vector dot product density |
CN109978070A (en) * | 2019-04-03 | 2019-07-05 | 北京市天元网络技术股份有限公司 | A kind of improved K-means rejecting outliers method and device |
CN112800101A (en) * | 2019-11-13 | 2021-05-14 | 中国信托登记有限责任公司 | FP-growth algorithm based abnormal behavior detection method and model applying same |
CN111125470A (en) * | 2019-12-25 | 2020-05-08 | 成都康赛信息技术有限公司 | Method for improving abnormal data mining and screening |
CN111680751A (en) * | 2020-06-09 | 2020-09-18 | 南京农业大学 | Grain yield map abnormal data detection algorithm |
CN112286951A (en) * | 2020-11-26 | 2021-01-29 | 杭州数梦工场科技有限公司 | Data detection method and device |
CN112381181A (en) * | 2020-12-11 | 2021-02-19 | 桂林电子科技大学 | Dynamic detection method for building energy consumption abnormity |
CN112381181B (en) * | 2020-12-11 | 2022-10-04 | 桂林电子科技大学 | Dynamic detection method for building energy consumption abnormity |
CN113225391A (en) * | 2021-04-27 | 2021-08-06 | 东莞中山大学研究院 | Atmospheric environment monitoring quality monitoring method based on sliding window anomaly detection and computing equipment |
CN113225391B (en) * | 2021-04-27 | 2022-11-08 | 东莞中山大学研究院 | Atmospheric environment monitoring quality monitoring method based on sliding window anomaly detection and computing equipment |
CN113537061A (en) * | 2021-07-16 | 2021-10-22 | 中天通信技术有限公司 | Format identification method, device and storage medium for two-dimensional quadrature amplitude modulation signal |
CN113537061B (en) * | 2021-07-16 | 2024-03-26 | 中天通信技术有限公司 | Method, device and storage medium for identifying format of two-dimensional quadrature amplitude modulation signal |
CN115271003A (en) * | 2022-09-30 | 2022-11-01 | 江苏云天新材料制造有限公司 | Abnormal data analysis method and system for automatic environment monitoring equipment |
CN116089846A (en) * | 2023-04-03 | 2023-05-09 | 北京智蚁杨帆科技有限公司 | New energy settlement data anomaly detection and early warning method based on data clustering |
CN116502169A (en) * | 2023-06-28 | 2023-07-28 | 深圳特力自动化工程有限公司 | Centrifugal dehydrator working state detection method based on data detection |
CN116502169B (en) * | 2023-06-28 | 2023-08-22 | 深圳特力自动化工程有限公司 | Centrifugal dehydrator working state detection method based on data detection |
CN116628729B (en) * | 2023-07-25 | 2023-09-29 | 天津市城市规划设计研究总院有限公司 | Method and system for improving data security according to data characteristic differentiation |
CN117313957A (en) * | 2023-11-28 | 2023-12-29 | 威海华创软件有限公司 | Intelligent prediction method for production flow task amount based on big data analysis |
CN117313957B (en) * | 2023-11-28 | 2024-02-27 | 威海华创软件有限公司 | Intelligent prediction method for production flow task amount based on big data analysis |
Also Published As
Publication number | Publication date |
---|---|
CN107682319B (en) | 2020-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107682319B (en) | Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method | |
CN108667684B (en) | Data flow anomaly detection method based on local vector dot product density | |
Potamias et al. | Sampling trajectory streams with spatiotemporal criteria | |
CN115577275A (en) | Time sequence data anomaly monitoring system and method based on LOF and isolated forest | |
CN108985380B (en) | Point switch fault identification method based on cluster integration | |
CN110895526A (en) | Method for correcting data abnormity in atmosphere monitoring system | |
CN103473540B (en) | The modeling of intelligent transportation system track of vehicle increment type and online method for detecting abnormality | |
CN109977895B (en) | Wild animal video target detection method based on multi-feature map fusion | |
CN110134719B (en) | Identification and classification method for sensitive attribute of structured data | |
CN110991475A (en) | Moving object track clustering method based on multi-dimensional distance measurement | |
CN111046968B (en) | Road network track clustering analysis method based on improved DPC algorithm | |
CN110879881B (en) | Mouse track recognition method based on feature component hierarchy and semi-supervised random forest | |
CN109597757B (en) | Method for measuring similarity between software networks based on multidimensional time series entropy | |
CN106570104B (en) | Multi-partition clustering preprocessing method for stream data | |
CN111079788A (en) | K-means clustering method based on density Canopy | |
CN110751076A (en) | Vehicle detection method | |
CN110458094B (en) | Equipment classification method based on fingerprint similarity | |
CN113537321B (en) | Network flow anomaly detection method based on isolated forest and X mean value | |
CN114997276A (en) | Heterogeneous multi-source time sequence data abnormity identification method for compression molding equipment | |
CN112100435A (en) | Automatic labeling method based on edge end traffic audio and video synchronization sample | |
CN111368867B (en) | File classifying method and system and computer readable storage medium | |
Fan et al. | Adaptive crowd segmentation based on coherent motion detection | |
CN108537249B (en) | Industrial process data clustering method for density peak clustering | |
CN113128584A (en) | Mode-level unsupervised sorting method of multifunctional radar pulse sequence | |
CN106203526B (en) | Goal behavior mode online classification method based on multidimensional characteristic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |