AU2021106594A4 - Online anomaly detection method and system for streaming data - Google Patents
Online anomaly detection method and system for streaming data Download PDFInfo
- Publication number
- AU2021106594A4 AU2021106594A4 AU2021106594A AU2021106594A AU2021106594A4 AU 2021106594 A4 AU2021106594 A4 AU 2021106594A4 AU 2021106594 A AU2021106594 A AU 2021106594A AU 2021106594 A AU2021106594 A AU 2021106594A AU 2021106594 A4 AU2021106594 A4 AU 2021106594A4
- Authority
- AU
- Australia
- Prior art keywords
- matrix
- data
- hash
- model
- sketch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 50
- 239000011159 matrix material Substances 0.000 claims abstract description 180
- 230000002159 abnormal effect Effects 0.000 claims abstract description 19
- 238000004364 calculation method Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000000034 method Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 14
- 230000008676 import Effects 0.000 claims description 11
- 238000013480 data collection Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 230000017105 transposition Effects 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims 2
- 238000004806 packaging method and process Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 9
- 230000008878 coupling Effects 0.000 abstract description 5
- 238000010168 coupling process Methods 0.000 abstract description 5
- 238000005859 coupling reaction Methods 0.000 abstract description 5
- 238000007418 data mining Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 2
- 101100001670 Emericella variicolor andE gene Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/40—Data acquisition and logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/805—Real-time
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
OF THE DISCLOSURE
The present disclosure relates to the technical field of streaming data mining, and in particular,
to an online anomaly detection method and system for streaming data. The online anomaly
detection method for streaming data includes: processing a data block by using a matrix sketching
model to obtain a sketch matrix, where the data block is transmitted at a high speed; importing the
sketch matrix to a hash learning model to obtain an optimal model parameter and a feature hash
table for the current moment; and constructing an anomaly score calculation model based on the
optimal model parameter and the feature hash table, importing to-be-detected sample data to the
anomaly score calculation model for detection, and determining whether the to-be-detected sample
data is abnormal. The present disclosure uses matrix sketching and hash learning technologies. This
reduces data sizes and feature dimensions and improves a detection speed and storage efficiency. In
addition, a detection model can be updated online to adapt to dynamic changes of data distribution.
Therefore, when a large amount of high-dimensional streaming data is transmitted at a high speed,
anomalies in the streaming data can be efficiently detected in real time.
-2/2
D, ,Normal data block
I Matrix sketching model
(Matrix sketching-based sub
model)
B Sketch matrix
Hash learning model
(Hash learning-based sub
model)
Feature hash
table
Anomaly score calculations
X I model I '
(Coupling model for detecting Abnorma
anomalies in streaming data)_i I data
yt Normal data
D, , Normal data block
Matrix sketching model'
I (Matrix sketching-based sub
E model)
CO
B Sketch matrix
-0) - 1
Hash learning model
E (Hash learning-based sub
model) (D
UW H Feature
*hash table
7Anomaly s co re6-aTCufatiV]
(Coupling model for detecting Abnorma
anomalies in streaming data) -i I data
ormal data
D Normal data block
FIG. 2
Description
-2/2
D, ,Normal data block
I Matrix sketching model (Matrix sketching-based sub model)
B Sketch matrix
Hash learning model (Hash learning-based sub model)
Feature hash table Anomaly score calculations X I model I
' (Coupling model for detecting Abnorma anomalies in streaming data)_i I data
yt Normal data
D, , Normal data block
Matrix sketching model' I (Matrix sketching-based sub E model) CO B Sketch matrix -0) - 1 Hash learning model E (Hash learning-based sub (D model)
UW H Feature *hash table 7Anomaly s co re6-aTCufatiV] (Coupling model for detecting Abnorma anomalies in streaming data) -i I data
ormal data
D Normal data block
FIG. 2
[01] The present disclosure relates to the technical field of streaming data mining, and in particular, to an online anomaly detection method and system for streaming data.
[02] Streaming data (SD) is a continuous flow of sequential data that is transmitted in a large volume and at a high speed. An anomaly detection method can be used to detect anomalies in streaming data and is essential to data mining.
[03] Currently, growing requirements emerge for detecting anomalies in streaming data based on limited storage and computing resources. A key technology that is based on distance, density, incremental learning, or ensemble learning is proposed to perform online anomaly detection on a large amount of high-dimensional high-speed streaming data. In addition, various technologies that integrate incremental learning and ensemble learning are developed to reduce computing and storage overheads.
[04] However, these existing technologies are based on space division and use multiple detectors to detect anomalies in streaming data. As a result, large amounts of overheads are caused in storage and computing, which reduces the efficiency in detecting anomalies in high-dimensional streaming data. In addition, these technologies ignore encoding characteristics of streaming data. Therefore, an online anomaly detection method for streaming data is required urgently.
[05] To resolve the technical issues in the prior art, the present disclosure provides an online anomaly detection method for streaming data. The method includes: obtaining a normal data block that is transmitted at a high speed and importing data in the normal data block to an online anomaly detection model for training; importing to-be-detected sample data to the trained online anomaly detection model and then identifying whether the to-be-detected sample data is normal data; and if the to-be-detected sample data is normal data, updating the normal data block to generate a new normal data block, and using the new normal data block as training data for a next anomaly detection; or if the to-be-detected sample data is abnormal data, labeling the abnormal data; the online anomaly detection model mentioned above consists of a modified matrix sketching model, a hash learning model, and an anomaly score calculation model.
[06] An online anomaly detection system for streaming data includes a data collection module, a matrix sketching module, a hash learning module, an anomaly identification module, an identification result output module, and a model update module.
[07] The present disclosure combines a matrix sketching technology with a hash learning technology and proposes a new solution for detecting anomalies in a large amount of high-dimensional high-speed streaming data online. This facilitates online detection of anomalies in a large amount of high-dimensional high-speed streaming data and in 5G scenarios, and provides technical support for achieving ultra-high speed and performance, ultra-low latency, and ultra-high computing and storage efficiency.
[08] FIG. 1 is a structural block diagram of an online anomaly detection method for streaming data according to the present disclosure; and
[09] FIG. 2 is a technical roadmap of an online anomaly detection method for streaming data according to the present disclosure.
[10] The following clearly and completely describes the technical solutions in the embodiments of the present disclosure with reference to the accompanying drawings of the present disclosure.
[11] FIG. 1 is a structural block diagram of an online anomaly detection method for streaming data. The online anomaly detection method consists of two sub models. The upper part of FIG. 1 represents the matrix sketching-driven sub model, which is developed by the matrix sketching-based anomaly detection technology. The lower part of FIG. 1 represents the hash leaming-driven sub model, which is constructed by the hash learning-based anomaly detection technology. The two sub models are bidirectionally connected by a coupling operator that can providing flexibility to represent various forms of two interacting sub models. Data is imported to and processed by the sub models. Then, normal data and abnormal data are obtained. +1 represents streaming data that is imported at a t+1 moment. +1 +-1 represents normal data and abnormal data that are detected in real time by the sub models at a t +1 moment respectively.
[12] In the present disclosure, a large amount of high-dimensional high-speed streaming data is abstracted as an ever-increasing dynamic data set in which data is continuously generated with time, SD={D,6 Rdxn t=1,2,°°..} namely, 1 ' ' ',. D represents a normal data block that is transmitted at a high speed at a t moment. d and " represent a feature space dimension and sample data size of the data block D,, respectively.
[13] The online anomaly detection method for streaming data includes the following steps: Obtain a normal data block that is transmitted at a high speed and import data in the normal data block to an online anomaly detection model for training. Import to-be-detected sample data to the trained online anomaly detection model and identify whether the to-be-detected sample data is normal data. If the to-be-detected sample data is normal data, update the normal data to generate a new normal data block, and use the new normal data block as training data for a next anomaly detection. If the to-be-detected sample data is abnormal data, label the abnormal data. The online anomaly detection model includes a modified matrix sketching model, a hash learning model, and an anomaly score calculation model.
[14] In an implementation of the online anomaly detection method for streaming data, the following steps are included: Obtain the normal data block that is transmitted at a high speed. Process the normal data block by using the modified matrix sketching model to obtain a sketch matrix. Import the sketch matrix to the hash learning model and optimize the sketch matrix by using a hash objective function, to obtain an optimal model parameter and a feature hash table H, for the current moment. Obtain to-be-detected sample data of a next moment and import the to-be-detected sample data and the feature hash table H to the anomaly score calculation model, to calculate an anomaly score for the to-be-detected sample data. Specify an anomaly score threshold (whose default value is 0.5) and compare the anomaly score of the to-be-detected sample data with the anomaly score threshold. If the calculated anomaly score is greater than the specified anomaly score threshold, the to-be-detected sample data is abnormal data. If the calculated anomaly score is less than or equal to the specified anomaly score threshold, the to-be-detected sample data is normal data.
[15] In a preferred embodiment, the to-be-detected sample data is imported to the trained online anomaly detection model for detection, as shown in FIG. 2. This process includes the following steps:
[16] SI: Import the data in the normal data block to the modified matrix sketching model to obtain a sketch matrix.
[17] S2: Import the sketch matrix to the hash learning model, optimize the sketch matrix by using a hash objective function to obtain an optimal model parameter , and then obtain a hash projection matrix based on the optimal model parameter.
[18] S3: Map the sketch matrix by using the hash projection matrix to obtain a feature hash table H
[19] S4: Obtain the to-be-detected sample data.
[20] S5: Import the to-be-detected sample data to the anomaly score calculation model to identify whether the to-be-detected sample data is abnormal data.
[21] A process of processing the data in the normal data block by using the modified matrix sketching model includes the following steps:
[22] S11: Construct a data matrix Z based on the data in the normal data block and select a precision parameters The data matrix Ze Rdxn and Rdxn representsarealnumberspaceof dxn.
[23] Optionally, a value range of the selected precision parameter 8 is (0,1].
[24] S12: Specify a number of iterations based on the data matrix Z.
[25] The data matrix Z is a real number space of dx n . Therefore, the specified number of iterations equals the number of columns in the data matrix Z. In other words, the specified number of iterations is n.
[26] S13: Initialize a zero matrix B of dx I based on the precision parameter 8 , where B =[bl,b2,---, bib]
[27] The selected precision parameter is 8 Therefore, a number of columns in the initialized zero matrix can be obtained by rounding up a reciprocal of the precision parameter, that is, l, <-1/l where represents a round up operation.
[28] S14: Replace the last column in the zero matrix B with an ith column in the data matrix Z to obtain a new matrix T, where T <[b,-,b_1,z,] andE ,
[29] S15: Perform singular value decomposition (SVD) on the new matrix T to obtain a singular value, left singular matrix U, and diagonal matrix I of the matrix T. A formula for performing
SVD on the new matrix T is as follows:[UZV]<- SVD(T)
[301 Z=diag([o,..., 1 ]), 1 >-o...
[31] U , Y , and V represent the left singular matrix, a right singular matrix, and the diagonal matrix of the matrix T respectively. diag represents a diagonal matrix whose diagonal elements are (I''''' . ''I represents an 1th singular value of the matrix T.
[32] S16: Select a minimum singular value 5 of the matrix T, and scan and update the diagonal matrix of the matrix T based on the minimum singular value.
[33] A formula for selecting the minimum singular value is as follows:
[34]
[35] A formula for scanning and updating the diagonal matrix of the matrix T based on the minimum singular value is as follows:
[36] i<- max(E -I,,0)
[37] represents an identity matrix of Ix I and 5 represents the minimum singular value.
[38] S17: Construct and update the sketch matrix B based on the updated diagonal matrix and the left singular matrix U, and add one to a value of i. A formula for updating the sketch matrix is as follows:
[39] B <- Ui
[40] S18: Compare the value of i with the number of iterations, and export the current sketch matrix B if the value of i is greater than the number of iterations or return to Si4 if the value of i is less than or equal to the number of iterations.
[41] In an implementation of processing the sketch matrix by using the hash learning model, the following steps are included: Process data in each column of the sketch matrix by using a hash projection method, to obtain a hash projection vector for the data in each column. Obtain the optimal model parameter based on the hash projection vector and the sketch matrix and the projection matrix based on a maximum objective function. The optimal model parameter is the maximum objective function that is obtained after the hash objective function is optimized.
[42] The hash learning model is constructed by using the following linear hash projection method:
[431 hk=sgn(w b,)
[44] hk represents a k th hash function in a hash function group H,=[hh2 ,- -,-,h ] Wke Rd represents a k th projection vector in the hash projection matrix W =[W1,W 2, ... ,Wk I .. WW,]cR> Bgn( BJ'2 . hl..h ( , represents - sign function, [b,b2 ,-,b,--b,]ER represents a sketch matrix of a data block D , and h represents an i th column in the sketch matrix.
[45] The feature hash table is calculated by using the linear hash projection method based on the following formula:
[461 H, = sgn(W,'B,) WB represents the
[47] represents the hash projection matrix, T represents transposition, and B sketch matrix the data block D,
[48] Optimization of the hash objective function is to maximize the objective function and obtain the optimal model parameter . A formula for maximizing the objective function is as follows: 14, <-- max tr WBBWj s.t.WW =
[49] WtcR"
[50] Rdxr represents a real number space of dx r , Bt represents the sketch matrix, represents the projection matrix, T represents transposition, tr(-) represents a matrix trace, and Ir represents an identity matrix of rx r.
[51] In an implementation of processing the to-be-detected sample data by using the anomaly score calculation model, the following steps are included:
[52] Step 1: Import a processed to-be-detected sample data matrix Xt±, a hash table H, of normal sample features, and the hash projection matrix to the anomaly score calculation model, where X,,1 E Rdx, t E R t Rdxr , and r < d.
[53] Step 2: Specify a threshold .
[54] Step 3: Perform binary hash encoding on data ' of each column in the to-be-detected
sample data matrix based on the hash projection matrix to obtain a binary hash code h, where iE1, 2,..., n Kh hahoe K h
[55] Step 4: Seek for K hash codes that are closest to the binary hash code in the hash table of normal sample features.
[56] Step 5: Calculate an average Hamming distance ai between the binary hash code h and hK the K closest hash codes i .
[57] Step 6: Compare the mean value ai with the specified threshold , and determine that the data of the column is normal data if a, { or the data of the column is abnormal data if a,
[58] Step 7: Determine whether the to-be-detected sample data is detected. If the to-be-detected sample data is detected, collectively label all abnormal data and export normal data. If the to-be-detected sample data is not detected, return to Step 3.
[59] The anomaly score calculation model is constructed based on the average Hamming distance between the binary hash code hi of the to-be-detected sample data and the K hash
codes ' in the feature hash table that are closest to the binary hash code
[60] The binary hash code of the to-be-detected sample data can be expressed as follows:
[611 h,=sgn(WTx,)
[621 h is the binary hash code of xi in a Hamming space.
[63] A formula for calculating the average Hamming distance is as follows: I1K a=- HamDist(hi,h|' 1641 Kj~
[65] a represents the anomaly score of the to-be-detected sample data, K represents a number
of closest hash codes that are specified by a user, and HamDist(hh/) represents a Hamming distance between h and h/ . K is usually set to 10. A threshold is specified to determine whether the data is abnormal data by using the following formulas: r xi c Y,, a, <
[661 jxie , a, >{
[67] represents the specified threshold.
[68] The online anomaly detection is updated in real time based on accumulation of sample data. If the sample data accumulates to a specified data size, repeat Steps 1 and 2 and update the model
parameter , sketch matrix B, and feature hash table Ht online.
[69] An online anomaly detection system for streaming data includes a data collection module, a matrix sketching module, a hash learning module, an anomaly identification module, an identification result output module, and a model update module. ) [70] The data collection module is configured to collect data and import the collected data to the matrix sketching module.
[71] The matrix sketching module is configured to perform matrix sketching on a large amount of high-dimensional high-speed streaming data, to generate a sketch matrix.
[72] The hash learning module is configured to map data in the sketch matrix to a Hamming space to generate a hash projection matrix and a feature hash table.
[73] The anomaly identification module is configured to: calculate an anomaly score for to-be-detected data based on the hash projection matrix and the feature hash table, and compare the calculated anomaly score with a specified anomaly threshold to obtain a detection result for the to-be-detected data.
[74] The identification result output module is configured to export the detection result.
[75] The model update module is configured to update data attributes and distribution characteristics of a model.
[76] The data collection module includes devices such as a sensor and data collector. These devices can be used to collect network logs, data of industrial sensors, and data in other fields.
[77] A process of processing data in a normal data block by using the matrix sketching module includes the following steps: Construct a data matrix Z based on the data in the normal data block and select a precision parameter -' . Specify a number of iterations based on the data matrix Z
. Initialize a zero matrix B of dx I based on the precision parameter ' . Replace the last column in the zero matrix B with an ith column in the data matrix Z to obtain a new matrix T. Perform SVD on the new matrix T to obtain a singular value, left singular matrix U , and diagonal matrix of the matrix T. Select a minimum singular value 6 of the matrix T, and scan and update the diagonal matrix of the matrix T based on the minimum singular value. Construct and update the sketch matrix B based on the updated diagonal matrix and the left singular matrix U , and add one to a value of i. Compare the value of i with the number of iterations, and export the current sketch matrix B if the value of i is greater than the number of iterations or reselect data from the data matrix Z for matrix sketching if the value of i is less than or equal to the number of iterations.
[78] A process of processing data by using the hash learning module includes the following steps: Process data in each column of the sketch matrix by using a hash projection method, to obtain a hash projection vector for the data in each column. Obtain an optimal model parameter based on the hash projection vector and the sketch matrix and the projection matrix based on a maximum objective function. The optimal model parameter is the maximum objective function that is obtained after a hash objective function is optimized.
[79] A process of processing data by using the anomaly identification module includes the ) following steps: Import a processed to-be-detected sample data matrix, a hash table of normal sample features, and the hash projection matrix to an anomaly score calculation model. Specify a threshold ' . Perform binary hash encoding on data i of each column in the to-be-detected sample data matrix based on the hash projection matrix to obtain a binary hash code h. Seek for K hash codes hi that are closest to the binary hash code h in the hash table of normal sample features. Calculate an average Hamming distance ai between the binary hash code h and the K closest hash codes hi . Compare the mean value a with the specified threshold , and determine that the data of the column is normal data if a, - ; or the data of the column is abnormal data if a, > g . Determine whether the to-be-detected sample data is detected. If the to-be-detected sample data is detected, collectively label all abnormal data and export normal data. If the to-be-detected sample data is not detected, detect again.
[80] Then, the identification result output module updates and exports the detection result.
[81] A process updating data by using the model update module includes the following steps: Convert the obtained normal data to a data matrix. Map the sketch matrix that is obtained by using the matrix sketching model to a binary Hamming space by using a linear hash projection method, to obtain an updated hash projection matrix. Package the data matrix and the sketch matrix to generate a new normal data block.
[82] The implementations in the system of the present disclosure are the same as those in the method of the present disclosure.
[83] The objectives, technical solutions, and beneficial effects of the present disclosure are further described in detail in the foregoing specific implementations. It should be understood that the foregoing descriptions are merely specific implementations of the present disclosure, but are not intended to limit the protection scope of the present disclosure. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure shall fall within the protection scope of the present disclosure.
Claims (5)
1. An online anomaly detection method for streaming data, comprising: obtaining a normal data block that is transmitted at a high speed and importing data in the normal data block to an online anomaly detection model for training; importing to-be-detected sample data to the trained online anomaly detection model and determining whether the to-be-detected sample data is normal data; and if the to-be-detected sample data is normal data, updating the normal data to generate a new normal data block, and using the new normal data block as training data for a next anomaly detection; or if the to-be-detected sample data is abnormal data, labeling the abnormal data, wherein the online anomaly detection model comprises a modified matrix sketching model, a hash learning model, and an anomaly score calculation model.
2. The online anomaly detection method for streaming data according to claim 1, wherein a process of importing the to-be-detected sample data to the trained online anomaly detection model for detection comprises: Si: importing the data in the normal data block to the modified matrix sketching model to obtain a sketch matrix; S2: importing the sketch matrix to the hash learning model, optimizing the sketch matrix by using a hash objective function to obtain an optimal model parameter , and obtaining a hash projection matrix based on the optimal model parameter; S3: mapping the sketch matrix by using the hash projection matrix to obtain a feature hash table H S4: obtaining the to-be-detected sample data; and S5: importing the to-be-detected sample data to the anomaly score calculation model to determine whether the to-be-detected sample data is abnormal data.
3. The online anomaly detection method for streaming data according to claim 2, wherein a process of processing the data in the normal data block by using the modified matrix sketching model comprises: S11: constructing a data matrix Z based on the data in the normal data block and selecting a precision parameter 8, wherein the data matrix Ze Rdxn and Rdxn representsarealnumber spaceof dxl; S12: specifying a number of iterations based on the data matrix Z; S13: initializing a zero matrix B of dx I based on the precision parameter 8 , wherein B =[bl,b2,---,bi,--
Si4: replacing the last column in the zero matrix B with an ith column in the data matrix Z to obtain a new matrix T, wherein i E 1,2,..., n. ) 15: performing singular value decomposition (SVD) on the new matrix T to obtain a singular value, left singular matrix U, and diagonal matrix I of the matrix T; S16: selecting a minimum singular value 5 of the matrix T, and scanning and updating the diagonal matrix of the matrix T based on the minimum singular value; S17: constructing and updating the sketch matrix B based on the updated diagonal matrix and the left singular matrix U, and adding one to a value of i; and S18: comparing the value of i with the number of iterations, and exporting the current sketch matrix B if the value of i is greater than the number of iterations or returning to S14 if the value of i is less than or equal to the number of iterations; wherein a process of processing the sketch matrix by using the hash learning model comprises: processing data in each column of the sketch matrix by using a hash projection method, to obtain a hash projection vector for the data in each column; and obtaining the optimal model parameterWt based on the hash projection vector and the sketch matrix and the projection matrix based on a maximum objective function, wherein the optimal model parameter is the maximum objective function that is obtained after the hash objective function is optimized; wherein a formula for the optimal model parameter is as follows: W* <- max tr(W,$B,BW, st)w.t. ,= I WJR ", wherein Rdx represents a real number space of dx r , Bt represents the sketch matrix, t
represents the projection matrix, T represents transposition, tr(.) represents a matrix trace, and Ir represents an identity matrix of rx r ; wherein a formula for obtaining the feature hash table based on the hash projection matrix is as follows: H, =sgnW B ,wherein sgn(-) represents a sign function, represents the hash projection matrix, T represents transposition, and B represents the sketch matrix; wherein a process of processing the to-be-detected sample data by using the anomaly score calculation model comprises: Step 1: importing a processed to-be-detected sample data matrix, a hash table of normal sample features, and the hash projection matrix to the anomaly score calculation model; Step 2: specifying a threshold ' ; Step 3: performing binary hash encoding on data ' of each column in the to-be-detected sample data matrix based on the hash projection matrix to obtain a binary hash code h wherein iE 1,2,...,n
. Step 4: seeking for K hash codes that are closest to the binary hash code in the ) hash table of normal sample features; Step 5: calculating an average Hamming distance ai between the binary hash code h i and hK the K closest hash codes I;
Step 6: comparing the mean value ai with the specified threshold , and determining that the data of the column is normal data if a, or the data of the column is abnormal data if a, and Step 7: determining whether the to-be-detected sample data is detected; and if the to-be-detected sample data is detected, collectively labeling all abnormal data and exporting normal data, or if the to-be-detected sample data is not detected, returning to Step 3; wherein a formula for calculating the average Hamming distance between the binary hash code h K is as follows: and the closest hash codes/<
a,= ZHamDist(h,,h|J K, wherein K represents a number of closest hash codes that are specified by a user, and HamDist(h,h/J) represents a Hamming distance between h and
4. The online anomaly detection method for streaming data according to claim 1, wherein a process of updating the normal data comprises: converting the obtained normal data to a data matrix; mapping a sketch matrix that is obtained by using the matrix sketching model to a binary Hamming space by using a linear hash projection method, to obtain an updated hash projection matrix; and packaging the data matrix and the sketch matrix to generate a new normal data block.
5. An online anomaly detection system for streaming data, wherein the system comprises a data collection module, a matrix sketching module, a hash learning module, an anomaly identification module, an identification result output module, and a model update module, wherein the data collection module is configured to collect data and import the collected data to the matrix sketching module; the matrix sketching module is configured to perform matrix sketching on a large amount of high-dimensional high-speed streaming data, to generate a sketch matrix; the hash learning module is configured to map data in the sketch matrix to a Hamming space to generate a hash projection matrix and a feature hash table; the anomaly identification module is configured to: calculate an anomaly score for to-be-detected data based on the hash projection matrix and the feature hash table, and compare the calculated anomaly score with a specified anomaly threshold to obtain a detection result for the to-be-detected data; the identification result output module is configured to export the detection result; and the model update module is configured to update data attributes and distribution characteristics of a model.
FIG. 1 -1/2-
DRAWINGS
FIG. 2 -2/2-
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021106594A AU2021106594A4 (en) | 2021-08-23 | 2021-08-23 | Online anomaly detection method and system for streaming data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021106594A AU2021106594A4 (en) | 2021-08-23 | 2021-08-23 | Online anomaly detection method and system for streaming data |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2021106594A4 true AU2021106594A4 (en) | 2021-11-11 |
Family
ID=78480100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2021106594A Ceased AU2021106594A4 (en) | 2021-08-23 | 2021-08-23 | Online anomaly detection method and system for streaming data |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2021106594A4 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115909741A (en) * | 2022-11-30 | 2023-04-04 | 山东高速股份有限公司 | Method, device and medium for judging traffic state |
-
2021
- 2021-08-23 AU AU2021106594A patent/AU2021106594A4/en not_active Ceased
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115909741A (en) * | 2022-11-30 | 2023-04-04 | 山东高速股份有限公司 | Method, device and medium for judging traffic state |
CN115909741B (en) * | 2022-11-30 | 2024-03-26 | 山东高速股份有限公司 | Traffic state judging method, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109639739B (en) | Abnormal flow detection method based on automatic encoder network | |
CN109299284B (en) | Knowledge graph representation learning method based on structural information and text description | |
CN113326187B (en) | Data-driven memory leakage intelligent detection method and system | |
CN111667015B (en) | Method and device for detecting state of equipment of Internet of things and detection equipment | |
CN109376797B (en) | Network traffic classification method based on binary encoder and multi-hash table | |
CN111523667B (en) | RFID positioning method based on neural network | |
CN110765277A (en) | Online equipment fault diagnosis platform of mobile terminal based on knowledge graph | |
AU2021106594A4 (en) | Online anomaly detection method and system for streaming data | |
CN114491082A (en) | Plan matching method based on network security emergency response knowledge graph feature extraction | |
CN115033895A (en) | Binary program supply chain safety detection method and device | |
Dong et al. | Mining data correlation from multi-faceted sensor data in the Internet of Things | |
CN117093260B (en) | Fusion model website structure analysis method based on decision tree classification algorithm | |
CN117827508A (en) | Abnormality detection method based on system log data | |
CN113098848A (en) | Flow data anomaly detection method and system based on matrix sketch and Hash learning | |
CN111898134A (en) | Intelligent contract vulnerability detection method and device based on LSTM and BiLSTM | |
CN111767546A (en) | Deep learning-based input structure inference method and device | |
CN115268994B (en) | Code feature extraction method based on TBCNN and multi-head self-attention mechanism | |
CN114722388B (en) | Database data information security monitoring method | |
CN116467720A (en) | Intelligent contract vulnerability detection method based on graph neural network and electronic equipment | |
CN115842861A (en) | Edge connection device adaptation method, device and computer readable storage medium | |
CN117390130A (en) | Code searching method based on multi-mode representation | |
CN117439800B (en) | Network security situation prediction method, system and equipment | |
CN112597155B (en) | Data search optimization method, device, medium and computer program product | |
CN117792801B (en) | Network security threat identification method and system based on multivariate event analysis | |
CN114943430B (en) | Smart grid key node identification method based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |