AU2021106594A4 - Online anomaly detection method and system for streaming data - Google Patents

Online anomaly detection method and system for streaming data Download PDF

Info

Publication number
AU2021106594A4
AU2021106594A4 AU2021106594A AU2021106594A AU2021106594A4 AU 2021106594 A4 AU2021106594 A4 AU 2021106594A4 AU 2021106594 A AU2021106594 A AU 2021106594A AU 2021106594 A AU2021106594 A AU 2021106594A AU 2021106594 A4 AU2021106594 A4 AU 2021106594A4
Authority
AU
Australia
Prior art keywords
matrix
data
hash
model
sketch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021106594A
Inventor
Xingrong FAN
Zhiwei Guo
Yu Shen
Jianhui Wang
Xianming ZHANG
Dujiang ZHAO
Xiaolong ZHAO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Engineering Research Center for Waste Oil Recovery Technology and Equipment Ministry of Education Chongqing Technology and Business University
Chongqing Technology and Business University S&T Developing Ltd
Original Assignee
Engineering Research Center for Waste Oil Recovery Technology and Equipment Ministry of Education Chongqing Technology and Business University
Chongqing Technology and Business University S&T Developing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Engineering Research Center for Waste Oil Recovery Technology and Equipment Ministry of Education Chongqing Technology and Business University, Chongqing Technology and Business University S&T Developing Ltd filed Critical Engineering Research Center for Waste Oil Recovery Technology and Equipment Ministry of Education Chongqing Technology and Business University
Priority to AU2021106594A priority Critical patent/AU2021106594A4/en
Application granted granted Critical
Publication of AU2021106594A4 publication Critical patent/AU2021106594A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/40Data acquisition and logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/805Real-time

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

OF THE DISCLOSURE The present disclosure relates to the technical field of streaming data mining, and in particular, to an online anomaly detection method and system for streaming data. The online anomaly detection method for streaming data includes: processing a data block by using a matrix sketching model to obtain a sketch matrix, where the data block is transmitted at a high speed; importing the sketch matrix to a hash learning model to obtain an optimal model parameter and a feature hash table for the current moment; and constructing an anomaly score calculation model based on the optimal model parameter and the feature hash table, importing to-be-detected sample data to the anomaly score calculation model for detection, and determining whether the to-be-detected sample data is abnormal. The present disclosure uses matrix sketching and hash learning technologies. This reduces data sizes and feature dimensions and improves a detection speed and storage efficiency. In addition, a detection model can be updated online to adapt to dynamic changes of data distribution. Therefore, when a large amount of high-dimensional streaming data is transmitted at a high speed, anomalies in the streaming data can be efficiently detected in real time. -2/2 D, ,Normal data block I Matrix sketching model (Matrix sketching-based sub model) B Sketch matrix Hash learning model (Hash learning-based sub model) Feature hash table Anomaly score calculations X I model I ' (Coupling model for detecting Abnorma anomalies in streaming data)_i I data yt Normal data D, , Normal data block Matrix sketching model' I (Matrix sketching-based sub E model) CO B Sketch matrix -0) - 1 Hash learning model E (Hash learning-based sub model) (D UW H Feature *hash table 7Anomaly s co re6-aTCufatiV] (Coupling model for detecting Abnorma anomalies in streaming data) -i I data ormal data D Normal data block FIG. 2

Description

-2/2
D, ,Normal data block
I Matrix sketching model (Matrix sketching-based sub model)
B Sketch matrix
Hash learning model (Hash learning-based sub model)
Feature hash table Anomaly score calculations X I model I
' (Coupling model for detecting Abnorma anomalies in streaming data)_i I data
yt Normal data
D, , Normal data block
Matrix sketching model' I (Matrix sketching-based sub E model) CO B Sketch matrix -0) - 1 Hash learning model E (Hash learning-based sub (D model)
UW H Feature *hash table 7Anomaly s co re6-aTCufatiV] (Coupling model for detecting Abnorma anomalies in streaming data) -i I data
ormal data
D Normal data block
FIG. 2
ONLINE ANOMALY DETECTION METHOD AND SYSTEM FOR STREAMING DATA TECHNICAL FIELD
[01] The present disclosure relates to the technical field of streaming data mining, and in particular, to an online anomaly detection method and system for streaming data.
BACKGROUND ART
[02] Streaming data (SD) is a continuous flow of sequential data that is transmitted in a large volume and at a high speed. An anomaly detection method can be used to detect anomalies in streaming data and is essential to data mining.
[03] Currently, growing requirements emerge for detecting anomalies in streaming data based on limited storage and computing resources. A key technology that is based on distance, density, incremental learning, or ensemble learning is proposed to perform online anomaly detection on a large amount of high-dimensional high-speed streaming data. In addition, various technologies that integrate incremental learning and ensemble learning are developed to reduce computing and storage overheads.
[04] However, these existing technologies are based on space division and use multiple detectors to detect anomalies in streaming data. As a result, large amounts of overheads are caused in storage and computing, which reduces the efficiency in detecting anomalies in high-dimensional streaming data. In addition, these technologies ignore encoding characteristics of streaming data. Therefore, an online anomaly detection method for streaming data is required urgently.
SUMMARY
[05] To resolve the technical issues in the prior art, the present disclosure provides an online anomaly detection method for streaming data. The method includes: obtaining a normal data block that is transmitted at a high speed and importing data in the normal data block to an online anomaly detection model for training; importing to-be-detected sample data to the trained online anomaly detection model and then identifying whether the to-be-detected sample data is normal data; and if the to-be-detected sample data is normal data, updating the normal data block to generate a new normal data block, and using the new normal data block as training data for a next anomaly detection; or if the to-be-detected sample data is abnormal data, labeling the abnormal data; the online anomaly detection model mentioned above consists of a modified matrix sketching model, a hash learning model, and an anomaly score calculation model.
[06] An online anomaly detection system for streaming data includes a data collection module, a matrix sketching module, a hash learning module, an anomaly identification module, an identification result output module, and a model update module.
[07] The present disclosure combines a matrix sketching technology with a hash learning technology and proposes a new solution for detecting anomalies in a large amount of high-dimensional high-speed streaming data online. This facilitates online detection of anomalies in a large amount of high-dimensional high-speed streaming data and in 5G scenarios, and provides technical support for achieving ultra-high speed and performance, ultra-low latency, and ultra-high computing and storage efficiency.
BRIEFT DESCRIPTION OF THE DRAWINGS
[08] FIG. 1 is a structural block diagram of an online anomaly detection method for streaming data according to the present disclosure; and
[09] FIG. 2 is a technical roadmap of an online anomaly detection method for streaming data according to the present disclosure.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[10] The following clearly and completely describes the technical solutions in the embodiments of the present disclosure with reference to the accompanying drawings of the present disclosure.
[11] FIG. 1 is a structural block diagram of an online anomaly detection method for streaming data. The online anomaly detection method consists of two sub models. The upper part of FIG. 1 represents the matrix sketching-driven sub model, which is developed by the matrix sketching-based anomaly detection technology. The lower part of FIG. 1 represents the hash leaming-driven sub model, which is constructed by the hash learning-based anomaly detection technology. The two sub models are bidirectionally connected by a coupling operator that can providing flexibility to represent various forms of two interacting sub models. Data is imported to and processed by the sub models. Then, normal data and abnormal data are obtained. +1 represents streaming data that is imported at a t+1 moment. +1 +-1 represents normal data and abnormal data that are detected in real time by the sub models at a t +1 moment respectively.
[12] In the present disclosure, a large amount of high-dimensional high-speed streaming data is abstracted as an ever-increasing dynamic data set in which data is continuously generated with time, SD={D,6 Rdxn t=1,2,°°..} namely, 1 ' ' ',. D represents a normal data block that is transmitted at a high speed at a t moment. d and " represent a feature space dimension and sample data size of the data block D,, respectively.
[13] The online anomaly detection method for streaming data includes the following steps: Obtain a normal data block that is transmitted at a high speed and import data in the normal data block to an online anomaly detection model for training. Import to-be-detected sample data to the trained online anomaly detection model and identify whether the to-be-detected sample data is normal data. If the to-be-detected sample data is normal data, update the normal data to generate a new normal data block, and use the new normal data block as training data for a next anomaly detection. If the to-be-detected sample data is abnormal data, label the abnormal data. The online anomaly detection model includes a modified matrix sketching model, a hash learning model, and an anomaly score calculation model.
[14] In an implementation of the online anomaly detection method for streaming data, the following steps are included: Obtain the normal data block that is transmitted at a high speed. Process the normal data block by using the modified matrix sketching model to obtain a sketch matrix. Import the sketch matrix to the hash learning model and optimize the sketch matrix by using a hash objective function, to obtain an optimal model parameter and a feature hash table H, for the current moment. Obtain to-be-detected sample data of a next moment and import the to-be-detected sample data and the feature hash table H to the anomaly score calculation model, to calculate an anomaly score for the to-be-detected sample data. Specify an anomaly score threshold (whose default value is 0.5) and compare the anomaly score of the to-be-detected sample data with the anomaly score threshold. If the calculated anomaly score is greater than the specified anomaly score threshold, the to-be-detected sample data is abnormal data. If the calculated anomaly score is less than or equal to the specified anomaly score threshold, the to-be-detected sample data is normal data.
[15] In a preferred embodiment, the to-be-detected sample data is imported to the trained online anomaly detection model for detection, as shown in FIG. 2. This process includes the following steps:
[16] SI: Import the data in the normal data block to the modified matrix sketching model to obtain a sketch matrix.
[17] S2: Import the sketch matrix to the hash learning model, optimize the sketch matrix by using a hash objective function to obtain an optimal model parameter , and then obtain a hash projection matrix based on the optimal model parameter.
[18] S3: Map the sketch matrix by using the hash projection matrix to obtain a feature hash table H
[19] S4: Obtain the to-be-detected sample data.
[20] S5: Import the to-be-detected sample data to the anomaly score calculation model to identify whether the to-be-detected sample data is abnormal data.
[21] A process of processing the data in the normal data block by using the modified matrix sketching model includes the following steps:
[22] S11: Construct a data matrix Z based on the data in the normal data block and select a precision parameters The data matrix Ze Rdxn and Rdxn representsarealnumberspaceof dxn.
[23] Optionally, a value range of the selected precision parameter 8 is (0,1].
[24] S12: Specify a number of iterations based on the data matrix Z.
[25] The data matrix Z is a real number space of dx n . Therefore, the specified number of iterations equals the number of columns in the data matrix Z. In other words, the specified number of iterations is n.
[26] S13: Initialize a zero matrix B of dx I based on the precision parameter 8 , where B =[bl,b2,---, bib]
[27] The selected precision parameter is 8 Therefore, a number of columns in the initialized zero matrix can be obtained by rounding up a reciprocal of the precision parameter, that is, l, <-1/l where represents a round up operation.
[28] S14: Replace the last column in the zero matrix B with an ith column in the data matrix Z to obtain a new matrix T, where T <[b,-,b_1,z,] andE ,
[29] S15: Perform singular value decomposition (SVD) on the new matrix T to obtain a singular value, left singular matrix U, and diagonal matrix I of the matrix T. A formula for performing
SVD on the new matrix T is as follows:[UZV]<- SVD(T)
[301 Z=diag([o,..., 1 ]), 1 >-o...
[31] U , Y , and V represent the left singular matrix, a right singular matrix, and the diagonal matrix of the matrix T respectively. diag represents a diagonal matrix whose diagonal elements are (I''''' . ''I represents an 1th singular value of the matrix T.
[32] S16: Select a minimum singular value 5 of the matrix T, and scan and update the diagonal matrix of the matrix T based on the minimum singular value.
[33] A formula for selecting the minimum singular value is as follows:
[34]
[35] A formula for scanning and updating the diagonal matrix of the matrix T based on the minimum singular value is as follows:
[36] i<- max(E -I,,0)
[37] represents an identity matrix of Ix I and 5 represents the minimum singular value.
[38] S17: Construct and update the sketch matrix B based on the updated diagonal matrix and the left singular matrix U, and add one to a value of i. A formula for updating the sketch matrix is as follows:
[39] B <- Ui
[40] S18: Compare the value of i with the number of iterations, and export the current sketch matrix B if the value of i is greater than the number of iterations or return to Si4 if the value of i is less than or equal to the number of iterations.
[41] In an implementation of processing the sketch matrix by using the hash learning model, the following steps are included: Process data in each column of the sketch matrix by using a hash projection method, to obtain a hash projection vector for the data in each column. Obtain the optimal model parameter based on the hash projection vector and the sketch matrix and the projection matrix based on a maximum objective function. The optimal model parameter is the maximum objective function that is obtained after the hash objective function is optimized.
[42] The hash learning model is constructed by using the following linear hash projection method:
[431 hk=sgn(w b,)
[44] hk represents a k th hash function in a hash function group H,=[hh2 ,- -,-,h ] Wke Rd represents a k th projection vector in the hash projection matrix W =[W1,W 2, ... ,Wk I .. WW,]cR> Bgn( BJ'2 . hl..h ( , represents - sign function, [b,b2 ,-,b,--b,]ER represents a sketch matrix of a data block D , and h represents an i th column in the sketch matrix.
[45] The feature hash table is calculated by using the linear hash projection method based on the following formula:
[461 H, = sgn(W,'B,) WB represents the
[47] represents the hash projection matrix, T represents transposition, and B sketch matrix the data block D,
[48] Optimization of the hash objective function is to maximize the objective function and obtain the optimal model parameter . A formula for maximizing the objective function is as follows: 14, <-- max tr WBBWj s.t.WW =
[49] WtcR"
[50] Rdxr represents a real number space of dx r , Bt represents the sketch matrix, represents the projection matrix, T represents transposition, tr(-) represents a matrix trace, and Ir represents an identity matrix of rx r.
[51] In an implementation of processing the to-be-detected sample data by using the anomaly score calculation model, the following steps are included:
[52] Step 1: Import a processed to-be-detected sample data matrix Xt±, a hash table H, of normal sample features, and the hash projection matrix to the anomaly score calculation model, where X,,1 E Rdx, t E R t Rdxr , and r < d.
[53] Step 2: Specify a threshold .
[54] Step 3: Perform binary hash encoding on data ' of each column in the to-be-detected
sample data matrix based on the hash projection matrix to obtain a binary hash code h, where iE1, 2,..., n Kh hahoe K h
[55] Step 4: Seek for K hash codes that are closest to the binary hash code in the hash table of normal sample features.
[56] Step 5: Calculate an average Hamming distance ai between the binary hash code h and hK the K closest hash codes i .
[57] Step 6: Compare the mean value ai with the specified threshold , and determine that the data of the column is normal data if a, { or the data of the column is abnormal data if a,
[58] Step 7: Determine whether the to-be-detected sample data is detected. If the to-be-detected sample data is detected, collectively label all abnormal data and export normal data. If the to-be-detected sample data is not detected, return to Step 3.
[59] The anomaly score calculation model is constructed based on the average Hamming distance between the binary hash code hi of the to-be-detected sample data and the K hash
codes ' in the feature hash table that are closest to the binary hash code
[60] The binary hash code of the to-be-detected sample data can be expressed as follows:
[611 h,=sgn(WTx,)
[621 h is the binary hash code of xi in a Hamming space.
[63] A formula for calculating the average Hamming distance is as follows: I1K a=- HamDist(hi,h|' 1641 Kj~
[65] a represents the anomaly score of the to-be-detected sample data, K represents a number
of closest hash codes that are specified by a user, and HamDist(hh/) represents a Hamming distance between h and h/ . K is usually set to 10. A threshold is specified to determine whether the data is abnormal data by using the following formulas: r xi c Y,, a, <
[661 jxie , a, >{
[67] represents the specified threshold.
[68] The online anomaly detection is updated in real time based on accumulation of sample data. If the sample data accumulates to a specified data size, repeat Steps 1 and 2 and update the model
parameter , sketch matrix B, and feature hash table Ht online.
[69] An online anomaly detection system for streaming data includes a data collection module, a matrix sketching module, a hash learning module, an anomaly identification module, an identification result output module, and a model update module. ) [70] The data collection module is configured to collect data and import the collected data to the matrix sketching module.
[71] The matrix sketching module is configured to perform matrix sketching on a large amount of high-dimensional high-speed streaming data, to generate a sketch matrix.
[72] The hash learning module is configured to map data in the sketch matrix to a Hamming space to generate a hash projection matrix and a feature hash table.
[73] The anomaly identification module is configured to: calculate an anomaly score for to-be-detected data based on the hash projection matrix and the feature hash table, and compare the calculated anomaly score with a specified anomaly threshold to obtain a detection result for the to-be-detected data.
[74] The identification result output module is configured to export the detection result.
[75] The model update module is configured to update data attributes and distribution characteristics of a model.
[76] The data collection module includes devices such as a sensor and data collector. These devices can be used to collect network logs, data of industrial sensors, and data in other fields.
[77] A process of processing data in a normal data block by using the matrix sketching module includes the following steps: Construct a data matrix Z based on the data in the normal data block and select a precision parameter -' . Specify a number of iterations based on the data matrix Z
. Initialize a zero matrix B of dx I based on the precision parameter ' . Replace the last column in the zero matrix B with an ith column in the data matrix Z to obtain a new matrix T. Perform SVD on the new matrix T to obtain a singular value, left singular matrix U , and diagonal matrix of the matrix T. Select a minimum singular value 6 of the matrix T, and scan and update the diagonal matrix of the matrix T based on the minimum singular value. Construct and update the sketch matrix B based on the updated diagonal matrix and the left singular matrix U , and add one to a value of i. Compare the value of i with the number of iterations, and export the current sketch matrix B if the value of i is greater than the number of iterations or reselect data from the data matrix Z for matrix sketching if the value of i is less than or equal to the number of iterations.
[78] A process of processing data by using the hash learning module includes the following steps: Process data in each column of the sketch matrix by using a hash projection method, to obtain a hash projection vector for the data in each column. Obtain an optimal model parameter based on the hash projection vector and the sketch matrix and the projection matrix based on a maximum objective function. The optimal model parameter is the maximum objective function that is obtained after a hash objective function is optimized.
[79] A process of processing data by using the anomaly identification module includes the ) following steps: Import a processed to-be-detected sample data matrix, a hash table of normal sample features, and the hash projection matrix to an anomaly score calculation model. Specify a threshold ' . Perform binary hash encoding on data i of each column in the to-be-detected sample data matrix based on the hash projection matrix to obtain a binary hash code h. Seek for K hash codes hi that are closest to the binary hash code h in the hash table of normal sample features. Calculate an average Hamming distance ai between the binary hash code h and the K closest hash codes hi . Compare the mean value a with the specified threshold , and determine that the data of the column is normal data if a, - ; or the data of the column is abnormal data if a, > g . Determine whether the to-be-detected sample data is detected. If the to-be-detected sample data is detected, collectively label all abnormal data and export normal data. If the to-be-detected sample data is not detected, detect again.
[80] Then, the identification result output module updates and exports the detection result.
[81] A process updating data by using the model update module includes the following steps: Convert the obtained normal data to a data matrix. Map the sketch matrix that is obtained by using the matrix sketching model to a binary Hamming space by using a linear hash projection method, to obtain an updated hash projection matrix. Package the data matrix and the sketch matrix to generate a new normal data block.
[82] The implementations in the system of the present disclosure are the same as those in the method of the present disclosure.
[83] The objectives, technical solutions, and beneficial effects of the present disclosure are further described in detail in the foregoing specific implementations. It should be understood that the foregoing descriptions are merely specific implementations of the present disclosure, but are not intended to limit the protection scope of the present disclosure. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure shall fall within the protection scope of the present disclosure.

Claims (5)

WHAT IS CLAIMED IS:
1. An online anomaly detection method for streaming data, comprising: obtaining a normal data block that is transmitted at a high speed and importing data in the normal data block to an online anomaly detection model for training; importing to-be-detected sample data to the trained online anomaly detection model and determining whether the to-be-detected sample data is normal data; and if the to-be-detected sample data is normal data, updating the normal data to generate a new normal data block, and using the new normal data block as training data for a next anomaly detection; or if the to-be-detected sample data is abnormal data, labeling the abnormal data, wherein the online anomaly detection model comprises a modified matrix sketching model, a hash learning model, and an anomaly score calculation model.
2. The online anomaly detection method for streaming data according to claim 1, wherein a process of importing the to-be-detected sample data to the trained online anomaly detection model for detection comprises: Si: importing the data in the normal data block to the modified matrix sketching model to obtain a sketch matrix; S2: importing the sketch matrix to the hash learning model, optimizing the sketch matrix by using a hash objective function to obtain an optimal model parameter , and obtaining a hash projection matrix based on the optimal model parameter; S3: mapping the sketch matrix by using the hash projection matrix to obtain a feature hash table H S4: obtaining the to-be-detected sample data; and S5: importing the to-be-detected sample data to the anomaly score calculation model to determine whether the to-be-detected sample data is abnormal data.
3. The online anomaly detection method for streaming data according to claim 2, wherein a process of processing the data in the normal data block by using the modified matrix sketching model comprises: S11: constructing a data matrix Z based on the data in the normal data block and selecting a precision parameter 8, wherein the data matrix Ze Rdxn and Rdxn representsarealnumber spaceof dxl; S12: specifying a number of iterations based on the data matrix Z; S13: initializing a zero matrix B of dx I based on the precision parameter 8 , wherein B =[bl,b2,---,bi,--
Si4: replacing the last column in the zero matrix B with an ith column in the data matrix Z to obtain a new matrix T, wherein i E 1,2,..., n. ) 15: performing singular value decomposition (SVD) on the new matrix T to obtain a singular value, left singular matrix U, and diagonal matrix I of the matrix T; S16: selecting a minimum singular value 5 of the matrix T, and scanning and updating the diagonal matrix of the matrix T based on the minimum singular value; S17: constructing and updating the sketch matrix B based on the updated diagonal matrix and the left singular matrix U, and adding one to a value of i; and S18: comparing the value of i with the number of iterations, and exporting the current sketch matrix B if the value of i is greater than the number of iterations or returning to S14 if the value of i is less than or equal to the number of iterations; wherein a process of processing the sketch matrix by using the hash learning model comprises: processing data in each column of the sketch matrix by using a hash projection method, to obtain a hash projection vector for the data in each column; and obtaining the optimal model parameterWt based on the hash projection vector and the sketch matrix and the projection matrix based on a maximum objective function, wherein the optimal model parameter is the maximum objective function that is obtained after the hash objective function is optimized; wherein a formula for the optimal model parameter is as follows: W* <- max tr(W,$B,BW, st)w.t. ,= I WJR ", wherein Rdx represents a real number space of dx r , Bt represents the sketch matrix, t
represents the projection matrix, T represents transposition, tr(.) represents a matrix trace, and Ir represents an identity matrix of rx r ; wherein a formula for obtaining the feature hash table based on the hash projection matrix is as follows: H, =sgnW B ,wherein sgn(-) represents a sign function, represents the hash projection matrix, T represents transposition, and B represents the sketch matrix; wherein a process of processing the to-be-detected sample data by using the anomaly score calculation model comprises: Step 1: importing a processed to-be-detected sample data matrix, a hash table of normal sample features, and the hash projection matrix to the anomaly score calculation model; Step 2: specifying a threshold ' ; Step 3: performing binary hash encoding on data ' of each column in the to-be-detected sample data matrix based on the hash projection matrix to obtain a binary hash code h wherein iE 1,2,...,n
. Step 4: seeking for K hash codes that are closest to the binary hash code in the ) hash table of normal sample features; Step 5: calculating an average Hamming distance ai between the binary hash code h i and hK the K closest hash codes I;
Step 6: comparing the mean value ai with the specified threshold , and determining that the data of the column is normal data if a, or the data of the column is abnormal data if a, and Step 7: determining whether the to-be-detected sample data is detected; and if the to-be-detected sample data is detected, collectively labeling all abnormal data and exporting normal data, or if the to-be-detected sample data is not detected, returning to Step 3; wherein a formula for calculating the average Hamming distance between the binary hash code h K is as follows: and the closest hash codes/<
a,= ZHamDist(h,,h|J K, wherein K represents a number of closest hash codes that are specified by a user, and HamDist(h,h/J) represents a Hamming distance between h and
4. The online anomaly detection method for streaming data according to claim 1, wherein a process of updating the normal data comprises: converting the obtained normal data to a data matrix; mapping a sketch matrix that is obtained by using the matrix sketching model to a binary Hamming space by using a linear hash projection method, to obtain an updated hash projection matrix; and packaging the data matrix and the sketch matrix to generate a new normal data block.
5. An online anomaly detection system for streaming data, wherein the system comprises a data collection module, a matrix sketching module, a hash learning module, an anomaly identification module, an identification result output module, and a model update module, wherein the data collection module is configured to collect data and import the collected data to the matrix sketching module; the matrix sketching module is configured to perform matrix sketching on a large amount of high-dimensional high-speed streaming data, to generate a sketch matrix; the hash learning module is configured to map data in the sketch matrix to a Hamming space to generate a hash projection matrix and a feature hash table; the anomaly identification module is configured to: calculate an anomaly score for to-be-detected data based on the hash projection matrix and the feature hash table, and compare the calculated anomaly score with a specified anomaly threshold to obtain a detection result for the to-be-detected data; the identification result output module is configured to export the detection result; and the model update module is configured to update data attributes and distribution characteristics of a model.
FIG. 1 -1/2-
DRAWINGS
FIG. 2 -2/2-
AU2021106594A 2021-08-23 2021-08-23 Online anomaly detection method and system for streaming data Ceased AU2021106594A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021106594A AU2021106594A4 (en) 2021-08-23 2021-08-23 Online anomaly detection method and system for streaming data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021106594A AU2021106594A4 (en) 2021-08-23 2021-08-23 Online anomaly detection method and system for streaming data

Publications (1)

Publication Number Publication Date
AU2021106594A4 true AU2021106594A4 (en) 2021-11-11

Family

ID=78480100

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021106594A Ceased AU2021106594A4 (en) 2021-08-23 2021-08-23 Online anomaly detection method and system for streaming data

Country Status (1)

Country Link
AU (1) AU2021106594A4 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909741A (en) * 2022-11-30 2023-04-04 山东高速股份有限公司 Method, device and medium for judging traffic state

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909741A (en) * 2022-11-30 2023-04-04 山东高速股份有限公司 Method, device and medium for judging traffic state
CN115909741B (en) * 2022-11-30 2024-03-26 山东高速股份有限公司 Traffic state judging method, equipment and medium

Similar Documents

Publication Publication Date Title
CN109639739B (en) Abnormal flow detection method based on automatic encoder network
CN109299284B (en) Knowledge graph representation learning method based on structural information and text description
CN113326187B (en) Data-driven memory leakage intelligent detection method and system
CN111667015B (en) Method and device for detecting state of equipment of Internet of things and detection equipment
CN109376797B (en) Network traffic classification method based on binary encoder and multi-hash table
CN111523667B (en) RFID positioning method based on neural network
CN110765277A (en) Online equipment fault diagnosis platform of mobile terminal based on knowledge graph
AU2021106594A4 (en) Online anomaly detection method and system for streaming data
CN114491082A (en) Plan matching method based on network security emergency response knowledge graph feature extraction
CN115033895A (en) Binary program supply chain safety detection method and device
Dong et al. Mining data correlation from multi-faceted sensor data in the Internet of Things
CN117093260B (en) Fusion model website structure analysis method based on decision tree classification algorithm
CN117827508A (en) Abnormality detection method based on system log data
CN113098848A (en) Flow data anomaly detection method and system based on matrix sketch and Hash learning
CN111898134A (en) Intelligent contract vulnerability detection method and device based on LSTM and BiLSTM
CN111767546A (en) Deep learning-based input structure inference method and device
CN115268994B (en) Code feature extraction method based on TBCNN and multi-head self-attention mechanism
CN114722388B (en) Database data information security monitoring method
CN116467720A (en) Intelligent contract vulnerability detection method based on graph neural network and electronic equipment
CN115842861A (en) Edge connection device adaptation method, device and computer readable storage medium
CN117390130A (en) Code searching method based on multi-mode representation
CN117439800B (en) Network security situation prediction method, system and equipment
CN112597155B (en) Data search optimization method, device, medium and computer program product
CN117792801B (en) Network security threat identification method and system based on multivariate event analysis
CN114943430B (en) Smart grid key node identification method based on deep reinforcement learning

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry