CN114969761A - Log anomaly detection method based on LDA theme characteristics - Google Patents
Log anomaly detection method based on LDA theme characteristics Download PDFInfo
- Publication number
- CN114969761A CN114969761A CN202210689100.4A CN202210689100A CN114969761A CN 114969761 A CN114969761 A CN 114969761A CN 202210689100 A CN202210689100 A CN 202210689100A CN 114969761 A CN114969761 A CN 114969761A
- Authority
- CN
- China
- Prior art keywords
- log
- template
- topic
- model
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/552—Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Virology (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a log abnormity detection method based on LDA theme characteristics. The method comprises two stages of model training and anomaly detection. In the model training stage, a log analyzer is used for analyzing a system log into a log template set and a log triple set, wherein the log template set is used for training an LDA (latent dirichlet allocation) model to obtain a log template topic classification model; and converting the log triple into a process log template theme by using an LDA-CM (latent dirichlet allocation-CM) model, further constructing a training sample by using a sliding window mechanism, finally inputting the training sample into an LSTM (least squares metric TM) model, and training to generate a log anomaly detection model. In the abnormal detection stage, the process log to be detected is converted into a corresponding template theme sequence, and then the corresponding template theme sequence is input into an LSTM-ADM model to realize abnormal detection aiming at the process log.
Description
Technical Field
The invention belongs to the technical field of safety, and particularly relates to a log abnormity detection method based on LDA theme characteristics.
Background
In the field of system security, detecting software or system abnormality through logs is a common security protection means. Bugs inevitably exist from simple and small software systems to large and complex software systems, as well as distributed file systems and high-performance cloud computing management platforms, and the bugs can cause the abnormity of the operation of the system. Furthermore, an attacker may also exploit vulnerabilities of software and systems to launch a risky attack to break the system. Therefore, timely and accurate detection of these anomalies is crucial to the construction of a secure and trusted system. However, the existing anomaly detection method cannot accurately learn semantic difference characteristics between normal logs and abnormal logs, so that the generalization capability of the anomaly detection method is poor, and a good effect is not achieved in practical application.
Logs are a common and major source of data for anomaly detection methods in almost all computer systems, and record a series of significant events that describe the state of software and system operation. Existing methods of analyzing system logs to implement anomaly detection can be generalized into four categories: the method comprises a log data counting detection method based on Principal Component Analysis (PCA), a detection method based on a variable mining (IM) capture log recurrence mode, a detection method based on a workflow and a method based on deep learning. The first three types of methods can achieve good results in specific application scenarios, but cannot be used to detect different attacks. The last category of deep learning methods uses log templates for classification to learn patterns of behavior within a log sequence. The current deep learning-based method cannot accurately learn the semantic relation characteristics among logs, and for the injection of a new log template, the stability of the method is greatly influenced, and the method implementation model may fail; in addition, the method has the advantages that the related performances such as precision ratio, recall ratio, harmonic score and the like need to be further improved so as to adapt to complicated and variable software and systems.
Disclosure of Invention
In order to more accurately learn the semantic relation characteristics among logs and more effectively detect the abnormal behaviors of a process or a system through unstructured log records, the invention provides a log abnormality detection method based on LDA theme characteristics.
In order to achieve the purpose, the invention is realized by the following technical scheme:
the invention relates to a log abnormity detection method based on LDA theme characteristics, which mainly comprises two stages: firstly, a model training stage, namely, constructing a training sample by extracting template subject characteristics of log data, and further training to generate an abnormal detection model; and in the second abnormal detection stage, the abnormal detection model is utilized to realize the detection of the process log.
A model training stage:
(1) acquiring system log data L ═ log 1 ,log 2 ,…,log n And the corresponding process set is set as P ═ P 1 ,p 2 ,…,p f L is generated by the process in P; and processing the logs in the L by using a log analyzer to analyze, and generating a log template set K ═ { K ═ K 1 ,k 2 ,…,k m D and a set of log triples D ═ D 1 ,d 2 ,…,d n In which d is i Is log i And (d) corresponding log triples (k, pid and ts), wherein k is a log template, pid is a process identifier, and ts is a time stamp generated by the log.
(2) Preprocessing the log template K, inputting the preprocessed data into a preset theme set to be T ═ T 1 ,t 2 ,…,t r And (4) training the LDA model to generate a log template topic classification model LDA-CM based on LDA.
(3) Initializing a log template topic mapping dictionary TD, and calculating each log template K in K by using an LDA-CM model i Subject probability distribution vector Θ i ,Θ i Is equal to the number of topics in T, Θ i [j]Represents k i Belonging to a topic t j The probability of (d); then, the theta is obtained i Subject t corresponding to the maximum probability value in (1) x Will map { k } i →t x Add to TD.
(4) Processing the log triple set D according to the processes in the P, establishing log template subject sequences corresponding to the processes in the P by using the TD, and recording the formed sequence set as S ═ S { (S) 1 ,S 2 ,…,S f }. The method comprises the following specific steps:
(4a) dividing D into a plurality of subsets according to the ID of each process in P, namely pid in the log triples, and sequencing the log triples in each subset according to the timestamp ts so as to obtain P for each process i Generating a corresponding sequence of log triples D i (ii) a Then, D is obtained i The log template in each log triple of the process p is obtained i Corresponding log template sequence
(4b) For each process P in P i Log template sequence ofEach log template k in (1) i,j Determining k by using the log template theme mapping relation in TD i,j Subject t of x And then is process p i Establishing a log template topic sequence S i =<t i,1 ,t i,2 ,…,t i,q >. P, the log template topic sequence corresponding to each process forms a set S ═ { S ═ S 1 ,S 2 ,…,S f }。
(5) Using a sliding window mechanism, p for each process in S i Log template topic sequence S i And processing to generate a training sample set TP. The method comprises the following specific steps:
(5a) initializing the length of a sliding window to be h, the sliding step length to be 1, and the training sample set TP to be empty;
(5b) for each process p in S i Log template topic sequence S i =<t i,1 ,t i,2 ,…,t i,q >Is processed if S i If the number of the log template topics in the log template is less than h, namely q is less than or equal to h, S i Corresponding journal template topic window set W i Is empty; otherwise, by moving the sliding window, construct and S i Corresponding journal template topic window set W i ={w i,1 ,w i,2 ,…,w i,y In which w i,1 =<t i,1 ,t i,2 ,…,t i,h >,w i,2 =<t i,2 ,t i,3 ,…,t i,h+1 >,…,w i,y =<t i,q-h ,t i,q-h+1 ,…,t i,q-1 >Then construct training sample pairs (w) i,1 ,t i,h+1 )、(w i,2 ,t i,h+2 ) … and (w) i,y ,t i,q ) And adds these training sample pairs to the TP.
(6) And training the LSTM model by using the training samples in the TP to generate an LSTM-based process log anomaly detection model LSTM-ADM.
An abnormality detection stage:
(1) the log sequence of the process p to be detected is L p =<log p,1 ,log p,2 ,…,log p,v >Using log parser pair L p The logs in the process are sequentially processed to generate a log template sequence K of the process p =<k p,1 ,k p,2 ,…,k p,v >。
(2) Mapping dictionary TD with log template theme, and mapping K p Conversion to a Log template topic sequence S p The method comprises the following specific steps:
(2a) initialization S p Is a null sequence;
(2b) sequential treatment of K p Each log template k in p,i Checking whether the relation k exists in the log template topic mapping dictionary TD p,i Subject mapping of { k } p,i →t j H, if there is, will t j Is added to S p Performing the following steps; otherwise, k is p,i Inputting LDA-CM model to obtain k p,i Log template topic probability distribution vector Θ p,i Obtaining theta p,i Is not set to theta p,i [x]Then log template k p,i Corresponding topic is t x (ii) a Then t is x Is added to S p In (e), a new log template topic map k is created at the same time p,i →t x And add the mapping to the TD. Finally obtained with K p Corresponding log template topic sequence S p =<t p,1 ,t p,2 ,…,t p,v >。
(3) Using a sliding window mechanism, for S p And processing, carrying out anomaly detection by using an LSTM-ADM model, and returning a detection result. The method comprises the following specific steps:
(3a) initializing the sliding window with the length h and the sliding step length 1, and collecting the subject window set W of the log template p Null, detection pair set DP null;
(3b) judgment S p If the former is not larger than the latter, namely v is not larger than h, the abnormal detection of the process p is finished; otherwise, the next step is carried out continuously;
(3c) by moving the sliding window, the structure and S p Corresponding journal template topic window set W p ={w p,1 ,w p,2 ,…,w p,y In which w p,1 =<t p,1 ,t p,2 ,…,t p,h >,w p,2 =<t p,2 ,t p,3 ,…,t p,h+1 >,…,w p,y =<t p,v-h ,t p,v-h+1 ,…,t p,v-1 >Then, a set of detection pairs DP { (w) is constructed p,1 ,t p,h+1 ),(w p,2 ,t p,h+2 ),…,(w p,y ,t p,v )};
(3d) Detecting pairs (w) for each of the DPs p,i ,t p,h+i ) Subject window w of log template p,i Inputting LSTM-ADM to obtain window w p,i Predicted log template topic probability distribution vector V for the next log p,i ,V p,i Is equal to the number of topics in T, V p,i [j]Representing a window w p,i Subject of the predicted log template of the next log of (1) belongs to the subject t j The probability of (d); then, V is obtained p,i The topics corresponding to the g maximum probability values in the prediction log form a prediction log template topic set CS p,i If it is determined that The process p is abnormal, and the detection is finished;
(3e) and when all detection pairs in the DP are processed and no abnormality is detected, the process p has no abnormality and the detection is finished.
The invention has the beneficial effects that:
the invention provides a log anomaly detection method based on LDA theme characteristics for the first time, and the method can extract the characteristics of the log and convert the log into a log template theme, thereby overcoming the defects of the existing anomaly detection method based on the log template.
The LDA theme model used by the method is an unsupervised model, only log template data is needed to be used as a corpus, the number of themes is specified, training can be completed without labels to obtain an LDA-CM theme classification model, and the method is easy to realize;
in addition, the LDA-CM topic classification model can match the newly added log template to the most relevant log template topic, so that the problem of model robustness of the existing method for injecting the new log template is solved.
Drawings
FIG. 1 is a diagram of log data preprocessing according to the present invention.
FIG. 2 is a flowchart of an overall framework of the LDA topic feature-based log anomaly detection method of the present invention.
Detailed Description
In the following description, for purposes of explanation, numerous implementation details are set forth in order to provide a thorough understanding of the embodiments of the invention. It should be understood, however, that these implementation details are not to be interpreted as limiting the invention. That is, in some embodiments of the invention, such implementation details are not necessary.
As shown in fig. 2, the present invention is a log anomaly detection method based on LDA topic features, which includes two stages:
a model training stage: in the model training stage, firstly, a log analyzer is used for analyzing a system log into a log template set and a log triple set, wherein the log template set is used for training an LDA (latent dirichlet allocation) model to obtain a log template topic classification model LDA-CM (latent dirichlet allocation-CM); then, converting the log triple into a process log template theme by using an LDA-CM (latent dirichlet allocation-CM) model, further constructing a training sample by using a sliding window mechanism, finally inputting the training sample into an LSTM model, and training to generate a log anomaly detection model LSTM-ADM;
an abnormality detection stage: in the anomaly detection stage, firstly, the process log to be detected is converted into a corresponding template topic sequence, and then the LSTM-ADM model in the step 1 is input to realize anomaly detection aiming at the process log.
The invention is further illustrated below with reference to the accompanying figures 1-2.
The log abnormity detection method based on the LDA theme characteristics, provided by the invention, comprises the following steps:
in the model training phase:
(1) acquiring system log data L ═ log 1 ,log 2 ,…,log n P, process set P ═ P 1 ,p 2 ,…,p f L is generated by the process in P; and processing the logs in the L by using a log analyzer to analyze, and generating a log template set K ═ { K ═ K 1 ,k 2 ,…,k m D and a set of log triples D ═ D 1 ,d 2 ,…,d n In which d is i Is log i And (d) corresponding log triples (k, pid and ts), wherein k is a log template, pid is a process identifier, and ts is a time stamp generated by the log.
(2) Splitting words of each log template in the log template K to obtain a word list WL, carrying out lowercase conversion on each word of the WL, filtering stop words and semantic-free identifiers, finally converting the word list into a corpus by using a tape model, adding the corpus into a corpus list CL, and inputting the corpus CL into a preset theme set to obtain T ═ T 1 ,t 2 ,…,t r And (4) training the LDA model to generate a log template topic classification model LDA-CM based on LDA.
(3) Initializing a log template topic mapping dictionary TD, and calculating each log template K in K by using an LDA-CM model i Subject probability distribution vector Θ i ,Θ i Is equal to the number of topics in T, Θ i [j]Represents k i Belonging to a topic t j The probability of (d); then, the theta is obtained i Subject t corresponding to the maximum probability value in (1) x Will map { k } i →t x Add to TD.
(4) Processing the log triple set D according to the processes in the P, establishing log template subject sequences corresponding to the processes in the P by using the TD, and recording the formed sequence set as S ═ S { (S) 1 ,S 2 ,…,S f }. The method comprises the following specific steps:
(4a) dividing D into a plurality of subsets according to the ID of each process in P, namely pid in the log triples, and sequencing the log triples in each subset according to the timestamp ts so as to obtain P for each process i Generating a corresponding sequence of log triples D i (ii) a Then, D is obtained i The log template in each log triple of the process p is obtained i Corresponding log template sequence
(4b) For each process P in P i Log template sequence ofEach log template k in (1) i,j Determining k by using the log template theme mapping relation in TD i,j Subject t of x And then is process p i Establishing a log template topic sequence S i =<t i,1 ,t i,2 ,…,t i,q >. P, the log template topic sequence corresponding to each process forms a set S ═ { S ═ S 1 ,S 2 ,…,S f }。
(5) Using a sliding window mechanism, p for each process in S i Log template topic sequence S i And processing to generate a training sample set TP. The method comprises the following specific steps:
(5a) initializing the length of a sliding window to be h, the sliding step length to be 1, and the training sample set TP to be empty;
(5b) for each process p in S i Log template topic sequence S i =<t i,1 ,t i,2 ,…,t i,q >Is processed if S i If the number of the log template topics in the log template is less than h, namely q is less than or equal to h, S i Corresponding journal template topic window set W i Is empty; otherwise, by moving the sliding window, construct and S i Corresponding journal template topic window set W i ={w i,1 ,w i,2 ,…,w i,y In which w i,1 =<t i,1 ,t i,2 ,…,t i,h >,w i,2 =<t i,2 ,t i,3 ,…,t i,h+1 >,…,w i,y =<t i,q-h ,t i,q-h+1 ,…,t i,q-1 >Then construct training sample pairs (w) i,1 ,t i,h+1 )、(w i,2 ,t i,h+2 ) … and (w) i,y ,t i,q ) And adds these training sample pairs to the TP.
(6) And training the LSTM model by using the training samples in the TP to generate an LSTM-based process log anomaly detection model LSTM-ADM.
In the anomaly detection phase:
(1) the log sequence of the process p to be detected is L p =<log p,1 ,log p,2 ,…,log p,v >Using log parser pair L p The logs in the process are sequentially processed to generate a log template sequence K of the process p =<k p,1 ,k p,2 ,…,k p,v >。
(2) Mapping the dictionary TD with the log template theme, and matching K p Conversion to a Log template topic sequence S p . The method comprises the following specific steps:
(2a) initialization S p Is a null sequence;
(2b) sequential treatment of K p Each log template k in p,i In the log template topic mapping dictionary TD, whether the k exists or not is inquired p,i Subject mapping of { k } p,i →t j H, if present, will t j Is added to S p The preparation method comprises the following steps of (1) performing; otherwise, k is p,i Inputting LDA-CM model to obtain k p,i Log template topic probability distribution vector Θ p,i Obtaining theta p,i Maximum probability value Θ in (1) p,i [x]Then log template k p,i Corresponding topic is t x (ii) a Then t is x Is added to S p In (2), a new log template topic map k is created simultaneously p,i →t x And add the mapping to the TD. Finally obtained with K p Corresponding log template topic sequence S p =<t p,1 ,t p,2 ,…,t p,v >。
(3) Using a sliding window mechanism, for S p And processing, performing anomaly detection by using an LSTM-ADM model, and returning a detection result. The method comprises the following specific steps:
(3a) initializing the sliding window with the length h and the sliding step length 1, and collecting the subject window set W of the log template p Null, detection pair set DP null;
(3b) judgment S p If the former is not larger than the latter, namely v is not larger than h, the abnormal detection of the process p is finished; otherwise, the next step is carried out continuously;
(3c) by moving the sliding window, the structure and S p Corresponding journal template topic window set W p ={w p,1 ,w p,2 ,…,w p,y In which w p,1 =<t p,1 ,t p,2 ,…,t p,h >,w p,2 =<t p,2 ,t p,3 ,…,t p,h+1 >,…,w p,y =<t p,v-h ,t p,v-h+1 ,…,t p,v-1 >Generating a set of detection pairs DP { (w) p,1 ,t p,h+1 ),(w p,2 ,t p,h+2 ),…,(w p,y ,t p,v )};
(3d) Detecting pairs (w) for each of the DPs p,i ,t p,h+i ) Subject window w of log template p,i Inputting LSTM-ADM to obtain window w p,i Predicted log template topic probability distribution vector V for the next log p,i ,V p,i Is equal to the number of topics in T, V p,i [j]Representing a window w p,i Subject of the predicted log template of the next log of (1) belongs to the subject t j The probability of (d); then, V is obtained p,i The topics corresponding to the g maximum probability values in the prediction log form a prediction log template topic set CS p,i If, if The process p is abnormal, and the detection is finished; (3e) and when all detection pairs in the DP are processed and no abnormality is detected, the process p has no abnormality and the detection is finished.
Therefore, the anomaly detection method based on the LDA subject characteristics has better robustness and realizability.
The above description is only an embodiment of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (7)
1. A log abnormity detection method based on LDA subject characteristics is characterized in that: the log anomaly detection method comprises the following steps:
step 1, model training: in the model training stage, firstly, a log analyzer is used for analyzing a system log into a log template set and a log triple set, wherein the log template set is used for training an LDA (latent dirichlet allocation) model to obtain a log template topic classification model LDA-CM (latent dirichlet allocation-CM); then, converting the log triple into a process log template theme by using an LDA-CM (latent dirichlet allocation-CM) model, further constructing a training sample by using a sliding window mechanism, finally inputting the training sample into an LSTM model, and training to generate a log anomaly detection model LSTM-ADM;
step 2, abnormality detection: in the anomaly detection stage, firstly, the process log to be detected is converted into a corresponding template topic sequence, and then the LSTM-ADM model in the step 1 is input to realize anomaly detection aiming at the process log.
2. The method for detecting log abnormality based on LDA subject characteristics as claimed in claim 1, wherein: the step 1 of model training specifically comprises the following steps:
step 1-1: acquiring system log data L ═ log 1 ,log 2 ,…,log n And the corresponding process set is set as P ═ P 1 ,p 2 ,…,p f And L is generated by the process in P, the log in L is processed by using the log resolver to resolve, and a log template set K is generated as { K ═ K 1 ,k 2 ,…,k m And a set of log triples D corresponding to L ═ D 1 ,d 2 ,…,d n In which d is i Is log i Corresponding log triple (k, pid, ts), wherein k is a log template, pid is a process identifier, and ts is a time stamp generated by the log;
step 1-2: preprocessing a log template set K, inputting preprocessed data into a preset subject set to be T ═ T 1 ,t 2 ,…,t r Training the LDA model to generate a log template topic classification model LDA-CM based on LDA;
step 1-3: initializing a log template topic mapping dictionary TD, and calculating each log template K in a log template set K by using the LDA-CM model generated in the step 1-2 i Subject probability distribution vector Θ i ,Θ i Is equal to the number of topics in T, Θ i [j]Represents k i Belonging to a topic t j Then, obtain Θ i Subject t corresponding to the maximum probability value in (1) x Will map { k } i →t x Addition to TD;
step 1-4: processing the log triple set D according to the processes in the process set P, establishing log template topic sequences corresponding to the processes in the process set P by using a log template topic mapping dictionary TD, and recording the formed sequence set as S ═ S { (S) 1 ,S 2 ,…,S f };
Step 1-5: using a sliding window mechanism, for each process p in the sequence set S formed in steps 1-4 i Log template topic sequence S i Processing to generate a training sample set TP;
1-6: and (3) training an LSTM model by using the training samples in the training sample set TP generated in the step (1-5) to generate an LSTM-based process log anomaly detection model LSTM-ADM.
3. The method for detecting log abnormality based on LDA subject feature of claim 2, wherein: in the steps 1 to 4, the log triple set D is processed, and the specific steps are as follows:
step 1-4-1: dividing D into a plurality of subsets according to the ID of each process in the process-oriented set P, namely pid in the log triples, and sequencing the log triples in each subset according to the timestamp ts, thereby providing each process P i Constructing a corresponding log triple sequence D i Then, D is obtained i The log template in each log triple of the process p is obtained i Corresponding log template sequence
Step 1-4-2: for each process P in the set of Processes P i Log template sequence ofEach log template k in (1) i,j Determining k by using the log template topic mapping relation in the log template topic mapping dictionary TD i,j Subject t of x And then is process p i Establishing a log template topic sequence S i =<t i,1 ,t i,2 ,…,t i,q >And the log template topic sequence corresponding to each process in the P forms a set S ═ { S ═ S 1 ,S 2 ,…,S f }。
4. The method for detecting log abnormality based on LDA subject feature of claim 2, wherein: the specific steps of generating the training sample set TP in the steps 1-5 are as follows:
step 1-5-1: initializing the length of a sliding window to be h, the sliding step length to be 1, and the training sample set TP to be empty;
step 1-5-2: for each process p in the sequence set S i Log template topic sequence S i =<t i,1 ,t i,2 ,…,t i,q >Is processed if S i If the number of the log template topics in the log template is less than h, namely q is less than or equal to h, S i Corresponding journal template topic window set W i Is empty; otherwise, by moving the sliding window, construct and S i Corresponding journal template topic window set W i ={w i,1 ,w i,2 ,…,w i,y In which w i,1 =<t i,1 ,t i,2 ,…,t i,h >,w i,2 =<t i,2 ,t i,3 ,…,t i,h+1 >,…,w i,y =<t i,q-h ,t i,q-h+1 ,…,t i,q-1 >Then construct training sample pairs (w) i,1 ,t i,h+1 )、(w i,2 ,t i,h+2 ) … and (w) i,y ,t i,q ) And adds these training sample pairs to the TP.
5. The method for detecting log abnormality based on LDA subject characteristics as claimed in claim 1, wherein: the abnormality detection of step 2 specifically includes the steps of:
step 2-1: the log sequence of the process p to be detected is L p =<log p,1 ,log p,2 ,…,log p,v >Using log parser pair L p The logs in the process are sequentially processed to generate a log template sequence K of the process p =<k p,1 ,k p,2 ,…,k p,v >;
Step 2-2: mapping dictionary TD by using log template theme, and mapping log template sequence K p Conversion to a Log template topic sequence S p =<t p,1 ,t p,2 ,…,t p,v >;
Step 2-3: using sliding window mechanism to S in step 2-2 p =<t p,1 ,t p,2 ,…,t p,v >And processing, carrying out anomaly detection by using an LSTM-ADM model, and returning a detection result.
6. The log anomaly detection method based on LDA subject characteristics as claimed in claim 5, wherein: step 2-2 is to log template sequence K p Conversion to a Log template topic sequence S p The method comprises the following specific steps:
step 2-2-1: initialization S p Is a null sequence;
step 2-2-2: sequential treatment of K p Each log template k in p,i In the log template topic mapping dictionary TD, whether the k exists or not is inquired p,i Subject mapping of { k } p,i →t j H, if there is, will t j Is added to S p The preparation method comprises the following steps of (1) performing; otherwise, k is added p,i Inputting LDA-CM model to obtain k p,i Log template topic probability distribution vector Θ p,i Obtaining theta p,i Is not set to theta p,i [x]Then log template k p,i Corresponding topic is t x Then t is added x Is added to S p In (e), a new log template topic map k is created at the same time p,i →t x And adding the mapping into TD to finally obtain K p Corresponding log template topic sequence S p =<t p,1 ,t p,2 ,…,t p,v >。
7. The log anomaly detection method based on LDA subject characteristics as claimed in claim 5, wherein: subject sequence S by log template for process p in step 2-3 p =<t p,1 ,t p,2 ,…,t p,v >The method realizes the process log abnormity detection, and comprises the following specific steps:
step 2-3-1: initializing the sliding window with the length h and the sliding step length 1, and collecting the subject window set W of the log template p Null, detection pair set DP null;
step 2-3-2: judgment S p If the number of the log template topics is not more than the length of the sliding window, namely v is not more than h, the abnormal detection of the process p is finished; otherwise, the next step is carried out continuously;
step 2-3-3: by moving the sliding window, the structure and S p Corresponding journal template topic window set W p ={w p,1 ,w p,2 ,…,w p,y In which w p,1 =<t p,1 ,t p,2 ,…,t p,h >,w p,2 =<t p,2 ,t p,3 ,…,t p,h+1 >,…,w p,y =<t p,v-h ,t p,v -h+1,…,t p,v-1 >Then, a set of detection pairs DP { (w) is constructed p,1 ,t p,h+1 ),(w p,2 ,t p,h+2 ),…,(w p,y ,t p,v )};
Step 2-3-4: detecting pairs (w) for each of the DPs p,i ,t p,h+i ) Subject window w of log template p,i Inputting a log abnormity detection model LSTM-ADM to obtain a window w p,i Predicted log template topic probability distribution vector V for the next log p,i ,V p,i Is equal to the number of topics in T, V p,i [j]Representing a window w p,i Subject of the predicted log template of the next log of (1) belongs to the subject t j Then, V is obtained p,i The topics corresponding to the g maximum probability values in the prediction log form a prediction log template topic set CS p,i If, ifThe process p is abnormal, and the detection is finished;
step 2-3-5: and when all detection pairs in the DP are processed and no abnormality is detected, the process p has no abnormality and the detection is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210689100.4A CN114969761A (en) | 2022-06-17 | 2022-06-17 | Log anomaly detection method based on LDA theme characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210689100.4A CN114969761A (en) | 2022-06-17 | 2022-06-17 | Log anomaly detection method based on LDA theme characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114969761A true CN114969761A (en) | 2022-08-30 |
Family
ID=82963994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210689100.4A Pending CN114969761A (en) | 2022-06-17 | 2022-06-17 | Log anomaly detection method based on LDA theme characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114969761A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116841650A (en) * | 2023-08-31 | 2023-10-03 | 腾讯科技(深圳)有限公司 | Sample construction method, device, equipment and storage medium |
-
2022
- 2022-06-17 CN CN202210689100.4A patent/CN114969761A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116841650A (en) * | 2023-08-31 | 2023-10-03 | 腾讯科技(深圳)有限公司 | Sample construction method, device, equipment and storage medium |
CN116841650B (en) * | 2023-08-31 | 2023-11-21 | 腾讯科技(深圳)有限公司 | Sample construction method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107315956B (en) | It is a kind of for quick and precisely detecting the Graph-theoretical Approach of Malware on the zero | |
CN109918505B (en) | Network security event visualization method based on text processing | |
CN113434357A (en) | Log abnormity detection method and device based on sequence prediction | |
CN109670318B (en) | Vulnerability detection method based on cyclic verification of nuclear control flow graph | |
US11533373B2 (en) | Global iterative clustering algorithm to model entities' behaviors and detect anomalies | |
CN111598179A (en) | Power monitoring system user abnormal behavior analysis method, storage medium and equipment | |
CN117081858B (en) | Intrusion behavior detection method, system, equipment and medium based on multi-decision tree | |
CN112100137A (en) | Unmanned aerial vehicle anomaly detection method based on multi-log collaborative analysis | |
CN116107834A (en) | Log abnormality detection method, device, equipment and storage medium | |
Liu et al. | FewM-HGCL: Few-shot malware variants detection via heterogeneous graph contrastive learning | |
CN114969761A (en) | Log anomaly detection method based on LDA theme characteristics | |
Xie et al. | An attention-based gru network for anomaly detection from system logs | |
CN112583847B (en) | Method for network security event complex analysis for medium and small enterprises | |
CN111786999B (en) | Intrusion behavior detection method, device, equipment and storage medium | |
CN115221013B (en) | Method, device and equipment for determining log mode | |
CN116467720A (en) | Intelligent contract vulnerability detection method based on graph neural network and electronic equipment | |
CN112733144B (en) | Intelligent malicious program detection method based on deep learning technology | |
CN115278752A (en) | AI (Artificial intelligence) detection method for abnormal logs of 5G (third generation) communication system | |
CN111079145B (en) | Malicious program detection method based on graph processing | |
CN111565192A (en) | Credibility-based multi-model cooperative defense method for internal network security threats | |
Xie et al. | Industrial Internet Vulnerability Detection Method Based on CBAM-CNN-SVM | |
Zheng et al. | Using complex network communities to evaluate the correctness of object detection | |
Nandakumar et al. | A Novel Approach to User Agent String Parsing for Vulnerability Analysis Using Multi-Headed Attention | |
CN111125699B (en) | Malicious program visual detection method based on deep learning | |
Chen et al. | Avminer: Expansible and semantic-preserving anti-virus labels mining method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |