CN114356743B - Abnormal event automatic detection method and system based on sequence reconstruction - Google Patents

Abnormal event automatic detection method and system based on sequence reconstruction Download PDF

Info

Publication number
CN114356743B
CN114356743B CN202210234545.3A CN202210234545A CN114356743B CN 114356743 B CN114356743 B CN 114356743B CN 202210234545 A CN202210234545 A CN 202210234545A CN 114356743 B CN114356743 B CN 114356743B
Authority
CN
China
Prior art keywords
event
sequence
events
subsequence
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210234545.3A
Other languages
Chinese (zh)
Other versions
CN114356743A (en
Inventor
杨林
李东阳
马琳茹
王晓磊
张洪广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Original Assignee
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences filed Critical Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority to CN202210234545.3A priority Critical patent/CN114356743B/en
Publication of CN114356743A publication Critical patent/CN114356743A/en
Application granted granted Critical
Publication of CN114356743B publication Critical patent/CN114356743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention provides an abnormal event automatic detection method and system based on sequence reconstruction. The method comprises the following steps: step S1, determining a discrete event sequence from a plurality of source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence; step S2, splitting the discrete event sequence to obtain a plurality of original subsequences, and embedding characteristics of the plurality of original subsequences to further obtain an input subsequence of an unsupervised detection model, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence; step S3, determining the abnormal attribute of the events by using a criterion based on the original subsequence and the reconstructed subsequence of the discrete event sequence.

Description

Abnormal event automatic detection method and system based on sequence reconstruction
Technical Field
The invention belongs to the field of data detection, and particularly relates to an abnormal event automatic detection method and system based on sequence reconstruction.
Background
The proliferation of log data has resulted in an increasing demand for anomaly detection in many areas, which is a fundamental task in building secure, reliable and trustworthy computer systems. It has been shown by investigation that large modern systems generate logs at a rate of about 50GB (about 1.2 hundred million lines) per hour on average, and once a system fails, it becomes very difficult to manually identify key information from such a huge amount of logs for anomaly detection, even with a practical search tool such as grep. Meanwhile, many tasks are mostly expanded in the form of event sequences during specific implementation, so that behavior traces are more discrete and chaotic, and the difficulty of anomaly detection is greatly increased. Therefore, in order to meet practical challenges such as difficulty in rapid analysis of a large amount of discrete logs, difficulty in accurate positioning of complex abnormal behaviors, and difficulty in effective avoidance of system misinformation, an accurate and efficient automatic abnormal event detection system is urgently needed.
The current anomaly detection methods for time sequence discrete events are mainly divided into three categories: 1) based on the traditional machine learning method, the method mainly utilizes quantitative or statistical information of events to detect abnormity, but has the defects of insufficient consideration of time sequence information among events and high false alarm rate; 2) workflow-based methods, which assume that there is a workflow model similar to a finite state machine to represent the normal sequence of event-jump states, are mostly deterministic and cannot capture the complex long-term dependencies in the sequence, and therefore can only provide limited anomaly detection performance. 3) Based on a deep learning method, the method can utilize a strong deep network model to automatically learn a normal sequence pattern hidden in log data, detects abnormality by comparing whether a test sample deviates from the normal pattern, and is a mainstream development direction of future log analysis and abnormality detection. However, most of the existing anomaly detection methods based on deep learning realize anomaly detection by predicting a single event in the future, and on one hand, the method cannot fully utilize the time sequence characteristics of the existing event sequence, and on the other hand, the method is easy to fall into an under-fit or over-fit error region when a normal model is constructed, so that the detection accuracy cannot be ensured.
Disclosure of Invention
The application provides an abnormal event automatic detection scheme based on sequence reconstruction. The technical problem to be solved by the invention is as follows: on the premise of giving a group of discrete event log sequences as historical monitoring data, how to construct an automatic abnormal event detection system can accurately identify whether a subsequent event is abnormal or not. Further broken down into two parts: 1) how to fully utilize potential time sequence dependency relationship in discrete event sequences to construct a more accurate sequence model; 2) how to overcome the problems of over-fitting and under-fitting in the model construction process so as to improve the accuracy of anomaly detection.
The invention discloses an abnormal event automatic detection method based on sequence reconstruction in a first aspect. The method comprises the following steps:
step S1, determining a discrete event sequence from a plurality of source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
step S2, splitting the discrete event sequence to obtain a plurality of original subsequences, and embedding characteristics of the plurality of original subsequences to further obtain an input subsequence of an unsupervised detection model, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
and step S3, judging the abnormal attributes of the events by using a judgment criterion based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
According to the method of the first aspect of the present invention, the step S1 specifically includes:
analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs;
and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
According to the method of the first aspect of the present invention, in said step S2:
splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences;
the characteristic embedding of the plurality of original subsequences specifically comprises the following steps: and converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to serve as the input subsequence.
According to the method of the first aspect of the present invention, in step S2, the generating a reconstructed subsequence of the discrete event sequence based on the input subsequence specifically includes:
dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps:
compressing the training sequence into an abstract representation using the LSTM encoder;
calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed;
decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
According to the method of the first aspect of the present invention, in said step S2:
the LSTM encoder comprises a plurality of layers of LSTM networks, and the plurality of layers of LSTM networks are used for extracting the timing dependence relation of the training sequence so as to compress the training sequence into an abstract representation based on the timing dependence relation;
and generating the abstract representation to be reconstructed by taking a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component so as to eliminate the adverse effect of the potential abnormal sample in the training data.
According to the method of the first aspect of the present invention, the step S3 specifically includes:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component;
wherein M and N are positive integers, and M is more than or equal to N.
According to the method of the first aspect of the present invention, the determining, by the anomaly evaluation component, the anomaly attributes of the M events based on the occurrence probability by using the criterion specifically includes:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
The invention discloses an abnormal event automatic detection system based on sequence reconstruction in a second aspect. The system comprises:
the first processing unit is configured to determine a discrete event sequence from the multiple source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
a second processing unit configured to perform a splitting process on the discrete event sequence to obtain a number of original subsequences, and further obtain an input subsequence of an unsupervised detection model by performing feature embedding on the number of original subsequences, where the unsupervised detection model includes an LSTM encoder, a variation component, and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
a third processing unit configured to determine abnormal properties of the plurality of events using a criterion of evaluation based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
According to the system of the second aspect of the invention, the first processing unit is specifically configured to:
analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs;
and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
According to the system of the second aspect of the invention, the second processing unit is specifically configured to:
splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences;
the characteristic embedding of the plurality of original subsequences specifically comprises the following steps: and converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to serve as the input subsequence.
According to the system of the second aspect of the present invention, the second processing unit is specifically configured to generate a reconstructed subsequence of the discrete event sequence based on the input subsequence, specifically including:
dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps:
compressing the training sequence into an abstract representation using the LSTM encoder;
calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed;
decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
According to the system of the second aspect of the present invention, the LSTM encoder includes a plurality of layers of LSTM networks, where the plurality of layers of LSTM networks are configured to extract a timing dependency relationship of the training sequence, so as to compress the training sequence into an abstract representation based on the timing dependency relationship; the second processing unit is specifically configured to generate the abstract representation to be reconstructed by using a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component, so as to eliminate adverse effects of potential abnormal samples in the training data.
According to the system of the second aspect of the invention, the third processing unit is specifically configured to:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component;
wherein M and N are positive integers, and M is more than or equal to N.
According to the system of the second aspect of the invention, the third processing unit is specifically configured to:
the determining, by the anomaly evaluation component, the anomaly attributes of the M events based on the occurrence probability by using the evaluation criterion specifically includes:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for automatically detecting abnormal events based on sequence reconstruction according to any one of the first aspect of the disclosure when executing the computer program.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program, which when executed by a processor, implements the steps in a method for automatic detection of abnormal events based on sequence reconstruction according to any one of the first aspect of the present disclosure.
In conclusion, the technical scheme provided by the invention can fully utilize the potential time sequence dependency relationship in the log sequence data, and relieve the problems of over-fitting and under-fitting in the unsupervised detection process by means of probability modeling and sequence reconstruction, thereby improving the accuracy of the abnormal event automatic detection system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an abnormal event automatic detection method based on sequence reconstruction according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an abnormal event automatic detection process according to an embodiment of the present invention;
FIG. 3 is a block diagram of an abnormal event automatic detection system based on sequence reconstruction according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention discloses an abnormal event automatic detection method based on sequence reconstruction in a first aspect. Fig. 1 is a flowchart of an abnormal event automatic detection method based on sequence reconstruction according to an embodiment of the present invention; as shown in fig. 1, the method includes:
step S1, determining a discrete event sequence from a plurality of source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
step S2, splitting the discrete event sequence to obtain a plurality of original subsequences, and embedding the characteristics of the original subsequences to further obtain an input subsequence of an unsupervised detection model, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
and step S3, judging the abnormal attributes of the events by using a judgment criterion based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
FIG. 2 is a schematic diagram of an abnormal event automatic detection process according to an embodiment of the present invention; as shown in fig. 2, the method mainly includes three steps of data preprocessing (S1), unsupervised detection (S2), and exception output (S3).
The data preprocessing (S1) is mainly responsible for the analysis and sequence conversion work of the discrete event logs, and the obtained event sequence is fed into an unsupervised detection module and an abnormal output module for subsequent processing. Unsupervised detection (S2) obtains a plurality of sub-sequences in the form of sliding window after receiving a preprocessed discrete event sequence, and each sub-sequence is encoded by the feature embedding component and then fed to the LSTM variational self-encoder for sequence reconstruction. In this process, the stacked LSTM network is responsible for extracting potential timing dependencies in the event sequence, and the variational encoder model performs probability modeling on abstract representations of the timing relationships and generates reconstructed versions of the original subsequences based on the probability distribution. The anomaly output (S3) takes the original subsequence and its reconstructed version as input, and automatically obtains the final detection result by constructing a classifier and an anomaly evaluation component.
At step S1, a discrete event sequence is determined from the multiple source logs using a predefined event template, the discrete event sequence being formed by multiple event logs of the same user being spliced in time sequence.
In some embodiments, the step S1 specifically includes: analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs; and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
Specifically, a data preprocessing process is called, a group of predefined event templates are extracted from an audit log, and lengthy and disordered log entries are analyzed into concise and standard eventse(ii) a And simultaneously, according to the user ID identification recorded in the log, aggregating the corresponding logs into a discrete event sequence according to the time sequence.
In step S2, the discrete event sequence is split to obtain a plurality of original sub-sequences, and an input sub-sequence of an unsupervised detection model is further obtained by feature embedding the plurality of original sub-sequences, wherein the unsupervised detection model includes an LSTM encoder, a variation component, and an LSTM decoder, and is used for generating a reconstructed sub-sequence of the discrete event sequence based on the input sub-sequence.
In some embodiments, in said step S2: splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences; the characteristic embedding of the plurality of original subsequences is specifically as follows: converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to be used as the input subsequence
In some embodiments, in the step S2, the generating a reconstructed subsequence of the discrete event sequence based on the input subsequence specifically includes:
(1) dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps: (i) compressing the training sequence into an abstract representation using the LSTM encoder; (ii) calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed; (iii) decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
(2) and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
In some embodiments, in said step S2: the LSTM encoder comprises a plurality of layers of LSTM networks, and the plurality of layers of LSTM networks are used for extracting the timing dependence relation of the training sequence so as to compress the training sequence into an abstract representation based on the timing dependence relation; and generating the abstract representation to be reconstructed by adopting a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component so as to eliminate the adverse effect of the potential abnormal sample in the training data.
Specifically, an unsupervised detection mode procedure is invoked, first according to a sliding window mechanism and a preset window lengthWSplitting event sequencesGenerating a plurality of subsequences, coding the subsequences by a feature embedding component, and taking the coded sequence vectors as input of a subsequent unsupervised detection model. The feature embedding work includes inserting a longer length (W) Of (3) event subsequences W =[e 1 ,e 2 ,…,e W ]Conversion to smaller dimensions (d) Feature vector ofX W . The encoded feature vectors are used to train an unsupervised detection model based on LSTM and variational encoders. Specifically, the model mainly comprises three parts of an LSTM encoder, a variation component and an LSTM decoder, wherein the LSTM encoder encodes the encoded feature vectorsX W Compression into a potentially abstract representationh enc And the variation component simulates the data distribution of the abstract representation by calculating the average value mu and the standard deviation delta of the compressed representation and generates the abstract representation to be reconstructed by taking the standard normal distribution random number epsilon as a seedh enc ’,The LSTM decoder then re-characterizes the generated abstractionh enc Decoding is performed to obtain reconstructed feature vectors. In this process, the stacked multi-layer LSTM network is intended to extract potential timing dependencies in the event sequence, and the variational mechanism eliminates adverse effects of possible presence of anomalous samples in the training data by modeling the data distribution of abstract representations of the event sequence.
In step S3, based on the original subsequence and the reconstructed subsequence of the discrete event sequence, an abnormal attribute of the plurality of events is determined using a criterion of evaluation.
In some embodiments, the step S3 specifically includes: aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component;
wherein M and N are positive integers, and M is more than or equal to N.
In some embodiments, the determining, by the anomaly evaluation component, the anomaly attributes of the M events based on the occurrence probability using the evaluation criterion includes:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
Specifically, an abnormal output process is called, the process takes an original event subsequence and a reconstructed subsequence generated by an unsupervised detection module as input, a single-layer full-connection classifier model is constructed to obtain a probability value of each event in the sequence at each position, and then the probability is comprehensively evaluated through an embedded abnormal evaluation component to generate a final detection result. Specifically, the anomaly evaluation component includes two parallel criteria, one for checking whether the probability of occurrence of the correct event in the reconstructed sequence is lower than a certain threshold, and the other for checking whether the correct event at each position of the reconstructed sequence is located in the top K of the reconstruction probability. If both criteria cannot be met simultaneously, the subsequence is considered abnormal and alerted.
The invention discloses an abnormal event automatic detection system based on sequence reconstruction in a second aspect. FIG. 3 is a block diagram of an abnormal event automatic detection system based on sequence reconstruction according to an embodiment of the present invention; as shown in fig. 3, the system 300 includes:
a first processing unit 301, configured to determine a discrete event sequence from multiple source logs by using a predefined event template, where the discrete event sequence is formed by splicing multiple event logs of the same user in a time sequence;
a second processing unit 302 configured to perform a splitting process on the discrete event sequence to obtain a number of original sub-sequences, and further obtain an input sub-sequence of an unsupervised detection model by performing feature embedding on the number of original sub-sequences, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed sub-sequence of the discrete event sequence based on the input sub-sequence;
a third processing unit 303 configured to determine abnormal properties of the plurality of events by using a criterion of evaluation based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
According to the system of the second aspect of the present invention, the first processing unit 301 is specifically configured to:
analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs;
and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
According to the system of the second aspect of the present invention, the second processing unit 302 is specifically configured to:
splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences;
the characteristic embedding of the plurality of original subsequences specifically comprises the following steps: and converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to serve as the input subsequence.
According to the system of the second aspect of the present invention, the second processing unit 302 is specifically configured to generate a reconstructed subsequence of the discrete event sequence based on the input subsequence, and specifically includes:
dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps:
compressing the training sequence into an abstract representation using the LSTM encoder;
calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed;
decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
According to the system of the second aspect of the present invention, the LSTM encoder includes a multi-layer LSTM network, and the multi-layer LSTM network is configured to extract a timing dependency relationship of the training sequence, so as to compress the training sequence into an abstract representation based on the timing dependency relationship; the second processing unit 302 is specifically configured to generate the abstract representation to be reconstructed by using a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component, so as to eliminate adverse effects of potential abnormal samples in the training data.
According to the system of the second aspect of the present invention, the third processing unit 303 is specifically configured to:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component;
wherein M and N are positive integers, and M is more than or equal to N.
According to the system of the second aspect of the present invention, the third processing unit 303 is specifically configured to:
the determining, by the anomaly evaluation component, the anomaly attributes of the M events based on the occurrence probability by using the evaluation criterion specifically includes:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for automatically detecting abnormal events based on sequence reconstruction according to any one of the first aspect of the disclosure when executing the computer program.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device, which are connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, Near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that the structure shown in fig. 4 is only a partial block diagram related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the solution of the present application is applied, and a specific electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program, which when executed by a processor, implements the steps in a method for automatic detection of abnormal events based on sequence reconstruction according to any one of the first aspect of the present disclosure.
In conclusion, the technical scheme provided by the invention can fully utilize the potential time sequence dependency relationship in the log sequence data, and relieve the problems of over-fitting and under-fitting in the unsupervised detection process by means of probability modeling and sequence reconstruction, thereby improving the accuracy of the abnormal event automatic detection system. The invention has the technical characteristics and obvious effects that: firstly, the invention provides an abnormal event detection model based on sequence reconstruction, which can fully utilize the potential time sequence dependency relationship in a discrete event sequence and relieve the overfitting problem of the traditional prediction-based method, thereby improving the accuracy of abnormal detection; secondly, the invention provides a new dual-sequence anomaly detection evaluation standard, which can increase the accuracy of the detection result while reducing the system missing report, and provides a new guiding principle for judging the anomaly of an anomaly detection scheme.
It should be noted that the technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered. The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. An abnormal event automatic detection method based on sequence reconstruction is characterized by comprising the following steps:
step S1, determining a discrete event sequence from a plurality of source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
step S2, splitting the discrete event sequence to obtain a plurality of original subsequences, and embedding the characteristics of the original subsequences to further obtain an input subsequence of an unsupervised detection model, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
step S3, based on the original subsequence and the reconstructed subsequence of the discrete event sequence, judging the abnormal attribute of the events by using a judgment criterion;
wherein, in the step S3:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component; the method specifically comprises the following steps:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein M and N are positive integers, and M is more than or equal to N; i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
2. The method for automatically detecting abnormal events based on sequence reconstruction as claimed in claim 1, wherein the step S1 specifically includes:
analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs;
and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
3. The method for automatically detecting abnormal events based on sequence reconstruction as claimed in claim 2, wherein in said step S2:
splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences;
the characteristic embedding of the plurality of original subsequences specifically comprises the following steps: and converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to serve as the input subsequence.
4. The method according to claim 3, wherein in the step S2, the step of generating the reconstructed subsequence of the discrete event sequence based on the input subsequence specifically includes:
dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps:
compressing the training sequence into an abstract representation using the LSTM encoder;
calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed;
decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
5. The method for automatically detecting abnormal events based on sequence reconstruction according to claim 4, wherein in the step S2:
the LSTM encoder comprises a plurality of layers of LSTM networks, and the plurality of layers of LSTM networks are used for extracting the timing dependence relation of the training sequence so as to compress the training sequence into an abstract representation based on the timing dependence relation;
and generating the abstract representation to be reconstructed by adopting a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component so as to eliminate the adverse effect of the potential abnormal sample in the training data.
6. An abnormal event automatic detection system based on sequence reconstruction, which is characterized by comprising:
the first processing unit is configured to determine a discrete event sequence from the multiple source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
a second processing unit configured to perform a splitting process on the discrete event sequence to obtain a number of original subsequences, and further obtain an input subsequence of an unsupervised detection model by performing feature embedding on the number of original subsequences, where the unsupervised detection model includes an LSTM encoder, a variation component, and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
a third processing unit configured to determine abnormal attributes of the plurality of events by using a criterion of evaluation based on the original subsequence and the reconstructed subsequence of the discrete event sequence;
wherein the third processing unit is specifically configured to:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component; the method specifically comprises the following steps:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein M and N are positive integers, and M is more than or equal to N; i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
7. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for automatic detection of abnormal events based on sequence reconstruction according to any one of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, implements the steps of the method for automatic detection of abnormal events based on sequence reconstruction as claimed in any one of claims 1 to 5.
CN202210234545.3A 2022-03-11 2022-03-11 Abnormal event automatic detection method and system based on sequence reconstruction Active CN114356743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210234545.3A CN114356743B (en) 2022-03-11 2022-03-11 Abnormal event automatic detection method and system based on sequence reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210234545.3A CN114356743B (en) 2022-03-11 2022-03-11 Abnormal event automatic detection method and system based on sequence reconstruction

Publications (2)

Publication Number Publication Date
CN114356743A CN114356743A (en) 2022-04-15
CN114356743B true CN114356743B (en) 2022-06-07

Family

ID=81095192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210234545.3A Active CN114356743B (en) 2022-03-11 2022-03-11 Abnormal event automatic detection method and system based on sequence reconstruction

Country Status (1)

Country Link
CN (1) CN114356743B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021100179A1 (en) * 2019-11-21 2021-05-27 日本電信電話株式会社 Abnormality detection device, abnormality detection method, and program
CN112784965B (en) * 2021-01-28 2022-07-29 广西大学 Large-scale multi-element time series data anomaly detection method oriented to cloud environment
CN113568774B (en) * 2021-07-27 2024-01-16 东华大学 Multi-dimensional time sequence data real-time abnormality detection method using unsupervised deep neural network
CN113868006B (en) * 2021-10-09 2024-03-01 中国建设银行股份有限公司 Time sequence detection method and device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN114356743A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
US20190392252A1 (en) Systems and methods for selecting a forecast model for analyzing time series data
CN116300691B (en) State monitoring method and system for multi-axis linkage numerical control machining
CN111967571A (en) MHMA-based anomaly detection method and equipment
CN108460397B (en) Method and device for analyzing equipment fault type, storage medium and electronic equipment
CN113228006A (en) Apparatus and method for detecting anomalies in successive events and computer program product thereof
KR102359090B1 (en) Method and System for Real-time Abnormal Insider Event Detection on Enterprise Resource Planning System
WO2022001125A1 (en) Method, system and device for predicting storage failure in storage system
CN113920053A (en) Defect detection method based on deep learning, computing device and storage medium
CN115456107A (en) Time series abnormity detection system and method
US9454457B1 (en) Software test apparatus, software test method and computer readable medium thereof
CN115169430A (en) Cloud network end resource multidimensional time sequence anomaly detection method based on multi-scale decoding
WO2019019429A1 (en) Anomaly detection method, device and apparatus for virtual machine, and storage medium
CN109086186B (en) Log detection method and device
CN114090556A (en) Electric power marketing data acquisition method and system
CN110826616B (en) Information processing method and device, electronic equipment and storage medium
CN114356743B (en) Abnormal event automatic detection method and system based on sequence reconstruction
CN117093477A (en) Software quality assessment method and device, computer equipment and storage medium
KR102192461B1 (en) Apparatus and method for learning neural network capable of modeling uncerrainty
CN116559619A (en) Method and related apparatus for testing semiconductor device
CN115221045A (en) Multi-target software defect prediction method based on multi-task and multi-view learning
CN114860542A (en) Trend prediction model optimization method, trend prediction model optimization device, electronic device, and medium
CN114168409A (en) Service system running state monitoring and early warning method and system
CN114791886B (en) Software problem tracking method and system
CN115828977B (en) Industrial Internet label coding method, device, medium and electronic equipment
CN114580982B (en) Method, device and equipment for evaluating data quality of industrial equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant