CN114356743B - Abnormal event automatic detection method and system based on sequence reconstruction - Google Patents
Abnormal event automatic detection method and system based on sequence reconstruction Download PDFInfo
- Publication number
- CN114356743B CN114356743B CN202210234545.3A CN202210234545A CN114356743B CN 114356743 B CN114356743 B CN 114356743B CN 202210234545 A CN202210234545 A CN 202210234545A CN 114356743 B CN114356743 B CN 114356743B
- Authority
- CN
- China
- Prior art keywords
- event
- sequence
- events
- subsequence
- abnormal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention provides an abnormal event automatic detection method and system based on sequence reconstruction. The method comprises the following steps: step S1, determining a discrete event sequence from a plurality of source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence; step S2, splitting the discrete event sequence to obtain a plurality of original subsequences, and embedding characteristics of the plurality of original subsequences to further obtain an input subsequence of an unsupervised detection model, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence; step S3, determining the abnormal attribute of the events by using a criterion based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
Description
Technical Field
The invention belongs to the field of data detection, and particularly relates to an abnormal event automatic detection method and system based on sequence reconstruction.
Background
The proliferation of log data has resulted in an increasing demand for anomaly detection in many areas, which is a fundamental task in building secure, reliable and trustworthy computer systems. It has been shown by investigation that large modern systems generate logs at a rate of about 50GB (about 1.2 hundred million lines) per hour on average, and once a system fails, it becomes very difficult to manually identify key information from such a huge amount of logs for anomaly detection, even with a practical search tool such as grep. Meanwhile, many tasks are mostly expanded in the form of event sequences during specific implementation, so that behavior traces are more discrete and chaotic, and the difficulty of anomaly detection is greatly increased. Therefore, in order to meet practical challenges such as difficulty in rapid analysis of a large amount of discrete logs, difficulty in accurate positioning of complex abnormal behaviors, and difficulty in effective avoidance of system misinformation, an accurate and efficient automatic abnormal event detection system is urgently needed.
The current anomaly detection methods for time sequence discrete events are mainly divided into three categories: 1) based on the traditional machine learning method, the method mainly utilizes quantitative or statistical information of events to detect abnormity, but has the defects of insufficient consideration of time sequence information among events and high false alarm rate; 2) workflow-based methods, which assume that there is a workflow model similar to a finite state machine to represent the normal sequence of event-jump states, are mostly deterministic and cannot capture the complex long-term dependencies in the sequence, and therefore can only provide limited anomaly detection performance. 3) Based on a deep learning method, the method can utilize a strong deep network model to automatically learn a normal sequence pattern hidden in log data, detects abnormality by comparing whether a test sample deviates from the normal pattern, and is a mainstream development direction of future log analysis and abnormality detection. However, most of the existing anomaly detection methods based on deep learning realize anomaly detection by predicting a single event in the future, and on one hand, the method cannot fully utilize the time sequence characteristics of the existing event sequence, and on the other hand, the method is easy to fall into an under-fit or over-fit error region when a normal model is constructed, so that the detection accuracy cannot be ensured.
Disclosure of Invention
The application provides an abnormal event automatic detection scheme based on sequence reconstruction. The technical problem to be solved by the invention is as follows: on the premise of giving a group of discrete event log sequences as historical monitoring data, how to construct an automatic abnormal event detection system can accurately identify whether a subsequent event is abnormal or not. Further broken down into two parts: 1) how to fully utilize potential time sequence dependency relationship in discrete event sequences to construct a more accurate sequence model; 2) how to overcome the problems of over-fitting and under-fitting in the model construction process so as to improve the accuracy of anomaly detection.
The invention discloses an abnormal event automatic detection method based on sequence reconstruction in a first aspect. The method comprises the following steps:
step S1, determining a discrete event sequence from a plurality of source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
step S2, splitting the discrete event sequence to obtain a plurality of original subsequences, and embedding characteristics of the plurality of original subsequences to further obtain an input subsequence of an unsupervised detection model, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
and step S3, judging the abnormal attributes of the events by using a judgment criterion based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
According to the method of the first aspect of the present invention, the step S1 specifically includes:
analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs;
and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
According to the method of the first aspect of the present invention, in said step S2:
splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences;
the characteristic embedding of the plurality of original subsequences specifically comprises the following steps: and converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to serve as the input subsequence.
According to the method of the first aspect of the present invention, in step S2, the generating a reconstructed subsequence of the discrete event sequence based on the input subsequence specifically includes:
dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps:
compressing the training sequence into an abstract representation using the LSTM encoder;
calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed;
decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
According to the method of the first aspect of the present invention, in said step S2:
the LSTM encoder comprises a plurality of layers of LSTM networks, and the plurality of layers of LSTM networks are used for extracting the timing dependence relation of the training sequence so as to compress the training sequence into an abstract representation based on the timing dependence relation;
and generating the abstract representation to be reconstructed by taking a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component so as to eliminate the adverse effect of the potential abnormal sample in the training data.
According to the method of the first aspect of the present invention, the step S3 specifically includes:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component;
wherein M and N are positive integers, and M is more than or equal to N.
According to the method of the first aspect of the present invention, the determining, by the anomaly evaluation component, the anomaly attributes of the M events based on the occurrence probability by using the criterion specifically includes:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
The invention discloses an abnormal event automatic detection system based on sequence reconstruction in a second aspect. The system comprises:
the first processing unit is configured to determine a discrete event sequence from the multiple source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
a second processing unit configured to perform a splitting process on the discrete event sequence to obtain a number of original subsequences, and further obtain an input subsequence of an unsupervised detection model by performing feature embedding on the number of original subsequences, where the unsupervised detection model includes an LSTM encoder, a variation component, and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
a third processing unit configured to determine abnormal properties of the plurality of events using a criterion of evaluation based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
According to the system of the second aspect of the invention, the first processing unit is specifically configured to:
analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs;
and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
According to the system of the second aspect of the invention, the second processing unit is specifically configured to:
splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences;
the characteristic embedding of the plurality of original subsequences specifically comprises the following steps: and converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to serve as the input subsequence.
According to the system of the second aspect of the present invention, the second processing unit is specifically configured to generate a reconstructed subsequence of the discrete event sequence based on the input subsequence, specifically including:
dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps:
compressing the training sequence into an abstract representation using the LSTM encoder;
calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed;
decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
According to the system of the second aspect of the present invention, the LSTM encoder includes a plurality of layers of LSTM networks, where the plurality of layers of LSTM networks are configured to extract a timing dependency relationship of the training sequence, so as to compress the training sequence into an abstract representation based on the timing dependency relationship; the second processing unit is specifically configured to generate the abstract representation to be reconstructed by using a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component, so as to eliminate adverse effects of potential abnormal samples in the training data.
According to the system of the second aspect of the invention, the third processing unit is specifically configured to:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component;
wherein M and N are positive integers, and M is more than or equal to N.
According to the system of the second aspect of the invention, the third processing unit is specifically configured to:
the determining, by the anomaly evaluation component, the anomaly attributes of the M events based on the occurrence probability by using the evaluation criterion specifically includes:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for automatically detecting abnormal events based on sequence reconstruction according to any one of the first aspect of the disclosure when executing the computer program.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program, which when executed by a processor, implements the steps in a method for automatic detection of abnormal events based on sequence reconstruction according to any one of the first aspect of the present disclosure.
In conclusion, the technical scheme provided by the invention can fully utilize the potential time sequence dependency relationship in the log sequence data, and relieve the problems of over-fitting and under-fitting in the unsupervised detection process by means of probability modeling and sequence reconstruction, thereby improving the accuracy of the abnormal event automatic detection system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an abnormal event automatic detection method based on sequence reconstruction according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an abnormal event automatic detection process according to an embodiment of the present invention;
FIG. 3 is a block diagram of an abnormal event automatic detection system based on sequence reconstruction according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention discloses an abnormal event automatic detection method based on sequence reconstruction in a first aspect. Fig. 1 is a flowchart of an abnormal event automatic detection method based on sequence reconstruction according to an embodiment of the present invention; as shown in fig. 1, the method includes:
step S1, determining a discrete event sequence from a plurality of source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
step S2, splitting the discrete event sequence to obtain a plurality of original subsequences, and embedding the characteristics of the original subsequences to further obtain an input subsequence of an unsupervised detection model, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
and step S3, judging the abnormal attributes of the events by using a judgment criterion based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
FIG. 2 is a schematic diagram of an abnormal event automatic detection process according to an embodiment of the present invention; as shown in fig. 2, the method mainly includes three steps of data preprocessing (S1), unsupervised detection (S2), and exception output (S3).
The data preprocessing (S1) is mainly responsible for the analysis and sequence conversion work of the discrete event logs, and the obtained event sequence is fed into an unsupervised detection module and an abnormal output module for subsequent processing. Unsupervised detection (S2) obtains a plurality of sub-sequences in the form of sliding window after receiving a preprocessed discrete event sequence, and each sub-sequence is encoded by the feature embedding component and then fed to the LSTM variational self-encoder for sequence reconstruction. In this process, the stacked LSTM network is responsible for extracting potential timing dependencies in the event sequence, and the variational encoder model performs probability modeling on abstract representations of the timing relationships and generates reconstructed versions of the original subsequences based on the probability distribution. The anomaly output (S3) takes the original subsequence and its reconstructed version as input, and automatically obtains the final detection result by constructing a classifier and an anomaly evaluation component.
At step S1, a discrete event sequence is determined from the multiple source logs using a predefined event template, the discrete event sequence being formed by multiple event logs of the same user being spliced in time sequence.
In some embodiments, the step S1 specifically includes: analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs; and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
Specifically, a data preprocessing process is called, a group of predefined event templates are extracted from an audit log, and lengthy and disordered log entries are analyzed into concise and standard eventse(ii) a And simultaneously, according to the user ID identification recorded in the log, aggregating the corresponding logs into a discrete event sequence according to the time sequence.
In step S2, the discrete event sequence is split to obtain a plurality of original sub-sequences, and an input sub-sequence of an unsupervised detection model is further obtained by feature embedding the plurality of original sub-sequences, wherein the unsupervised detection model includes an LSTM encoder, a variation component, and an LSTM decoder, and is used for generating a reconstructed sub-sequence of the discrete event sequence based on the input sub-sequence.
In some embodiments, in said step S2: splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences; the characteristic embedding of the plurality of original subsequences is specifically as follows: converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to be used as the input subsequence
In some embodiments, in the step S2, the generating a reconstructed subsequence of the discrete event sequence based on the input subsequence specifically includes:
(1) dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps: (i) compressing the training sequence into an abstract representation using the LSTM encoder; (ii) calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed; (iii) decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
(2) and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
In some embodiments, in said step S2: the LSTM encoder comprises a plurality of layers of LSTM networks, and the plurality of layers of LSTM networks are used for extracting the timing dependence relation of the training sequence so as to compress the training sequence into an abstract representation based on the timing dependence relation; and generating the abstract representation to be reconstructed by adopting a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component so as to eliminate the adverse effect of the potential abnormal sample in the training data.
Specifically, an unsupervised detection mode procedure is invoked, first according to a sliding window mechanism and a preset window lengthWSplitting event sequencesGenerating a plurality of subsequences, coding the subsequences by a feature embedding component, and taking the coded sequence vectors as input of a subsequent unsupervised detection model. The feature embedding work includes inserting a longer length (W) Of (3) event subsequences W =[e 1 ,e 2 ,…,e W ]Conversion to smaller dimensions (d) Feature vector ofX W . The encoded feature vectors are used to train an unsupervised detection model based on LSTM and variational encoders. Specifically, the model mainly comprises three parts of an LSTM encoder, a variation component and an LSTM decoder, wherein the LSTM encoder encodes the encoded feature vectorsX W Compression into a potentially abstract representationh enc And the variation component simulates the data distribution of the abstract representation by calculating the average value mu and the standard deviation delta of the compressed representation and generates the abstract representation to be reconstructed by taking the standard normal distribution random number epsilon as a seedh enc ’,The LSTM decoder then re-characterizes the generated abstractionh enc ’Decoding is performed to obtain reconstructed feature vectors. In this process, the stacked multi-layer LSTM network is intended to extract potential timing dependencies in the event sequence, and the variational mechanism eliminates adverse effects of possible presence of anomalous samples in the training data by modeling the data distribution of abstract representations of the event sequence.
In step S3, based on the original subsequence and the reconstructed subsequence of the discrete event sequence, an abnormal attribute of the plurality of events is determined using a criterion of evaluation.
In some embodiments, the step S3 specifically includes: aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component;
wherein M and N are positive integers, and M is more than or equal to N.
In some embodiments, the determining, by the anomaly evaluation component, the anomaly attributes of the M events based on the occurrence probability using the evaluation criterion includes:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
Specifically, an abnormal output process is called, the process takes an original event subsequence and a reconstructed subsequence generated by an unsupervised detection module as input, a single-layer full-connection classifier model is constructed to obtain a probability value of each event in the sequence at each position, and then the probability is comprehensively evaluated through an embedded abnormal evaluation component to generate a final detection result. Specifically, the anomaly evaluation component includes two parallel criteria, one for checking whether the probability of occurrence of the correct event in the reconstructed sequence is lower than a certain threshold, and the other for checking whether the correct event at each position of the reconstructed sequence is located in the top K of the reconstruction probability. If both criteria cannot be met simultaneously, the subsequence is considered abnormal and alerted.
The invention discloses an abnormal event automatic detection system based on sequence reconstruction in a second aspect. FIG. 3 is a block diagram of an abnormal event automatic detection system based on sequence reconstruction according to an embodiment of the present invention; as shown in fig. 3, the system 300 includes:
a first processing unit 301, configured to determine a discrete event sequence from multiple source logs by using a predefined event template, where the discrete event sequence is formed by splicing multiple event logs of the same user in a time sequence;
a second processing unit 302 configured to perform a splitting process on the discrete event sequence to obtain a number of original sub-sequences, and further obtain an input sub-sequence of an unsupervised detection model by performing feature embedding on the number of original sub-sequences, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed sub-sequence of the discrete event sequence based on the input sub-sequence;
a third processing unit 303 configured to determine abnormal properties of the plurality of events by using a criterion of evaluation based on the original subsequence and the reconstructed subsequence of the discrete event sequence.
According to the system of the second aspect of the present invention, the first processing unit 301 is specifically configured to:
analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs;
and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
According to the system of the second aspect of the present invention, the second processing unit 302 is specifically configured to:
splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences;
the characteristic embedding of the plurality of original subsequences specifically comprises the following steps: and converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to serve as the input subsequence.
According to the system of the second aspect of the present invention, the second processing unit 302 is specifically configured to generate a reconstructed subsequence of the discrete event sequence based on the input subsequence, and specifically includes:
dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps:
compressing the training sequence into an abstract representation using the LSTM encoder;
calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed;
decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
According to the system of the second aspect of the present invention, the LSTM encoder includes a multi-layer LSTM network, and the multi-layer LSTM network is configured to extract a timing dependency relationship of the training sequence, so as to compress the training sequence into an abstract representation based on the timing dependency relationship; the second processing unit 302 is specifically configured to generate the abstract representation to be reconstructed by using a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component, so as to eliminate adverse effects of potential abnormal samples in the training data.
According to the system of the second aspect of the present invention, the third processing unit 303 is specifically configured to:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component;
wherein M and N are positive integers, and M is more than or equal to N.
According to the system of the second aspect of the present invention, the third processing unit 303 is specifically configured to:
the determining, by the anomaly evaluation component, the anomaly attributes of the M events based on the occurrence probability by using the evaluation criterion specifically includes:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for automatically detecting abnormal events based on sequence reconstruction according to any one of the first aspect of the disclosure when executing the computer program.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device, which are connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, Near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that the structure shown in fig. 4 is only a partial block diagram related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the solution of the present application is applied, and a specific electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program, which when executed by a processor, implements the steps in a method for automatic detection of abnormal events based on sequence reconstruction according to any one of the first aspect of the present disclosure.
In conclusion, the technical scheme provided by the invention can fully utilize the potential time sequence dependency relationship in the log sequence data, and relieve the problems of over-fitting and under-fitting in the unsupervised detection process by means of probability modeling and sequence reconstruction, thereby improving the accuracy of the abnormal event automatic detection system. The invention has the technical characteristics and obvious effects that: firstly, the invention provides an abnormal event detection model based on sequence reconstruction, which can fully utilize the potential time sequence dependency relationship in a discrete event sequence and relieve the overfitting problem of the traditional prediction-based method, thereby improving the accuracy of abnormal detection; secondly, the invention provides a new dual-sequence anomaly detection evaluation standard, which can increase the accuracy of the detection result while reducing the system missing report, and provides a new guiding principle for judging the anomaly of an anomaly detection scheme.
It should be noted that the technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered. The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (8)
1. An abnormal event automatic detection method based on sequence reconstruction is characterized by comprising the following steps:
step S1, determining a discrete event sequence from a plurality of source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
step S2, splitting the discrete event sequence to obtain a plurality of original subsequences, and embedding the characteristics of the original subsequences to further obtain an input subsequence of an unsupervised detection model, wherein the unsupervised detection model comprises an LSTM encoder, a variation component and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
step S3, based on the original subsequence and the reconstructed subsequence of the discrete event sequence, judging the abnormal attribute of the events by using a judgment criterion;
wherein, in the step S3:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component; the method specifically comprises the following steps:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein M and N are positive integers, and M is more than or equal to N; i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
2. The method for automatically detecting abnormal events based on sequence reconstruction as claimed in claim 1, wherein the step S1 specifically includes:
analyzing the multi-source logs by using the predefined event template extracted from the audit logs to obtain a plurality of event logs;
and classifying and aggregating the event logs based on the user ID to determine a plurality of event logs of the same user, and splicing the event logs of the same user according to a time sequence to obtain the discrete event sequence.
3. The method for automatically detecting abnormal events based on sequence reconstruction as claimed in claim 2, wherein in said step S2:
splitting the discrete event sequence by using a sliding window mechanism and adopting a preset window length to obtain a plurality of original subsequences;
the characteristic embedding of the plurality of original subsequences specifically comprises the following steps: and converting each original subsequence into a feature vector with a smaller dimension through dimension reduction processing to serve as the input subsequence.
4. The method according to claim 3, wherein in the step S2, the step of generating the reconstructed subsequence of the discrete event sequence based on the input subsequence specifically includes:
dividing the input subsequence into a training sequence and a test sequence, wherein the training sequence is used for training the unsupervised detection model and comprises the following steps:
compressing the training sequence into an abstract representation using the LSTM encoder;
calculating the average value and the standard deviation of the compressed representation by using the variation component to simulate the data distribution of the abstract representation, and generating the abstract representation to be reconstructed by using a standard normal random number as a seed;
decoding the abstract representation to be reconstructed by using the LSTM decoder to obtain a reconstruction subsequence of a training process, and finishing the training process when the similarity between the reconstruction subsequence of the training process and the input subsequence is higher than a first threshold value;
and inputting the test sequence into a trained unsupervised detection model, and obtaining a reconstruction subsequence of the test process as a reconstruction subsequence of the discrete event sequence after the test sequence is processed by the LSTM encoder, the variation component and the LSTM decoder.
5. The method for automatically detecting abnormal events based on sequence reconstruction according to claim 4, wherein in the step S2:
the LSTM encoder comprises a plurality of layers of LSTM networks, and the plurality of layers of LSTM networks are used for extracting the timing dependence relation of the training sequence so as to compress the training sequence into an abstract representation based on the timing dependence relation;
and generating the abstract representation to be reconstructed by adopting a standard normal random number as a seed based on the data distribution of the abstract representation simulated by the variation component so as to eliminate the adverse effect of the potential abnormal sample in the training data.
6. An abnormal event automatic detection system based on sequence reconstruction, which is characterized by comprising:
the first processing unit is configured to determine a discrete event sequence from the multiple source logs by using a predefined event template, wherein the discrete event sequence is formed by splicing a plurality of event logs of the same user according to a time sequence;
a second processing unit configured to perform a splitting process on the discrete event sequence to obtain a number of original subsequences, and further obtain an input subsequence of an unsupervised detection model by performing feature embedding on the number of original subsequences, where the unsupervised detection model includes an LSTM encoder, a variation component, and an LSTM decoder, and is used for generating a reconstructed subsequence of the discrete event sequence based on the input subsequence;
a third processing unit configured to determine abnormal attributes of the plurality of events by using a criterion of evaluation based on the original subsequence and the reconstructed subsequence of the discrete event sequence;
wherein the third processing unit is specifically configured to:
aligning the original subsequence and the reconstructed subsequence of the discrete event sequence, each of the original subsequence and the reconstructed subsequence of the discrete event sequence containing M events, each of the M events corresponding to an event position, the event position characterizing a position of a current event in the event sequence thereof, wherein the M events contain the following cases:
m independent events different from each other;
n mutually different independent events and M-N repeated events, wherein the repeated events are repeated events of a plurality of events in the N mutually different independent events;
calculating the occurrence probability of each event in M events at M positions by using a single-layer full-connection classifier model, and judging the abnormal attribute of the M events by using the judgment criterion based on the occurrence probability by using an abnormal evaluation component; the method specifically comprises the following steps:
acquiring fixed positions of M events in the original subsequence;
in the reconstructed subsequence of the discrete sequence of events:
when the probability that the ith event occurs on the fixed position corresponding to the ith event is lower than a second threshold value, judging that the ith event is an abnormal event;
when the probability that the ith event appears at the fixed position corresponding to the ith event is not lower than a second threshold, acquiring the probability that other events except the ith event appear at the fixed position corresponding to the ith event as a probability set, and judging the ranking position of the probability that the ith event appears at the fixed position corresponding to the ith event in the probability set:
when the sequencing positions are K in the top, judging the ith event as a normal event;
otherwise, judging the ith event as an abnormal event;
wherein M and N are positive integers, and M is more than or equal to N; i and K are positive integers, i is more than or equal to 1 and less than or equal to M, and K is more than or equal to 1 and less than or equal to N.
7. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for automatic detection of abnormal events based on sequence reconstruction according to any one of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, implements the steps of the method for automatic detection of abnormal events based on sequence reconstruction as claimed in any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210234545.3A CN114356743B (en) | 2022-03-11 | 2022-03-11 | Abnormal event automatic detection method and system based on sequence reconstruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210234545.3A CN114356743B (en) | 2022-03-11 | 2022-03-11 | Abnormal event automatic detection method and system based on sequence reconstruction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114356743A CN114356743A (en) | 2022-04-15 |
CN114356743B true CN114356743B (en) | 2022-06-07 |
Family
ID=81095192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210234545.3A Active CN114356743B (en) | 2022-03-11 | 2022-03-11 | Abnormal event automatic detection method and system based on sequence reconstruction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114356743B (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021100179A1 (en) * | 2019-11-21 | 2021-05-27 | 日本電信電話株式会社 | Abnormality detection device, abnormality detection method, and program |
CN112784965B (en) * | 2021-01-28 | 2022-07-29 | 广西大学 | Large-scale multi-element time series data anomaly detection method oriented to cloud environment |
CN113568774B (en) * | 2021-07-27 | 2024-01-16 | 东华大学 | Multi-dimensional time sequence data real-time abnormality detection method using unsupervised deep neural network |
CN113868006B (en) * | 2021-10-09 | 2024-03-01 | 中国建设银行股份有限公司 | Time sequence detection method and device, electronic equipment and computer storage medium |
-
2022
- 2022-03-11 CN CN202210234545.3A patent/CN114356743B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN114356743A (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190392252A1 (en) | Systems and methods for selecting a forecast model for analyzing time series data | |
CN116300691B (en) | State monitoring method and system for multi-axis linkage numerical control machining | |
CN111967571A (en) | MHMA-based anomaly detection method and equipment | |
CN108460397B (en) | Method and device for analyzing equipment fault type, storage medium and electronic equipment | |
CN113228006A (en) | Apparatus and method for detecting anomalies in successive events and computer program product thereof | |
KR102359090B1 (en) | Method and System for Real-time Abnormal Insider Event Detection on Enterprise Resource Planning System | |
WO2022001125A1 (en) | Method, system and device for predicting storage failure in storage system | |
CN113920053A (en) | Defect detection method based on deep learning, computing device and storage medium | |
CN115456107A (en) | Time series abnormity detection system and method | |
US9454457B1 (en) | Software test apparatus, software test method and computer readable medium thereof | |
CN115169430A (en) | Cloud network end resource multidimensional time sequence anomaly detection method based on multi-scale decoding | |
WO2019019429A1 (en) | Anomaly detection method, device and apparatus for virtual machine, and storage medium | |
CN109086186B (en) | Log detection method and device | |
CN114090556A (en) | Electric power marketing data acquisition method and system | |
CN110826616B (en) | Information processing method and device, electronic equipment and storage medium | |
CN114356743B (en) | Abnormal event automatic detection method and system based on sequence reconstruction | |
CN117093477A (en) | Software quality assessment method and device, computer equipment and storage medium | |
KR102192461B1 (en) | Apparatus and method for learning neural network capable of modeling uncerrainty | |
CN116559619A (en) | Method and related apparatus for testing semiconductor device | |
CN115221045A (en) | Multi-target software defect prediction method based on multi-task and multi-view learning | |
CN114860542A (en) | Trend prediction model optimization method, trend prediction model optimization device, electronic device, and medium | |
CN114168409A (en) | Service system running state monitoring and early warning method and system | |
CN114791886B (en) | Software problem tracking method and system | |
CN115828977B (en) | Industrial Internet label coding method, device, medium and electronic equipment | |
CN114580982B (en) | Method, device and equipment for evaluating data quality of industrial equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |