CN112202726A - System anomaly detection method based on context sensing - Google Patents

System anomaly detection method based on context sensing Download PDF

Info

Publication number
CN112202726A
CN112202726A CN202010948293.1A CN202010948293A CN112202726A CN 112202726 A CN112202726 A CN 112202726A CN 202010948293 A CN202010948293 A CN 202010948293A CN 112202726 A CN112202726 A CN 112202726A
Authority
CN
China
Prior art keywords
model
behavior
system behavior
detection
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010948293.1A
Other languages
Chinese (zh)
Other versions
CN112202726B (en
Inventor
师斌
杨圆哲
郑庆华
董博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010948293.1A priority Critical patent/CN112202726B/en
Publication of CN112202726A publication Critical patent/CN112202726A/en
Application granted granted Critical
Publication of CN112202726B publication Critical patent/CN112202726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a system anomaly detection method based on context sensing, which is used for constructing an anomaly detection model by combining a long-term and short-term memory neural network to realize the detection of system anomaly behaviors. The method comprises the steps of obtaining context information of system calling from system stack information during system operation, constructing a context information list, intercepting a system behavior sequence with a fixed length in training data, performing state compression by using word embedding vectors, and building an anomaly detection model by using a long-short term memory neural network, wherein the detection of the system anomaly calling sequence can be realized after the anomaly detection model is trained. The invention also controls the false alarm rate by adjusting the parameters, and adjusts the parameters by collecting the detection result of false alarm and updating the online model to optimize the model, thereby realizing the high-efficiency and accurate system abnormity detection.

Description

System anomaly detection method based on context sensing
Technical Field
The invention belongs to the field of system anomaly detection, and particularly relates to a system anomaly detection method based on context sensing.
Background
System anomaly detection techniques may identify states, events, or other variables in the system that do not match expectations. The system anomaly detection technology has the advantages that unknown attacks and anomalies can be actively defended without depending on attack signatures, however, as the system scale and the complexity of application programs increase, the system and the application will encounter more and more bugs and defects, and the attacks launched by the bugs and defects become more and more complex. This makes anomaly detection techniques more difficult and many conventional anomaly feature-based detection methods are no longer effective. In response to the problem, many researches propose that the characteristics of the system in normal operation are used for anomaly detection, but not the characteristics of the system in attack, so as to accurately model the normal behavior of the system, and therefore, the potential anomaly behavior is detected through the current behavior mode. The prior art establishes a system normal behavior model which is mainly divided into two categories: deterministic models and probabilistic models. The idea of the deterministic model is to record all normal behaviors that the system has ever appeared, and to mark an abnormal behavior if an unknown behavior is found in the detection. The deterministic model cannot distinguish the anomalies in the probability of behavior occurrence, so that the high false alarm rate and the low detection rate are achieved. On the contrary, the probability model mainly calculates the probability of occurrence of various current behaviors through historical data, and if the occurrence probability is lower than the threshold of the confidence interval, the probability model is judged to be abnormal. The following patents provide reference to a system anomaly detection method:
document 1. a distributed system abnormality detection method (201110093278.4)
Document 2. anomaly detection system based on neural network in cloud computing (201210559741.4)
Document 3. a method and system for daily flow anomaly detection (201710607485.4)
Document 1 determines a historical relevance degree of a measure attribute according to historical information, generates a measure attribute relationship network division model, performs relevance degree calculation on newly acquired data, updates the network model, and judges whether an abnormality occurs according to a division result. The method has the problems that only the single aspect of similarity is considered, the model features are insufficient, and the provided information is compared one side.
Document 2 collects logs on the monitoring host, defines the type of an abnormal event according to a certain neural network algorithm, updates the agent rule base, and generates a response when meeting the defined abnormal characteristics. The problem with this approach is that the training scale is large.
Document 3 performs anomaly detection on a log stream by using a detection model, extracts data features from static statistical data, inputs the features into an initial model, trains the model, acquires the log stream, and updates data. The problem is that only static statistical data features are extracted, and context information is not considered.
The above documents all have the problem that the context information is not well utilized, and various attacks of normal codes of the multiplexing system cannot be detected. In general, the current related anomaly detection work lacks context information, which results in insufficient data characteristics adopted by the current work and a bottleneck of improving detection performance. Therefore, the context data is urgently needed to be added into an abnormality detection model for research.
Disclosure of Invention
The invention aims to provide a system abnormity detection method based on context sensing, which can learn and update parameters in real time and solve the problem that a model cannot be updated in real time.
The invention is realized by adopting the following technical scheme:
a system anomaly detection method based on context awareness comprises the following steps:
1) obtaining context system behavior information according to the system call;
2) different system behaviors are represented by different hash values;
3) compressing the state by using the word embedding vector, constructing an anomaly detection model by using a long-short term memory neural network, and training the model by using a system behavior represented by a hash value;
4) and carrying out anomaly detection by using the trained model.
In a further development of the invention, the method further comprises: and adjusting parameters to control the false alarm rate, collecting the detection result of false alarm, and updating the existing model according to the result to improve the accuracy.
The invention further improves the method, and the specific implementation steps of the step 1) are as follows:
step1, acquiring system call name and program address
The method comprises the steps that when a system is called every time, a system calling name and a current PC (personal computer), namely a program counter, are obtained, and the PC represents a program address for calling the system calling;
step2. build context information list
Collecting all return addresses in a system stack, extracting the return addresses into a context information list, adding a current PC (personal computer) as an element into the list, and recording calling structure information of the system by using the context information list;
step3. recursive function processing
If a pair of identical return addresses are found in the stack, which may be a recursive call, the system removes all return addresses between them from the context information list.
The invention further improves the method, and the specific implementation steps of the step 2) are as follows:
step1. System behavior representation
Definition S ═ S1,s2,s3...sNDenotes the set of all system behaviors, with element siNamely the system behavior; definition w ═ { m1,m2,m3,…,mnIs the sequence of system behavior over a period of time, where miRepresenting the behavior at position i in the sequence; then miIs any one of N elements in S, and its value depends on miThe previously occurring system behavior;
the input of the model is the latest system historical data, and the output is the probability distribution of n system behaviors belonging to S, which respectively represents the probability that the next system behavior to occur is si; if the next system behavior to be predicted is mt, then the input for the model is the most recently generated sequence of system behaviors w of length h; wherein w ═ { m ═ mt-h,..,mt-2,mt-1In which the values of each mi are in a set S, in which different elements in different w sequencesThe values may be the same; the output of the model is the conditional probability distribution P [ mt ═ si | w ] of the mt value]Wherein s isiE, S, i is 1, if m is actually m during detectiontIf the result is outside the confidence interval of the conditional probability, the result is determined to be abnormal;
step2. intercepting system behavior sequence
Using a stepping method, sequentially intercepting a system behavior sequence of a time window with the length of h in the training data, and updating a model according to the system behavior sequence to obtain each siE, taking S as the probability distribution of the next system behavior;
step3. anomaly type determination
The detection stage of the model is online and real-time, and the system behavior log captured in real time is transmitted into the detection model in a streaming manner; suppose a method is to detect an incoming system behavior mtIf it is abnormal, the method sets w to { m ═ mt-h,..,mt-2,mt-1Sending it as input into the model; the output at this time is the probability distribution P [ m ]t|w]={s1:p1,s2:p2,...,sn:pnDescribing the probability of each system behavior in S as mt; the following abnormalities may occur:
(1) acquiring program failure of system behavior in real time, wherein the exception is data acquisition exception;
(2) the training data may not cover all system behaviors, so the value of mt in the detection stage may not be included in the set S of system behaviors, and the abnormality is a system behavior abnormality;
(3) m of the detection stage in generaltThere are many possibilities to take the value of (a), mtProbability P (m) for different valuest=si|w,siE, sorting the e, i is 1, the.. n), and if the actually detected mt values are k values before the probability ranking, the method considers the mt values as normal; otherwise, it is considered as an exception, which is a behavioral sequence exception.
The invention is further improved in that the specific implementation steps of the step 3) are as follows:
step1. model simplification
Model simplification is first performed, assuming that the probability of a system behavior is determined by its nearest N preamble behaviors, rather than all preambles in the entire history, which is equivalent to the assumption P (m)t=si|m1,...,mt-1)=P(mt=si|mt-N,...,mt-1) Under this assumption, the probability is computed using the relative frequency in the training data to give its maximum likelihood estimate;
given a sequence of system behaviors training data W, the process of training LSTM is to reduce the deviation of each predicted next system behavior from the actual occurring system behavior in the training data, which is to learn a probability distribution P (m)t=si|m1,...,mt-1) The distribution maximizes the probability of training data as a whole system behavior sequence;
step2. transfer history information
The state of each element in the LSTM node contains a hidden state vector Ht-iAnd cell state vector Ct–iBoth are passed on to the next time node to initialize their states, then the state from the previous time node (H)t-i,Ct-i) And data entry at this time (m)t-i) Are commonly used to calculate the new state and output at that time;
step3. model calculation
The LSTM has three stages in the calculation process through the state of the previous time node and the data input at this time: a forgetting stage, which is used for selectively forgetting the input transmitted from the previous state; selecting a memory stage, and memorizing the current input; an output stage, which determines which contents are to be used as the output of the current state through the output gate parameters obtained by calculation;
step4. model training
During training, reducing a loss function of each input/output pair by using a gradient descent method; the input contains a w ═ m composed of h system behaviorst-h,...,mt-1And the output training label is the next system line corresponding to the window wIs a value s ofiThe loss function is calculated using the class cross entropy; after the training is completed, the output of the window w of the sequence of predicted behaviors, when predicted, is that each system behavior in w is in the corresponding LSTM block.
The invention further improves the method, and the specific implementation steps of the step 4) are as follows:
to mtSorting the conditional probability distribution of the values, and judging whether the k results before the probability are normal or not; the smaller the value of k is, the higher the detection rate is, but the higher the false alarm rate is at the cost; the larger the value of k is, the smaller the false alarm rate is, but the true abnormality can not be detected.
The invention further improves the method, and the specific implementation steps of the step 5) are as follows:
manually diagnosing each abnormality, and if the abnormality is false alarm, replacing parameters in the original model according to newly trained parameters to update the model on line;
if the system behavior is abnormal, m of the detection stagetA set S of system behaviors whose values are not contained in the training data; if so new mtIf the value is diagnosed to be normal, adding the value into the set S, and updating the embedded layer model and the hidden layer model by taking the data as a training label; therefore, when the method determines the dimension of the input layer according to the number of elements in the S set in the initial stage, the number which can be increased in the real-time detection process is reserved;
if the behavior sequence is abnormal, if the result is false alarm through manual diagnosis, the data is used as a training label to update the weight of the hidden layer model; therefore, when the system behavior sequence is given next time, the probability corresponding to the next system behavior of the system behavior sequence is increased, and the model is updated to adjust the parameter weight through new training data on the original basis.
The invention has at least the following beneficial technical effects:
according to the system abnormity detection method based on context awareness, the context characteristic information can generate a more expressive behavior model, so that the detection accuracy is improved. The method is a probabilistic model anomaly detection system trained based on a system call sequence. First, to address the problem of insufficient features of the solution model, the method proposes to enrich the existing system call information with stack information. The method can obtain the context information of the system call from the stack, and then the same system call with different contexts is treated differently, so that a more accurate normal behavior model is generated. Secondly, aiming at the problem of overlarge training scale, inspired by the function and performance advantages of a natural language processing model based on RNN (recurrent neural network), the method provides the purposes of compressing the state by using word embedding vectors and predicting the probabilities of different system calls by using an LSTM (long short term memory) neural network, thereby achieving the purpose of detecting abnormal system call sequences.
In conclusion, the invention firstly provides a method for performing state compression by using word embedded vectors according to context information and performing probabilistic model training by using a long-short term memory neural network, and the method can capture potential nonlinearity and high-dimensional dependency of each system call, thereby more effectively detecting system abnormality.
Drawings
FIG. 1 is a diagram illustrating the general structure of a context-aware system call exception detection mechanism.
Fig. 2 is a schematic overall framework flow diagram.
FIG. 3 is a flow chart of the application of the long-term and short-term memory neural network.
FIG. 4 is a schematic diagram of the time-series expansion of the LSTM model.
FIG. 5 is a schematic diagram of the LSTM global neural network structure.
Detailed Description
The invention is further described below with reference to the following figures and examples.
The method is used for modeling the anomaly detection under the Linux operating system based on the open source neural network Keras. In order to more clearly illustrate the technical solution of the present invention, a system anomaly detection method based on context awareness according to the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, a workflow of a context-aware system call anomaly detection mechanism mainly includes system call interception and context information acquisition, model training and anomaly detection, and in the left part of the diagram, a large amount of system behavior sequence data is acquired during normal operation of a system, different system behaviors are expressed by different hash values, and an anomaly detection model is trained by using the system behavior sequence. And as shown in the right part of the figure, the method transmits the newly intercepted system behavior hash value into the detection model to detect whether the system behavior hash value is normal or not, and records the result. Since the training data may not cover all cases and at the same time the normal behavior model of the system may change over time. The false positive detection result will also be collected and the existing model updated with this result. As shown in fig. 2, the method is implemented as follows:
s201, obtaining context information
The context information has a strong effect on detecting the exception of the execution flow, and for a detection model which is not sensitive to the context, the exception of the system function in the normal calling sequence is difficult to detect. However, since code reuse attacks can leverage existing code in the entire process memory, these system calls can be located on any code segment. While still conforming to normal system call sequence order, incorrect function call stack information can be easily identified by context information.
Step1, acquiring system call name and program address
And acquiring the system calling name and the current PC every time the system is called. For a 32-bit Linux system, the entry address is stored in IA32_ syserter _ EIP of MSR (special module register), and for a 64-bit system, the entry address is stored in LSTAR of MSR. The system call can be obtained by registering and intercepting a monitoring point at the entrance function, and the system call number and other useful parameters can be obtained from the system call parameters after interception. In the 32-bit system, the parameters of system call are respectively stored in EAX, EBX, ECX, EDX, ESI and EDI, in the 64-bit system, the parameters of system call are respectively stored in RAX, RDI, RSI, RDX, R10, R8 and R9, and the parameter system information is obtained by directly reading the register.
In the Linux system, most of the context information is related to the task _ struct of the current process structure, so the task _ struct of the current process needs to be acquired first. In a Linux operating system, some spaces are reserved in a kernel address space for storing information related to a process when the process is created, the space is of a unit type and has the size of 2 continuous memory pages, and task _ struct of the process is stored in thread _ info in the space. From the address of the thread _ info, the address of the task _ struct can be obtained. Wherein the process name is stored in the comm object under task _ struct; the process number is stored in the pid object; while the user stack address is stored in the mm- > start _ stack object for the process and in the thread- > sp object for the thread. After the user stack address is obtained, a backspace function of the glibc can be simulated to extract call stack information.
Step2. build context information list
The method extracts all return addresses in the collection system stack and stores the extracted return addresses into a Context information list, wherein the Context is { a }0,a1,...,an-1Where n is the number of call stack levels, an-1Representing the return address of the function that called the system call. Then the current PC is taken as anAnd is added to the list Context, where the Context information list Context has n elements. This context information list mainly records the call structure information of the system.
Step3. recursive function processing
For recursive functions, the number of recursive layers is often strongly correlated with data such as parameters. In this case, the same recursion procedure tends to have different depths due to different parameters, resulting in many different context information. This may make the training process more difficult to converge or result in a higher false alarm rate. Once a pair of identical return addresses are found in the stack, they may be the result of a recursive call. At which point the system removes all return addresses between them from the context information list.
S202, data preprocessing
Step1. System behavior representation
The system call is a main interface for the application program to access the system resource, and can well express the system behavior. The normal behavior of the system is limited, defining S ═ S1,s2,s3...sNDenotes the set of all system behaviors, with element siRepresenting system behavior. On the other hand, the sequence of the system behaviors can well represent the system execution flow. Definition w ═ { m1,m2,m3,…,mnIs the sequence of system behavior over a period of time, where miRepresenting the behavior at position i in the sequence. The value of mi may be any one of N elements in S, and its value strongly depends on the system behavior that occurs before mi.
The method models the system call sequence anomaly detection problem as a multi-class classification problem, where each different system behavior is a class. The input of the model is recent system history data, the output is the probability distribution of n system behaviors belonging to S, and the probability distribution respectively represents that the next system behavior to be generated is SiThe probability of (c). If the next system behavior to predict is mtThen the input for the model is the most recently generated sequence of system behaviors w of length h. Wherein w ═ { m ═ mt-h,..,mt-2,mt-1In which each m isiAre in the set S, wherein different elements in different w-sequences may take the same value. The output of the model is the conditional probability distribution P [ mt ═ si | w ] of the mt value]Wherein s isiE.s (i ═ 1.., n). Then if the actual mt result is outside the confidence interval of the conditional probability at the time of detection, it is considered as abnormal.
Step2. intercepting system behavior sequence
The training phase of the model mainly depends on the system behavior sequence captured when the system normally executes. In the invention, a stepping method is used for sequentially intercepting the system behavior sequence of a time window with the length of h in training data and updating a model according to the system behavior sequence to obtain each siE.s as the probability distribution for the next system behavior. The continuous data of the system behavior intercepted when the system is normally executed is assumed as follows: { ……,s36,s127,s87,s4,s10,s319… …, given a window size h of 3. Then the input sequence and output label pairs used to train the model would be: { s36,s127,s87→s4},{s127,s87,s4→s10},{s87,s4,s10→s319}。
Step3. anomaly type determination
If the method is to detect an incoming system behavior mtIf it is abnormal, the method needs to put w ═ mt-h,..,mt-2,mt-1Send it as input into the model. The output at this time is the probability distribution P [ m ]t|w]={s1:p1,s2:p2,...,sn:pnDescribes the probability of each system behavior in S as mt. The following abnormalities may occur:
(1) the program that acquires the system behavior in real time is disabled. The program that acquires the system behavior may fail due to an attack or other reasons, causing a problem that the sequence of the system call and its context cannot be acquired normally, and at this time, the method no longer has an input to be provided to the anomaly detection model. The exception is a data acquisition exception.
(2) All system behaviors may not be covered in the training data, so m of the detection phasetMay not be included in the set S of system behaviors. In fact, the occurrence of system behavior never seen in the normal execution of the system is very critical and is likely to represent a system anomaly occurrence. The exception is a system behavior exception.
(3) M of the detection stage in generaltThere are many possibilities for the values of (a) and they are all normal. M is to betProbability P (m) for different valuest=si|w,siE.s, i 1, n), if m is actually detectedtIf the values are k before the probability ranking, the values are considered as normal; otherwise, the result is regarded as abnormal. The anomaly is a behavioral sequence anomaly.
And obtaining the behavior abnormal type according to the abnormal behavior feedback.
S203, application of long-short term memory (LSTM) neural network mechanism
The application of the long-short term memory neural network mechanism is described below in conjunction with fig. 3-5.
S301 model simplification
Suppose that the system behavior sequence training data W ═ m1,m2…, mt }, the present invention uses its subsequence w1={mt-N,...,mt-1,mt=siH and subsequence w2={mt-N,...,mt-1The relative frequency of m is predictedtValue is siThe probability of (c).
Figure BDA0002676048140000101
The method counts the frequency of the entire historical system behavior sequence using a sliding window of size N.
Compared to traditional n-tuple probability models, LSTM-based models can recognize more complex patterns of behavior and maintain long memory states over the sequence. The method uses the LSTM neural network in the model to detect the abnormality of the system behavior sequence. Given a sequence of system behaviors training data W, the process of training LSTM is to minimize the deviation of each predicted next system behavior from the actual system behavior in the training data, which is to learn a probability distribution P (m)t=si|m1,...,mt-1) The probability of training the entire system behavior sequence of the data is maximized.
S302, historical information is transmitted
Fig. 4 shows the spread of LSTM nodes over time. Wherein the state of each cell comprises a hidden state vector Ht-iAnd cell state vector Ct-i. Both are passed on to the next time node to initialize their states. Then state from the previous time node (H)t-i,Ct-i) And data entry at this time (m)t-i) Are commonly used forThe new state and output at this time is calculated. This is the way to pass historical information in the LSTM model. In the example of the method, each system behavior in the time window (h) of the monitoring data takes one point in time, so the single-layer network consists of h temporally spread LSTM blocks.
S303. model calculation
The calculation process of the LSTM through the state of the previous time node and the data input at the time mainly comprises three stages 1) forgetting stage. This stage is mainly the selective forgetting of the input from the previous state. Specifically, the forgetting gate parameter obtained through calculation is used for controlling which contents need to be left and which contents need to be forgotten in the previous state; 2) selecting a memory stage, wherein the memory stage memorizes the current input, specifically, the important contents of the current input are controlled to be recorded through the input gate parameters obtained by calculation; 3) and (5) an output stage. This stage will determine which contents will be considered as the output of the current state through the calculated output gate parameters. Unlike conventional recurrent neural networks, the output of the LSTM can typically be taken from HtCan be directly obtained.
S304 model training
The model training process is mainly to properly assign and adjust the parameter weights of the gates so that the final output of the LSTM is closer to the label of the training data. During training, a gradient descent method is used to reduce the penalty function for each input/output pair. Wherein the input comprises a window w ═ m consisting of h system behaviorst-h,...,mt-1And outputting a training label which is a value s of the next system behavior corresponding to the window wiThe loss function is calculated using class cross entropy. After training is completed, the output of the behavior sequence window w can be predicted, and each system behavior in w is in the corresponding LSTM block when predicted.
As shown in fig. 5, the entire neural network structure. First, at the input layer (input layer), a corresponding single system behavior m in each time windowiThe one-hot coded form is input into the model, in other words, if S={s1,s2,s3...sNIs the set of all system behaviors and miIs taken asjThen the input is an N-dimensional vector with the jth element being 1 and all other elements being 0. The input vector at this time is very sparse, so at the embedding layer (embedding layer), the method learns an embedding matrix W, and the input can be compressed and embedded into a 100-dimensional continuous space by multiplying with the input. At the hidden layer (hidden layer), the LSTM cells have an internal state, and this state is periodically updated at each time step. At the output layer (output layer), the method normalizes the next likely occurrence of system behavior into an estimate of uniform probability values using the softmax activation function to represent P [ m ] for each si ∈ St=si|w]. Only 1 hidden layer is used in the structure in the figure, but more layers can be used in a real environment according to the number of states implied by data so as to become a deep LSTM neural network.
S204, false alarm rate control
For the scene of system call anomaly detection, the high false alarm rate can be caused by directly adopting the classification result to carry out anomaly detection, because the prediction result m in the actual scenetThere may be multiple values that are normal. For a given time window w ═ open @ context1,read@context2,……,read@contexth-1},mtIt may be read @ contexth(probability of 0.8) it may also be close @ contexth(probability of 0.2); both of these results are normal behavior. Then the classification result read @ context is directly usedhWill result in the normal result close @ contexthAre classified as abnormal, resulting in false positives for the system. Method pair mtConditional probability distribution of values P [ m ]t=si|w]Wherein s isiE, sorting by S (i is 1.. times.n), and defining that k results before the probability are normal, otherwise, determining that the results are abnormal. Obviously, the smaller the value of k is, the higher the detection rate is, but the higher the false alarm rate is at the cost; the larger the value of k is, the smaller the false alarm rate is, but the true abnormality can not be detected. Therefore, to balance the detection rate and false alarm rate,the method can adjust the k according to the actual situation, thereby balancing the detection rate and the false alarm rate.
S205. on-line model update
The invention provides a set of feedback mechanism to update the model on line, namely, each abnormity is diagnosed manually, and if the abnormity is false alarm, the model is updated on line by replacing the parameters in the original model according to the newly trained parameters.
In the invention, when the dimension of the input layer is determined according to the number of elements in the S set in the initial stage, the number which can be increased in the real-time detection process is reserved. If the system behavior is abnormal, m of the detection stagetIs not included in the set S of system behaviors in the training data. If so new mtIf the value is diagnosed to be normal, the method needs to add the value into the set S, and updates the embedded layer model and the hidden layer model by taking the data as a training label; if the behavior sequence is abnormal. If the result is found to be false alarm through manual diagnosis, the data is used as the training label to update the weight of the hidden layer model. Therefore, when the system behavior sequence is given next time, the probability corresponding to the next system behavior of the system behavior sequence is increased. And the model updating only needs to adjust the parameter weight through new training data on the basis of the original model.

Claims (7)

1. A system abnormity detection method based on context sensing is characterized by comprising the following steps:
1) obtaining context system behavior information according to the system call;
2) different system behaviors are represented by different hash values;
3) compressing the state by using the word embedding vector, constructing an anomaly detection model by using a long-short term memory neural network, and training the model by using a system behavior represented by a hash value;
4) and carrying out anomaly detection by using the trained model.
2. The method of claim 1, further comprising: and adjusting parameters to control the false alarm rate, collecting the detection result of false alarm, and updating the existing model according to the result to improve the accuracy.
3. The context-aware-based system anomaly detection method according to claim 1 or 2, wherein the specific implementation steps of step 1) are as follows:
step1, acquiring system call name and program address
The method comprises the steps that when a system is called every time, a system calling name and a current PC (personal computer), namely a program counter, are obtained, and the PC represents a program address for calling the system calling;
step2, constructing a context information list
Collecting all return addresses in a system stack, extracting the return addresses into a context information list, adding a current PC (personal computer) as an element into the list, and recording calling structure information of the system by using the context information list;
step3. recursive function processing
If a pair of identical return addresses are found in the stack, which may be a recursive call, the system removes all return addresses between them from the context information list.
4. The method for detecting system anomaly based on context awareness according to claim 3, wherein the specific implementation steps of step 2) are as follows:
step1. System behavior representation
Definition S ═ S1,s2,s3...sNDenotes the set of all system behaviors, with element siNamely the system behavior; definition w ═ { m1,m2,m3,…,mnIs the sequence of system behavior over a period of time, where miRepresenting the behavior at position i in the sequence; then miIs any one of N elements in S, and its value depends on miThe previously occurring system behavior;
the input to the model is the most recent system history data, inputThe probability distribution of n system behaviors belonging to S is obtained, and the probability that the next system behavior to be generated is si is respectively represented; if the next system behavior to be predicted is mt, then the input for the model is the most recently generated sequence of system behaviors w of length h; wherein w ═ { m ═ mt-h,..,mt-2,mt-1The value of each mi is in the set S, wherein different elements in different w sequences may have the same value; the output of the model is the conditional probability distribution P [ mt ═ si | w ] of the mt value]Wherein s isiE, S, i is 1, if m is actually m during detectiontIf the result is outside the confidence interval of the conditional probability, the result is determined to be abnormal;
step2, intercepting system behavior sequence
Using a stepping method, sequentially intercepting a system behavior sequence of a time window with the length of h in the training data, and updating a model according to the system behavior sequence to obtain each siE, taking S as the probability distribution of the next system behavior;
step3. abnormality type determination
The detection stage of the model is online and real-time, and the system behavior log captured in real time is transmitted into the detection model in a streaming manner; suppose a method is to detect an incoming system behavior mtIf it is abnormal, the method sets w to { m ═ mt-h,..,mt-2,mt-1Sending it as input into the model; the output at this time is the probability distribution P [ m ]t|w]={s1:p1,s2:p2,...,sn:pnDescribing the probability of each system behavior in S as mt; the following abnormalities may occur:
(1) acquiring program failure of system behavior in real time, wherein the exception is data acquisition exception;
(2) the training data may not cover all system behaviors, so the value of mt in the detection stage may not be included in the set S of system behaviors, and the abnormality is a system behavior abnormality;
(3) m of the detection stage in generaltThere are many possibilities to take the value of (a), mtProbability P (m) for different valuest=si|w,siE, sorting the e, i is 1, the.. n), and if the actually detected mt values are k values before the probability ranking, the method considers the mt values as normal; otherwise, it is considered as an exception, which is a behavioral sequence exception.
5. The method for detecting system anomaly based on context awareness according to claim 4, wherein the specific implementation steps of step 3) are as follows:
step1 model simplification
Model simplification is first performed, assuming that the probability of a system behavior is determined by its nearest N preamble behaviors, rather than all preambles in the entire history, which is equivalent to the assumption P (m)t=si|m1,...,mt-1)=P(mt=si|mt-N,...,mt-1) Under this assumption, the probability is computed using the relative frequency in the training data to give its maximum likelihood estimate;
given a sequence of system behaviors training data W, the process of training LSTM is to reduce the deviation of each predicted next system behavior from the actual occurring system behavior in the training data, which is to learn a probability distribution P (m)t=si|m1,...,mt-1) The distribution maximizes the probability of training data as a whole system behavior sequence;
step2, transmitting historical information
The state of each element in the LSTM node contains a hidden state vector Ht-iAnd cell state vector Ct–iBoth are passed on to the next time node to initialize their states, then the state from the previous time node (H)t-i,Ct-i) And data entry at this time (m)t-i) Are commonly used to calculate the new state and output at that time;
step3. model calculation
The LSTM has three stages in the calculation process through the state of the previous time node and the data input at this time: a forgetting stage, which is used for selectively forgetting the input transmitted from the previous state; selecting a memory stage, and memorizing the current input; an output stage, which determines which contents are to be used as the output of the current state through the output gate parameters obtained by calculation;
step4 model training
During training, reducing a loss function of each input/output pair by using a gradient descent method; the input contains a w ═ m composed of h system behaviorst-h,...,mt-1And outputting a training label which is a value s of the next system behavior corresponding to the window wiThe loss function is calculated using the class cross entropy; after the training is completed, the output of the window w of the sequence of predicted behaviors, when predicted, is that each system behavior in w is in the corresponding LSTM block.
6. The method for detecting system anomaly based on context awareness according to claim 5, wherein the specific implementation steps of step 4) are as follows:
to mtSorting the conditional probability distribution of the values, and judging whether the k results before the probability are normal or not; the smaller the value of k is, the higher the detection rate is, but the higher the false alarm rate is at the cost; the larger the value of k is, the smaller the false alarm rate is, but the true abnormality can not be detected.
7. The method for detecting system anomaly based on context awareness according to claim 6, wherein the specific implementation steps of step 5) are as follows:
manually diagnosing each abnormality, and if the abnormality is false alarm, replacing parameters in the original model according to newly trained parameters to update the model on line;
if the system behavior is abnormal, m of the detection stagetA set S of system behaviors whose values are not contained in the training data; if so new mtIf the value is diagnosed to be normal, adding the value into the set S, and updating the embedded layer model and the hidden layer model by taking the data as a training label; therefore, when the method determines the dimension of the input layer according to the number of elements in the S set in the initial stage, the dimension is reservedThe number of possible increases in the real-time detection process;
if the behavior sequence is abnormal, if the result is false alarm through manual diagnosis, the data is used as a training label to update the weight of the hidden layer model; therefore, when the system behavior sequence is given next time, the probability corresponding to the next system behavior of the system behavior sequence is increased, and the model is updated to adjust the parameter weight through new training data on the original basis.
CN202010948293.1A 2020-09-10 2020-09-10 System anomaly detection method based on context sensing Active CN112202726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010948293.1A CN112202726B (en) 2020-09-10 2020-09-10 System anomaly detection method based on context sensing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010948293.1A CN112202726B (en) 2020-09-10 2020-09-10 System anomaly detection method based on context sensing

Publications (2)

Publication Number Publication Date
CN112202726A true CN112202726A (en) 2021-01-08
CN112202726B CN112202726B (en) 2021-11-19

Family

ID=74015607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010948293.1A Active CN112202726B (en) 2020-09-10 2020-09-10 System anomaly detection method based on context sensing

Country Status (1)

Country Link
CN (1) CN112202726B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860484A (en) * 2021-01-29 2021-05-28 深信服科技股份有限公司 Container runtime abnormal behavior detection and model training method and related device
CN112882899A (en) * 2021-02-25 2021-06-01 中国烟草总公司郑州烟草研究院 Method and device for detecting log abnormity
CN114244603A (en) * 2021-12-15 2022-03-25 中国电信股份有限公司 Anomaly detection and comparison embedded model training and detection method, device and medium
CN116991681A (en) * 2023-09-27 2023-11-03 北京中科润宇环保科技股份有限公司 NLP-combined fly ash fusion processing system abnormality report identification method and server

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711380A (en) * 2019-01-03 2019-05-03 电子科技大学 A kind of timing behavior segment generation system and method based on global context information
CN109977118A (en) * 2019-03-21 2019-07-05 东南大学 A kind of abnormal domain name detection method of word-based embedded technology and LSTM
CN111209168A (en) * 2020-01-14 2020-05-29 中国人民解放军陆军炮兵防空兵学院郑州校区 Log sequence anomaly detection framework based on nLSTM-self attention
CN111291181A (en) * 2018-12-10 2020-06-16 百度(美国)有限责任公司 Representation learning for input classification via topic sparse autoencoder and entity embedding
CN111310583A (en) * 2020-01-19 2020-06-19 中国科学院重庆绿色智能技术研究院 Vehicle abnormal behavior identification method based on improved long-term and short-term memory network
CN111371806A (en) * 2020-03-18 2020-07-03 北京邮电大学 Web attack detection method and device
CN111370122A (en) * 2020-02-27 2020-07-03 西安交通大学 Knowledge guidance-based time sequence data risk prediction method and system and application thereof
WO2020159802A1 (en) * 2019-02-02 2020-08-06 Microsoft Technology Licensing, Llc Deep learning enhanced code completion system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291181A (en) * 2018-12-10 2020-06-16 百度(美国)有限责任公司 Representation learning for input classification via topic sparse autoencoder and entity embedding
CN109711380A (en) * 2019-01-03 2019-05-03 电子科技大学 A kind of timing behavior segment generation system and method based on global context information
WO2020159802A1 (en) * 2019-02-02 2020-08-06 Microsoft Technology Licensing, Llc Deep learning enhanced code completion system
CN109977118A (en) * 2019-03-21 2019-07-05 东南大学 A kind of abnormal domain name detection method of word-based embedded technology and LSTM
CN111209168A (en) * 2020-01-14 2020-05-29 中国人民解放军陆军炮兵防空兵学院郑州校区 Log sequence anomaly detection framework based on nLSTM-self attention
CN111310583A (en) * 2020-01-19 2020-06-19 中国科学院重庆绿色智能技术研究院 Vehicle abnormal behavior identification method based on improved long-term and short-term memory network
CN111370122A (en) * 2020-02-27 2020-07-03 西安交通大学 Knowledge guidance-based time sequence data risk prediction method and system and application thereof
CN111371806A (en) * 2020-03-18 2020-07-03 北京邮电大学 Web attack detection method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
梅御东等: "一种基于日志信息和CNN-text的软件系统异常检测方法", 《计算机学报》 *
王毅等: "结合LSTM和CNN混合架构的深度神经网络语言模型", 《情报学报》 *
鲁沛瑶: "基于LSTM的软件系统异常检测方法的研究与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860484A (en) * 2021-01-29 2021-05-28 深信服科技股份有限公司 Container runtime abnormal behavior detection and model training method and related device
CN112882899A (en) * 2021-02-25 2021-06-01 中国烟草总公司郑州烟草研究院 Method and device for detecting log abnormity
CN114244603A (en) * 2021-12-15 2022-03-25 中国电信股份有限公司 Anomaly detection and comparison embedded model training and detection method, device and medium
CN114244603B (en) * 2021-12-15 2024-02-23 中国电信股份有限公司 Anomaly detection and comparison embedded model training and detection method, device and medium
CN116991681A (en) * 2023-09-27 2023-11-03 北京中科润宇环保科技股份有限公司 NLP-combined fly ash fusion processing system abnormality report identification method and server
CN116991681B (en) * 2023-09-27 2024-01-30 北京中科润宇环保科技股份有限公司 NLP-combined fly ash fusion processing system abnormality report identification method and server

Also Published As

Publication number Publication date
CN112202726B (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN112202726B (en) System anomaly detection method based on context sensing
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
CN111210024A (en) Model training method and device, computer equipment and storage medium
CN111652290B (en) Method and device for detecting countermeasure sample
EP3539060A1 (en) Systems and methods for continuously modeling industrial asset performance
CN111881722B (en) Cross-age face recognition method, system, device and storage medium
CN109361648B (en) Method and device for detecting hidden attack of industrial control system
CN113438114B (en) Method, device, equipment and storage medium for monitoring running state of Internet system
CN113328908B (en) Abnormal data detection method and device, computer equipment and storage medium
CN112016097B (en) Method for predicting network security vulnerability time to be utilized
KR102359090B1 (en) Method and System for Real-time Abnormal Insider Event Detection on Enterprise Resource Planning System
CN117041017B (en) Intelligent operation and maintenance management method and system for data center
CN110162958B (en) Method, apparatus and recording medium for calculating comprehensive credit score of device
CN113468520A (en) Data intrusion detection method applied to block chain service and big data server
CN115396204A (en) Industrial control network flow abnormity detection method and device based on sequence prediction
Lee et al. Learning in the wild: When, how, and what to learn for on-device dataset adaptation
CN110166422B (en) Domain name behavior recognition method and device, readable storage medium and computer equipment
You et al. sBiLSAN: Stacked bidirectional self-attention lstm network for anomaly detection and diagnosis from system logs
CN115017015B (en) Method and system for detecting abnormal behavior of program in edge computing environment
JP7331369B2 (en) Abnormal Sound Additional Learning Method, Data Additional Learning Method, Abnormality Degree Calculating Device, Index Value Calculating Device, and Program
CN116842520A (en) Anomaly perception method, device, equipment and medium based on detection model
CN115983087A (en) Method for detecting time sequence data abnormity by combining attention mechanism and LSTM and terminal
CN111221896A (en) User behavior prediction method and device, electronic equipment and storage medium
CN114610613A (en) Online real-time micro-service call chain abnormity detection method
CN112560252A (en) Prediction method for residual life of aircraft engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant