CN112202726A

CN112202726A - System anomaly detection method based on context sensing

Info

Publication number: CN112202726A
Application number: CN202010948293.1A
Authority: CN
Inventors: 师斌; 杨圆哲; 郑庆华; 董博
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2021-01-08
Anticipated expiration: 2040-09-10
Also published as: CN112202726B

Abstract

The invention discloses a system anomaly detection method based on context sensing, which is used for constructing an anomaly detection model by combining a long-term and short-term memory neural network to realize the detection of system anomaly behaviors. The method comprises the steps of obtaining context information of system calling from system stack information during system operation, constructing a context information list, intercepting a system behavior sequence with a fixed length in training data, performing state compression by using word embedding vectors, and building an anomaly detection model by using a long-short term memory neural network, wherein the detection of the system anomaly calling sequence can be realized after the anomaly detection model is trained. The invention also controls the false alarm rate by adjusting the parameters, and adjusts the parameters by collecting the detection result of false alarm and updating the online model to optimize the model, thereby realizing the high-efficiency and accurate system abnormity detection.

Description

System anomaly detection method based on context sensing

Technical Field

The invention belongs to the field of system anomaly detection, and particularly relates to a system anomaly detection method based on context sensing.

Background

System anomaly detection techniques may identify states, events, or other variables in the system that do not match expectations. The system anomaly detection technology has the advantages that unknown attacks and anomalies can be actively defended without depending on attack signatures, however, as the system scale and the complexity of application programs increase, the system and the application will encounter more and more bugs and defects, and the attacks launched by the bugs and defects become more and more complex. This makes anomaly detection techniques more difficult and many conventional anomaly feature-based detection methods are no longer effective. In response to the problem, many researches propose that the characteristics of the system in normal operation are used for anomaly detection, but not the characteristics of the system in attack, so as to accurately model the normal behavior of the system, and therefore, the potential anomaly behavior is detected through the current behavior mode. The prior art establishes a system normal behavior model which is mainly divided into two categories: deterministic models and probabilistic models. The idea of the deterministic model is to record all normal behaviors that the system has ever appeared, and to mark an abnormal behavior if an unknown behavior is found in the detection. The deterministic model cannot distinguish the anomalies in the probability of behavior occurrence, so that the high false alarm rate and the low detection rate are achieved. On the contrary, the probability model mainly calculates the probability of occurrence of various current behaviors through historical data, and if the occurrence probability is lower than the threshold of the confidence interval, the probability model is judged to be abnormal. The following patents provide reference to a system anomaly detection method:

document 1. a distributed system abnormality detection method (201110093278.4)

Document 2. anomaly detection system based on neural network in cloud computing (201210559741.4)

Document 3. a method and system for daily flow anomaly detection (201710607485.4)

Document 1 determines a historical relevance degree of a measure attribute according to historical information, generates a measure attribute relationship network division model, performs relevance degree calculation on newly acquired data, updates the network model, and judges whether an abnormality occurs according to a division result. The method has the problems that only the single aspect of similarity is considered, the model features are insufficient, and the provided information is compared one side.

Document 2 collects logs on the monitoring host, defines the type of an abnormal event according to a certain neural network algorithm, updates the agent rule base, and generates a response when meeting the defined abnormal characteristics. The problem with this approach is that the training scale is large.

Document 3 performs anomaly detection on a log stream by using a detection model, extracts data features from static statistical data, inputs the features into an initial model, trains the model, acquires the log stream, and updates data. The problem is that only static statistical data features are extracted, and context information is not considered.

The above documents all have the problem that the context information is not well utilized, and various attacks of normal codes of the multiplexing system cannot be detected. In general, the current related anomaly detection work lacks context information, which results in insufficient data characteristics adopted by the current work and a bottleneck of improving detection performance. Therefore, the context data is urgently needed to be added into an abnormality detection model for research.

Disclosure of Invention

The invention aims to provide a system abnormity detection method based on context sensing, which can learn and update parameters in real time and solve the problem that a model cannot be updated in real time.

The invention is realized by adopting the following technical scheme:

a system anomaly detection method based on context awareness comprises the following steps:

1) obtaining context system behavior information according to the system call;

2) different system behaviors are represented by different hash values;

3) compressing the state by using the word embedding vector, constructing an anomaly detection model by using a long-short term memory neural network, and training the model by using a system behavior represented by a hash value;

4) and carrying out anomaly detection by using the trained model.

In a further development of the invention, the method further comprises: and adjusting parameters to control the false alarm rate, collecting the detection result of false alarm, and updating the existing model according to the result to improve the accuracy.

The invention further improves the method, and the specific implementation steps of the step 1) are as follows:

step1, acquiring system call name and program address

The method comprises the steps that when a system is called every time, a system calling name and a current PC (personal computer), namely a program counter, are obtained, and the PC represents a program address for calling the system calling;

step2. build context information list

Collecting all return addresses in a system stack, extracting the return addresses into a context information list, adding a current PC (personal computer) as an element into the list, and recording calling structure information of the system by using the context information list;

step3. recursive function processing

If a pair of identical return addresses are found in the stack, which may be a recursive call, the system removes all return addresses between them from the context information list.

The invention further improves the method, and the specific implementation steps of the step 2) are as follows:

step1. System behavior representation

Definition S ═ S₁，s₂，s₃...s_NDenotes the set of all system behaviors, with element s_iNamely the system behavior; definition w ═ { m₁，m₂，m₃，…，m_nIs the sequence of system behavior over a period of time, where m_iRepresenting the behavior at position i in the sequence; then m_iIs any one of N elements in S, and its value depends on m_iThe previously occurring system behavior;

the input of the model is the latest system historical data, and the output is the probability distribution of n system behaviors belonging to S, which respectively represents the probability that the next system behavior to occur is si; if the next system behavior to be predicted is mt, then the input for the model is the most recently generated sequence of system behaviors w of length h; wherein w ═ { m ═ m_t-h,..,m_t-2,m_t-1In which the values of each mi are in a set S, in which different elements in different w sequencesThe values may be the same; the output of the model is the conditional probability distribution P [ mt ═ si | w ] of the mt value]Wherein s is_iE, S, i is 1, if m is actually m during detection_tIf the result is outside the confidence interval of the conditional probability, the result is determined to be abnormal;

step2. intercepting system behavior sequence

Using a stepping method, sequentially intercepting a system behavior sequence of a time window with the length of h in the training data, and updating a model according to the system behavior sequence to obtain each s_iE, taking S as the probability distribution of the next system behavior;

step3. anomaly type determination

The detection stage of the model is online and real-time, and the system behavior log captured in real time is transmitted into the detection model in a streaming manner; suppose a method is to detect an incoming system behavior m_tIf it is abnormal, the method sets w to { m ═ m_t-h,..,m_t-2,m_t-1Sending it as input into the model; the output at this time is the probability distribution P [ m ]_t|w]＝{s₁：p₁，s₂：p₂，...，s_n：p_nDescribing the probability of each system behavior in S as mt; the following abnormalities may occur:

(1) acquiring program failure of system behavior in real time, wherein the exception is data acquisition exception;

(2) the training data may not cover all system behaviors, so the value of mt in the detection stage may not be included in the set S of system behaviors, and the abnormality is a system behavior abnormality;

(3) m of the detection stage in general_tThere are many possibilities to take the value of (a), m_tProbability P (m) for different values_t＝s_i|w,s_iE, sorting the e, i is 1, the.. n), and if the actually detected mt values are k values before the probability ranking, the method considers the mt values as normal; otherwise, it is considered as an exception, which is a behavioral sequence exception.

The invention is further improved in that the specific implementation steps of the step 3) are as follows:

step1. model simplification

Model simplification is first performed, assuming that the probability of a system behavior is determined by its nearest N preamble behaviors, rather than all preambles in the entire history, which is equivalent to the assumption P (m)_t＝s_i|m₁，...，m_t-1)＝P(m_t＝s_i|m_t-N，...，m_t-1) Under this assumption, the probability is computed using the relative frequency in the training data to give its maximum likelihood estimate;

given a sequence of system behaviors training data W, the process of training LSTM is to reduce the deviation of each predicted next system behavior from the actual occurring system behavior in the training data, which is to learn a probability distribution P (m)_t＝s_i|m₁，...，m_t-1) The distribution maximizes the probability of training data as a whole system behavior sequence;

step2. transfer history information

The state of each element in the LSTM node contains a hidden state vector H_t-iAnd cell state vector C_t–iBoth are passed on to the next time node to initialize their states, then the state from the previous time node (H)_t-i,C_t-i) And data entry at this time (m)_t-i) Are commonly used to calculate the new state and output at that time;

step3. model calculation

The LSTM has three stages in the calculation process through the state of the previous time node and the data input at this time: a forgetting stage, which is used for selectively forgetting the input transmitted from the previous state; selecting a memory stage, and memorizing the current input; an output stage, which determines which contents are to be used as the output of the current state through the output gate parameters obtained by calculation;

step4. model training

During training, reducing a loss function of each input/output pair by using a gradient descent method; the input contains a w ═ m composed of h system behaviors_t-h，...，m_t-1And the output training label is the next system line corresponding to the window wIs a value s of_iThe loss function is calculated using the class cross entropy; after the training is completed, the output of the window w of the sequence of predicted behaviors, when predicted, is that each system behavior in w is in the corresponding LSTM block.

The invention further improves the method, and the specific implementation steps of the step 4) are as follows:

to m_tSorting the conditional probability distribution of the values, and judging whether the k results before the probability are normal or not; the smaller the value of k is, the higher the detection rate is, but the higher the false alarm rate is at the cost; the larger the value of k is, the smaller the false alarm rate is, but the true abnormality can not be detected.

The invention further improves the method, and the specific implementation steps of the step 5) are as follows:

manually diagnosing each abnormality, and if the abnormality is false alarm, replacing parameters in the original model according to newly trained parameters to update the model on line;

if the system behavior is abnormal, m of the detection stage_tA set S of system behaviors whose values are not contained in the training data; if so new m_tIf the value is diagnosed to be normal, adding the value into the set S, and updating the embedded layer model and the hidden layer model by taking the data as a training label; therefore, when the method determines the dimension of the input layer according to the number of elements in the S set in the initial stage, the number which can be increased in the real-time detection process is reserved;

if the behavior sequence is abnormal, if the result is false alarm through manual diagnosis, the data is used as a training label to update the weight of the hidden layer model; therefore, when the system behavior sequence is given next time, the probability corresponding to the next system behavior of the system behavior sequence is increased, and the model is updated to adjust the parameter weight through new training data on the original basis.

The invention has at least the following beneficial technical effects:

according to the system abnormity detection method based on context awareness, the context characteristic information can generate a more expressive behavior model, so that the detection accuracy is improved. The method is a probabilistic model anomaly detection system trained based on a system call sequence. First, to address the problem of insufficient features of the solution model, the method proposes to enrich the existing system call information with stack information. The method can obtain the context information of the system call from the stack, and then the same system call with different contexts is treated differently, so that a more accurate normal behavior model is generated. Secondly, aiming at the problem of overlarge training scale, inspired by the function and performance advantages of a natural language processing model based on RNN (recurrent neural network), the method provides the purposes of compressing the state by using word embedding vectors and predicting the probabilities of different system calls by using an LSTM (long short term memory) neural network, thereby achieving the purpose of detecting abnormal system call sequences.

In conclusion, the invention firstly provides a method for performing state compression by using word embedded vectors according to context information and performing probabilistic model training by using a long-short term memory neural network, and the method can capture potential nonlinearity and high-dimensional dependency of each system call, thereby more effectively detecting system abnormality.

Drawings

FIG. 1 is a diagram illustrating the general structure of a context-aware system call exception detection mechanism.

Fig. 2 is a schematic overall framework flow diagram.

FIG. 3 is a flow chart of the application of the long-term and short-term memory neural network.

FIG. 4 is a schematic diagram of the time-series expansion of the LSTM model.

FIG. 5 is a schematic diagram of the LSTM global neural network structure.

Detailed Description

The invention is further described below with reference to the following figures and examples.

The method is used for modeling the anomaly detection under the Linux operating system based on the open source neural network Keras. In order to more clearly illustrate the technical solution of the present invention, a system anomaly detection method based on context awareness according to the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, a workflow of a context-aware system call anomaly detection mechanism mainly includes system call interception and context information acquisition, model training and anomaly detection, and in the left part of the diagram, a large amount of system behavior sequence data is acquired during normal operation of a system, different system behaviors are expressed by different hash values, and an anomaly detection model is trained by using the system behavior sequence. And as shown in the right part of the figure, the method transmits the newly intercepted system behavior hash value into the detection model to detect whether the system behavior hash value is normal or not, and records the result. Since the training data may not cover all cases and at the same time the normal behavior model of the system may change over time. The false positive detection result will also be collected and the existing model updated with this result. As shown in fig. 2, the method is implemented as follows:

s201, obtaining context information

The context information has a strong effect on detecting the exception of the execution flow, and for a detection model which is not sensitive to the context, the exception of the system function in the normal calling sequence is difficult to detect. However, since code reuse attacks can leverage existing code in the entire process memory, these system calls can be located on any code segment. While still conforming to normal system call sequence order, incorrect function call stack information can be easily identified by context information.

Step1, acquiring system call name and program address

And acquiring the system calling name and the current PC every time the system is called. For a 32-bit Linux system, the entry address is stored in IA32_ syserter _ EIP of MSR (special module register), and for a 64-bit system, the entry address is stored in LSTAR of MSR. The system call can be obtained by registering and intercepting a monitoring point at the entrance function, and the system call number and other useful parameters can be obtained from the system call parameters after interception. In the 32-bit system, the parameters of system call are respectively stored in EAX, EBX, ECX, EDX, ESI and EDI, in the 64-bit system, the parameters of system call are respectively stored in RAX, RDI, RSI, RDX, R10, R8 and R9, and the parameter system information is obtained by directly reading the register.

In the Linux system, most of the context information is related to the task _ struct of the current process structure, so the task _ struct of the current process needs to be acquired first. In a Linux operating system, some spaces are reserved in a kernel address space for storing information related to a process when the process is created, the space is of a unit type and has the size of 2 continuous memory pages, and task _ struct of the process is stored in thread _ info in the space. From the address of the thread _ info, the address of the task _ struct can be obtained. Wherein the process name is stored in the comm object under task _ struct; the process number is stored in the pid object; while the user stack address is stored in the mm- > start _ stack object for the process and in the thread- > sp object for the thread. After the user stack address is obtained, a backspace function of the glibc can be simulated to extract call stack information.

Step2. build context information list

The method extracts all return addresses in the collection system stack and stores the extracted return addresses into a Context information list, wherein the Context is { a }₀，a₁，...，a_n-1Where n is the number of call stack levels, a_n-1Representing the return address of the function that called the system call. Then the current PC is taken as a_nAnd is added to the list Context, where the Context information list Context has n elements. This context information list mainly records the call structure information of the system.

Step3. recursive function processing

For recursive functions, the number of recursive layers is often strongly correlated with data such as parameters. In this case, the same recursion procedure tends to have different depths due to different parameters, resulting in many different context information. This may make the training process more difficult to converge or result in a higher false alarm rate. Once a pair of identical return addresses are found in the stack, they may be the result of a recursive call. At which point the system removes all return addresses between them from the context information list.

S202, data preprocessing

Step1. System behavior representation

The system call is a main interface for the application program to access the system resource, and can well express the system behavior. The normal behavior of the system is limited, defining S ═ S₁，s₂，s₃...s_NDenotes the set of all system behaviors, with element s_iRepresenting system behavior. On the other hand, the sequence of the system behaviors can well represent the system execution flow. Definition w ═ { m₁，m₂，m₃，…，m_nIs the sequence of system behavior over a period of time, where m_iRepresenting the behavior at position i in the sequence. The value of mi may be any one of N elements in S, and its value strongly depends on the system behavior that occurs before mi.

The method models the system call sequence anomaly detection problem as a multi-class classification problem, where each different system behavior is a class. The input of the model is recent system history data, the output is the probability distribution of n system behaviors belonging to S, and the probability distribution respectively represents that the next system behavior to be generated is S_iThe probability of (c). If the next system behavior to predict is m_tThen the input for the model is the most recently generated sequence of system behaviors w of length h. Wherein w ═ { m ═ m_t-h,..,m_t-2,m_t-1In which each m is_iAre in the set S, wherein different elements in different w-sequences may take the same value. The output of the model is the conditional probability distribution P [ mt ═ si | w ] of the mt value]Wherein s is_iE.s (i ═ 1.., n). Then if the actual mt result is outside the confidence interval of the conditional probability at the time of detection, it is considered as abnormal.

Step2. intercepting system behavior sequence

The training phase of the model mainly depends on the system behavior sequence captured when the system normally executes. In the invention, a stepping method is used for sequentially intercepting the system behavior sequence of a time window with the length of h in training data and updating a model according to the system behavior sequence to obtain each s_iE.s as the probability distribution for the next system behavior. The continuous data of the system behavior intercepted when the system is normally executed is assumed as follows: { ……，s₃₆，s₁₂₇，s₈₇，s₄，s₁₀，s₃₁₉… …, given a window size h of 3. Then the input sequence and output label pairs used to train the model would be: { s₃₆，s₁₂₇，s₈₇→s₄}，{s₁₂₇，s₈₇，s₄→s₁₀}，{s₈₇，s₄，s₁₀→s₃₁₉}。

Step3. anomaly type determination

If the method is to detect an incoming system behavior m_tIf it is abnormal, the method needs to put w ═ m_t-h,..,m_t-2,m_t-1Send it as input into the model. The output at this time is the probability distribution P [ m ]_t|w]＝{s₁：p₁，s₂：p₂，...，s_n：p_nDescribes the probability of each system behavior in S as mt. The following abnormalities may occur:

(1) the program that acquires the system behavior in real time is disabled. The program that acquires the system behavior may fail due to an attack or other reasons, causing a problem that the sequence of the system call and its context cannot be acquired normally, and at this time, the method no longer has an input to be provided to the anomaly detection model. The exception is a data acquisition exception.

(2) All system behaviors may not be covered in the training data, so m of the detection phase_tMay not be included in the set S of system behaviors. In fact, the occurrence of system behavior never seen in the normal execution of the system is very critical and is likely to represent a system anomaly occurrence. The exception is a system behavior exception.

(3) M of the detection stage in general_tThere are many possibilities for the values of (a) and they are all normal. M is to be_tProbability P (m) for different values_t＝s_i|w,s_iE.s, i 1, n), if m is actually detected_tIf the values are k before the probability ranking, the values are considered as normal; otherwise, the result is regarded as abnormal. The anomaly is a behavioral sequence anomaly.

And obtaining the behavior abnormal type according to the abnormal behavior feedback.

S203, application of long-short term memory (LSTM) neural network mechanism

The application of the long-short term memory neural network mechanism is described below in conjunction with fig. 3-5.

S301 model simplification

Suppose that the system behavior sequence training data W ═ m₁，m₂…, mt }, the present invention uses its subsequence w₁＝{m_t-N，...，m_t-1，m_t＝s_iH and subsequence w₂＝{m_t-N，...，m_t-1The relative frequency of m is predicted_tValue is s_iThe probability of (c).

The method counts the frequency of the entire historical system behavior sequence using a sliding window of size N.

Compared to traditional n-tuple probability models, LSTM-based models can recognize more complex patterns of behavior and maintain long memory states over the sequence. The method uses the LSTM neural network in the model to detect the abnormality of the system behavior sequence. Given a sequence of system behaviors training data W, the process of training LSTM is to minimize the deviation of each predicted next system behavior from the actual system behavior in the training data, which is to learn a probability distribution P (m)_t＝s_i|m₁，...，m_t-1) The probability of training the entire system behavior sequence of the data is maximized.

S302, historical information is transmitted

Fig. 4 shows the spread of LSTM nodes over time. Wherein the state of each cell comprises a hidden state vector H_t-iAnd cell state vector C_t-i. Both are passed on to the next time node to initialize their states. Then state from the previous time node (H)_t-i,C_t-i) And data entry at this time (m)_t-i) Are commonly used forThe new state and output at this time is calculated. This is the way to pass historical information in the LSTM model. In the example of the method, each system behavior in the time window (h) of the monitoring data takes one point in time, so the single-layer network consists of h temporally spread LSTM blocks.

S303. model calculation

The calculation process of the LSTM through the state of the previous time node and the data input at the time mainly comprises three stages 1) forgetting stage. This stage is mainly the selective forgetting of the input from the previous state. Specifically, the forgetting gate parameter obtained through calculation is used for controlling which contents need to be left and which contents need to be forgotten in the previous state; 2) selecting a memory stage, wherein the memory stage memorizes the current input, specifically, the important contents of the current input are controlled to be recorded through the input gate parameters obtained by calculation; 3) and (5) an output stage. This stage will determine which contents will be considered as the output of the current state through the calculated output gate parameters. Unlike conventional recurrent neural networks, the output of the LSTM can typically be taken from H_tCan be directly obtained.

S304 model training

The model training process is mainly to properly assign and adjust the parameter weights of the gates so that the final output of the LSTM is closer to the label of the training data. During training, a gradient descent method is used to reduce the penalty function for each input/output pair. Wherein the input comprises a window w ═ m consisting of h system behaviors_t-h，...，m_t-1And outputting a training label which is a value s of the next system behavior corresponding to the window w_iThe loss function is calculated using class cross entropy. After training is completed, the output of the behavior sequence window w can be predicted, and each system behavior in w is in the corresponding LSTM block when predicted.

As shown in fig. 5, the entire neural network structure. First, at the input layer (input layer), a corresponding single system behavior m in each time window_iThe one-hot coded form is input into the model, in other words, if S＝{s₁，s₂，s₃...s_NIs the set of all system behaviors and m_iIs taken as_jThen the input is an N-dimensional vector with the jth element being 1 and all other elements being 0. The input vector at this time is very sparse, so at the embedding layer (embedding layer), the method learns an embedding matrix W, and the input can be compressed and embedded into a 100-dimensional continuous space by multiplying with the input. At the hidden layer (hidden layer), the LSTM cells have an internal state, and this state is periodically updated at each time step. At the output layer (output layer), the method normalizes the next likely occurrence of system behavior into an estimate of uniform probability values using the softmax activation function to represent P [ m ] for each si ∈ S_t＝s_i|w]. Only 1 hidden layer is used in the structure in the figure, but more layers can be used in a real environment according to the number of states implied by data so as to become a deep LSTM neural network.

S204, false alarm rate control

For the scene of system call anomaly detection, the high false alarm rate can be caused by directly adopting the classification result to carry out anomaly detection, because the prediction result m in the actual scene_tThere may be multiple values that are normal. For a given time window w ═ open @ context₁，read@context₂，……，read@context_h-1}，m_tIt may be read @ context_h(probability of 0.8) it may also be close @ context_h(probability of 0.2); both of these results are normal behavior. Then the classification result read @ context is directly used_hWill result in the normal result close @ context_hAre classified as abnormal, resulting in false positives for the system. Method pair m_tConditional probability distribution of values P [ m ]_t＝s_i|w]Wherein s is_iE, sorting by S (i is 1.. times.n), and defining that k results before the probability are normal, otherwise, determining that the results are abnormal. Obviously, the smaller the value of k is, the higher the detection rate is, but the higher the false alarm rate is at the cost; the larger the value of k is, the smaller the false alarm rate is, but the true abnormality can not be detected. Therefore, to balance the detection rate and false alarm rate,the method can adjust the k according to the actual situation, thereby balancing the detection rate and the false alarm rate.

S205. on-line model update

The invention provides a set of feedback mechanism to update the model on line, namely, each abnormity is diagnosed manually, and if the abnormity is false alarm, the model is updated on line by replacing the parameters in the original model according to the newly trained parameters.

In the invention, when the dimension of the input layer is determined according to the number of elements in the S set in the initial stage, the number which can be increased in the real-time detection process is reserved. If the system behavior is abnormal, m of the detection stage_tIs not included in the set S of system behaviors in the training data. If so new m_tIf the value is diagnosed to be normal, the method needs to add the value into the set S, and updates the embedded layer model and the hidden layer model by taking the data as a training label; if the behavior sequence is abnormal. If the result is found to be false alarm through manual diagnosis, the data is used as the training label to update the weight of the hidden layer model. Therefore, when the system behavior sequence is given next time, the probability corresponding to the next system behavior of the system behavior sequence is increased. And the model updating only needs to adjust the parameter weight through new training data on the basis of the original model.

Claims

1. A system abnormity detection method based on context sensing is characterized by comprising the following steps:

1) obtaining context system behavior information according to the system call;

2) different system behaviors are represented by different hash values;

4) and carrying out anomaly detection by using the trained model.

2. The method of claim 1, further comprising: and adjusting parameters to control the false alarm rate, collecting the detection result of false alarm, and updating the existing model according to the result to improve the accuracy.

3. The context-aware-based system anomaly detection method according to claim 1 or 2, wherein the specific implementation steps of step 1) are as follows:

step1, acquiring system call name and program address

step2, constructing a context information list

step3. recursive function processing

4. The method for detecting system anomaly based on context awareness according to claim 3, wherein the specific implementation steps of step 2) are as follows:

step1. System behavior representation

the input to the model is the most recent system history data, inputThe probability distribution of n system behaviors belonging to S is obtained, and the probability that the next system behavior to be generated is si is respectively represented; if the next system behavior to be predicted is mt, then the input for the model is the most recently generated sequence of system behaviors w of length h; wherein w ═ { m ═ m_t-h,..,m_t-2,m_t-1The value of each mi is in the set S, wherein different elements in different w sequences may have the same value; the output of the model is the conditional probability distribution P [ mt ═ si | w ] of the mt value]Wherein s is_iE, S, i is 1, if m is actually m during detection_tIf the result is outside the confidence interval of the conditional probability, the result is determined to be abnormal;

step2, intercepting system behavior sequence

step3. abnormality type determination

5. The method for detecting system anomaly based on context awareness according to claim 4, wherein the specific implementation steps of step 3) are as follows:

step1 model simplification

step2, transmitting historical information

step3. model calculation

step4 model training

During training, reducing a loss function of each input/output pair by using a gradient descent method; the input contains a w ═ m composed of h system behaviors_t-h，...，m_t-1And outputting a training label which is a value s of the next system behavior corresponding to the window w_iThe loss function is calculated using the class cross entropy; after the training is completed, the output of the window w of the sequence of predicted behaviors, when predicted, is that each system behavior in w is in the corresponding LSTM block.

6. The method for detecting system anomaly based on context awareness according to claim 5, wherein the specific implementation steps of step 4) are as follows:

7. The method for detecting system anomaly based on context awareness according to claim 6, wherein the specific implementation steps of step 5) are as follows:

if the system behavior is abnormal, m of the detection stage_tA set S of system behaviors whose values are not contained in the training data; if so new m_tIf the value is diagnosed to be normal, adding the value into the set S, and updating the embedded layer model and the hidden layer model by taking the data as a training label; therefore, when the method determines the dimension of the input layer according to the number of elements in the S set in the initial stage, the dimension is reservedThe number of possible increases in the real-time detection process;