WO2017148196A1 - 异常检测方法及装置 - Google Patents

异常检测方法及装置 Download PDF

Info

Publication number
WO2017148196A1
WO2017148196A1 PCT/CN2016/108764 CN2016108764W WO2017148196A1 WO 2017148196 A1 WO2017148196 A1 WO 2017148196A1 CN 2016108764 W CN2016108764 W CN 2016108764W WO 2017148196 A1 WO2017148196 A1 WO 2017148196A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature pattern
feature
mode
system call
sequence
Prior art date
Application number
PCT/CN2016/108764
Other languages
English (en)
French (fr)
Inventor
左焘
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017148196A1 publication Critical patent/WO2017148196A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy

Definitions

  • the present invention relates to the field of communications, and in particular to an abnormality detecting method and apparatus.
  • Intrusion detection technology analyzes the information generated in the system by the system information and user behavior retained in the computer to detect the intrusion into the system. Intrusion detection technology is divided into Misuse Detection and Anomaly Detection. Misuse detection is based on the analysis of intrusions or attacks to detect intrusions. The disadvantage of misuse detection is that it is difficult to detect new intrusion methods or variations in some intrusion methods. Also, its performance is related to the size and architecture of the pattern library.
  • Anomaly detection is based on the system characteristics in the normal state to check the deviation of the current state from the normal state.
  • One of the current mainstream anomaly detection methods is feature pattern extraction combined with statistical feature detection.
  • the system call sequence is first compressed into a sequence composed of a system call and a feature pattern.
  • an improved state transition matrix is established with the variable length mode as the unit, and the abnormality is distinguished by the probability of occurrence of the state sequence.
  • the current mainstream anomaly detection method first acquires the determined feature pattern of the system call sequence of the key program, and then performs detection according to the determined feature pattern, but in the actual detection process, the key program saved by the system
  • the system call sequence has very few types of characterization patterns, which will result in fewer identifiable patterns in anomaly detection and the inability to cope with complex pattern features, resulting in low characterization of key program behaviors, which in turn affects the entire The accuracy of anomaly detection.
  • the embodiment of the invention provides an anomaly detection method and device, so as to at least solve the problem that the identifiable mode caused by the abnormality detection caused by the method of determining the feature mode combined with the statistical feature detection in the related art is less.
  • an anomaly detection method including: acquiring a fuzzy feature pattern of a system call sequence, and adding the fuzzy feature pattern to a feature pattern library, wherein the fuzzy feature mode is included Determining a feature mode of the mode and the fuzzy mode, the determining mode representing a feature pattern composed in a determined order by a plurality of system calls, the fuzzy mode representing a feature pattern of a type of system call sequence; a system call sequence of the training set Matching the feature patterns included in the feature pattern library, acquiring a state sequence corresponding to the system call sequence of the training set according to a rule corresponding to the matching result; training the Markov model with the state sequence to obtain the trained Mar Cove model Type; using the trained Markov model to detect anomalies in the sequence of calls to be detected.
  • the determining mode is obtained by performing the following steps until the feature mode with the longest length is obtained: acquiring a current feature mode whose length is the first threshold, and adding the current feature mode to the feature mode a library; connecting the current feature mode with an adjacent feature mode to obtain a connected feature pattern; determining whether the connected feature pattern satisfies a support requirement; and if the determination result is yes,
  • the connected feature mode is set to acquire the feature mode, and the feature pattern to be acquired is added to the feature pattern library, wherein the current feature mode and the adjacent are not included in the feature pattern library a feature mode, wherein the support degree is a probability that a short sequence of the system call appears as a whole in a running track of the system process; determining whether the feature pattern library includes the feature mode with the longest length; and the determination result is yes
  • the feature pattern included in the feature pattern library is taken as the determination mode.
  • the method further includes: selecting a predetermined number of adjacent feature patterns in the feature pattern library to connect, and obtaining a connection. a feature pattern; determining whether the connected feature pattern satisfies the support requirement; if the determination result is yes, setting the connected feature pattern to a new feature pattern to be acquired, and the new candidate Obtaining a feature pattern added to the feature pattern library, wherein the feature pattern library does not include an adjacent feature pattern for composing the new feature pattern to be acquired; and continuing to determine whether the feature pattern library includes the The longest feature pattern.
  • the fuzzy mode is obtained by repeatedly performing the following steps until the feature mode with the longest length is obtained: combining the two determination modes in the feature mode set with the first predetermined threshold to obtain a merge a feature mode, wherein the feature mode located in the middle of the two determination modes is a single system call sequence or the determination mode; determining whether the merged feature mode satisfies a support requirement; and the determination result is yes And the target feature pattern is added to the feature pattern library, where the feature pattern library does not include the feature pattern participating in the merge; Whether the target feature pattern having the longest length is included in the feature pattern library; if the determination result is YES, the target feature pattern included in the feature pattern library is taken as the blur mode.
  • the method further includes: combining two determination modes in the feature pattern library that are at a first predetermined threshold, Obtaining the merged feature pattern, and determining whether the merged feature pattern meets the support requirement; or combining the two target feature patterns in the feature pattern library, and determining the merged feature pattern Whether the support requirement is met; or, combining the determined mode with the target feature mode, and determining whether the merged feature mode meets the support requirement.
  • the obtaining, according to the rule corresponding to the matching result, the sequence of states corresponding to the to-be-detected system call sequence includes: adding the to-be-detected system call sequence to a tail of the empty queue; The detection system call sequence is matched with the determination mode in the feature pattern library; when the to-be-detected system call sequence matches the determination mode, the to-be-detected system call sequence is converted into the state sequence; When the to-be-detected system call sequence does not match the determined mode, the state sequence corresponding to the to-be-detected system call sequence is obtained according to the matching result of the to-be-detected system call sequence and the fuzzy feature mode.
  • the obtaining, according to the matching result of the to-be-detected system call sequence and the fuzzy feature mode, the sequence of states corresponding to the to-be-detected system call sequence includes: determining whether the header of the to-be-detected system call sequence matches a first determining portion of the fuzzy feature mode, wherein the header of the to-be-detected system call sequence includes at least one system call; and if the determination result is yes, the other system calls of the to-be-detected system call sequence are sequentially Matching with other determined portions of the fuzzy feature mode, and acquiring a sequence of states corresponding to the call sequence of the system to be detected according to the matching result.
  • the method further includes: determining a system between the two determined portions of the empty queue Whether the calling sequence is not a single system call or the determining mode, or determining whether a system call sequence between the determining portion and the null queue end is greater than a maximum length of the determined mode in the feature pattern library; In the case of YES, the first determined portion of the empty queue that matches the fuzzy feature pattern is dequeued, and each system call in the first determined portion is converted to a state added to the sequence of states. If the judgment result is no, continue to determine other system calls in the to-be-detected system call sequence.
  • the using the trained Markov model to detect the abnormality of the to-be-detected system call sequence includes: acquiring, in the trained Markov model, the call sequence of the to-be-detected system a probability that the probability is less than a predetermined threshold in a predetermined number of state sequences, wherein the predetermined threshold is a value corresponding to an abnormality of the system call sequence; and determining that the number is greater than a second predetermined threshold, determining the An exception occurred in the call sequence to be detected.
  • an abnormality detecting apparatus including: a first processing module, configured to acquire a fuzzy feature mode of a system call sequence, and add the fuzzy feature mode to a feature pattern library, wherein
  • the fuzzy feature mode is a feature mode including a determination mode and a fuzzy mode, the determination mode representing a feature mode composed in a determined order by a plurality of system calls, the fuzzy mode representing a feature mode of a type of system call sequence;
  • a processing module configured to match a system call sequence of the training set with a feature pattern included in the feature pattern library, and obtain a state sequence corresponding to the system call sequence of the training set according to a rule corresponding to the matching result;
  • a module configured to train the Markov model with the sequence of states to obtain a trained Markov model; and the detecting module is configured to use the trained Markov model to detect an abnormality of the sequence of the system to be detected.
  • the first processing module is further configured to obtain the determining mode by performing the following steps until the longest feature mode is obtained: acquiring a current feature mode whose length is a first threshold, and Adding a current feature pattern to the feature pattern library; connecting the current feature pattern to an adjacent feature pattern to obtain a connected feature pattern; determining whether the connected feature pattern satisfies a support requirement; In the case of YES, the connected feature pattern is set to the feature pattern to be acquired, and the feature pattern to be acquired is added to the feature pattern library, where the feature pattern library is not included.
  • the first processing module is further configured to: when determining that the feature pattern library does not include the longest feature pattern, select a predetermined number of adjacent feature patterns in the feature pattern library to connect , get the connected features a mode; determining whether the connected feature pattern satisfies the support requirement; if the determination result is yes, setting the connected feature mode to a new feature mode to be acquired, and the new feature mode to be acquired Adding to the feature pattern library, wherein the feature pattern library does not include an adjacent feature pattern for composing the new feature pattern to be acquired; and continuing to determine whether the feature pattern library includes the longest length Feature pattern.
  • the first processing module is further configured to acquire the fuzzy mode by repeatedly performing the following steps until a feature mode with the longest length is obtained: a distance in the feature mode set is a first predetermined threshold Combining the two determining modes to obtain the merged feature mode, wherein the feature mode located in the middle of the two determining modes is a single system call sequence or the determining mode; determining whether the merged feature mode satisfies the support degree Assuming that the result of the determination is yes, the merged feature pattern is used as a target feature pattern, and the target feature pattern is added to the feature pattern library, wherein the feature pattern library is not included Participating in the merged feature pattern; determining whether the feature pattern library includes the longest target feature pattern; and if the determination result is yes, using the target feature pattern included in the feature pattern library as the Fuzzy mode.
  • the first processing module is further configured to: when determining that the feature pattern library does not include the longest target feature mode, and determine two distances in the feature pattern library as a first predetermined threshold Merging, obtaining a merged feature pattern, and determining whether the merged feature pattern satisfies a support requirement; or combining the two target feature patterns in the feature pattern library, and determining the merged Whether the feature mode satisfies the support requirement; or, the determined mode is merged with the target feature mode, and it is determined whether the merged feature mode satisfies the support requirement.
  • the second processing module includes: an adding unit, configured to add the to-be-detected system call sequence to a tail of the empty queue; and a matching unit configured to invoke the to-be-detected system in the empty queue The sequence is matched with the determined mode in the feature pattern library; the converting unit is configured to convert the to-be-detected system call sequence into the state sequence when the to-be-detected system call sequence matches the determined mode; a first acquiring unit, configured to acquire, according to a matching result of the to-be-detected system call sequence and the fuzzy feature mode, the call sequence to be detected according to a matching result of the to-be-detected system call sequence and the determining mode The sequence of states.
  • the first obtaining unit includes: a first determining subunit, configured to determine whether a header of the to-be-detected system call sequence matches a first determining part of the fuzzy feature mode, where the to-be-detected The header of the system call sequence includes at least one system call; the acquisition subunit is configured to, in the case of the determination result being YES, sequentially perform other system calls of the to-be-detected system call sequence and other determined portions of the fuzzy feature pattern. Matching, and acquiring a sequence of states corresponding to the call sequence of the system to be detected according to the matching result.
  • the first obtaining unit further includes: a second determining subunit, configured to determine, after sequentially matching other system calls of the to-be-detected system call sequence with other determined portions of the fuzzy feature mode Whether the system call sequence between two adjacent determining portions in the empty queue is not a single system call or the determining mode; or the third determining subunit is set to sequentially call the sequence to be detected by the system to be detected After matching the other system calls with the other determined portions of the fuzzy feature mode, determining whether the system call sequence between the determining portion and the null queue end is greater than a maximum length of the determined mode in the feature pattern library; a unit, configured to, in a case where the determination result is YES, in a case where the determination result is YES, the first determination unit that matches the fuzzy feature pattern in the empty queue Demarcating the team, and converting each system call in the first determining portion into a state added to the sequence of states; determining the subunit, and setting to continue determining the to-be-detected if the
  • the detecting module includes: a second acquiring unit, configured to acquire, in the trained Markov model, a probability that the probability of the predetermined number of state sequences corresponding to the to-be-detected system call sequence is less than a predetermined threshold
  • the predetermined threshold value is a value corresponding to the abnormality degree of the system call sequence
  • the determining unit is configured to determine that the call sequence to be detected is abnormal when determining that the number is greater than the second predetermined threshold.
  • a computer storage medium is further provided, and the computer storage medium may store an execution instruction for performing the implementation of the abnormality detecting method in the above embodiment.
  • the fuzzy feature mode of the system call sequence is obtained, and the fuzzy feature mode is added to the feature mode library, wherein the fuzzy feature mode is a feature mode including the determining mode and the fuzzy mode, and determining the mode representation by using multiple
  • the state sequence corresponding to the call sequence of the system to be detected is used; the state sequence is used as a training set to train the Markov model to obtain a trained Markov model; and the trained Markov model is used to detect the call sequence of the system to be detected Anomaly. That is to say, the present invention is based on the Markov model and the acquisition of the fuzzy feature pattern, wherein the acquisition of the fuzzy feature pattern is mainly based on the correlation within the system call sequence, and the feature pattern is further drilled to determine the sequence of the system call sequence.
  • the trajectory trend is combined with the online detection algorithm to further detect the abnormality of the system call sequence in a certain process, and is not limited to the identifiable mode type caused by the abnormality detection by the determined feature mode in the related art. The problem further enriches the types of identifiable modes and further improves the efficiency of abnormal detection.
  • 1 is a flow chart of anomaly detection according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing the structure of an anomaly detection model based on a Markov chain according to an embodiment of the present invention
  • FIG. 3 is a flowchart of an online detection algorithm according to an embodiment of the present invention.
  • FIG. 4 is a block diagram showing the structure of an abnormality detecting apparatus according to an embodiment of the present invention.
  • FIG. 5 is a structural block diagram (1) of an abnormality detecting apparatus according to an embodiment of the present invention.
  • FIG. 6 is a structural block diagram (2) of an abnormality detecting apparatus according to an embodiment of the present invention.
  • FIG. 7 is a structural block diagram (3) of an abnormality detecting apparatus according to an embodiment of the present invention.
  • FIG. 8 is a structural block diagram (4) of an abnormality detecting apparatus according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of an abnormality detecting according to an embodiment of the present invention. As shown in FIG. 1, the flow includes the following steps:
  • Step S102 acquiring a fuzzy feature mode of the system call sequence, and adding the fuzzy feature mode to the feature pattern library;
  • the above fuzzy feature mode is a feature mode including a determination mode and a fuzzy mode.
  • the determining mode represents a feature pattern composed in a determined order by a plurality of system calls. For example, if the system calls the sequence l ⁇ C, and l consists of a set of system calls to determine the order, then l is the determination mode, C is the feature pattern library, is a set of feature patterns; fuzzy mode represents a type of system The feature pattern of the call sequence, for example, if the system call sequence l ⁇ C, l cannot be represented by a certain system call short sequence, but represents a type of system call short sequence, then l is the fuzzy mode.
  • Step S104 Matching a system call sequence of the training set with a feature pattern included in the feature pattern library, and acquiring a state sequence corresponding to the system call sequence of the training set according to a rule corresponding to the matching result;
  • Step S106 training the Markov model with the state sequence to obtain the trained Markov model
  • Step S108 using the trained Markov model, detecting an abnormality of the calling sequence of the system to be detected.
  • the application scenario of the abnormality detecting method includes, but is not limited to, when the host is used, the host detects the system call sequence of the critical program, such as lpr, ftpd, sendmail, etc., to detect Network or local intrusion, or, when used on the terminal, after the terminal connects to the server, download the trained Markov model and directly perform the detection locally.
  • the critical program such as lpr, ftpd, sendmail, etc.
  • the fuzzy feature mode of the system call sequence is obtained, and the fuzzy feature mode is added to the feature mode library, wherein the fuzzy feature mode is a feature mode including a determining mode and a fuzzy mode, and determining the mode representation by using multiple
  • the fuzzy feature mode is a feature mode including a determining mode and a fuzzy mode, and determining the mode representation by using multiple
  • the system call sequence of the training set is matched with the feature pattern included in the feature pattern library, according to a rule corresponding to the matching result
  • the acquisition of the fuzzy feature pattern is mainly based on the correlation within the system call sequence, and the feature pattern is further drilled to determine the system. Calling a run trajectory trend of the sequence and combining the online detection algorithm to further detect a certain
  • the abnormality of the system call sequence in the process is not limited to the problem that the identifiable mode is less caused by the abnormality detection by the determined feature mode in the related art, thereby enriching the identifiable mode type and further improving the abnormality detection. The effect of efficiency.
  • the feature pattern is first extracted from the system call sequence generated by the normal behavior of the system process, and constitutes a feature pattern library. Then, according to the feature pattern library, the system call sequence generated by the normal behavior is compressed, and the compressed sequence is used as a training set for the training of the Markov model, and the model that the program runs normally is obtained.
  • the online detection algorithm is adopted in the detection. Firstly, the system call sequence to be detected is compressed into a state sequence in real time according to the feature pattern library, and then the abnormality degree is calculated according to the Markov model, and then the abnormality is detected.
  • the determining mode is determined by:
  • Probability determining whether the feature pattern has the longest feature pattern in the feature pattern library; if the determination result is yes, the feature pattern included in the feature pattern library is used as the determination mode; if the determination result is negative Next, selecting a predetermined number of adjacent feature patterns in the feature pattern library to connect, and obtaining the connected a mode of determining whether the connected feature pattern satisfies the support requirement; if the determination result is yes, setting the connected feature mode to a new feature pattern to be acquired, and the new feature pattern to be acquired Adding to the feature pattern library, wherein the feature pattern library does not include an adjacent feature pattern for composing the new feature pattern to be acquired; and continuously determining whether the feature pattern library has the longest feature pattern in the feature pattern library.
  • the value of the foregoing first threshold includes but is not limited to: 1.
  • the determination mode is obtained by connecting all feature patterns to further acquire the fuzzy feature pattern.
  • the blur mode is determined by:
  • the two determination modes in which the distance in the feature pattern set is the first predetermined threshold are combined to obtain a merged feature pattern, wherein the middle of the two determination modes is located
  • the feature mode is a single system call sequence or the determination mode; determining whether the merged feature mode satisfies the support requirement; if the determination result is yes, the merged feature mode is used as the target feature mode, and the The target feature pattern is added to the feature pattern library, wherein the feature pattern library does not include the feature pattern participating in the merge; determining whether the feature pattern library includes the longest target feature pattern; if the judgment result is yes And the target feature pattern included in the feature pattern library is used as the fuzzy mode; if the determination result is negative, the two determination modes in which the distance in the feature pattern library is the first predetermined threshold are combined to obtain a merge a feature pattern, and determining whether the merged feature pattern satisfies the support requirement; or combining the two target feature patterns in the feature pattern library, and determining
  • the foregoing first predetermined threshold may be 2.
  • the acquisition of the fuzzy feature pattern is divided into two parts.
  • the first part is to determine the mode extraction. Specifically, firstly, look for the feature mode of length 1, that is, a single system call that satisfies the minimum support degree; then merge the adjacent single system calls satisfying the support requirement into a feature of length 2. Mode; on this basis, adjacent feature patterns of length 2 and 1 satisfying the support requirement are connected to obtain a feature pattern of length 3. And so on, until no new longer feature patterns appear.
  • the second part is the fuzzy mode extraction. Specifically, based on the previous results, the feature patterns with distance 2 and satisfying the support requirements are merged into a new feature pattern.
  • the intermediate system call or feature pattern is regarded as a random Independent program features, called fuzzy parts. Then, repeat the previous step, and so on, until no new longer feature patterns appear.
  • the distance concept of the feature pattern involved in the fuzzy feature pattern acquisition method included in the above example is defined as 1, the distance between feature patterns at a system call or feature pattern is 2, and so on.
  • the length of the system call sequence is not limited, as long as the minimum support requirement is met, it is regarded as a feature mode.
  • the support degree sup(l) for the short sequence l of the system call is defined as: Where len(T) represents the length of the system call sequence T, and also indicates that T contains the number of system calls. Num(T,l) represents the number of times the system call short sequence l appears in T.
  • the feature mode is defined as if one system calls the support of the short sequence 1 to be greater than the threshold requirement, then l is considered to be one of the feature modes, denoted as ci, and i denotes the number of the feature mode.
  • the length priority principle is adopted. Specifically, if the longer feature mode includes a shorter feature mode, the shorter feature mode is deleted from the feature mode library.
  • the length priority principle reduces the size of the feature library and reduces the training time of the Markov chain model.
  • acquiring a sequence of states corresponding to the system call sequence according to a rule corresponding to the matching result includes the following steps:
  • Step S11 adding a system call sequence to be detected to the tail of the empty queue
  • Step S12 matching the to-be-detected system call sequence in the empty queue with the determined mode in the feature pattern library
  • Step S13 When the to-be-detected system call sequence matches the determined mode, the system call sequence to be detected is converted into the state sequence;
  • Step S14 When the to-be-detected system call sequence does not match the determined mode, obtain a state sequence corresponding to the to-be-detected system call sequence according to the matching result of the to-be-detected system call sequence and the fuzzy feature mode.
  • the system call sequence in the currently executed system process is matched with the determined mode in the feature pattern library, and if yes, the current system call sequence is converted into a state sequence and added.
  • the system call sequence in the currently executed system process is matched with the fuzzy feature pattern in the feature pattern library to further identify the abnormal state, thereby avoiding the relevant technology and judging
  • the problem that the accuracy of the detection abnormality caused by the abnormality of the current system call sequence is considered to be reduced.
  • acquiring a sequence of states corresponding to the to-be-detected system call sequence according to the matching result of the to-be-detected system call sequence and the fuzzy feature mode includes the following steps:
  • Step S21 determining whether the header of the to-be-detected system call sequence matches the first determining portion of the fuzzy feature mode, where the header of the to-be-detected system call sequence includes at least one system call;
  • Step S22 in the case that the determination result is yes, the other system calls of the to-be-detected system call sequence are sequentially matched with other determined portions of the fuzzy feature mode, and the state corresponding to the to-be-detected system call sequence is obtained according to the matching result. sequence.
  • the determining part in the fuzzy feature mode may be a part in the fuzzy mode that can be determined by using a system call, and the fuzzy part included in the fuzzy feature mode may be a fuzzy mode. Remove the portion other than the determined portion.
  • the first part of the system call sequence to be detected is first matched with the first determined part of the fuzzy feature mode, and in the case of matching, the other system of the system call sequence to be detected is sequentially
  • the call is matched with other determined portions of the fuzzy feature mode, and the state sequence corresponding to the call sequence of the system to be detected is obtained according to the matching result, which further solves the problem that the identifiable mode caused by the abnormality detection by the determined feature mode in the related art is less
  • the problem in turn, achieves the effect of enriching the types of identifiable patterns.
  • Step S31 determining whether the system call sequence between two adjacent determination parts in the empty queue is not a single system call or the determining mode, or determining whether the system call sequence between the determining part and the end of the empty queue is greater than the The maximum length of the mode is determined in the feature pattern library;
  • Step S32 in the case that the determination result is yes, the first determining unit that matches the fuzzy feature pattern in the empty queue Divancing the team and converting each system call in the first determined portion into a state added to the sequence of states;
  • Step S33 if the determination result is no, continue to determine other system calls in the to-be-detected system call sequence.
  • the optional implementation manner by determining a relationship between a system call sequence between two adjacent determination portions in a real-time detected empty queue and a single system call or determining a mode, or determining a determination portion
  • the relationship between the maximum length of the mode determined in the system call sequence feature pattern library and the end of the empty queue is used as the condition for dequeuing, which avoids the repeated detection of the system call sequence that satisfies the dequeue condition during the abnormality detection process. The waste of resources.
  • Step S301 executing a system call si, adding it to the tail of the empty queue a;
  • Step S302 Match the system call sequence in the queue a with the feature pattern in the feature library. If the sequence header matches a certain feature pattern, go to step S304; if the header of the sequence matches the first one of the fuzzy mode ⁇ Determining part, go to step S303; if the sequence matches a feature pattern before the part, go to step S301; if not, go to step S305;
  • Step S303 sequentially locate all the determined portions of the fuzzy mode ⁇ in the queue a, and confirm the adjacent two determined portions in the queue a, or the system call sequence length d i between the determined portion and the end of the queue a, if d i is greater than Determining the maximum length d max of the mode in the feature pattern library, going to step S306; if di is not greater than d max , and the queue a locates a part of the determination portion of the fuzzy mode ⁇ , go to step S101; if d i is not greater than d max , And the queue a locates all the determined portions of the fuzzy mode ⁇ , proceeds to step S104;
  • Step S304 Record the corresponding status number C i , join the status sequence, the matching part of the queue a dequeue, go to step S301;
  • Step S305 the state of each system call in the queue a joins the state sequence, clears the queue a, and proceeds to step S301;
  • Step S306 The system call sequence of the first determined portion of the queue a matching the fuzzy mode ⁇ is dequeued, and the state corresponding to each system call in the system call sequence is added to the state sequence, and the process proceeds to step S302.
  • the using the trained Markov model to detect the abnormality of the to-be-detected system call sequence includes the following steps:
  • Step S41 in the trained Markov model, acquiring a number of a predetermined number of state sequences corresponding to the call sequence of the system to be detected that is less than a predetermined threshold, wherein the predetermined threshold is an abnormality with a system call sequence.
  • step S42 when it is determined that the number is greater than the second predetermined threshold, it is determined that the call sequence to be detected is abnormal.
  • the values of the foregoing second predetermined threshold include, but are not limited to, 2, 3, and no limitation is imposed herein. set.
  • detecting an abnormality of the calling sequence of the system to be detected further solves the identifiable result caused by the abnormality detection by the determined feature mode in the related art.
  • the problem of fewer types of patterns has led to an increase in the variety of identifiable modes.
  • State set Q ⁇ q 1 , ..., q i , ..., q n ⁇ .
  • Each system call and feature pattern corresponds to q i one-to-one, where 1 ⁇ i ⁇ n.
  • State transition probability matrix among them It indicates the probability that the time t is in the state q i and the time t+1 is in the state q j .
  • the estimation method is among them Indicates the number of times the state has occurred for q i q j .
  • V(Seq) ranges from 0 to 1. If the V value is small, indicating that the Seq has a low degree of matching with the normal system call sequence, then the Seq is highly likely to be abnormal. Conversely, the possibility of Seq anomalies is small. Therefore, the V value also indicates the degree of abnormality of the system call short sequence.
  • the degree of abnormality of successive L states can be detected using the trained Markov model.
  • the sliding window with the length k is followed by the detection point to continuously slide forward, and the probability that the probability of the last k state sequences is less than the threshold v is recorded. When the count is greater than 2, it is marked as an exception.
  • the feature pattern can be more deeply mined according to the correlation within the system call sequence.
  • the mined feature patterns are not limited to a set of determined system call sequences, that is, the feature patterns can be ambiguous.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • module may implement software and/or predetermined functions.
  • a combination of hardware is also possible and contemplated.
  • FIG. 4 is a structural block diagram of an abnormality detecting apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes:
  • the first processing module 42 is configured to acquire a fuzzy feature mode of the system call sequence, and add the fuzzy feature mode to the feature mode library, where the fuzzy feature mode is a feature mode including a determining mode and a fuzzy mode, Determining a mode represents a feature pattern composed of a plurality of system calls in a determined order, the fuzzy mode representing a feature pattern of a type of system call sequence;
  • the above fuzzy feature mode is a feature mode including a determination mode and a fuzzy mode.
  • the determining mode represents a feature pattern composed in a determined order by a plurality of system calls. For example, if the system calls the sequence l ⁇ C, and l consists of a set of system calls to determine the order, then l is the determination mode, C is the feature pattern library, is a set of feature patterns; fuzzy mode represents a type of system The feature pattern of the call sequence, for example, if the system call sequence l ⁇ C, l cannot be represented by a certain system call short sequence, but represents a type of system call short sequence, then l is the fuzzy mode.
  • the second processing module 44 is configured to match the system call sequence of the training set with the feature pattern included in the feature pattern library, and obtain a sequence of states corresponding to the system call sequence of the training set according to a rule corresponding to the matching result. ;
  • an obtaining module 46 configured to train the Markov model with the sequence of states to obtain a trained Markov model
  • the detection module 48 is configured to detect an abnormality of the sequence of the system to be detected using the trained Markov model.
  • the application scenario of the abnormality detecting method includes, but is not limited to, when the host is used, the host detects the system call sequence of the critical program, such as lpr, ftpd, sendmail, etc., to detect Network or local intrusion, or, when used on the terminal, after the terminal connects to the server, download the trained Markov model and directly perform the detection locally.
  • the critical program such as lpr, ftpd, sendmail, etc.
  • the fuzzy feature mode of the system call sequence is obtained, and the fuzzy feature mode is added to the feature mode library, wherein the fuzzy feature mode is a feature mode including a determining mode and a fuzzy mode, and determining the mode representation by using multiple
  • the fuzzy feature mode is a feature mode including a determining mode and a fuzzy mode, and determining the mode representation by using multiple
  • the system call sequence of the training set is matched with the feature pattern included in the feature pattern library, according to a rule corresponding to the matching result
  • the acquisition of the fuzzy feature pattern is mainly based on the correlation within the system call sequence, and the feature pattern is further drilled to determine the system. Calling a running trajectory trend of the sequence, and combining the online detection algorithm to further detect the abnormality of the system call sequence in a certain process, and is not limited to the identifiable mode caused by the abnormality detection by the determined feature mode in the related art.
  • the problem of a small variety further enriches the types of identifiable modes and further improves the efficiency of abnormal detection.
  • the feature pattern is first extracted from the sequence of system calls generated by the normal behavior of the system process, and the features are formed. Pattern library. Then, according to the feature pattern library, the system call sequence generated by the normal behavior is compressed, and the compressed sequence is used as a training set for the training of the Markov model, and the model that the program runs normally is obtained.
  • the online detection algorithm is adopted in the detection. Firstly, the system call sequence to be detected is compressed into a state sequence in real time according to the feature pattern library, and then the abnormality degree is calculated according to the Markov model, and then the abnormality is detected.
  • the first processing module 42 is further configured to obtain the determining mode by performing the following steps until the longest feature mode is obtained: acquiring a current feature mode whose length is the first threshold, and Adding the current feature pattern to the feature pattern library; connecting the current feature pattern with the adjacent feature pattern to obtain a connected feature pattern; determining whether the connected feature pattern satisfies the support requirement; If yes, the connected feature mode is set to the feature mode to be acquired, and the feature pattern to be acquired is added to the feature pattern library, where the current feature mode is not included in the feature pattern library and Adjacent feature mode, the support degree is a probability that the system call short sequence appears as a whole in the running track of the system process; determining whether the feature pattern library includes the longest feature pattern; if the judgment result is yes Next, the feature pattern included in the feature pattern library is taken as the determination mode; In the case, a predetermined number of adjacent feature patterns are selected in the feature pattern library to be connected, and the connected feature pattern is obtained; determining whether the connected feature pattern satisfies the support requirement
  • the value of the foregoing first threshold includes but is not limited to: 1.
  • the determination mode is obtained by connecting all feature patterns to further acquire the fuzzy feature pattern.
  • the first processing module 42 is further configured to obtain the fuzzy mode by repeatedly performing the following steps until the feature mode with the longest length is obtained: the distance in the feature mode set is the first The two determining modes of the predetermined threshold are combined to obtain a combined feature mode, wherein the feature mode located in the middle of the two determining modes is a single system call sequence or the determining mode; determining whether the merged feature mode satisfies the support degree Requirement; if the judgment result is yes, the merged feature pattern is taken as the target feature pattern, and the target feature pattern is added to the feature pattern library, wherein the feature pattern library does not include the feature of participating in the merge a mode; determining whether the feature pattern library has the longest target feature pattern in the feature pattern library; if the determination result is yes, the target feature pattern included in the feature pattern library is used as the fuzzy mode; In the case where the distance in the feature pattern library is the first predetermined threshold Row merge, obtain the merged feature pattern, and determine whether the merged feature pattern satisfies the support
  • the foregoing first predetermined threshold may be 2.
  • the acquisition of the fuzzy feature pattern is divided into two parts.
  • the first part is to determine the mode extraction. Specifically, firstly, look for the feature mode of length 1, that is, a single system call that satisfies the minimum support degree; then merge the adjacent single system calls satisfying the support requirement into a feature of length 2. Mode; on this basis, adjacent feature patterns of length 2 and 1 satisfying the support requirement are connected to obtain a feature pattern of length 3. And so on, until no new longer feature patterns appear.
  • the second part is the fuzzy mode extraction. Specifically, based on the previous results, the feature patterns with distance 2 and satisfying the support requirements are merged into a new feature pattern.
  • the intermediate system call or feature pattern is regarded as a random Independent program features, called fuzzy parts. Then, repeat the previous step, and so on, until no new longer feature patterns appear.
  • the distance concept of the feature pattern involved in the fuzzy feature pattern acquisition method included in the above example is defined as 1, the distance between feature patterns at a system call or feature pattern is 2, and so on.
  • the length of the system call sequence is not limited, as long as the minimum support requirement is met, it is regarded as a feature mode.
  • the support degree sup(l) for the short sequence l of the system call is defined as: Where len(T) represents the length of the system call sequence T, and also indicates that T contains the number of system calls. Num(T,l) represents the number of times the system call short sequence l appears in T.
  • the feature mode is defined as if one system calls the support of the short sequence 1 to be greater than the threshold requirement, then l is considered to be one of the feature modes, denoted as ci, and i denotes the number of the feature mode.
  • the length priority principle is adopted. Specifically, if the longer feature mode includes a shorter feature mode, the shorter feature mode is deleted from the feature mode library.
  • the length priority principle reduces the size of the feature library and reduces the training time of the Markov chain model.
  • FIG. 5 is a structural block diagram (1) of an abnormality detecting apparatus according to an embodiment of the present invention.
  • the second processing module 44 includes:
  • adding unit 52 configured to add the to-be-detected system call sequence to the tail of the empty queue
  • the matching unit 54 is configured to match the to-be-detected system call sequence in the empty queue with the determined mode in the feature pattern library;
  • the converting unit 56 is configured to convert the to-be-detected system call sequence into the state sequence when the to-be-detected system call sequence matches the determined mode;
  • the first obtaining unit 58 is configured to: when the to-be-detected system call sequence does not match the determined mode, according to the waiting A matching result of the system call sequence and the fuzzy feature mode is obtained to obtain a sequence of states corresponding to the call sequence of the system to be detected.
  • the system call sequence in the currently executed system process is matched with the determined mode in the feature pattern library, and if yes, the current system call sequence is converted into a state sequence and added.
  • the system call sequence in the currently executed system process is matched with the fuzzy feature pattern in the feature pattern library to further identify the abnormal state, thereby avoiding the relevant technology and judging
  • the problem that the accuracy of the detection abnormality caused by the abnormality of the current system call sequence is considered to be reduced.
  • FIG. 6 is a structural block diagram (2) of an abnormality detecting apparatus according to an embodiment of the present invention.
  • the first obtaining unit 58 includes:
  • the first determining sub-unit 62 is configured to determine whether the header of the to-be-detected system call sequence matches the first determining portion of the fuzzy feature mode, wherein the header of the to-be-detected system call sequence includes at least one system call;
  • the obtaining sub-unit 64 is configured to, in the case that the determination result is yes, sequentially match other system calls of the to-be-detected system call sequence with other determined portions of the fuzzy feature pattern, and obtain the to-be-detected according to the matching result.
  • the sequence of states corresponding to the system call sequence is configured to, in the case that the determination result is yes, sequentially match other system calls of the to-be-detected system call sequence with other determined portions of the fuzzy feature pattern, and obtain the to-be-detected according to the matching result.
  • the determining part in the fuzzy feature mode may be a part in the fuzzy mode that can be determined by using a system call, and the fuzzy part included in the fuzzy feature mode may be a fuzzy mode. Remove the portion other than the determined portion.
  • the first part of the system call sequence to be detected is first matched with the first determined part of the fuzzy feature mode, and in the case of matching, the other system of the system call sequence to be detected is sequentially
  • the call is matched with other determined portions of the fuzzy feature mode, and the state sequence corresponding to the call sequence of the system to be detected is obtained according to the matching result, which further solves the problem that the identifiable mode caused by the abnormality detection by the determined feature mode in the related art is less
  • the problem in turn, achieves the effect of enriching the types of identifiable patterns.
  • FIG. 7 is a structural block diagram (3) of an abnormality detecting apparatus according to an embodiment of the present invention.
  • the first obtaining unit 58 includes: in addition to the subunit shown in FIG.
  • the second determining sub-unit 72 is configured to determine, between the two determined portions of the empty queue, after the other system calls of the system call sequence to be detected are sequentially matched with other determined portions of the fuzzy feature pattern. Whether the system call sequence is not a single system call or the determining mode; or, the second determining sub-unit 72 is equivalently replaced with the third determining sub-unit, wherein the third determining sub-unit is set to sequentially call the system to be detected After matching other system calls of the sequence with other determined portions of the fuzzy feature pattern, determining whether the system call sequence between the determining portion and the end of the empty queue is greater than a maximum length of the determined mode in the feature pattern library;
  • the processing sub-unit 74 is set to, if the determination result is YES, in the case that the determination result is YES, the first determination part of the empty queue matching the fuzzy feature pattern is dequeued, and the first Each system call in a certain part is converted into a state added to the sequence of states;
  • the determining subunit 76 is arranged to continue to determine other system calls in the sequence of calls to be detected in the event that the determination is negative.
  • the optional implementation manner by determining a relationship between a system call sequence between two adjacent determination portions in a real-time detected empty queue and a single system call or determining a mode, or determining a determination portion
  • the relationship between the maximum length of the mode determined in the system call sequence feature pattern library and the end of the empty queue is used as the condition for dequeuing, which avoids the repeated detection of the system call sequence that satisfies the dequeue condition during the abnormality detection process. The waste of resources.
  • FIG. 8 is a structural block diagram (4) of an abnormality detecting apparatus according to an embodiment of the present invention.
  • the detecting module 48 includes:
  • a second obtaining unit 82 configured to acquire, in the trained Markov model, a number of a predetermined number of state sequences corresponding to the to-be-detected system call sequence that is less than a predetermined threshold, wherein the predetermined threshold is a value corresponding to the degree of abnormality of the system call sequence;
  • the determining unit 84 is configured to determine that the call sequence to be detected is abnormal when it is determined that the number is greater than the second predetermined threshold.
  • the value of the second predetermined threshold includes, but is not limited to, 2, 3, and is not limited herein.
  • detecting an abnormality of the calling sequence of the system to be detected further solves the identifiable result caused by the abnormality detection by the determined feature mode in the related art.
  • the problem of fewer types of patterns has led to an increase in the variety of identifiable modes.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • S1 Obtain a fuzzy feature pattern of the system call sequence, and add the fuzzy feature pattern to the feature pattern library, where the fuzzy feature mode is a feature mode including a determining mode and a fuzzy mode, where the determining mode represents multiple system calls a feature pattern composed in a determined order, the fuzzy mode representing a feature pattern of a type of system call sequence;
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • the processor executes the foregoing steps S1, S2, S3, and S4 according to the stored program code in the storage medium.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the fuzzy feature mode of the system call sequence is obtained, and the fuzzy feature mode is added to the feature mode library, wherein the fuzzy feature mode is a feature mode including the determining mode and the fuzzy mode, and determining the mode representation by using multiple
  • the state sequence corresponding to the call sequence of the system to be detected is used; the state sequence is used as a training set to train the Markov model to obtain a trained Markov model; and the trained Markov model is used to detect the call sequence of the system to be detected Anomaly. That is to say, the present invention is based on the Markov model and the acquisition of the fuzzy feature pattern, wherein the acquisition of the fuzzy feature pattern is mainly based on the correlation within the system call sequence, and the feature pattern is further drilled to determine the sequence of the system call sequence.
  • the trajectory trend is combined with the online detection algorithm to further detect the abnormality of the system call sequence in a certain process, and is not limited to the identifiable mode type caused by the abnormality detection by the determined feature mode in the related art. The problem further enriches the types of identifiable modes and further improves the efficiency of abnormal detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Image Analysis (AREA)

Abstract

一种异常检测方法及装置。其中,该方法包括:获取系统调用序列的模糊特征模式,并将模糊特征模式添加至特征模式库中(S102),其中,模糊特征模式为包括确定模式和模糊模式的特征模式,确定模式表示通过多个系统调用按照确定顺序组成的特征模式,模糊模式表示一类系统调用序列的特征模式;将训练集的系统调用序列与特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取训练集的系统调用序列对应的状态序列(S104);用状态序列训练马尔可夫模型,得到训练后的马尔可夫模型(S106);使用训练后的马尔可夫模型,检测待检测系统调用序列的异常(S108)。通过该方法,解决了相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题。

Description

异常检测方法及装置 技术领域
本发明涉及通信领域,具体而言,涉及一种异常检测方法及装置。
背景技术
相关技术中,许多本地和通过Internet的远程攻击入侵行为,都是利用了存在漏洞的关键程序。针对这些关键程序的攻击是入侵系统的主要手段。入侵检测技术是通过对计算机中留存的系统信息和用户行为在系统中产生的信息进行分析,用以检测对系统的入侵。入侵检测技术分为误用检测(Misuse Detection)和异常检测(Anomaly Detection)。误用检测是基于分析入侵或攻击的相关知识来检测入侵。误用检测的缺点在于对新的入侵方法或一些入侵方法的变异难以检测。并且,它的性能和模式库的大小及体系结构有关。对入侵行为及入侵方法的分析发现,入侵最终都体现在一系列非法的或者异常的系统调用上,由此提出了多种异常检测方法。异常检测是基于正常状态下的系统特征来检查当前状态对正常状态的偏离。
当前主流的异常检测方法之一是特征模式提取结合统计特征检测。在检测时,首先将系统调用序列压缩成系统调用与特征模式混合构成的序列,在此基础上以变长模式为单元建立改进的状态转换矩阵,用状态序列出现的概率区分异常。
由此可见,当前主流的异常检测方法在检测时,先获取关键程序的系统调用序列的确定特征模式,然后根据该确定特征模式进行检测,但是在实际的检测过程中,系统所保存的关键程序的系统调用序列的确定特征模式种类是非常少的,这将导致在异常检测时,可识别的模式较少,无法应对复杂的模式特征,从而导致对关键程序行为的刻画能力低下,进而影响整个异常检测的精确度。
针对相关技术中,通过确定特征模式结合统计特征检测的方法进行异常检测所导致的可识别模式较少的问题,尚未提出有效的解决方案。
发明内容
本发明实施例提供了一种异常检测方法及装置,以至少解决相关技术中通过确定特征模式结合统计特征检测的方法进行异常检测所导致的可识别模式较少的问题。
根据本发明的一个实施例,提供了一种异常检测方法,包括:获取系统调用序列的模糊特征模式,并将所述模糊特征模式添加至特征模式库中,其中,所述模糊特征模式为包括确定模式和模糊模式的特征模式,所述确定模式表示通过多个系统调用按照确定顺序组成的特征模式,所述模糊模式表示一类系统调用序列的特征模式;将训练集的系统调用序列与所述特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取所述训练集的系统调用序列对应的状态序列;用所述状态序列训练马尔可夫模型,得到训练后的马尔可夫模 型;使用所述训练后的马尔可夫模型,检测待检测系统调用序列的异常。
可选地,所述确定模式通过以下方式获取:执行以下步骤,直至得到长度最长的特征模式:获取长度为第一阈值的当前特征模式,并将所述当前特征模式添加至所述特征模式库中;将所述当前特征模式与相邻特征模式进行连接,得到连接后的特征模式;判断所述连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将所述连接后的特征模式设置为待获取特征模式,并将所述待获取特征模式添加至所述特征模式库中,其中,在所述特征模式库中不包括所述当前特征模式和所述相邻特征模式,所述支持度为系统调用短序列作为一个整体在系统进程的运行轨迹中出现的概率;判断所述特征模式库中是否包括所述长度最长的特征模式;在判断结果为是的情况下,将所述特征模式库中所包括的特征模式作为所述确定模式。
可选地,在判断所述特征模式库中不包括所述长度最长的特征模式时,所述方法还包括:在所述特征模式库中选取预定数量的相邻特征模式进行连接,得到连接后的特征模式;判断该连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该连接后的特征模式设置为新的待获取特征模式,并将所述新的待获取特征模式添加至所述特征模式库中,其中,所述特征模式库不包含用于组成所述新的待获取特征模式的相邻特征模式;继续判断所述特征模式库中是否包括所述长度最长的特征模式。
可选地,所述模糊模式通过以下方式获取:重复执行以下步骤,直至得到长度最长的特征模式:将所述特征模式集合中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,其中,位于所述两个确定模式中间的特征模式为单个系统调用序列或者所述确定模式;判断所述合并后的特征模式是否满足支持度要求;在判断结果为是的情况下,将所述合并后的特征模式作为目标特征模式,并将所述目标特征模式添加至所述特征模式库中,其中,所述特征模式库中不包括参与合并的特征模式;判断所述特征模式库中是否包括长度最长的目标特征模式;在判断结果为是的情况下,将所述特征模式库中所包括的所述目标特征模式作为所述模糊模式。
可选地,在判断所述特征模式库中不包括长度最长的目标特征模式时,所述方法还包括:将所述特征模式库中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,并判断所述合并后的特征模式是否满足支持度要求;或者,将所述特征模式库中两个所述目标特征模式进行合并,并判断所述合并后的特征模式是否满足支持度要求;或者,将所述确定模式与所述目标特征模式进行合并,并判断所述合并后的特征模式是否满足支持度要求。
可选地,所述依据与匹配结果对应的规则获取所述待检测系统调用序列对应的状态序列包括:将所述待检测系统调用序列添加至空队列的尾部;将所述空队列中的所述待检测系统调用序列与所述特征模式库中的确定模式进行匹配;在所述待检测系统调用序列与所述确定模式匹配时,将所述待检测系统调用序列转换为所述状态序列;在所述待检测系统调用序列与所述确定模式不匹配时,根据所述待检测系统调用序列与所述模糊特征模式的匹配结果获取所述待检测系统调用序列对应的状态序列。
可选地,所述根据所述待检测系统调用序列与所述模糊特征模式的匹配结果获取所述待检测系统调用序列对应的状态序列包括:判断所述待检测系统调用序列的首部是否匹配所述模糊特征模式的第一个确定部分,其中,所述待检测系统调用序列的首部至少包括一个系统调用;在判断结果为是的情况下,依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配,并根据匹配结果获取所述待检测系统调用序列对应的状态序列。
可选地,在依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配之后还包括:判断所述空队列中相邻两个确定部分之间的系统调用序列是否不为单个系统调用或者所述确定模式,或者,判断所述确定部分与所述空队列末端之间的系统调用序列是否大于所述特征模式库中确定模式的最大长度;在判断结果为是的情况下,所述空队列中与所述模糊特征模式匹配的第一个确定部分出队,并将所述第一个确定部分中每一个系统调用转换成状态添加至所述状态序列中;在判断结果为否的情况下,继续确定所述待检测系统调用序列中的其它系统调用。
可选地,所述使用所述训练后的马尔可夫模型,检测所述待检测系统调用序列的异常包括:在所述训练后的马尔可夫模型中,获取所述待检测系统调用序列对应的预定数量个状态序列中概率小于预定阈值的个数,其中,所述预定阈值为与系统调用序列的异常度对应的值;在判断所述个数大于第二预定阈值时,则确定所述待检测调用序列出现异常。
根据本发明的另一实施例,提供了一种异常检测装置,包括:第一处理模块,设置为获取系统调用序列的模糊特征模式,并将所述模糊特征模式添加至特征模式库中,其中,所述模糊特征模式为包括确定模式和模糊模式的特征模式,所述确定模式表示通过多个系统调用按照确定顺序组成的特征模式,所述模糊模式表示一类系统调用序列的特征模式;第二处理模块,设置为将训练集的系统调用序列与所述特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取所述训练集的系统调用序列对应的状态序列;获取模块,设置为用所述状态序列训练马尔可夫模型,得到训练后的马尔可夫模型;检测模块,设置为使用所述训练后的马尔可夫模型,检测待检测系统调用序列的异常。
可选地,所述第一处理模块还设置为通过以下方式获取所述确定模式:执行以下步骤,直至得到长度最长的特征模式:获取长度为第一阈值的当前特征模式,并将所述当前特征模式添加至所述特征模式库中;将所述当前特征模式与相邻特征模式进行连接,得到连接后的特征模式;判断所述连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将所述连接后的特征模式设置为待获取特征模式,并将所述待获取特征模式添加至所述特征模式库中,其中,在所述特征模式库中不包括所述当前特征模式和所述相邻特征模式,所述支持度为系统调用短序列作为一个整体在系统进程的运行轨迹中出现的概率;判断所述特征模式库中是否包括所述长度最长的特征模式;在判断结果为是的情况下,将所述特征模式库中所包括的特征模式作为所述确定模式。
可选地,所述第一处理模块还设置为在判断所述特征模式库中不包括所述长度最长的特征模式时,在所述特征模式库中选取预定数量的相邻特征模式进行连接,得到连接后的特征 模式;判断该连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该连接后的特征模式设置为新的待获取特征模式,并将所述新的待获取特征模式添加至所述特征模式库中,其中,所述特征模式库不包含用于组成所述新的待获取特征模式的相邻特征模式;继续判断所述特征模式库中是否包括所述长度最长的特征模式。
可选地,所述第一处理模块还设置为通过以下方式获取所述模糊模式:重复执行以下步骤,直至得到长度最长的特征模式:将所述特征模式集合中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,其中,位于所述两个确定模式中间的特征模式为单个系统调用序列或者所述确定模式;判断所述合并后的特征模式是否满足支持度要求;在判断结果为是的情况下,将所述合并后的特征模式作为目标特征模式,并将所述目标特征模式添加至所述特征模式库中,其中,所述特征模式库中不包括参与合并的特征模式;判断所述特征模式库中是否包括长度最长的目标特征模式;在判断结果为是的情况下,将所述特征模式库中所包括的所述目标特征模式作为所述模糊模式。
可选地,所述第一处理模块还设置为在判断所述特征模式库中不包括长度最长的目标特征模式时,将所述特征模式库中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,并判断所述合并后的特征模式是否满足支持度要求;或者,将所述特征模式库中两个所述目标特征模式进行合并,并判断所述合并后的特征模式是否满足支持度要求;或者,将所述确定模式与所述目标特征模式进行合并,并判断所述合并后的特征模式是否满足支持度要求。
可选地,所述第二处理模块包括:添加单元,设置为将所述待检测系统调用序列添加至空队列的尾部;匹配单元,设置为将所述空队列中的所述待检测系统调用序列与所述特征模式库中的确定模式进行匹配;转换单元,设置为在所述待检测系统调用序列与所述确定模式匹配时,将所述待检测系统调用序列转换为所述状态序列;第一获取单元,设置为在所述待检测系统调用序列与所述确定模式不匹配时,根据所述待检测系统调用序列与所述模糊特征模式的匹配结果获取所述待检测系统调用序列对应的状态序列。
可选地,所述第一获取单元包括:第一判断子单元,设置为判断所述待检测系统调用序列的首部是否匹配所述模糊特征模式的第一个确定部分,其中,所述待检测系统调用序列的首部至少包括一个系统调用;获取子单元,设置为在判断结果为是的情况下,依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配,并根据匹配结果获取所述待检测系统调用序列对应的状态序列。
可选地,所述第一获取单元还包括:第二判断子单元,设置为在依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配之后,判断所述空队列中相邻两个确定部分之间的系统调用序列是否不为单个系统调用或者所述确定模式;或者,第三判断子单元,设置为在依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配之后,判断所述确定部分与所述空队列末端之间的系统调用序列是否大于所述特征模式库中确定模式的最大长度;处理子单元,设置为在判断结果为是的情况下,在判断结果为是的情况下,所述空队列中与所述模糊特征模式匹配的第一个确定部 分出队,并将所述第一个确定部分中每一个系统调用转换成状态添加至所述状态序列中;确定子单元,设置为在判断结果为否的情况下,继续确定所述待检测系统调用序列中的其它系统调用。
可选地,所述检测模块包括:第二获取单元,设置为在所述训练后的马尔可夫模型中,获取所述待检测系统调用序列对应的预定数量个状态序列中概率小于预定阈值的个数,其中,所述预定阈值为与系统调用序列的异常度对应的值;确定单元,设置为在判断所述个数大于第二预定阈值时,则确定所述待检测调用序列出现异常。
在本发明实施例中,还提供了一种计算机存储介质,该计算机存储介质可以存储有执行指令,该执行指令用于执行上述实施例中的异常检测方法的实现。
通过本发明实施例,获取系统调用序列的模糊特征模式,并将该模糊特征模式添加至特征模式库中,其中,模糊特征模式为包括确定模式和模糊模式的特征模式,确定模式表示通过多个系统调用按照确定顺序组成的特征模式,模糊模式表示一类系统调用序列的特征模式;将待检测系统调用序列与该特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取该待检测系统调用序列对应的状态序列;将该状态序列作为训练集训练马尔可夫模型,得到训练后的马尔可夫模型;使用该训练后的马尔可夫模型,检测该待检测系统调用序列的异常。也就是说,本发明基于马尔可夫模型以及结合模糊特征模式的获取,其中该模糊特征模式的获取主要是根据系统调用序列内部的相关性,更深入挖掘特征模式,来判断系统调用序列的一种运行轨迹趋势,并结合在线的检测算法进一步检测某一进程中的系统调用序列的异常,而不仅仅局限于相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题,进而丰富了可识别模式种类,进一步达到提高异常检测效率的效果。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的异常检测流程图;
图2是根据本发明实施例的基于马尔可夫链的异常检测模型结构框图;
图3是根据本发明实施例的在线检测算法流程图;
图4是根据本发明实施例的异常检测装置的结构框图;
图5是根据本发明实施例的异常检测装置的结构框图(一);
图6是根据本发明实施例的异常检测装置的结构框图(二);
图7是根据本发明实施例的异常检测装置的结构框图(三);
图8是根据本发明实施例的异常检测装置的结构框图(四)。
具体实施方式
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
实施例1
在本实施例中提供了一种异常检测方法,图1是根据本发明实施例的异常检测流程图,如图1所示,该流程包括如下步骤:
步骤S102,获取系统调用序列的模糊特征模式,并将该模糊特征模式添加至特征模式库中;
需要说明的是,上述模糊特征模式为包括确定模式和模糊模式的特征模式。其中,确定模式表示通过多个系统调用按照确定顺序组成的特征模式。例如,若系统调用序列l∈C,且l由一组系统调用以确定的顺序组成,则l为确定模式,C为特征模式库,是一个由特征模式构成的集合;模糊模式表示一类系统调用序列的特征模式,例如,若系统调用序列l∈C,l无法由某一确定的系统调用短序列表示,而是代表一类系统调用短序列,则l为模糊模式。
步骤S104,将训练集的系统调用序列与该特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取该训练集的系统调用序列对应的状态序列;
需要说明的是,在本实施例中,上述系统进程中可以包括多个系统调用。
步骤S106,用状态序列训练马尔可夫模型,得到训练后的马尔可夫模型;
步骤S108,使用该训练后的马尔可夫模型,检测待检测系统调用序列的异常。
可选地,在本实施例中,上述异常检测方法的应用场景包括但并不限于:在主机上使用时,主机通过监视关键程序的系统调用序列,如lpr,ftpd,sendmail等,以检测来自网络或者本地的入侵,或者,在终端上使用时,终端连接服务器后,下载训练好的马尔可夫模型,并直接在本地实施检测。在上述应用场景下,获取系统调用序列的模糊特征模式,并将该模糊特征模式添加至特征模式库中,其中,模糊特征模式为包括确定模式和模糊模式的特征模式,确定模式表示通过多个系统调用按照确定顺序组成的特征模式,模糊模式表示一类系统调用序列的特征模式;将训练集的系统调用序列与该特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取该训练集的系统调用序列对应的状态序列;用状态序列训练马尔可夫模型,得到训练后的马尔可夫模型;使用该训练后的马尔可夫模型,检测待检测系统调用序列的异常。也就是说,在本实施例中,基于马尔可夫模型以及结合模糊特征模式的获取,其中该模糊特征模式的获取主要是根据系统调用序列内部的相关性,更深入挖掘特征模式,来判断系统调用序列的一种运行轨迹趋势,并结合在线的检测算法进一步检测某一进 程中的系统调用序列的异常,而不仅仅局限于相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题,进而丰富了可识别模式种类,进一步达到提高异常检测效率的效果。
下面结合具体示例,本实施例进行举例说明。
在本示例中,主要是基于马尔可夫链的异常检测模型,其中,主要模型结构图参考图2。如图2所示,首先从系统进程运程正常行为产生的系统调用序列中提取特征模式,构成特征模式库。然后,根据特征模式库对正常行为产生的系统调用序列压缩,压缩序列作为训练集用于马尔可夫模型的训练,得到程序正常运行的模型。检测时采用在线检测算法,首先根据特征模式库将待检测系统调用序列实时地压缩成状态序列,再根据马尔可夫模型进行异常度计算,进而检测异常。
需要说明的是,图2中涉及的程序运行轨迹可以为T,其中T=(sc1,sc2,…,sci,…),sci表示序列中第i个系统调用。
Tˊ为T经过压缩处理后的程序轨迹,该轨迹是由系统调用和特征模式混合构成的序列,表示为Tˊ=(s1,s2,…,si,…),si可能是sci也可能是ci,,其中,ci,为特征模式。显然,len(Tˊ)<len(T)。
在一个可选地实施方式中,确定模式通过以下方式确定:
执行以下步骤,直至得到长度最长的特征模式:获取长度为第一阈值的当前特征模式,并将该当前特征模式添加至该特征模式库中;将该当前特征模式与相邻特征模式进行连接,得到连接后的特征模式;判断该连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该连接后的特征模式设置为待获取特征模式,并将该待获取特征模式添加至该特征模式库中,其中,在该特征模式库中不包括该当前特征模式和该相邻特征模式,该支持度为系统调用短序列作为一个整体在系统进程的运行轨迹中出现的概率;判断该特征模式库中是否包括该长度最长的特征模式;在判断结果为是的情况下,将该特征模式库中所包括的特征模式作为该确定模式;在判断结果为否的情况下,在该特征模式库中选取预定数量的相邻特征模式进行连接,得到连接后的特征模式;判断该连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该连接后的特征模式设置为新的待获取特征模式,并将该新的待获取特征模式添加至该特征模式库中,其中,该特征模式库不包含用于组成该新的待获取特征模式的相邻特征模式;继续判断该特征模式库中是否包括该长度最长的特征模式。
需要说明的是,上述第一阈值的取值包括但并不限于:1。
在本可选实施方式中,通过将所有特征模式进行连接得到确定模式,以进一步获取模糊特征模式。
在一个可选地实施方式中,模糊模式通过以下方式确定:
重复执行以下步骤,直至得到长度最长的特征模式:将该特征模式集合中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,其中,位于该两个确定模式中间 的特征模式为单个系统调用序列或者该确定模式;判断该合并后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该合并后的特征模式作为目标特征模式,并将该目标特征模式添加至该特征模式库中,其中,该特征模式库中不包括参与合并的特征模式;判断该特征模式库中是否包括长度最长的目标特征模式;在判断结果为是的情况下,将该特征模式库中所包括的该目标特征模式作为该模糊模式;在判断结果为否的情况下,将该特征模式库中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,并判断该合并后的特征模式是否满足支持度要求;或者,将该特征模式库中两个该目标特征模式进行合并,并判断该合并后的特征模式是否满足支持度要求;或者,将该确定模式与该目标特征模式进行合并,并判断该合并后的特征模式是否满足支持度要求。
需要说明的是,在本可选实施例中,上述第一预定阈值可以为2。
下面结合具体示例,对本实施例中涉及到的确定模式和模糊模式的确定方法进行举例说明。
模糊特征模式的获取分为两个部分。第一部分为确定模式提取,具体为,首先寻找长度为1的特征模式,即满足最低支持度的单个系统调用;然后将相邻的、满足支持度要求的单个系统调用合并成长度为2的特征模式;在此基础上,再将相邻的满足支持度要求的长度为2和1的特征模式进行连接,以求得长度为3的特征模式。依此类推,直到没有新的更长的特征模式出现。第二部分为模糊模式提取,具体为,首先在之前结果的基础上,将距离为2的、满足支持度要求的特征模式合并为新的特征模式,中间的系统调用或者特征模式看成一个随机的、独立的程序特征,称为模糊部分。然后,重复上一步,依此类推,直到没有新的更长的特征模式出现。
需要说明的是,上述示例中包含的模糊特征模式获取方法中涉及到的特征模式的距离概念。定义系统调用序列中相邻的特征模式的距离为1,间隔一个系统调用或者特征模式的特征模式之间距离为2,以此类推。系统调用序列的长度不加限制,只要满足最低支持度的要求,都被看作是特征模式。
对系统调用短序列l的支持度sup(l)定义为:
Figure PCTCN2016108764-appb-000001
其中,len(T)表示系统调用序列T的长度,也表示T含有系统调用的个数。num(T,l)表示系统调用短序列l在T中出现的次数。
特征模式定义为若某个系统调用短序列l的支持度大于阈值要求,则认为l是特征模式之一,记为ci,i表示特征模式的编号。
此外,在实施例中,采用长度优先原则,具体为,若较长的特征模式包含了较短的特征模式,则将较短的特征模式从特征模式库中删除。长度优先原则减小了特征库的大小,降低了马尔可夫链模型的训练时间。
在一个可选地实施方式中,依据与该匹配结果对应的规则获取该系统调用序列对应的状态序列包括以下步骤:
步骤S11,将待检测系统调用序列添加至空队列的尾部;
步骤S12,将空队列中的该待检测系统调用序列与该特征模式库中的确定模式进行匹配;
步骤S13,在待检测系统调用序列与该确定模式匹配时,将该待检测系统调用序列转换为该状态序列;
步骤S14,在待检测系统调用序列与该确定模式不匹配时,根据该待检测系统调用序列与该模糊特征模式的匹配结果获取该待检测系统调用序列对应的状态序列。
可选地,在本实施例中,首先是将当前执行的系统进程中的系统调用序列与特征模式库中的确定模式进行匹配,如果匹配,则将该当前系统调用序列转换成状态序列并添加至状态序列中;如果不匹配,则将当前执行的系统进程中的系统调用序列与特征模式库中的模糊特征模式进行匹配,以进一步对异常状态进行识别,从而避免了相关技术中,在判断在该当前系统调用序列与该确定模式不匹配时,则认为当前系统调用序列出现异常所造成检测异常的准确度降低的问题。
在一个可选地实施方式中,根据该待检测系统调用序列与该模糊特征模式的匹配结果获取该待检测系统调用序列对应的状态序列包括以下步骤:
步骤S21,判断待检测系统调用序列的首部是否匹配该模糊特征模式的第一个确定部分,其中,该待检测系统调用序列的首部至少包括一个系统调用;
步骤S22,在判断结果为是的情况下,依次将该待检测系统调用序列的其它系统调用与该模糊特征模式的其它确定部分进行匹配,并根据匹配结果获取该待检测系统调用序列对应的状态序列。
需要说明的是,在本可选实施方式中,模糊特征模式中的确定部分可以为模糊模式中可以用系统调用确定表达的部分,而模糊特征模式中所包括的模糊部分,则可以为模糊模式中除去确定部分之外的部分。
可选地,在本可选实施方式中,首先将待检测系统调用序列的首部与模糊特征模式的第一个确定部分进行匹配,在匹配的情况下,依次将待检测系统调用序列的其它系统调用与模糊特征模式的其它确定部分进行匹配,并根据匹配结果获取待检测系统调用序列对应的状态序列,进一步解决了相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题,进而达到了丰富可识别模式种类的效果。
在一个可选地实施方式中,在依次将该当前系统调用序列的其它系统调用与该模糊特征模式的其它确定部分进行匹配之后还包括以下步骤:
步骤S31,判断空队列中相邻两个确定部分之间的系统调用序列是否不为单个系统调用或者该确定模式,或者,判断该确定部分与该空队列末端之间的系统调用序列是否大于该特征模式库中确定模式的最大长度;
步骤S32,在判断结果为是的情况下,该空队列中与该模糊特征模式匹配的第一个确定部 分出队,并将该第一个确定部分中每一个系统调用转换成状态添加至该状态序列中;
步骤S33,在判断结果为否的情况下,继续确定该待检测系统调用序列中的其它系统调用。
可选地,在本可选实施方式中,通过将实时检测到的空队列中相邻两个确定部分之间的系统调用序列与单个系统调用或者确定模式之间的关系,或者,判断确定部分与空队列末端之间的系统调用序列特征模式库中确定模式的最大长度之间的关系作为出队的条件,避免了在异常检测过程中,满足出队条件的系统调用序列重复进行检测所造成的资源浪费问题。
下面结合具体示例,对本实施例进行举例说明。
在本示例中,主要是将系统调用序列改造成状态序列的一种在线检测方法。如图3所示,其中主要包括以下步骤:
步骤S301:执行一个系统调用si,将其加入到空队列a的尾部;
步骤S302:把队列a中的系统调用序列与特征库中特征模式进行匹配,若此序列首部刚好匹配某特征模式,转到步骤S304;若此序列的首部匹配某个模糊模式α的第一个确定部分,转到步骤S303;若此序列匹配某特征模式前部分,转到步骤S301;若不能匹配,转到步骤S305;
步骤S303:依次在队列a中定位模糊模式α的所有确定部分,并且确认队列a中相邻两个确定部分,或确定部分与队列a末端之间的系统调用序列长度di,若di大于特征模式库中确定模式的最大长度dmax,转到步骤S306;若di不大于dmax,且队列a定位了模糊模式α的一部分确定部分,转到步骤S101;若di不大于dmax,且队列a定位了模糊模式α的所有确定部分,转到步骤S104;
步骤S304:记录对应的状态编号Ci,加入状态序列,队列a中匹配的部分出队,转到步骤S301;
步骤S305:队列a中每个系统调用对应的状态加入状态序列,清空队列a,转到步骤S301;
步骤S306:队列a与模糊模式α匹配的第一个确定部分的系统调用序列出队,该系统调用序列中每个系统调用对应的状态加入状态序列,转到步骤S302。
重复执行以上步骤,直至进程结束,将系统调用序列改造成状态序列。
在一个可选地实施方式中,该使用该训练后的马尔可夫模型,检测该待检测系统调用序列的异常包括以下步骤:
步骤S41,在该训练后的马尔可夫模型中,获取该待检测系统调用序列对应的预定数量个状态序列中概率小于预定阈值的个数,其中,该预定阈值为与系统调用序列的异常度对应的值;
步骤S42,在判断该个数大于第二预定阈值时,则确定该待检测调用序列出现异常。
需要说明的是,上述第二预定阈值的取值包括但并不限于:2、3,在此,并不做任何限 定。
可选地,在本可选实施方式中,通过使用训练后的马尔可夫模型,检测待检测系统调用序列的异常,进一步解决了相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题,进而达到了提高可识别模式种类的效果。
下面结合具体示例,对本实施例进行举例说明。
对于马尔可夫链模型λ=(Q,π,P),有
1)状态集合Q={q1,…,qi,…,qn}。每个系统调用和特征模式都与qi一一对应,其中1≤i≤n。
2)
Figure PCTCN2016108764-appb-000002
初始状态分布,估算方法为
Figure PCTCN2016108764-appb-000003
其中
Figure PCTCN2016108764-appb-000004
表示状态qi对应的系统调用或特征模式在Tˊ中出现的次数,N表示Tˊ的长度。
3)状态转移概率矩阵
Figure PCTCN2016108764-appb-000005
其中
Figure PCTCN2016108764-appb-000006
表示t时刻处于状态qi,t+1时刻处于状态qj的概率,估算方法为
Figure PCTCN2016108764-appb-000007
其中
Figure PCTCN2016108764-appb-000008
表示状态对qiqj出现的次数。
长度为L的状态短序列Seq=(q1,q2,…,qi,…,qL)的匹配因子为:
Figure PCTCN2016108764-appb-000009
V(Seq)的取值范围在0到1之间。如果V值很小,说明Seq与正常系统调用序列匹配度低,那么Seq异常的可能性大。反之,Seq异常的可能性小。因此,V值也表示系统调用短序列的异常度。
在本实施例中,使用训练好的马尔可夫模型,可以检测连续L个状态的异常度。检测时,采用长度为k的滑动窗口跟随检测点不断向前滑动,记录最近k个状态序列中概率小于阈值v的个数。当计数大于2时,则标记为一次异常。
在本实施例中能够根据系统调用序列内部的相关性,更深入的挖掘特征模式。并且挖掘出的特征模式并不局限于一组确定的系统调用序列,也就是说,该特征模式可以是模糊的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
实施例2
在本实施例中还提供了一种异常检测装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或 硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图4是根据本发明实施例的异常检测装置的结构框图,如图4所示,该装置包括:
1)第一处理模块42,设置为获取系统调用序列的模糊特征模式,并将该模糊特征模式添加至特征模式库中,其中,该模糊特征模式为包括确定模式和模糊模式的特征模式,该确定模式表示通过多个系统调用按照确定顺序组成的特征模式,该模糊模式表示一类系统调用序列的特征模式;
需要说明的是,上述模糊特征模式为包括确定模式和模糊模式的特征模式。其中,确定模式表示通过多个系统调用按照确定顺序组成的特征模式。例如,若系统调用序列l∈C,且l由一组系统调用以确定的顺序组成,则l为确定模式,C为特征模式库,是一个由特征模式构成的集合;模糊模式表示一类系统调用序列的特征模式,例如,若系统调用序列l∈C,l无法由某一确定的系统调用短序列表示,而是代表一类系统调用短序列,则l为模糊模式。
2)第二处理模块44,设置为将训练集的系统调用序列与该特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取该训练集的系统调用序列对应的状态序列;
3)获取模块46,设置为用该状态序列训练马尔可夫模型,得到训练后的马尔可夫模型;
4)检测模块48,设置为使用该训练后的马尔可夫模型,检测待检测系统调用序列的异常。
可选地,在本实施例中,上述异常检测方法的应用场景包括但并不限于:在主机上使用时,主机通过监视关键程序的系统调用序列,如lpr,ftpd,sendmail等,以检测来自网络或者本地的入侵,或者,在终端上使用时,终端连接服务器后,下载训练好的马尔可夫模型,并直接在本地实施检测。在上述应用场景下,获取系统调用序列的模糊特征模式,并将该模糊特征模式添加至特征模式库中,其中,模糊特征模式为包括确定模式和模糊模式的特征模式,确定模式表示通过多个系统调用按照确定顺序组成的特征模式,模糊模式表示一类系统调用序列的特征模式;将训练集的系统调用序列与该特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取该训练集的系统调用序列对应的状态序列;用状态序列训练马尔可夫模型,得到训练后的马尔可夫模型;使用该训练后的马尔可夫模型,检测待检测系统调用序列的异常。也就是说,在本实施例中,基于马尔可夫模型以及结合模糊特征模式的获取,其中该模糊特征模式的获取主要是根据系统调用序列内部的相关性,更深入挖掘特征模式,来判断系统调用序列的一种运行轨迹趋势,并结合在线的检测算法进一步检测某一进程中的系统调用序列的异常,而不仅仅局限于相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题,进而丰富了可识别模式种类,进一步达到提高异常检测效率的效果。
下面结合具体示例,本实施例进行举例说明。
在本示例中,主要是基于马尔可夫链的异常检测模型,其中,主要模型结构图参考图2。如图2所示,首先从系统进程运程正常行为产生的系统调用序列中提取特征模式,构成特征 模式库。然后,根据特征模式库对正常行为产生的系统调用序列压缩,压缩序列作为训练集用于马尔可夫模型的训练,得到程序正常运行的模型。检测时采用在线检测算法,首先根据特征模式库将待检测系统调用序列实时地压缩成状态序列,再根据马尔可夫模型进行异常度计算,进而检测异常。
需要说明的是,图2中涉及的程序运行轨迹可以为T,其中T=(sc1,sc2,…,sci,…),sci表示序列中第i个系统调用。
Tˊ为T经过压缩处理后的程序轨迹,该轨迹是由系统调用和特征模式混合构成的序列,表示为Tˊ=(s1,s2,…,si,…),si可能是sci也可能是ci,,其中,ci,为特征模式。显然,len(Tˊ)<len(T)。
在一个可选地实施方式中,第一处理模块42还设置为通过以下方式获取该确定模式:执行以下步骤,直至得到长度最长的特征模式:获取长度为第一阈值的当前特征模式,并将该当前特征模式添加至该特征模式库中;将该当前特征模式与相邻特征模式进行连接,得到连接后的特征模式;判断该连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该连接后的特征模式设置为待获取特征模式,并将该待获取特征模式添加至该特征模式库中,其中,在该特征模式库中不包括该当前特征模式和该相邻特征模式,该支持度为系统调用短序列作为一个整体在系统进程的运行轨迹中出现的概率;判断该特征模式库中是否包括该长度最长的特征模式;在判断结果为是的情况下,将该特征模式库中所包括的特征模式作为该确定模式;在判断结果为否的情况下,在该特征模式库中选取预定数量的相邻特征模式进行连接,得到连接后的特征模式;判断该连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该连接后的特征模式设置为新的待获取特征模式,并将该新的待获取特征模式添加至该特征模式库中,其中,该特征模式库不包含用于组成该新的待获取特征模式的相邻特征模式;继续判断该特征模式库中是否包括该长度最长的特征模式。
需要说明的是,上述第一阈值的取值包括但并不限于:1。
在本可选实施方式中,通过将所有特征模式进行连接得到确定模式,以进一步获取模糊特征模式。
在一个可选地实施方式中,该第一处理模块42还设置为通过以下方式获取该模糊模式:重复执行以下步骤,直至得到长度最长的特征模式:将该特征模式集合中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,其中,位于该两个确定模式中间的特征模式为单个系统调用序列或者该确定模式;判断该合并后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该合并后的特征模式作为目标特征模式,并将该目标特征模式添加至该特征模式库中,其中,该特征模式库中不包括参与合并的特征模式;判断该特征模式库中是否包括长度最长的目标特征模式;在判断结果为是的情况下,将该特征模式库中所包括的该目标特征模式作为该模糊模式;在判断结果为否的情况下,将该特征模式库中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,并判断该合并后的特征模式是否满足支持度要求;或者,将该特征模式库中两个该目标特征模式进行合并,并判断该合并后的特征模式是否满足支持度要求;或者,将该确定模式与该目标特征模式进行合并, 并判断该合并后的特征模式是否满足支持度要求。
需要说明的是,在本可选实施例中,上述第一预定阈值可以为2。
下面结合具体示例,对本实施例中涉及到的确定模式和模糊模式的确定方法进行举例说明。
模糊特征模式的获取分为两个部分。第一部分为确定模式提取,具体为,首先寻找长度为1的特征模式,即满足最低支持度的单个系统调用;然后将相邻的、满足支持度要求的单个系统调用合并成长度为2的特征模式;在此基础上,再将相邻的满足支持度要求的长度为2和1的特征模式进行连接,以求得长度为3的特征模式。依此类推,直到没有新的更长的特征模式出现。第二部分为模糊模式提取,具体为,首先在之前结果的基础上,将距离为2的、满足支持度要求的特征模式合并为新的特征模式,中间的系统调用或者特征模式看成一个随机的、独立的程序特征,称为模糊部分。然后,重复上一步,依此类推,直到没有新的更长的特征模式出现。
需要说明的是,上述示例中包含的模糊特征模式获取方法中涉及到的特征模式的距离概念。定义系统调用序列中相邻的特征模式的距离为1,间隔一个系统调用或者特征模式的特征模式之间距离为2,以此类推。系统调用序列的长度不加限制,只要满足最低支持度的要求,都被看作是特征模式。
对系统调用短序列l的支持度sup(l)定义为:
Figure PCTCN2016108764-appb-000010
其中,len(T)表示系统调用序列T的长度,也表示T含有系统调用的个数。num(T,l)表示系统调用短序列l在T中出现的次数。
特征模式定义为若某个系统调用短序列l的支持度大于阈值要求,则认为l是特征模式之一,记为ci,i表示特征模式的编号。
此外,在实施例中,采用长度优先原则,具体为,若较长的特征模式包含了较短的特征模式,则将较短的特征模式从特征模式库中删除。长度优先原则减小了特征库的大小,降低了马尔可夫链模型的训练时间。
图5是根据本发明实施例的异常检测装置的结构框图(一),如图5所示,第二处理模块44包括:
1)添加单元52,设置为将该待检测系统调用序列添加至空队列的尾部;
2)匹配单元54,设置为将该空队列中的该待检测系统调用序列与该特征模式库中的确定模式进行匹配;
3)转换单元56,设置为在该待检测系统调用序列与该确定模式匹配时,将该待检测系统调用序列转换为该状态序列;
4)第一获取单元58,设置为在该待检测系统调用序列与该确定模式不匹配时,根据该待 检测系统调用序列与该模糊特征模式的匹配结果获取该待检测系统调用序列对应的状态序列。
可选地,在本实施例中,首先是将当前执行的系统进程中的系统调用序列与特征模式库中的确定模式进行匹配,如果匹配,则将该当前系统调用序列转换成状态序列并添加至状态序列中;如果不匹配,则将当前执行的系统进程中的系统调用序列与特征模式库中的模糊特征模式进行匹配,以进一步对异常状态进行识别,从而避免了相关技术中,在判断在该当前系统调用序列与该确定模式不匹配时,则认为当前系统调用序列出现异常所造成检测异常的准确度降低的问题。
图6是根据本发明实施例的异常检测装置的结构框图(二),如图6所示,第一获取单元58包括:
1)第一判断子单元62,设置为判断该待检测系统调用序列的首部是否匹配该模糊特征模式的第一个确定部分,其中,该待检测系统调用序列的首部至少包括一个系统调用;
2)获取子单元64,设置为在判断结果为是的情况下,依次将该待检测系统调用序列的其它系统调用与该模糊特征模式的其它确定部分进行匹配,并根据匹配结果获取该待检测系统调用序列对应的状态序列。
需要说明的是,在本可选实施方式中,模糊特征模式中的确定部分可以为模糊模式中可以用系统调用确定表达的部分,而模糊特征模式中所包括的模糊部分,则可以为模糊模式中除去确定部分之外的部分。
可选地,在本可选实施方式中,首先将待检测系统调用序列的首部与模糊特征模式的第一个确定部分进行匹配,在匹配的情况下,依次将待检测系统调用序列的其它系统调用与模糊特征模式的其它确定部分进行匹配,并根据匹配结果获取待检测系统调用序列对应的状态序列,进一步解决了相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题,进而达到了丰富可识别模式种类的效果。
图7是根据本发明实施例的异常检测装置的结构框图(三),如图7所示,第一获取单元58除了包括图6所示的子单元外还包括:
1)第二判断子单元72,设置为在依次将该待检测系统调用序列的其它系统调用与该模糊特征模式的其它确定部分进行匹配之后,判断该空队列中相邻两个确定部分之间的系统调用序列是否不为单个系统调用或者该确定模式;或者,使用第三判断子单元等同替换上述第二判断子单元72,其中,第三判断子单元设置为在依次将该待检测系统调用序列的其它系统调用与该模糊特征模式的其它确定部分进行匹配之后,判断该确定部分与该空队列末端之间的系统调用序列是否大于该特征模式库中确定模式的最大长度;
2)处理子单元74,设置为在判断结果为是的情况下,在判断结果为是的情况下,该空队列中与该模糊特征模式匹配的第一个确定部分出队,并将该第一个确定部分中每一个系统调用转换成状态添加至该状态序列中;
3)确定子单元76,设置为在判断结果为否的情况下,继续确定该待检测系统调用序列中的其它系统调用。
可选地,在本可选实施方式中,通过将实时检测到的空队列中相邻两个确定部分之间的系统调用序列与单个系统调用或者确定模式之间的关系,或者,判断确定部分与空队列末端之间的系统调用序列特征模式库中确定模式的最大长度之间的关系作为出队的条件,避免了在异常检测过程中,满足出队条件的系统调用序列重复进行检测所造成的资源浪费问题。
图8是根据本发明实施例的异常检测装置的结构框图(四),如图8所示,检测模块48包括:
1)第二获取单元82,设置为在该训练后的马尔可夫模型中,获取该待检测系统调用序列对应的预定数量个状态序列中概率小于预定阈值的个数,其中,该预定阈值为与系统调用序列的异常度对应的值;
2)确定单元84,设置为在判断该个数大于第二预定阈值时,则确定该待检测调用序列出现异常。
需要说明的是,上述第二预定阈值的取值包括但并不限于:2、3,在此,并不做任何限定。
可选地,在本可选实施方式中,通过使用训练后的马尔可夫模型,检测待检测系统调用序列的异常,进一步解决了相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题,进而达到了提高可识别模式种类的效果。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。
实施例3
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,获取系统调用序列的模糊特征模式,并将该模糊特征模式添加至特征模式库中,其中,该模糊特征模式为包括确定模式和模糊模式的特征模式,该确定模式表示通过多个系统调用按照确定顺序组成的特征模式,该模糊模式表示一类系统调用序列的特征模式;
S2,将训练集的系统调用序列与该特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取该训练集的系统调用序列对应的状态序列;
S3,用状态序列训练马尔可夫模型,得到训练后的马尔可夫模型;
S4,使用训练后的马尔可夫模型,检测待检测系统调用序列的异常。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行上述步骤S1、S2、S3、S4。
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
工业实用性
通过本发明实施例,获取系统调用序列的模糊特征模式,并将该模糊特征模式添加至特征模式库中,其中,模糊特征模式为包括确定模式和模糊模式的特征模式,确定模式表示通过多个系统调用按照确定顺序组成的特征模式,模糊模式表示一类系统调用序列的特征模式;将待检测系统调用序列与该特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取该待检测系统调用序列对应的状态序列;将该状态序列作为训练集训练马尔可夫模型,得到训练后的马尔可夫模型;使用该训练后的马尔可夫模型,检测该待检测系统调用序列的异常。也就是说,本发明基于马尔可夫模型以及结合模糊特征模式的获取,其中该模糊特征模式的获取主要是根据系统调用序列内部的相关性,更深入挖掘特征模式,来判断系统调用序列的一种运行轨迹趋势,并结合在线的检测算法进一步检测某一进程中的系统调用序列的异常,而不仅仅局限于相关技术中通过确定的特征模式进行异常检测所导致的可识别模式种类较少的问题,进而丰富了可识别模式种类,进一步达到提高异常检测效率的效果。

Claims (18)

  1. 一种异常检测方法,包括:
    获取系统调用序列的模糊特征模式,并将所述模糊特征模式添加至特征模式库中,其中,所述模糊特征模式为包括确定模式和模糊模式的特征模式,所述确定模式表示通过多个系统调用按照确定顺序组成的特征模式,所述模糊模式表示一类系统调用序列的特征模式;
    将训练集的系统调用序列与所述特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取所述训练集的系统调用序列对应的状态序列;
    用所述状态序列训练马尔可夫模型,得到训练后的马尔可夫模型;
    使用所述训练后的马尔可夫模型,检测待检测系统调用序列的异常。
  2. 根据权利要求1所述的方法,其中,所述确定模式通过以下方式获取:
    执行以下步骤,直至得到长度最长的特征模式:
    获取长度为第一阈值的当前特征模式,并将所述当前特征模式添加至所述特征模式库中;
    将所述当前特征模式与相邻特征模式进行连接,得到连接后的特征模式;
    判断所述连接后的特征模式是否满足支持度要求;
    在判断结果为是的情况下,将所述连接后的特征模式设置为待获取特征模式,并将所述待获取特征模式添加至所述特征模式库中,其中,在所述特征模式库中不包括所述当前特征模式和所述相邻特征模式,所述支持度为系统调用短序列作为一个整体在系统进程的运行轨迹中出现的概率;
    判断所述特征模式库中是否包括所述长度最长的特征模式;
    在判断结果为是的情况下,将所述特征模式库中所包括的特征模式作为所述确定模式。
  3. 根据权利要求2所述的方法,其中,在判断所述特征模式库中不包括所述长度最长的特征模式时,所述方法还包括:
    在所述特征模式库中选取预定数量的相邻特征模式进行连接,得到连接后的特征模式;
    判断该连接后的特征模式是否满足支持度要求;
    在判断结果为是的情况下,将该连接后的特征模式设置为新的待获取特征模式,并将所述新的待获取特征模式添加至所述特征模式库中,其中,所述特征模式库不包含用于组成所述新的待获取特征模式的相邻特征模式;
    继续判断所述特征模式库中是否包括所述长度最长的特征模式。
  4. 根据权利要求3所述的方法,其中,所述模糊模式通过以下方式获取:
    重复执行以下步骤,直至得到长度最长的特征模式:
    将所述特征模式库中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,其中,位于所述两个确定模式中间的特征模式为单个系统调用序列或者所述确定模式;
    判断所述合并后的特征模式是否满足支持度要求;
    在判断结果为是的情况下,将所述合并后的特征模式作为目标特征模式,并将所述目标特征模式添加至所述特征模式库中,其中,所述特征模式库中不包括参与合并的特征模式;
    判断所述特征模式库中是否包括长度最长的目标特征模式;
    在判断结果为是的情况下,将所述特征模式库中所包括的所述目标特征模式作为所述模糊模式。
  5. 根据权利要求4所述的方法,其中,在判断所述特征模式库中不包括长度最长的目标特征模式时,所述方法还包括:
    将所述特征模式库中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,并判断所述合并后的特征模式是否满足支持度要求;或者,
    将所述特征模式库中两个所述目标特征模式进行合并,并判断所述合并后的特征模式是否满足支持度要求;或者,
    将所述确定模式与所述目标特征模式进行合并,并判断所述合并后的特征模式是否满足支持度要求。
  6. 根据权利要求1所述的方法,其中,所述依据与匹配结果对应的规则获取所述待检测系统调用序列对应的状态序列包括:
    将所述待检测系统调用序列添加至空队列的尾部;
    将所述空队列中的所述待检测系统调用序列与所述特征模式库中的确定模式进行匹配;
    在所述待检测系统调用序列与所述确定模式匹配时,将所述待检测系统调用序列转换为所述状态序列;
    在所述待检测系统调用序列与所述确定模式不匹配时,根据所述待检测系统调用序列与所述模糊特征模式的匹配结果获取所述待检测系统调用序列对应的状态序列。
  7. 根据权利要求6所述的方法,其中,所述根据所述待检测系统调用序列与所述模糊特征模式的匹配结果获取所述待检测系统调用序列对应的状态序列包括:
    判断所述待检测系统调用序列的首部是否匹配所述模糊特征模式的第一个确定部分,其中,所述待检测系统调用序列的首部至少包括一个系统调用;
    在判断结果为是的情况下,依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配,并根据匹配结果获取所述待检测系统调用序列对应的状态序列。
  8. 根据权利要求7所述的方法,其中,在依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配之后还包括:
    判断所述空队列中相邻两个确定部分之间的系统调用序列是否不为单个系统调用或者所述确定模式,或者,判断所述确定部分与所述空队列末端之间的系统调用序列是否大于所述特征模式库中确定模式的最大长度;
    在判断结果为是的情况下,所述空队列中与所述模糊特征模式匹配的第一个确定部分出队,并将所述第一个确定部分中每一个系统调用转换成状态添加至所述状态序列中;
    在判断结果为否的情况下,继续确定所述待检测系统调用序列中的其它系统调用。
  9. 根据权利要求1所述的方法,其中,所述使用所述训练后的马尔可夫模型,检测所述待检测系统调用序列的异常包括:
    在所述训练后的马尔可夫模型中,获取所述待检测系统调用序列对应的预定数量个状态序列中概率小于预定阈值的个数,其中,所述预定阈值为与系统调用序列的异常度对应的值;
    在判断所述个数大于第二预定阈值时,则确定所述待检测调用序列出现异常。
  10. 一种异常检测装置,包括:
    第一处理模块,设置为获取系统调用序列的模糊特征模式,并将所述模糊特征模式添加至特征模式库中,其中,所述模糊特征模式为包括确定模式和模糊模式的特征模式,所述确定模式表示通过多个系统调用按照确定顺序组成的特征模式,所述模糊模式表示一类系统调用序列的特征模式;
    第二处理模块,设置为将训练集的系统调用序列与所述特征模式库中所包括的特征模式进行匹配,依据与匹配结果对应的规则获取所述训练集的系统调用序列对应的状态序列;
    获取模块,设置为用所述状态序列训练马尔可夫模型,得到训练后的马尔可夫模型;
    检测模块,设置为使用所述训练后的马尔可夫模型,检测待检测系统调用序列的异常。
  11. 根据权利要求10所述的装置,其中,所述第一处理模块还设置为通过以下方式获取所述确定模式:
    执行以下步骤,直至得到长度最长的特征模式:
    获取长度为第一阈值的当前特征模式,并将所述当前特征模式添加至所述特征模式库中;
    将所述当前特征模式与相邻特征模式进行连接,得到连接后的特征模式;
    判断所述连接后的特征模式是否满足支持度要求;
    在判断结果为是的情况下,将所述连接后的特征模式设置为待获取特征模式,并将所述待获取特征模式添加至所述特征模式库中,其中,在所述特征模式库中不包括所述当前特征模式和所述相邻特征模式,所述支持度为系统调用短序列作为一个整体在系统进程的运行轨迹中出现的概率;
    判断所述特征模式库中是否包括所述长度最长的特征模式;
    在判断结果为是的情况下,将所述特征模式库中所包括的特征模式作为所述确定模式。
  12. 根据权利要求11所述的装置,其中,所述第一处理模块还设置为在判断所述特征模式库中不包括所述长度最长的特征模式时,在所述特征模式库中选取预定数量的相邻特征模式进行连接,得到连接后的特征模式;判断该连接后的特征模式是否满足支持度要求;在判断结果为是的情况下,将该连接后的特征模式设置为新的待获取特征模式,并将所述新的待获取特征模式添加至所述特征模式库中,其中,所述特征模式库不包含用于组成所述新的待获取特征模式的相邻特征模式;继续判断所述特征模式库中是否包括所述长度最长的特征模式。
  13. 根据权利要求12所述的装置,其中,所述第一处理模块还设置为通过以下方式获取所述模糊模式:
    重复执行以下步骤,直至得到长度最长的特征模式:
    将所述特征模式集合中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,其中,位于所述两个确定模式中间的特征模式为单个系统调用序列或者所述确定模式;
    判断所述合并后的特征模式是否满足支持度要求;
    在判断结果为是的情况下,将所述合并后的特征模式作为目标特征模式,并将所述目标特征模式添加至所述特征模式库中,其中,所述特征模式库中不包括参与合并的特征模式;
    判断所述特征模式库中是否包括长度最长的目标特征模式;
    在判断结果为是的情况下,将所述特征模式库中所包括的所述目标特征模式作为所述模糊模式。
  14. 根据权利要求13所述的装置,其中,所述第一处理模块还设置为在判断所述特征模式库中不包括长度最长的目标特征模式时,将所述特征模式库中距离为第一预定阈值的两个确定模式进行合并,得到合并后的特征模式,并判断所述合并后的特征模式是否满足支持度要求;或者,将所述特征模式库中两个所述目标特征模式进行合并,并判断所述合并后的特征模式是否满足支持度要求;或者,将所述确定模式与所述目标特征模式进行合并,并判断所述合并后的特征模式是否满足支持度要求。
  15. 根据权利要求10所述的装置,其中,所述第二处理模块包括:
    添加单元,设置为将所述待检测系统调用序列添加至空队列的尾部;
    匹配单元,设置为将所述空队列中的所述待检测系统调用序列与所述特征模式库中的确定模式进行匹配;
    转换单元,设置为在所述待检测系统调用序列与所述确定模式匹配时,将所述待检测系统调用序列转换为所述状态序列;
    第一获取单元,设置为在所述待检测系统调用序列与所述确定模式不匹配时,根据所述待检测系统调用序列与所述模糊特征模式的匹配结果获取所述待检测系统调用序列对应的状态序列。
  16. 根据权利要求15所述的装置,其中,所述第一获取单元包括:
    第一判断子单元,设置为判断所述待检测系统调用序列的首部是否匹配所述模糊特征模式的第一个确定部分,其中,所述待检测系统调用序列的首部至少包括一个系统调用;
    获取子单元,设置为在判断结果为是的情况下,依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配,并根据匹配结果获取所述待检测系统调用序列对应的状态序列。
  17. 根据权利要求16所述的装置,其中,所述第一获取单元还包括:
    第二判断子单元,设置为在依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配之后,判断所述空队列中相邻两个确定部分之间的系统调用序列是否不为单个系统调用或者所述确定模式;或者,第三判断子单元,设置为在依次将所述待检测系统调用序列的其它系统调用与所述模糊特征模式的其它确定部分进行匹配之后,判断所述确定部分与所述空队列末端之间的系统调用序列是否大于所述特征模式库中确定模式的最大长度;
    处理子单元,设置为在判断结果为是的情况下,在判断结果为是的情况下,所述空队列中与所述模糊特征模式匹配的第一个确定部分出队,并将所述第一个确定部分中每 一个系统调用转换成状态添加至所述状态序列中;
    确定子单元,设置为在判断结果为否的情况下,继续确定所述待检测系统调用序列中的其它系统调用。
  18. 根据权利要求10所述的装置,其中,所述检测模块包括:
    第二获取单元,设置为在所述训练后的马尔可夫模型中,获取所述待检测系统调用序列对应的预定数量个状态序列中概率小于预定阈值的个数,其中,所述预定阈值为与系统调用序列的异常度对应的值;
    确定单元,设置为在判断所述个数大于第二预定阈值时,则确定所述待检测调用序列出现异常。
PCT/CN2016/108764 2016-03-03 2016-12-07 异常检测方法及装置 WO2017148196A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610121003.XA CN107153584A (zh) 2016-03-03 2016-03-03 异常检测方法及装置
CN201610121003.X 2016-03-03

Publications (1)

Publication Number Publication Date
WO2017148196A1 true WO2017148196A1 (zh) 2017-09-08

Family

ID=59742522

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/108764 WO2017148196A1 (zh) 2016-03-03 2016-12-07 异常检测方法及装置

Country Status (2)

Country Link
CN (1) CN107153584A (zh)
WO (1) WO2017148196A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113472721A (zh) * 2020-03-31 2021-10-01 华为技术有限公司 一种网络攻击检测方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766229B (zh) * 2018-12-05 2022-02-11 华东师范大学 一种面向综合电子系统的异常检测方法
CN111368290B (zh) * 2018-12-26 2023-06-09 中兴通讯股份有限公司 一种数据异常检测方法、装置及终端设备
CN109960631B (zh) * 2019-03-19 2020-01-03 山东九州信泰信息科技股份有限公司 一种安全事件异常的实时侦测方法
CN110413345A (zh) * 2019-07-26 2019-11-05 云湾科技(嘉兴)有限公司 程序验证方法、装置、计算设备及计算机存储介质
CN111526164B (zh) * 2020-07-03 2020-10-30 北京每日优鲜电子商务有限公司 一种用于电商平台的网络攻击检测方法及系统
CN112036622B (zh) * 2020-08-18 2023-12-26 国网上海能源互联网研究院有限公司 一种基于图谱分析确定配电终端运行状态的方法及系统
CN112506748B (zh) * 2021-02-04 2021-07-09 连连(杭州)信息技术有限公司 一种异常日志分析方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1777119A (zh) * 2005-12-06 2006-05-24 南京邮电大学 一种类似生物免疫机制的入侵检测方法
CN101051953A (zh) * 2007-05-14 2007-10-10 中山大学 基于模糊神经网络的异常检测方法
US20080086434A1 (en) * 2006-10-09 2008-04-10 Radware, Ltd. Adaptive Behavioral HTTP Flood Protection
CN104113544A (zh) * 2014-07-18 2014-10-22 重庆大学 基于模糊隐条件随机场模型的网络入侵检测方法及系统
CN104955149A (zh) * 2015-06-10 2015-09-30 重庆邮电大学 基于模糊规则更新的室内wlan被动入侵检测定位方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1777119A (zh) * 2005-12-06 2006-05-24 南京邮电大学 一种类似生物免疫机制的入侵检测方法
US20080086434A1 (en) * 2006-10-09 2008-04-10 Radware, Ltd. Adaptive Behavioral HTTP Flood Protection
CN101051953A (zh) * 2007-05-14 2007-10-10 中山大学 基于模糊神经网络的异常检测方法
CN104113544A (zh) * 2014-07-18 2014-10-22 重庆大学 基于模糊隐条件随机场模型的网络入侵检测方法及系统
CN104955149A (zh) * 2015-06-10 2015-09-30 重庆邮电大学 基于模糊规则更新的室内wlan被动入侵检测定位方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113472721A (zh) * 2020-03-31 2021-10-01 华为技术有限公司 一种网络攻击检测方法及装置

Also Published As

Publication number Publication date
CN107153584A (zh) 2017-09-12

Similar Documents

Publication Publication Date Title
WO2017148196A1 (zh) 异常检测方法及装置
CN109951547B (zh) 事务请求并行处理方法、装置、设备和介质
US10148674B2 (en) Method for semi-supervised learning approach to add context to malicious events
US9977897B2 (en) System and method for detecting stack pivot programming exploit
Smith et al. Demographic model selection using random forests and the site frequency spectrum
Wang et al. Blockeye: Hunting for defi attacks on blockchain
US20170091461A1 (en) Malicious code analysis method and system, data processing apparatus, and electronic apparatus
US11163877B2 (en) Method, server, and computer storage medium for identifying virus-containing files
JP6697123B2 (ja) プロファイル生成装置、攻撃検知装置、プロファイル生成方法、および、プロファイル生成プログラム
CN109413016B (zh) 一种基于规则的报文检测方法和装置
US9971892B2 (en) Method, apparatus and computer device for scanning information to be scanned
CN113486334A (zh) 网络攻击预测方法、装置、电子设备及存储介质
WO2019136850A1 (zh) 风险行为识别方法、存储介质、设备及系统
CN112532455B (zh) 一种异常根因定位方法及装置
US20150058272A1 (en) Event correlation detection system
CN112070161B (zh) 一种网络攻击事件分类方法、装置、终端及存储介质
CN113282920A (zh) 日志异常检测方法、装置、计算机设备和存储介质
CN113065748A (zh) 业务风险评估方法、装置、设备及存储介质
CN109255238B (zh) 终端威胁检测与响应方法及引擎
WO2022111688A1 (zh) 人脸活体检测方法、装置及存储介质
CN107622214B (zh) 基于蚁群的硬件木马优化测试向量生成方法
JPWO2020065737A1 (ja) 影響範囲推定装置、影響範囲推定方法、及びプログラム
CN109949867B (zh) 一种多条序列比对算法的优化方法和系统、存储介质
CN104035866B (zh) 基于系统调用分析的软件行为评估方法和装置
CN114978616B (zh) 风险评估系统的构建方法及装置、风险评估方法及装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16892374

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16892374

Country of ref document: EP

Kind code of ref document: A1