CN113901455A - Abnormal operation behavior detection method, device, equipment and medium - Google Patents

Abnormal operation behavior detection method, device, equipment and medium Download PDF

Info

Publication number
CN113901455A
CN113901455A CN202111194488.2A CN202111194488A CN113901455A CN 113901455 A CN113901455 A CN 113901455A CN 202111194488 A CN202111194488 A CN 202111194488A CN 113901455 A CN113901455 A CN 113901455A
Authority
CN
China
Prior art keywords
sequence
occurrence probability
abnormal
operation occurrence
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111194488.2A
Other languages
Chinese (zh)
Inventor
刘柱
鲍青波
张楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202111194488.2A priority Critical patent/CN113901455A/en
Publication of CN113901455A publication Critical patent/CN113901455A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present disclosure relates to a method, apparatus, device, and medium for detecting abnormal operation behavior, the method comprising: acquiring a plurality of command sequence streams in the same user group; extracting text characteristics of the command sequence stream to obtain a plurality of byte fragment sequences; calculating the operation occurrence probability of the byte fragment sequence through a hidden Markov model; isolating abnormal target operation occurrence probability from the operation occurrence probability by using an isolated forest model; and determining the operation behavior of the shell command corresponding to the target operation occurrence probability as abnormal operation behavior. The method and the device can effectively improve the efficiency and accuracy of abnormal operation behavior detection.

Description

Abnormal operation behavior detection method, device, equipment and medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for detecting abnormal operation behavior.
Background
Malicious attacks cause an unauthorized attacker to gain access to the target computer, causing further damage. The intrusion detection technology is an important network security detection technology, and can discover the attack intrusion behavior in a computer system.
At present, the intrusion detection method mainly comprises the following steps: and matching the information with historical abnormal behavior information for multiple times by acquiring the behavior information of the user so as to judge the abnormal behavior information. However, the multiple matching processes of the method are in a serial relationship, so that the judgment result is too absolute, and the judgment result of abnormal behavior is not accurate; meanwhile, the historical abnormal behavior information depends on manual labeling, so that a large amount of manpower is consumed, and due to the difference of the personnel evaluation standards, the judgment error of the abnormal behavior can be caused. Therefore, the abnormal behavior determined by the intrusion detection method has the problem of low accuracy.
Disclosure of Invention
In order to solve the technical problem described above or at least partially solve the technical problem, the present disclosure provides an abnormal operation behavior detection method, apparatus, device, and medium.
The present disclosure provides an abnormal operation behavior detection method, including:
acquiring a plurality of command sequence streams in the same user group; wherein the command sequence stream comprises a plurality of shell commands;
extracting text characteristics of the command sequence stream to obtain a plurality of byte fragment sequences; the byte fragment sequence is used for representing the behavior habit of a user for operating the shell command;
calculating the operation occurrence probability of the byte fragment sequence through a hidden Markov model;
isolating abnormal target operation occurrence probability from the operation occurrence probability by utilizing an isolated forest model;
and determining the operation behavior of the shell command corresponding to the target operation occurrence probability as abnormal operation behavior.
Optionally, the obtaining multiple command sequence streams in the same user group includes:
recording a plurality of shell commands executed by a user through a bash program in a linux system;
sequencing the plurality of shell commands according to the sequence of the execution time to obtain an initial sequence flow;
and according to the category of operators and the size of a preset time window, carrying out group division on the initial sequence stream to obtain a plurality of user groups, wherein each user group comprises a plurality of command sequence streams.
Optionally, before the calculating, by the hidden markov model, the operation occurrence probability of the byte fragment sequence, the method further includes: and encoding the byte fragment sequence by constructing a dictionary.
Optionally, the method further includes:
selecting a sample from the byte fragment sequence as an observation sequence of a hidden Markov model;
identifying the observation sequence according to the current model parameters through the hidden Markov model so as to output a state sequence; wherein the model parameters include: an initial state probability vector, a state transition probability matrix and an observation probability matrix;
and updating the model parameters of the hidden Markov model by adopting a Baum-Welch algorithm according to the state sequence.
Optionally, the isolating the abnormal target operation occurrence probability from the operation occurrence probabilities by using the isolated forest model includes:
calculating the average path length of each operation occurrence probability in the isolated forest model;
calculating the abnormal score of each operation occurrence probability according to the average path length of each operation occurrence probability in the isolated forest model;
judging whether the abnormal score exceeds a preset score threshold value;
and if so, determining the operation occurrence probability as the abnormal target operation occurrence probability.
Optionally, the method further includes:
the following split tree set-up operations were repeated: randomly selecting a part of samples from the operation occurrence probability, and taking the selected samples as root nodes of the tree; selecting a partition attribute randomly from a plurality of attributes which serve as a feature point and correspond to a plurality of dimensions of the operation occurrence probability, selecting a value in the maximum value range of the partition attribute as the standard value of a left sub-tree and a right sub-tree, and establishing a separation tree; wherein the dimension is related to a size of a sliding window of the N-gram;
and obtaining a plurality of separation trees according to the establishing operation of the separation trees, wherein the plurality of separation trees form the isolated forest model.
Optionally, the extracting text features from the command sequence stream includes: and extracting text characteristics from the command sequence stream through a Chinese language model N-gram.
The present disclosure provides an abnormal operation behavior detection apparatus, including:
the acquisition module is used for acquiring a plurality of command sequence streams in the same user group; wherein the command sequence stream comprises a plurality of shell commands;
the extraction module is used for extracting text characteristics of the command sequence stream to obtain a plurality of byte fragment sequences; the byte fragment sequence is used for representing the behavior habit of a user for operating the shell command;
a calculation module for calculating the operation occurrence probability of the byte segment sequence by a hidden Markov model;
the isolation module is used for isolating the abnormal target operation occurrence probability from the operation occurrence probability by utilizing an isolated forest model;
and the determining module is used for determining the operation behavior of the shell command corresponding to the target operation occurrence probability as the abnormal operation behavior.
The present disclosure provides an electronic device, the electronic device including:
a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instructions from the memory and executing the instructions to realize the method.
The present disclosure provides a computer-readable storage medium having stored thereon a computer program for executing the above-mentioned method.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
the abnormal operation behavior detection method, device, equipment and medium provided by the embodiment of the disclosure comprise the following steps: firstly, extracting text characteristics of an obtained command sequence stream to obtain a plurality of byte fragment sequences; then calculating the operation occurrence probability of the byte fragment sequence through a hidden Markov model; and isolating the abnormal target operation occurrence probability from the operation occurrence probability by using the isolated forest model; and finally, determining the operation behavior of the shell command corresponding to the target operation occurrence probability as abnormal operation behavior.
Compared with the prior art, the technical scheme does not need to label data, can directly use the hidden Markov model and the isolated forest model to detect abnormal operation behaviors in an application environment with deficient label data, and improves the universality of the application range of intrusion detection. The method has the advantages that the probability relation among the operation behaviors is considered through the characteristic combination of the hidden Markov model to the behavior habit, so that the judgment of the abnormal operation behaviors is more accurate; and the processing efficiency and the detection performance are improved through the parallel calculation of the isolated forest model. Therefore, the technical scheme can effectively improve the efficiency and accuracy of abnormal operation behavior detection.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart of a method for detecting abnormal operation behavior according to an embodiment of the disclosure;
FIG. 2 is a flow chart of a model training method according to an embodiment of the present disclosure;
fig. 3 is a block diagram of an abnormal operation behavior detection apparatus according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The main intrusion detection method at present is to match the behavior information of a user with historical abnormal behavior information for many times, specifically, to screen out candidate abnormal behaviors through first matching, and then to perform second matching on a sequence corresponding to the candidate abnormal behaviors, so as to finally determine the abnormal behavior information. However, this solution has the following problems:
first, historical abnormal behavior information needs to be acquired, the information needs to be labeled manually, a lot of manpower is consumed, and due to the difference of evaluation criteria of personnel, an abnormal behavior judgment error can be caused. Secondly, the method uses a serial relation in the judgment flow, is too absolute in judgment, judges that the two times of matching are satisfied as abnormal, and does not consider the fusion of the two times of matching information. Thirdly, the method calculates the behavior information and the behavior sequence by character string matching calculation, only considering the occurrence or non-occurrence of the behavior information and the behavior sequence, and not considering the probability of the occurrence of the abnormal information, and if the frequency of the occurrence of a certain abnormal information in the historical abnormal library is higher, the probability of the abnormal behavior information matched with the abnormal behavior information is higher, but the method does not consider the abnormal behavior information.
In recent years, intrusion detection using the Shell command as audit data has been applied and researched more. In the Linux system, each legitimate user has a style and habit of its own particular use command. Because the execution content and the execution sequence of the Shell command are influenced by the working content and the habits of the executor, the intruder is used as a foreign person, and the execution content and the execution sequence of the command are definitely distinguished from the user group of the original host, so that the individual intruder and the original host group can be distinguished through the group analysis technology, the abnormal operation behavior of the intruder is detected, and the intruder is further determined.
Based on the above, aiming at the problems of the existing intrusion detection technology, the present disclosure considers shell command data based on user behavior audit, and provides a method, an apparatus, a device and a medium for detecting abnormal operation behaviors based on a group analysis technology.
Referring to the flowchart of the abnormal operation behavior detection method shown in fig. 1, the abnormal operation behavior detection method provided in this embodiment includes the following steps:
step S102, obtaining a plurality of command sequence streams in the same user group; wherein the command sequence stream comprises a plurality of shell commands.
In the embodiment, a Linux Shell command executed by a user in history is used as audit data, and a plurality of Shell commands executed by the user are recorded through a BASH program in a Linux system; the BASH program in the Linux system can automatically record all commands executed by each user on the Shell terminal on the system, the Shell commands are recorded in a specified file of the user, and the Shell commands are collected from the specified file.
Obtaining a plurality of shell commands executed by different user histories, discretizing the shell commands, and comprising the following steps: sequencing the multiple shell commands according to the sequence of the execution time to obtain an initial sequence flow; and according to the category of operators and the size of a preset time window, carrying out group division on the initial sequence stream to obtain a plurality of user groups, wherein each user group comprises a plurality of command sequence streams.
In this embodiment, in order to analyze the behavior habit of the user to execute the shell command historically, the shell command needs to be preprocessed into a command sequence stream. The original shell command comprises a command name, command optional parameters, a command execution object and the like, and as the command name is representative, the command name is reserved and a plurality of shell commands are sequenced in time. In practical applications, the user is grouped into groups, such as front-end, back-end, operation and maintenance groups, and based on this, the command sequence flow is also grouped. In this embodiment, the feature extraction may be performed on the initial sequence stream of the sequenced shell commands by taking 100 commands as the time window size, so as to obtain the operator category of each shell command, and then the shell commands in the initial sequence stream are grouped according to the operator category, for example, the shell commands are correspondingly divided into a plurality of command sequence streams corresponding to each of the operation and maintenance group, the front end group, the rear end group, and the like.
For a plurality of command sequence streams in the same user group, through mutual comparison among the individual command sequence streams in the group, if the command sequence stream comparison of a certain individual is abnormal and the operation is different from that of the whole group, some abnormal individuals can be found, namely, the group analysis abnormal detection.
Step S104, extracting text characteristics of the command sequence flow to obtain a plurality of byte fragment sequences; the byte fragment sequence is used for representing the behavior habit of a user for operating the shell command.
In one embodiment, the extraction of text features is performed on the command sequence stream by N-gram (Chinese language model). In this embodiment, a feature extraction method commonly used by N-grams is adopted to extract text features from the command sequence stream, so that not only can the operation type of the user be extracted, but also features such as the order of the command sequence stream and the operation frequency of the command sequence stream can be extracted, the fusion of multi-latitude features is realized, and the behavior habit of the user can be reflected.
And step S106, calculating the operation occurrence probability of the byte fragment sequence through a hidden Markov model.
The hidden Markov model is a dynamic Bayesian network, which can perform statistical modeling on byte segment sequences in time span and calculate the probability of different operations corresponding to the byte segment sequences under the condition of known model parameters. In the embodiment, the hidden markov model is used for fusing the characteristics of the byte segment sequence, such as sequence order, operation frequency and the like, and calculating the occurrence probability of the byte segment sequence, and the probability of the occurrence of the abnormal operation behavior is low due to the high occurrence probability of the normal operation behavior, so that the abnormal operation behavior can be further comprehensively judged according to the calculated occurrence probability of the operation. In the embodiment, the characteristics of behavior habits are combined through the hidden Markov model, and the probability relation among the operation behaviors is considered, so that the judgment of the abnormal operation behaviors is more accurate.
And S108, isolating the abnormal target operation occurrence probability from the operation occurrence probability by using the isolated forest model.
In this embodiment, the input of the isolated forest model is the operation occurrence probability of the sequence of the plurality of byte segments output by the hidden markov model, and the output of the isolated forest model is abnormal data in the operation occurrence probability, that is, the target operation occurrence probability. The isolated forest model is a model for detecting abnormal values in the unsupervised learning category, and isolates the abnormal target operation occurrence probability from the operation occurrence probability by utilizing the characteristics of small number of abnormal points and large difference between abnormal data and normal data.
The operation occurrence probability of the byte fragment sequence is judged abnormally through the isolated forest model, the calculation efficiency is improved due to the parallel relation of all trees in the isolated forest model, and the detection efficiency of abnormal operation behaviors is improved.
And step S110, determining the operation behavior of the shell command corresponding to the target operation occurrence probability as abnormal operation behavior. In specific implementation, a byte fragment sequence corresponding to the target operation occurrence probability may be determined, a shell command corresponding to the byte fragment sequence may be determined, and an operation behavior of the shell command may be determined to be an abnormal operation behavior.
The abnormal operation behavior detection method provided by the embodiment of the disclosure comprises the following steps: firstly, extracting text characteristics of an obtained command sequence stream to obtain a plurality of byte fragment sequences; then calculating the operation occurrence probability of the byte fragment sequence through a hidden Markov model; and isolating the abnormal target operation occurrence probability from the operation occurrence probability by using the isolated forest model; and finally, determining the operation behavior of the shell command corresponding to the target operation occurrence probability as abnormal operation behavior.
Compared with the prior art, the technical scheme does not need to label data, can directly use the hidden Markov model and the isolated forest model to detect abnormal operation behaviors in an application environment with deficient label data, and improves the universality of the application range of intrusion detection. The method has the advantages that the probability relation among the operation behaviors is considered through the characteristic combination of the hidden Markov model to the behavior habit, so that the judgment of the abnormal operation behaviors is more accurate; and the processing efficiency and the detection performance are improved through the parallel calculation of the isolated forest model. Therefore, the technical scheme can effectively improve the efficiency and accuracy of abnormal operation behavior detection.
In order to better understand the abnormal operation behavior detection method, the following detailed description is made on the embodiments of the present disclosure.
For the above step S104, the present embodiment uses N-gram to extract text features from the command sequence stream. The N-gram is an algorithm based on a statistical language model, and performs sliding window operation with the size of N on a command sequence stream of a text type according to bytes to form a byte fragment sequence with the length of N. This example uses 1-gram, 2-gram, and 3-gram for sequence feature extraction. Assuming that the original shell sequence is ls, cd, mv, cat, then the corresponding 1-gram extraction results are: the extraction results of { ls }, { cd }, { mv }, { cat }, 2-gram are: the extraction results of { ls, cd }, { cd, mv }, { mv, cat }, and 3-gram are { ls, cd, mv }, { cd, mv, cat }.
In order to input the byte fragment sequence extracted by the N-gram to the hidden markov model, the byte fragment sequence needs to be encoded into numbers, and in this embodiment, the byte fragment sequence is encoded by constructing a dictionary, and the shell command is mapped into numbers, for example, ls is mapped into 1, cd is mapped into 2, mv is mapped into 3, and cat is mapped into 4, in which case, the extraction result { ls }, { cd }, { mv }, and { cat } of the 1-gram is converted into {1}, {2}, {3}, and {4} respectively.
In order to enable the hidden markov model to be directly used for calculating the operation occurrence probability of the byte fragment sequence, the hidden markov model needs to be trained in advance. Hidden Markov Models (HMMs) are time-sequential probabilistic models that describe the process of randomly generating a random sequence of unobservable states from a hidden markov chain and then generating an observation sequence from each state. The observation sequence in this embodiment is a shell command input by a user, and the hidden state is whether an operation behavior corresponding to the shell command is abnormal. Based on this, as shown in fig. 2, the present embodiment provides a hidden markov model training method, which refers to the following steps:
step S202, selecting a sample from the byte fragment sequence as an observation sequence of the hidden Markov model.
And step S204, recognizing the observation sequence according to the current model parameters through the hidden Markov model so as to output a state sequence. Wherein the model parameter λ comprises: the initial state probability vector pi, the state transition probability matrix A and the observation probability matrix B, namely, the model parameters can be represented by the following ternary symbols: λ ═ a, B, pi. The state sequence can be determined according to the initial state probability vector pi and the state transition probability matrix A, and the observation sequence can be determined according to the observation probability matrix B.
And step S206, updating the model parameters of the hidden Markov model by adopting a Baum-Welch algorithm and according to the state sequence.
Since the embodiment only contains the byte fragment sequence of the observation sequence and does not contain hidden state information, the Baum-Welch algorithm is adopted for model training to learn the A, B and pi parameters in the model parameter lambda. The Baum-Welch algorithm (unsupervised learning algorithm, also known as EM algorithm) is a commonly used method for estimating hidden variables of parameters, which is an iterative algorithm. The hidden Markov model is trained by adopting an unsupervised Baum-Welch algorithm, so that the marking work of related abnormal operation behaviors is reduced, and the human resources are saved.
After a model parameter lambda (A, B, pi) of the hidden Markov model is calculated through a Baum-Welch algorithm, a byte fragment sequence corresponding to a shell command is input into the hidden Markov model as an observation sequence, and an operation occurrence probability P (O | lambda) of the observation sequence is calculated according to the model parameter, wherein the operation occurrence probability P (O | lambda) can represent the abnormal degree of the byte fragment sequence.
Considering that the abnormal operation behavior is judged one-sidedly by using the operation occurrence probability, the accuracy is low, and based on this, the abnormal target operation occurrence probability is further isolated from the operation occurrence probability by using the isolated forest model so as to judge the abnormal operation behavior more accurately.
The embodiment provides a method for establishing an isolated forest model, which comprises the following steps:
the following split tree set-up operations were repeated: randomly selecting a part of samples from the operation occurrence probability, and taking the selected samples as root nodes of the tree; and taking a plurality of attributes of a plurality of dimensions corresponding to the operation occurrence probability as a characteristic point, randomly selecting a partition attribute from the plurality of attributes, selecting a value in the most value range of the partition attribute as the standard values of the left and right subtrees, and establishing a separation tree. The above-mentioned dimension is related to the size of the sliding window of the N-gram, such as 1-gram, 2-gram, and 3-gram in the foregoing embodiments, and the corresponding operation probability has three dimensions, i.e., N-1, N-2, and N-3. And obtaining a plurality of separation trees according to the establishing operation of the separation trees, wherein the plurality of separation trees form an isolated forest model.
Specifically, in the present embodiment, the input of the isolated forest model is the operation occurrence probability of the hidden markov model output, that is, the input is P (O | λ). Since the 1-gram, 2-gram and 3-gram features are respectively extracted when the text features are extracted from the command sequence stream, the output of the hidden Markov model is { f1, f2, f3}, and f1, f2 and f3 respectively represent the operation occurrence probabilities corresponding to the byte fragment sequences extracted from the 1-gram, the 2-gram and the 3-gram. In this case, the input of the isolated forest model is the operation occurrence probability in three dimensions. And the isolated forest target constructs a binary tree by randomly selecting the partition attribute and the attribute value until reaching a preset tree depth, for example, f1 is selected as the partition attribute, 0.5 is selected as the partition attribute value, so that the operation occurrence probability corresponding to f1 is less than 0.5 and divided into the left sub-tree and the operation occurrence probability corresponding to f1 is greater than 0.5 and divided into the right sub-tree.
The node positions of { f1, f2, f3} output by the hidden Markov model in the isolated forest model are called feature points, and since the abnormal points are sparse in the feature space and the normal points are dense in the feature space, when the isolated forest model is used for dividing, the abnormal points can be divided by fewer dividing times, so that the node positions of the abnormal points in the isolated forest are close to the root node, the distance from the root node is shorter, and conversely, the distance from the normal points to the root node is longer.
When the abnormal target operation occurrence probability is isolated from the operation occurrence probability by using the isolated forest model, the method can be specifically realized by the following steps:
(1) and calculating the average path length of each operation occurrence probability in the isolated forest model.
(2) Calculating the abnormal score of each operation occurrence probability according to the average path length of each operation occurrence probability in the isolated forest model by referring to the following formula:
Figure BDA0003302501250000131
wherein E (h (x)) represents the average path length of the operation occurrence probability x in the isolated forest model, and c represents the average path length of the binary tree c in the isolated forest model.
(3) Judging whether the abnormal score exceeds a preset score threshold value or not;
(4) if the operation occurrence probability exceeds the preset threshold, determining the operation occurrence probability as the abnormal target operation occurrence probability; and further determining the operation behavior of the shell command corresponding to the target operation occurrence probability as abnormal operation behavior. And if not, indicating that the operation behavior of the shell command corresponding to the operation occurrence probability is a normal operation behavior.
After determining the abnormal operation behavior, the present embodiment may further determine a user corresponding to the abnormal operation behavior.
In summary, according to the abnormal operation behavior detection method provided by the embodiment of the disclosure, the hidden markov model is used to perform feature fusion on the command sequence flow of the shell command, so that not only can sequence information be extracted, but also operation frequency information of the sequence can be extracted, and the fusion of multi-latitude features is realized; and finally, judging the abnormal operation behaviors through the isolated forest model, wherein the parallel relation of all trees of the isolated forest model enables the calculation efficiency to be improved, and further the detection efficiency of the abnormal operation behaviors is improved.
The present embodiment provides an abnormal operation behavior detection apparatus, which is configured to implement the above abnormal operation behavior detection method, as shown in fig. 3, and includes:
an obtaining module 302, configured to obtain multiple command sequence streams in the same user group; wherein the command sequence stream comprises a plurality of shell commands;
an extracting module 304, configured to extract text features from the command sequence stream to obtain a plurality of byte segment sequences; the byte fragment sequence is used for representing the behavior habit of a user for operating the shell command;
a calculating module 306, configured to calculate an operation occurrence probability of the byte segment sequence through a hidden markov model;
an isolating module 308, configured to isolate an abnormal target operation occurrence probability from the operation occurrence probabilities by using an isolated forest model;
and the determining module 310 is configured to determine the operation behavior of the shell command corresponding to the target operation occurrence probability as an abnormal operation behavior.
The device provided by the embodiment has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 4, the electronic device 400 includes one or more processors 401 and memory 402.
The processor 401 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 400 to perform desired functions.
Memory 402 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 401 to implement the abnormal operation behavior detection method of the embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 400 may further include: an input device 403 and an output device 404, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 403 may also include, for example, a keyboard, a mouse, and the like.
The output device 404 may output various information to the outside, including the determined distance information, direction information, and the like. The output devices 404 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.
Of course, for simplicity, only some of the components of the electronic device 400 relevant to the present disclosure are shown in fig. 4, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 400 may include any other suitable components depending on the particular application.
Further, the present embodiment also provides a computer-readable storage medium, in which a computer program is stored, and the computer program is used for executing the above abnormal operation behavior detection method.
The method, the apparatus, the electronic device, and the computer program product of the medium for detecting abnormal operation behaviors provided in the embodiments of the present disclosure include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An abnormal operation behavior detection method, comprising:
acquiring a plurality of command sequence streams in the same user group; wherein the command sequence stream comprises a plurality of shell commands;
extracting text characteristics of the command sequence stream to obtain a plurality of byte fragment sequences; the byte fragment sequence is used for representing the behavior habit of a user for operating the shell command;
calculating the operation occurrence probability of the byte fragment sequence through a hidden Markov model;
isolating abnormal target operation occurrence probability from the operation occurrence probability by utilizing an isolated forest model;
and determining the operation behavior of the shell command corresponding to the target operation occurrence probability as abnormal operation behavior.
2. The method of claim 1, wherein obtaining multiple command sequence streams within the same user group comprises:
recording a plurality of shell commands executed by a user through a bash program in a linux system;
sequencing the plurality of shell commands according to the sequence of the execution time to obtain an initial sequence flow;
and according to the category of operators and the size of a preset time window, carrying out group division on the initial sequence stream to obtain a plurality of user groups, wherein each user group comprises a plurality of command sequence streams.
3. The method of claim 1, wherein prior to said calculating the probability of occurrence of an operation on the sequence of byte fragments by a hidden markov model, the method further comprises:
and encoding the byte fragment sequence by constructing a dictionary.
4. The method of claim 1, further comprising:
selecting a sample from the byte fragment sequence as an observation sequence of a hidden Markov model;
identifying the observation sequence according to the current model parameters through the hidden Markov model so as to output a state sequence; wherein the model parameters include: an initial state probability vector, a state transition probability matrix and an observation probability matrix;
and updating the model parameters of the hidden Markov model by adopting a Baum-Welch algorithm according to the state sequence.
5. The method as claimed in claim 1, wherein the isolating of the abnormal target operation occurrence probability from the operation occurrence probabilities by using the isolated forest model comprises:
calculating the average path length of each operation occurrence probability in the isolated forest model;
calculating the abnormal score of each operation occurrence probability according to the average path length of each operation occurrence probability in the isolated forest model;
judging whether the abnormal score exceeds a preset score threshold value;
and if so, determining the operation occurrence probability as the abnormal target operation occurrence probability.
6. The method of claim 1, further comprising:
the following split tree set-up operations were repeated: randomly selecting a part of samples from the operation occurrence probability, and taking the selected samples as root nodes of the tree; selecting a partition attribute randomly from a plurality of attributes which serve as a feature point and correspond to a plurality of dimensions of the operation occurrence probability, selecting a value in the maximum value range of the partition attribute as the standard value of a left sub-tree and a right sub-tree, and establishing a separation tree; wherein the dimension is related to a size of a sliding window of the N-gram;
and obtaining a plurality of separation trees according to the establishing operation of the separation trees, wherein the plurality of separation trees form the isolated forest model.
7. The method of claim 1, wherein the extracting text features from the command sequence stream comprises:
and extracting text characteristics from the command sequence stream through a Chinese language model N-gram.
8. An abnormal operation behavior detection apparatus, characterized by comprising:
the acquisition module is used for acquiring a plurality of command sequence streams in the same user group; wherein the command sequence stream comprises a plurality of shell commands;
the extraction module is used for extracting text characteristics of the command sequence stream to obtain a plurality of byte fragment sequences; the byte fragment sequence is used for representing the behavior habit of a user for operating the shell command;
a calculation module for calculating the operation occurrence probability of the byte segment sequence by a hidden Markov model;
the isolation module is used for isolating the abnormal target operation occurrence probability from the operation occurrence probability by utilizing an isolated forest model;
and the determining module is used for determining the operation behavior of the shell command corresponding to the target operation occurrence probability as the abnormal operation behavior.
9. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program for performing the method of any of the preceding claims 1-7.
CN202111194488.2A 2021-10-13 2021-10-13 Abnormal operation behavior detection method, device, equipment and medium Pending CN113901455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111194488.2A CN113901455A (en) 2021-10-13 2021-10-13 Abnormal operation behavior detection method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111194488.2A CN113901455A (en) 2021-10-13 2021-10-13 Abnormal operation behavior detection method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN113901455A true CN113901455A (en) 2022-01-07

Family

ID=79191935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111194488.2A Pending CN113901455A (en) 2021-10-13 2021-10-13 Abnormal operation behavior detection method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113901455A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969738A (en) * 2022-05-27 2022-08-30 天翼爱音乐文化科技有限公司 Interface abnormal behavior monitoring method, system, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969738A (en) * 2022-05-27 2022-08-30 天翼爱音乐文化科技有限公司 Interface abnormal behavior monitoring method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN107070852B (en) Network attack detection method and device
CN111027069B (en) Malicious software family detection method, storage medium and computing device
EP3819785A1 (en) Feature word determining method, apparatus, and server
CN109063055B (en) Method and device for searching homologous binary files
CN111343203B (en) Sample recognition model training method, malicious sample extraction method and device
CN111198948A (en) Text classification correction method, device and equipment and computer readable storage medium
CN110175851B (en) Cheating behavior detection method and device
CN112511546A (en) Vulnerability scanning method, device, equipment and storage medium based on log analysis
CN115221516B (en) Malicious application program identification method and device, storage medium and electronic equipment
CN111931935A (en) Network security knowledge extraction method and device based on One-shot learning
KR102516454B1 (en) Method and apparatus for generating summary of url for url clustering
CN113901455A (en) Abnormal operation behavior detection method, device, equipment and medium
CN109783805B (en) Network community user identification method and device and readable storage medium
KR20200063067A (en) Apparatus and method for validating self-propagated unethical text
CN114266251A (en) Malicious domain name detection method and device, electronic equipment and storage medium
CN112395881B (en) Material label construction method and device, readable storage medium and electronic equipment
CN112257757A (en) Malicious sample detection method and system based on deep learning
CN111797395A (en) Malicious code visualization and variety detection method, device, equipment and storage medium
CN111783088A (en) Malicious code family clustering method and device and computer equipment
CN109376531B (en) Web intrusion detection method based on semantic recoding and feature space separation
CN112163217B (en) Malware variant identification method, device, equipment and computer storage medium
CN113688240B (en) Threat element extraction method, threat element extraction device, threat element extraction equipment and storage medium
CN114528908A (en) Network request data classification model training method, classification method and storage medium
CN114266045A (en) Network virus identification method and device, computer equipment and storage medium
CN113836300A (en) Log analysis method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination