CN115378702B - Attack detection system based on Linux system call - Google Patents

Attack detection system based on Linux system call Download PDF

Info

Publication number
CN115378702B
CN115378702B CN202211004258.XA CN202211004258A CN115378702B CN 115378702 B CN115378702 B CN 115378702B CN 202211004258 A CN202211004258 A CN 202211004258A CN 115378702 B CN115378702 B CN 115378702B
Authority
CN
China
Prior art keywords
sequence
detection
system call
attack
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211004258.XA
Other languages
Chinese (zh)
Other versions
CN115378702A (en
Inventor
万邦睿
何雨多
钱鹰
黄江平
金霜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211004258.XA priority Critical patent/CN115378702B/en
Publication of CN115378702A publication Critical patent/CN115378702A/en
Application granted granted Critical
Publication of CN115378702B publication Critical patent/CN115378702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Abstract

The invention belongs to the technical field of computer security, and particularly relates to an attack detection method and system based on Linux system call, comprising the following steps: acquiring a system call generated by a system, intercepting the system call sequence into subsequences with equal length as a sequence to be detected, and converting the subsequences into a detection sequence in the form of word vectors; initially judging the category of the detection sequence in the form of the word vector by a deep learning detection model, if the detection sequence is judged to be an abnormal sequence, putting the sequence into an attack library, updating a detection matching library, and if the detection sequence is judged to be a normal sequence; comparing the sequence which is preliminarily judged to be normal with the detection matching library in matching degree; and judging the sequence of which the class cannot be judged by adopting cluster calculation to judge the matching library, thereby obtaining a detection result. The invention adopts a deep learning model and a matching library mode for derivative attack detection, and adopts a cluster detection mode for unknown attack, thereby solving the problems of redundant call of a system call sequence, overlong sequence and missing report rate of intrusion detection.

Description

Attack detection system based on Linux system call
Technical Field
The invention belongs to the technical field of computer security, and particularly relates to an attack detection system based on Linux system call.
Background
The current 'belief-creating' industry develops rapidly, and domestic operating systems with Linux as kernels are gradually used by more and more individuals and enterprises by virtue of unique advantages of the domestic operating systems, but malicious attacks to the domestic operating systems are increased, so that intrusion detection researches based on the Linux systems are not easy to develop.
The Linux system call is an application programming interface realized by a Linux kernel and is used for the interaction between an application program and the kernel, namely, the application program depending on the Linux system can interact with the kernel through the system call when running. And the Linux system call sequences generated by the normal operation and the malicious operation have obvious differences, so the invention can develop the research of the intrusion detection system based on the Linux system call.
The existing attack detection mode based on Linux system call is realized by analyzing the system call of a program, wherein the method mainly comprises the steps of analyzing the collected Linux system call sequence, establishing a normal system call base line or establishing an abnormal system call characteristic spectrum, and judging that the sequence has aggressiveness when the behavior different from the normal system call base line or the abnormal call characteristic is matched.
The prior art has the following problems:
(1) The attack detection research aiming at Linux system call is mainly based on derivative attack detection of known attack types, so that a certain report missing rate exists in the current intrusion detection system.
(2) Attack detection research on Linux system call is mainly in theory, and in the practical application situation, the Linux system call sequence has the problems of redundant call, overlong sequence and the like, and the problem is lack of treatment.
Disclosure of Invention
In order to solve the technical problems, the invention provides an attack detection system based on Linux system call, which comprises:
an attack detection method based on Linux system call comprises the following steps of
S1: acquiring a system call generated in the running process of a system, and acquiring an attack sequence and a normal sequence from an existing database to construct a data set;
s2: removing redundant system call in call generated in the system operation process, generating a system call sequence in a specified time period, intercepting the system call sequence into equal-length subsequences, and intercepting sequences in a data set into equal-length subsequences;
s3: dividing equal-length subsequences in a data set into an attack sequence and a normal sequence according to data types, and respectively storing the attack sequence and the normal sequence into two sequence libraries to obtain a detection sequence matching library;
s4: intercepting a system call sequence into subsequences with equal length as a sequence to be detected, and converting the subsequences into a detection sequence in a word vector form;
s5: initially judging the category of the detection sequence in the form of the word vector by a deep learning model, if the detection sequence is judged to be an abnormal sequence, putting the sequence into an attack library, updating a detection matching library, and if the detection sequence is judged to be a normal sequence, carrying out further judgment;
s6: comparing the initially determined sequence with the detection matching library in matching degree, and determining the category of the sequence;
s7: and judging the sequence of which the class cannot be judged by adopting cluster calculation to judge the matching library, thereby obtaining a detection result.
Preferably, the system call sequence is intercepted into subsequences with equal length, which specifically comprises:
removing redundant system call by adopting a statistical analysis method, generating a system call sequence, and intercepting the generated system call sequence into a call sequence with fixed length: intercepting a call sequence with the length exceeding a fixed length by adopting a sliding window technology, and filling 0 at the tail part of the generated sequence for filling if the length of the generated sequence does not reach the fixed length.
Preferably, a statistical analysis method is adopted to remove redundant system calls, which specifically comprises the following steps:
according to the sequences of the same type in the dataset, respectively calculating TF-IDF values of the system calls of each of the two types of sequences, respectively sorting the TF-IDF values of each of the two types of sequences in descending order, respectively screening out the last 40 system calls in the two types of sequences, screening out the repeatedly occurring system calls from the last 40 system calls in each type of sequences, comparing and analyzing the two screened system calls, taking the same system call in the two types of sequences as a redundant system call, carrying out statistical analysis on the selected redundant system call and all the system calls generated in the system operation process, finding out the system call which is the same as the selected redundant system call, and removing the system call.
Further, the calculating TF-IDF values of the system calls in each system call sequence is expressed as:
wherein the TF-IDF a1 The TF-IDF value representing system call a1,representing how frequently system call a1 occurs in sequence a, Σa 1,k Representing the total number of sequences in which the system call a1 occurs in the generated sequence k, |n| represents the total number of sequences of normal type or attack type in the generated sequence, | { N } a1 The number of sequences in which the system call a1 occurs in the current class is denoted by } |.
Preferably, the comparison of the matching degree between the sequence which is preliminarily judged to be normal and the sequence in the matching library is carried out, and the type of the sequence is judged, which specifically comprises the following steps:
setting the matching degree to be 0.8, comparing the similarity of the sequence to be detected and the sequences in the two types of matching libraries, if the similarity of the sequence to be detected and the sequence in the normal sequence matching library is greater than 0.8, the sequence is the normal sequence, if the similarity of the sequence to be detected and the sequence in the attack sequence matching library is greater than 0.8, the sequence is the attack sequence, and if the similarity of the sequence to be detected and the sequence in the normal sequence matching library or the sequence in the attack sequence matching library is less than 0.8, the type of the sequence cannot be identified.
Preferably, the performing cluster computation specifically includes:
s1: marking a detection sequence which cannot be identified by the detection unit as seq01, and converting the seq01 into a sequence seqm01 in the form of a word vector;
s2: selecting sequence seq generated by host h1 and seq01 in intranet connection host group in same time period 11 And converted into a sequence seqm in the form of a word vector 11 Calculate its Euclidean distance d (seqm 01 ,seqm 11 );
S3: setting a threshold distance, if d (seqm 01, seqm 11) > distance, determining that the sequence generated by the host is similar to the detection sequence, repeating S2-S3, detecting all host numbers of the similar sequence to the detection sequence, if the hostNumber > =threshold, determining that the detection sequence type is an attack sequence, and if the hostNumber is less than threshold, determining that the detection sequence type is a normal sequence, wherein threshold represents the set threshold of the number of the hosts with the similar sequence.
Further, the Euclidean distance d (seqm 01 ,seqm 11 ) Calculation, shown as:
wherein seqm 01 Representing a detection sequence in the form of a word vector seqm 11 Sequences, m, representing word vector forms generated by the host h1 and the detection sequences in the same time period in the intranet connection host group mi The expression sequence seqm 01 I-th system call, n ni The expression sequence seqm 11 Is the ith system call.
An attack detection system based on Linux system call, comprising: the device comprises a collecting module, a training module and a detecting module;
the collection module includes: the system call acquisition unit, the processing unit and the data transmission unit;
the system call acquisition unit is used for collecting related information called by the execution process of the designated process, acquiring an attack sequence and a normal sequence from the existing database and constructing a data set;
the related information includes: calling time, system calling name, process name and thread name;
the processing unit processes the system call information collected by the system call acquisition unit according to the length or the call time to generate a system call sequence in a specified time period, intercepts the system call sequence into equal-length subsequences, and intercepts sequences in the data set into equal-length subsequences;
the data transmission module transmits the subsequence generated by the system call information processing to the detection module as a detection sequence, and transmits the subsequence in the data set to the training module;
the training module comprises: the device comprises a data unit, a conversion unit and a training unit;
the data unit divides the equal-length subsequences in the data set into an attack sequence and a normal sequence according to the data type, and stores the attack sequence and the normal sequence into two sequence libraries of an attack library and a normal library respectively to obtain a detection sequence matching library;
the conversion unit converts the sequences stored in the sequence library into a word vector matrix;
the training unit adopts a deep learning technology to train the detection system according to the word vector matrix;
the detection module comprises: the device comprises a conversion unit, a detection unit, a matching unit, a calculation unit and an identification unit;
the conversion unit converts the sequence to be detected into a sequence in the form of a word vector according to the form of the word vector generated by the training module conversion unit;
the detection unit performs preliminary classification judgment on the sequence in the form of the word vector converted from the sequence to be detected through the deep learning model, if the sequence is judged to be an abnormal sequence, the sequence is sent to the training module for updating a matching library, and if the sequence is judged to be a normal sequence, the sequence is sent to the matching unit for further judgment;
the matching unit carries out rechecking on the sequence which is preliminarily judged to be normal by the detection unit, compares the matching degree of the sequence which is preliminarily judged to be normal with the sequence in the matching library, and judges the type of the sequence;
when the detection unit cannot identify the type of the detection sequence, the calculation unit calculates the similarity of the sequence generated by the sequence and other hosts in the intranet connection host group at the same time period, judges whether the sequence is similar according to a set similarity threshold value, and judges the sequence as an attack sequence if the sequence is similar to a plurality of hosts in the intranet connection host group, so as to obtain a detection result.
The invention has the beneficial effects that: according to the invention, the known type and unknown type attacks can be detected at the system call sequence level, so that the detected report missing rate is reduced, and the method is more suitable for actual application scenes.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a flow chart of a sequence process of the present invention;
FIG. 3 is a flow chart of the detection of the invention;
FIG. 4 is a schematic diagram of the system of the present invention;
fig. 5 is a system call sample of the system of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
An attack detection method based on Linux system call, as shown in fig. 1, is characterized by comprising the following steps:
s1: acquiring a system call generated in the running process of a system, and acquiring an attack sequence and a normal sequence from an existing database to construct a data set;
s2: removing redundant system call in call generated in the system operation process, generating a system call sequence in a specified time period, intercepting the system call sequence into equal-length subsequences, and intercepting sequences in a data set into equal-length subsequences;
s3: dividing equal-length subsequences in a data set into an attack sequence and a normal sequence according to data types, and respectively storing the attack sequence and the normal sequence into two sequence libraries to obtain a detection sequence matching library;
s4: intercepting a system call sequence into subsequences with equal length as a sequence to be detected, and converting the subsequences into a word vector form;
s5: initially judging the category of the sequence in the form of the word vector by a deep learning model, if the sequence is judged to be an abnormal sequence, putting the sequence into an attack library, updating a detection matching library, and if the sequence is judged to be a normal sequence, carrying out further judgment;
s6: comparing the initially determined sequence with the detection matching library in matching degree, and determining the category of the sequence;
s7: and judging the sequence of which the class cannot be judged by adopting cluster calculation to judge the matching library, thereby obtaining a detection result.
Intercepting the system call sequence into subsequences with equal length, which concretely comprises the following steps:
the information intercepted by the system call acquisition unit is processed into a system call sequence, the system call sequence can be processed according to the length or the call time, namely, the system call name and the generation time in the information are screened out, a fixed-length system call sequence in a specified time period is generated, as shown in fig. 2, redundant system calls are removed in the processing process, an equal-length call sequence with the length of 500 is generated, and the sequence with the length exceeding 500 is intercepted by adopting a sliding window technology, wherein the step length is set to be 250. The sequence length is less than 500 and the waiting time in the collection process exceeds 20 seconds, then 0 is filled in after the sequence for filling.
In this embodiment, the analysis method removes redundant system calls, which specifically includes:
according to the sequences of the same type in the dataset, respectively calculating TF-IDF values of system calls of each sequence in the two types of sequences, respectively sorting the TF-IDF values of each sequence in the two types of sequences in descending order, respectively screening out the last 40 system calls in the two types of sequences according to the smaller the TF-IDF values of the system calls and the smaller the influence of the system calls on the types of the sequences formed, respectively screening out the repeatedly occurring system calls from the last 40 system calls in each type of sequences, comparing and analyzing the two types of screened system calls, taking the same system call in the two types of sequences as a redundant system call, carrying out statistical analysis on the selected redundant system call and all the system calls generated in the system operation process, finding out the system call which is the same as the selected redundant system call, and removing the system call.
The TF-IDF value of the system call in each system call sequence is calculated as follows:
wherein the TF-IDF a1 The TF-IDF value representing system call a1,representing how frequently system call a1 occurs in sequence a, Σa 1,k Representing the total number of sequences in which the system call a1 occurs in the generated sequence k, |n| represents the total number of sequences of normal type or attack type in the generated sequence, | { N } a1 The number of sequences in which the system call a1 occurs in the current class is denoted by } |.
Converting a sequence to be detected into a sequence in the form of a word vector, which specifically comprises the following steps:
the method comprises the steps of regarding a single system call in a sequence as a word, regarding the whole system call sequence as a sentence, setting the dimension of a word vector to be 250D, training the word vector by using a Gensim tool according to different system call numbers in a word vector corpus, inputting a system call sequence set into a word vector model, generating a word vector matrix, and obtaining a sequence in a word vector form.
The method for preliminarily judging the category of the sequence in the form of the word vector through the deep learning model specifically comprises the following steps:
setting a training batch 20, a learning rate 0.01, an input vector dimension 250 dimension, an embedded layer dimension 125 dimension, a hidden layer 125 dimension, an output dimension 125 dimension, an activation function ReLU and a network layer number 2 layer by taking the generated word vector matrix of 500x250 as an input sequence, and taking the last hidden layer output as a classification result; the deep learning detection model includes, but is not limited to, a CNN, RNN, LSTM, GRU deep learning model.
Comparing the sequence which is preliminarily judged to be normal with the sequence in the matching library in matching degree, judging the type of the sequence, as shown in fig. 3, specifically comprising:
setting the matching degree to be 0.8, comparing the similarity of the sequence to be detected and the sequences in the two types of matching libraries, if the similarity of the sequence to be detected and the sequence in the normal sequence matching library is greater than 0.8, the sequence is the normal sequence, if the similarity of the sequence to be detected and the sequence in the attack sequence matching library is greater than 0.8, the sequence is the attack sequence, and if the similarity of the sequence to be detected and the sequence in the normal sequence matching library or the sequence in the attack sequence matching library is less than 0.8, the type of the sequence cannot be identified.
The cluster calculation specifically comprises the following steps:
s1: marking a detection sequence which cannot be identified by the detection unit as seq01, and converting the seq01 into a sequence seqm01 in the form of a word vector;
s2: selecting sequence seq generated by host h1 and seq01 in intranet connection host group in same time period 11 And converted into a sequence seqm in the form of a word vector 11 Calculate its Euclidean distance d (seqm 01 ,seqm 11 );
S3: setting a threshold distance, if d (seqm 01, seqm 11) > distance, determining that the sequence generated by the host is similar to the detection sequence, repeating S2-S3, detecting the number of hosts hostNumber with the similar sequence to the detection sequence, if hostNumber > =threshold, determining that the detection sequence type is an attack sequence, and if hostNumber < threshold, determining that the detection sequence type is a normal sequence, wherein threshold represents the set threshold of the number of hosts with the similar sequence.
The Euclidean distance d (seqm 01 ,seqm 11 ) Calculation, expressed as:
wherein seqm 01 Representing a detection sequence in the form of a word vector seqm 11 Sequences, m, representing word vector forms generated by the host h1 and the detection sequences in the same time period in the intranet connection host group mi The expression sequence seqm 01 I-th system call, n ni The expression sequence seqm 11 Is the ith system call.
An attack detection system based on Linux system call, as shown in fig. 4, includes: the device comprises a collecting module, a training module and a detecting module;
the collection module includes: the system call acquisition unit, the processing unit and the data transmission unit;
the system call acquisition unit is used for collecting related information called by the execution process of the designated process, acquiring an attack sequence and a normal sequence from the existing database and constructing a data set;
the related information includes: calling time, system calling name, process name and thread name;
the processing unit processes the system call information collected by the system call acquisition unit according to the length or the call time to generate a system call sequence in a specified time period, intercepts the system call sequence into equal-length subsequences, and intercepts sequences in the data set into equal-length subsequences;
the data transmission module transmits the subsequence generated by the system call information processing to the detection module as a detection sequence, and transmits the subsequence in the data set to the training module;
the training module comprises: the device comprises a data unit, a conversion unit and a training unit;
the data unit divides the equal-length subsequences in the data set into an attack sequence and a normal sequence according to the data type, and stores the attack sequence and the normal sequence into two sequence libraries of an attack library and a normal library respectively to obtain a detection sequence matching library;
the conversion unit converts the sequences stored in the sequence library into a word vector matrix;
the training unit adopts a deep learning technology to train the detection system according to the word vector matrix;
the detection module comprises: the device comprises a conversion unit, a detection unit, a matching unit, a calculation unit and an identification unit;
the conversion unit converts the sequence to be detected into a sequence in the form of a word vector according to the form of the word vector generated by the training module conversion unit;
the detection unit performs preliminary classification judgment on the sequence in the form of the word vector converted from the sequence to be detected through deep learning, if the sequence is judged to be an abnormal sequence, the sequence is sent to the training module for updating a matching library, and if the sequence is judged to be a normal sequence, the sequence is sent to the matching unit for further judgment;
the matching unit carries out rechecking on the sequence which is preliminarily judged to be normal by the detection unit, compares the matching degree of the sequence which is preliminarily judged to be normal with the sequence in the matching library, and judges the type of the sequence;
when the detection unit cannot identify the type of the detection sequence, the calculation unit calculates the similarity of the sequence generated by the sequence and other hosts in the intranet connection host group at the same time period, judges whether the sequence is similar according to a set similarity threshold value, and judges the sequence as an attack sequence if the sequence is similar to a plurality of hosts in the intranet connection host group, so as to obtain a detection result.
The collecting module is used for collecting the system call names and the system call numbers which are called by the execution process of the appointed process, generating a system call sequence according to the interception sequence, intercepting the call sequence into subsequences with equal length for the input of the training and detecting module, and storing the obtained sequence locally, wherein the module comprises a system call acquisition unit, a processing unit and a receiving and transmitting unit.
The system call acquisition unit can collect information related to system call generated in the execution process of the designated process, and as shown in fig. 5, the information content includes call time, system call name, process name, thread number and the like.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. An attack detection method based on Linux system call is characterized by comprising the following steps:
s1: acquiring a system call generated in the running process of a system, and acquiring an attack sequence and a normal sequence from an existing database to construct a data set;
s2: removing redundant system call in call generated in the system operation process, generating a system call sequence in a specified time period, intercepting the system call sequence into equal-length subsequences, and intercepting sequences in a data set into equal-length subsequences;
s3: dividing equal-length subsequences in a data set into an attack sequence and a normal sequence according to data types, and respectively storing the attack sequence and the normal sequence into two sequence libraries to obtain a detection sequence matching library;
s4: intercepting a system call sequence into subsequences with equal length as a sequence to be detected, and converting the subsequences into a detection sequence in a word vector form;
s5: initially judging the category of the detection sequence in the form of the word vector by a deep learning detection model, if the detection sequence is judged to be an abnormal sequence, putting the sequence into an attack library, updating a detection matching library, and if the detection sequence is judged to be a normal sequence, carrying out further judgment;
s6: comparing the initially determined sequence with the detection matching library in matching degree, and determining the category of the sequence;
s7: judging the sequence of which the class cannot be judged by adopting cluster calculation to obtain a detection result;
the cluster calculation specifically comprises the following steps:
s71: marking a detection sequence which cannot be identified by the detection unit as seq01, and converting the seq01 into a sequence seqm01 in the form of a word vector;
s72: selecting sequence seq generated by host h1 and seq01 in intranet connection host group in same time period 11 And converted into a sequence seqm in the form of a word vector 11 Calculate its Euclidean distance d (seqm 01 ,seqm 11 );
S73: setting a threshold distance, if d (seqm 01, seqm 11) > distance, determining that the sequence generated by the host is similar to the detection sequence, repeating S72-S73, detecting all host numbers of the similar sequence to the detection sequence, if the hostNumber > =threshold, determining that the detection sequence type is an attack sequence, and if the hostNumber is less than threshold, determining that the detection sequence type is a normal sequence, wherein threshold represents the set threshold of the number of the hosts with the similar sequence.
2. The attack detection method based on Linux system call according to claim 1, wherein the system call sequence is intercepted into subsequences with equal length, specifically comprising:
removing redundant system call by adopting a statistical analysis method, generating a system call sequence, and intercepting the generated system call sequence into a call sequence with fixed length: intercepting a call sequence with the length exceeding a fixed length by adopting a sliding window technology, and filling 0 at the tail part of the generated sequence for filling if the length of the generated sequence does not reach the fixed length.
3. The attack detection method based on Linux system call according to claim 2, wherein the removing redundant system call by adopting the statistical analysis method specifically comprises:
according to the known type sequences in the dataset, respectively calculating TF-IDF values of system calls of each sequence in the two types of sequences, respectively sorting the TF-IDF values of each sequence in the two types of sequences in descending order, respectively screening out the last 40 system calls in the two types of sequences, screening out repeated system calls from the last 40 system calls in each type of sequences, comparing and analyzing the two screened system calls, taking the same system call in the two types of sequences as a redundant system call, carrying out statistical analysis on the selected redundant system call and all the system calls generated in the system operation process, finding out the system call which is the same as the selected redundant system call, and removing the system call.
4. A Linux system call based attack detection method according to claim 3 wherein the TF-IDF value of the system call is calculated as:
wherein the TF-IDF a1 TF-IDF value, TF, representing system call a1 a1,a Representing how frequently system call a1 occurs in sequence a, Σa 1,k Representing the total number of sequences in which the system call a1 occurs in the generated sequence k, |n| represents the total number of sequences of normal type or attack type in the generated sequence, | { N } a1 The number of sequences in which the system call a1 occurs in the current class is denoted by } |.
5. The attack detection method based on Linux system call according to claim 1, wherein comparing the sequence which is preliminarily determined to be normal with the sequence in the matching library in matching degree, and determining the type of the sequence, specifically comprises:
setting the matching degree to be 0.8, comparing the similarity of the sequence to be detected and the sequences in the two types of matching libraries, if the similarity of the sequence to be detected and the sequence in the normal sequence matching library is greater than 0.8, the sequence is the normal sequence, if the similarity of the sequence to be detected and the sequence in the attack sequence matching library is greater than 0.8, the sequence is the attack sequence, and if the similarity of the sequence to be detected and the sequence in the normal sequence matching library or the sequence in the attack sequence matching library is less than 0.8, the type of the sequence cannot be identified.
6. The attack detection method based on Linux system call according to claim 1, wherein the euclidean distance d (seqm 01 ,seqm 11 ) Calculation, expressed as:
wherein seqm 01 Representing a detection sequence in the form of a word vector seqm 11 Sequences, m, representing word vector forms generated by the host h1 and the detection sequences in the same time period in the intranet connection host group mi The expression sequence seqm 01 I-th system call, n ni The expression sequence seqm 11 Is the ith system call.
7. An attack detection system based on Linux system call, comprising: the device comprises a collecting module, a training module and a detecting module;
the collection module includes: the system call acquisition unit, the processing unit and the data transmission unit;
the system call acquisition unit is used for collecting related information called by the execution process of the designated process, acquiring an attack sequence and a normal sequence from the existing database and constructing a data set;
the related information includes: calling time, system calling name, process name and thread name;
the processing unit processes the system call information collected by the system call acquisition unit according to the length or the call time to generate a system call sequence in a specified time period, intercepts the system call sequence into equal-length subsequences, and intercepts sequences in the data set into equal-length subsequences;
the data transmission module transmits the subsequence generated by the system call information processing to the detection module as a detection sequence, and transmits the subsequence in the data set to the training module;
the training module comprises: the device comprises a data unit, a conversion unit and a training unit;
the data unit divides the equal-length subsequences in the data set into an attack sequence and a normal sequence according to the data type, and stores the attack sequence and the normal sequence into two sequence libraries of an attack library and a normal library respectively to obtain a detection sequence matching library;
the conversion unit converts the sequences stored in the sequence library into a word vector matrix;
the training unit adopts a deep learning technology to train the detection system according to the word vector matrix;
the detection module comprises: the device comprises a conversion unit, a detection unit, a matching unit, a calculation unit and an identification unit;
the conversion unit converts the sequence to be detected into a sequence in the form of a word vector according to the form of the word vector generated by the training module conversion unit;
the detection unit performs preliminary classification judgment on the sequence in the form of the word vector converted from the sequence to be detected through the deep learning model, if the sequence is judged to be an abnormal sequence, the sequence is sent to the training module for updating a matching library, and if the sequence is judged to be a normal sequence, the sequence is sent to the matching unit for further judgment;
the matching unit carries out rechecking on the sequence which is preliminarily judged to be normal by the detection unit, compares the matching degree of the sequence which is preliminarily judged to be normal with the sequence in the matching library, and judges the type of the sequence;
when the detection unit cannot identify the type of the detection sequence, the calculation unit calculates the similarity of the sequence generated by the sequence and other hosts in the intranet connection host group at the same time period, judges whether the sequence is similar according to a set similarity threshold value, and judges the sequence as an attack sequence if the sequence is similar to a plurality of hosts in the intranet connection host group, so as to obtain a detection result.
CN202211004258.XA 2022-08-22 2022-08-22 Attack detection system based on Linux system call Active CN115378702B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211004258.XA CN115378702B (en) 2022-08-22 2022-08-22 Attack detection system based on Linux system call

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211004258.XA CN115378702B (en) 2022-08-22 2022-08-22 Attack detection system based on Linux system call

Publications (2)

Publication Number Publication Date
CN115378702A CN115378702A (en) 2022-11-22
CN115378702B true CN115378702B (en) 2024-04-02

Family

ID=84066677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211004258.XA Active CN115378702B (en) 2022-08-22 2022-08-22 Attack detection system based on Linux system call

Country Status (1)

Country Link
CN (1) CN115378702B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1649312A (en) * 2005-03-23 2005-08-03 北京首信科技有限公司 Program grade invasion detecting system and method based on sequency mode evacuation
CN102521534A (en) * 2011-12-03 2012-06-27 南京大学 Intrusion detection method based on crude entropy property reduction
CN107493258A (en) * 2017-04-19 2017-12-19 安徽华脉科技发展有限公司 A kind of intruding detection system based on network security
CN112613032A (en) * 2020-12-15 2021-04-06 中国科学院信息工程研究所 Host intrusion detection method and device based on system call sequence
CN113094713A (en) * 2021-06-09 2021-07-09 四川大学 Self-adaptive host intrusion detection sequence feature extraction method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL197477A0 (en) * 2009-03-08 2009-12-24 Univ Ben Gurion System and method for detecting new malicious executables, based on discovering and monitoring of characteristic system call sequences
US10586044B2 (en) * 2017-12-12 2020-03-10 Institute For Information Industry Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1649312A (en) * 2005-03-23 2005-08-03 北京首信科技有限公司 Program grade invasion detecting system and method based on sequency mode evacuation
CN102521534A (en) * 2011-12-03 2012-06-27 南京大学 Intrusion detection method based on crude entropy property reduction
CN107493258A (en) * 2017-04-19 2017-12-19 安徽华脉科技发展有限公司 A kind of intruding detection system based on network security
CN112613032A (en) * 2020-12-15 2021-04-06 中国科学院信息工程研究所 Host intrusion detection method and device based on system call sequence
CN113094713A (en) * 2021-06-09 2021-07-09 四川大学 Self-adaptive host intrusion detection sequence feature extraction method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Asma Razgallah ; Raphaël Khoury.Behavioral classification of Android applications using system calls.2021 28th Asia-Pacific Software Engineering Conference (APSEC).2022,全文. *
Shaohua Lv ; Jian Wang ; Yinqi Yang ; Jiqiang Liu.Intrusion Prediction With System-Call Sequence-to-Sequence Model.IEEE Access.2018,全文. *
主机序列样本生成和异常检测;卢逸君;中国优秀硕士学位论文全文数据库 (信息科技辑);20220315;全文 *
基于系统调用的入侵检测技术研究;陈仲磊;网络安全技术与应用;20220331;全文 *

Also Published As

Publication number Publication date
CN115378702A (en) 2022-11-22

Similar Documents

Publication Publication Date Title
CN111027069B (en) Malicious software family detection method, storage medium and computing device
CN109005145B (en) Malicious URL detection system and method based on automatic feature extraction
CN111639497B (en) Abnormal behavior discovery method based on big data machine learning
CN110704840A (en) Convolutional neural network CNN-based malicious software detection method
CN113806746B (en) Malicious code detection method based on improved CNN (CNN) network
CN111314329B (en) Traffic intrusion detection system and method
CN112905421A (en) Container abnormal behavior detection method of LSTM network based on attention mechanism
CN110879881B (en) Mouse track recognition method based on feature component hierarchy and semi-supervised random forest
CN109194677A (en) A kind of SQL injection attack detection, device and equipment
CN102521534B (en) Intrusion detection method based on crude entropy property reduction
CN111915437A (en) RNN-based anti-money laundering model training method, device, equipment and medium
CN110011990B (en) Intelligent analysis method for intranet security threats
CN116910752B (en) Malicious code detection method based on big data
CN112199670A (en) Log monitoring method for improving IFOREST (entry face detection sequence) to conduct abnormity detection based on deep learning
CN111400713B (en) Malicious software population classification method based on operation code adjacency graph characteristics
CN116150757A (en) Intelligent contract unknown vulnerability detection method based on CNN-LSTM multi-classification model
CN113283901A (en) Byte code-based fraud contract detection method for block chain platform
CN115378702B (en) Attack detection system based on Linux system call
WO2021262344A1 (en) Method and apparatus to detect scripted network traffic
CN111612531A (en) Click fraud detection method and system
CN111368894A (en) FCBF feature selection method and application thereof in network intrusion detection
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium
CN116107834A (en) Log abnormality detection method, device, equipment and storage medium
CN111079143B (en) Trojan horse detection method based on multi-dimensional feature map
CN114528908A (en) Network request data classification model training method, classification method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant