CN117640190A - Botnet detection method based on multi-mode stacking automatic encoder - Google Patents
Botnet detection method based on multi-mode stacking automatic encoder Download PDFInfo
- Publication number
- CN117640190A CN117640190A CN202311596885.1A CN202311596885A CN117640190A CN 117640190 A CN117640190 A CN 117640190A CN 202311596885 A CN202311596885 A CN 202311596885A CN 117640190 A CN117640190 A CN 117640190A
- Authority
- CN
- China
- Prior art keywords
- encoder
- function
- graph
- automatic encoder
- static
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 30
- 230000003068 static effect Effects 0.000 claims abstract description 47
- 238000004458 analytical method Methods 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 28
- 238000000034 method Methods 0.000 claims abstract description 20
- 230000006870 function Effects 0.000 claims description 86
- 239000013598 vector Substances 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000006399 behavior Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 5
- 230000002902 bimodal effect Effects 0.000 claims description 4
- 241000544061 Cuculus canorus Species 0.000 claims description 3
- 244000035744 Hura crepitans Species 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 3
- 238000010845 search algorithm Methods 0.000 claims description 3
- 238000009966 trimming Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 101000693367 Homo sapiens SUMO-activating enzyme subunit 1 Proteins 0.000 description 3
- 102100025809 SUMO-activating enzyme subunit 1 Human genes 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 108050008316 DNA endonuclease RBBP8 Proteins 0.000 description 2
- 102100035250 SUMO-activating enzyme subunit 2 Human genes 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a botnet detection method based on a multi-mode stacking automatic encoder. The method comprises the following steps: acquiring an executable file of an application program; respectively carrying out dynamic analysis and static analysis on a data set containing benign programs and bots, and extracting dynamic characteristics based on streams and static characteristics based on printable character string information graphs; pre-training two stacking automatic encoders to encode the flow-based features and the graph-based features respectively, and extracting deep features; fusing the dynamic characteristics and the static characteristics based on the multi-mode automatic encoder; fine tuning the multi-modal stacked auto-encoder model; and taking the encoder of the trained multi-mode stacking automatic encoder model as a feature extractor, and taking the output of the shared hidden layer as the input of one softmax layer for bot program detection. According to the invention, the static characteristics and the dynamic characteristics can be automatically fused through the improved multi-mode stacking automatic encoder, the complex relationship between the two different mode characteristics can be learned, the advantages of the hybrid analysis method can be fully exerted, and the precision of detecting the botnet program can be improved.
Description
Technical Field
The invention belongs to the field of network security and machine learning, and particularly relates to a botnet detection method based on a multi-mode stacking automatic encoder.
Background
For data acquisition and feature extraction, two methods of static analysis and dynamic analysis are mainly adopted in the botnet detection field. Static methods extract static features by analyzing binary code of zombie program instances without executing malware. Dynamic methods require execution of a given bot instance, typically in a sandboxed environment, and extraction of dynamic features that represent botnet behavior. Most of the existing botnet detection methods detect botnet programs based only on static features or dynamic features. Static analysis is simple and fast, but is susceptible to confusion techniques such as encryption. In contrast, dynamic analysis reflects the behavior of the program as it runs, is relatively confusing, and has better versatility to unknown attacks and attack variants, whereas the process of data collection is time consuming. Since static analysis is dominant in detecting the structure of malware, dynamic analysis can easily detect ambiguous malware. Therefore, by merging the two types of features in a proper way, the precision of botnet detection can be improved. Although the previous methods consider fusion of multiple features, the previous methods are feature-level fusion or single-mode fusion, and the complex relationship between two features cannot be learned by simply feature-combining the features, so that the advantages of the hybrid analysis method cannot be fully exerted.
Disclosure of Invention
In order to solve the problems, the invention provides a botnet detection method based on a multi-mode stacking automatic encoder. The method combines the multi-modal characteristics extracted by static analysis and dynamic analysis, learns the complex relationship between two different modal characteristics based on a stacked multi-modal automatic encoder, fully plays the advantages of a hybrid analysis method, and has higher detection capability on zombie programs.
In order to achieve the technical purpose and the technical effect, the invention is realized by the following technical scheme:
a botnet detection method based on a multi-mode stacking automatic encoder comprises the following steps:
(1) Acquiring an executable file of an application program and storing the executable file in an ELF format;
(2) Respectively carrying out dynamic analysis and static analysis on a data set containing benign programs and bots, and extracting dynamic characteristics based on streams and static characteristics based on Printable String Information (PSI) graphs;
(3) Pre-training two Stacking Automatic Encoders (SAE), respectively encoding dynamic features and static features, and extracting deep complex features;
(4) Fusing dynamic features and static features based on a multi-Modal Automatic Encoder (MAE);
(5) Trimming a multi-Modal Stacked Automatic Encoder (MSAE);
(6) And taking the encoder of the MSAE model with complete training as a feature extractor, and classifying the shared hidden layer output of the model as the input of a softmax layer to realize the detection of the bot program.
The following provides several alternatives, but not as additional limitations to the above-described overall scheme, and only further additions or preferences, each of which may be individually combined for the above-described overall scheme, or may be combined among multiple alternatives, without technical or logical contradictions.
Preferably, the step (2) of dynamically analyzing the data set containing benign programs and bots specifically includes the following steps:
1) Analyzing network behaviors of the ELF file through a Cuckoo sandbox, and recording network traffic in a pcap format;
2) According to five-tuple { source IP address, source port number, destination IP address, destination port number, protocol }, carrying out stream division on network traffic recorded in the pcap file, and aggregating data packets with the same five-tuple into stream data f= { p 1 ,p 2 ,...,p i P, where i Representing data packets having the same five-tuple;
3) The stream data are aggregated again, and the stream data collected from the same program running process are collected in a union mode to form a stream record of a corresponding ELF file:
4) And extracting statistical characteristics based on the stream records, wherein the statistical characteristics comprise an average value, a maximum value and a minimum value of the total number of data packets contained in the stream, an average value, a maximum value and a minimum value of communication duration of the stream, and the average value, the maximum value and the minimum value of the byte number contained in the data packets in the stream are 9 characteristic dimensions in total. Obtaining a stream-based dynamic feature setWherein n represents the number of ELF samples, +.>Representing flow-based features extracted from the ith ELF sample.
Preferably, in the step (2), the static analysis is performed on the data set containing the benign program and the zombie program, and specifically the method comprises the following steps:
1) Checking whether the ELF file is shelled using a shell checking tool DiE, and then unpacking and disassembling the binary using UPX and IDAPro;
2) Constructing a Function Call Graph (FCG) and a Printable String Information (PSI) graph according to a function caller-callee relationship in the assembly code;
3) Thereafter, a graph2vec graph embedding technique is usedConverting PSI diagram into numerical vector data to obtain static feature setWherein->Representing PSI map-based features extracted from the ith ELF sample.
Preferably, the function call graph is defined as a directed graph g= (V, E), and is defined by a vertex set v= { V 1 ,v 2 ,...,v m Sum edge set e= { E 12 ,e 13 ,...,e ij Composition, where m represents the number of vertices, e ij Representing a function v i Calling function v j . Vertices in the FCG correspond to functions contained in the assembly code of the program, and edges represent caller-callee relationships between the two functions.
Preferably, the construction process of the function call graph is summarized as follows:
a) Extracting a set of identified functions from the assembly code;
b) Then determining an entry point function;
c) Building FCG using breadth-first search algorithm if function v is identified i And v j With caller-callee relationship, vertex v will be i And v j Add to the vertex set V and edge e ij Added to edge set E.
Preferably, the PSI graph is constructed by selecting functions and relationships from FCGs that are close to zombie program operation steps, in order to minimize computational complexity, specifically:
a) Extracting all Printable String Information (PSI) existing in the binary file through an IDAPro plug-in, and selecting PSI containing at least three characters in length;
b) Selecting a set of PSI components P= { PSI containing important semantic information (which may reveal the intention of an attacker) 1 ,psi 2 ,...,psi k };
c) For vertex v in the function call graph i If v i Representation ofContains at least one important printable string information psi in the function of (a) i Vertex v i Adding the vertex set V' of the PSI graph, and continuing to execute the step 4), otherwise, skipping the step 4);
d) Traversing all representation functions v i Edge e of call relation ij If the function v j Also contains at least one psi i And (2) andthen vertex v j Add the vertex set V' of the PSI map and edge e ij Adding an edge set E' of the PSI graph;
e) Repeating the steps 3) and 4) until all vertexes in the function call graph are traversed, and finally outputting a PSI graph G ' = (V ', E ').
Preferably, two SAE are pre-trained in the step (3), specifically:
pre-training an SAE by using dynamic characteristics, wherein an encoder consists of two full connection layers and a ReLU activation function, and the decoder and the encoder are of symmetrical structures;
the other SAE is pre-trained by using static characteristics, the encoder consists of two convolution layers, a full connection layer and a ReLU activation function, and the decoder and the encoder are also symmetrical structures;
the pre-trained two SAEs are used to encode dynamic and static data, respectively, to obtain a potential representation of the two modality data.
Preferably, in the step (4), dynamic features and static features are fused based on the multi-mode automatic encoder, specifically:
the final hidden layer outputs of the encoders of the two pre-trained SAE are connected in series to be used as the input of the multi-mode automatic encoder;
fusing the potential representations of the two modal data based on a hidden layer of the multi-modal automatic encoder to generate a shared potential representation;
finally, a stacked multi-modal automatic encoder (MSAE) with all pre-training layers and shared hidden layers is constructed.
Preferably, the fine tuning process in the step (5) specifically includes:
the goal of an automatic encoder is to minimize reconstruction errors of the input and output, let the shared hidden layer learn a shared potential representation of the bimodal data, define a loss function as:
wherein,and->Respectively the dynamic and static feature vectors of the input,/->And->Is the corresponding reconstructed vector of MSAE output;
and fixing parameters of the pre-training layer, training by adopting a gradient descent algorithm, and updating only the weight and the parameters of the shared hidden layer.
Preferably, the encoder of the MSAE model is used as a feature extractor in the step (6), and classified by using a softmax layer, specifically:
unfolding the stacked automatic encoder, adding a softmax output layer on top of the shared hidden layer, and outputting the corresponding predictive label of the ith input
Where W represents the weight of the softmax layer, b represents the bias of the softmax layer, T is the number of object tag categories, z (i) Is the ith output of the shared hidden layer.
Preferably, in order to improve the detection accuracy of the zombie program, the classification error is also added into the loss function of the fine tuning stage, and the classification error is minimized based on the cross entropy loss function:
wherein y is (i) Is the true label of the i-th input sample,is the corresponding predictive label;
the final MSAE is minimized with respect to the reconstruction error L r And classification error L c Is a weighted sum of:
L=αL r +βL c +λR (5)
wherein R is a regularization term, which is realized by carrying out L2 regularization on the weights of all layers in the network; alpha, beta and lambda are weighting factors.
Preferably, the weighting factors α and β are adaptively calculated by using a softmax function:
compared with the prior art, the invention has the following beneficial effects:
1. the invention combines static analysis and dynamic analysis methods, extracts the features based on the stream and the features based on the PSI graph to detect the zombie program, and has higher accuracy compared with a single feature by means of the complementary advantages of the two analysis methods.
2. According to the invention, the strong autonomous learning capability of the multi-mode automatic encoder is utilized, the static features and the dynamic features are automatically fused through the iterative training of the network model, and compared with a simple method for fusing the features of the two modes through direct splicing, the complex relationship between the two modes can be extracted, and the advantages of hybrid analysis are fully exerted.
3. The invention adopts a pre-training and fine-tuning mode to train the MSAE model, does not need a large number of marked data sets, and adds punishment of classification errors into a loss function of a fine-tuning stage, thereby further enhancing the performance of the network model.
Drawings
FIG. 1 is a training and testing flow diagram of one embodiment of the present invention;
FIG. 2 is a schematic diagram of a bot detection scheme according to one embodiment of the present invention;
FIG. 3 is a diagram of a pre-training network and a fine-tuning network architecture according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an MSAE network structure according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. The described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The principle of application of the invention is described in detail below with reference to the accompanying drawings.
In the embodiment of the invention, the multi-mode characteristics extracted by combining static analysis and dynamic analysis are combined, a stacked multi-mode automatic encoder model is constructed based on the pre-trained stacked automatic encoder and the fine-tuned multi-mode automatic encoder and is used for automatically fusing the static characteristics and the dynamic characteristics, and then the accurate detection of the zombie program is realized based on the fused characteristics. The embodiment provides a botnet detection method based on a multi-mode stacking automatic encoder, specifically, as shown in fig. 1, comprising the following steps:
(1) An executable file of the application program is obtained and stored in an ELF format.
(2) And respectively carrying out dynamic analysis and static analysis on the data set containing the benign program and the zombie program, and extracting dynamic characteristics based on the stream and static characteristics based on the PSI graph.
(2.1) the specific steps of dynamic feature extraction are as follows:
(2.1.1) network behavior analysis is performed on the ELF file through a Cuckoo sandbox, and network traffic is recorded in a pcap format.
(2.1.2) stream-dividing the network traffic recorded in the pcap file according to the five-tuple { source IP address, source port number, destination IP address, destination port number, protocol }, and aggregating the data packets with the same five-tuple into stream data f= { p 1 ,p 2 ,...,p i P, where i Representing packets with the same five-tuple.
(2.1.3) re-aggregating the stream data, and merging the stream data collected from the same program runtime to form a stream record of the corresponding ELF file:
(2.1.4) extracting statistical features based on the flow records, including an average value, a maximum value, and a minimum value of the total number of data packets contained in the flow, an average value, a maximum value, and a minimum value of communication duration of the flow, and an average value, a maximum value, and a minimum value of the number of bytes contained in the data packets contained in the flow, totaling 9 feature elements. Obtaining a stream-based dynamic feature setWherein n represents the number of ELF samples, +.>Representing flow-based features extracted from the ith ELF sample.
(2.2) the specific steps of static feature extraction are as follows:
(2.2.1) checking whether the ELF file is shelled using the shell tool DiE, and then unpacking and disassembling the binary using UPX and IDAPro.
(2.2.2) constructing a Function Call Graph (FCG) and a Printable String Information (PSI) graph according to the function caller-callee relationship in the assembly code.
The function call graph is defined as a directed graph g= (V, E), defined by a vertex set v= { V 1 ,v 2 ,...,v m Sum edge set e= { E 12 ,e 13 ,...,e ij Composition, where m represents the number of vertices, e ij Representing a function v i Calling function v j . Vertices in the FCG correspond to unique functions contained in the assembly code of the program, and edges represent caller-callee relationships between the two functions. The present embodiment uses an existing function call graph construction method, based on breadth-first search algorithm, to construct FCG using FIFO function queues, specifically:
a) Extracting a group of boundaries of the identified functions from the assembly code, and storing the functions into a function set named as FunSet;
b) Then extracting all the entry point functions, storing the entry point functions into EntryFunSet, and adding all the entry point functions into a vertex set V;
c) Initializing a function queue by using an entry point function, and setting a queuing flag 'enQFlag' of the function queue as true so as to prevent repeated queuing of the same vertex;
d) When the queue is not empty, dequeuing the queue from the head element of the queue, and adding the function v i Treated as a function caller, after which the function v is traversed i To fetch its called set;
e) When the called party is acquired, traversing the called party set, and checking whether the called party v exists in the graph j If not, the callee v j Added to the set of vertices V and checked if there is a slave caller V already in the graph i To the called party v j Edge e of (2) ij If not, edge e ij Adding into an edge set E;
f) Detecting whether the called party is queued, if not, setting a queuing flag 'enQFlag' of the called party as true, and attaching the queuing flag to the tail of the queue;
g) Repeating steps d), e), f) until the queue is empty.
The function call graph is intended to represent all possible runs of a program. FCGs are therefore often complex, with a large number of nodes and edges, which requires longer computation time and more memory. Although all call relationships of the program are represented in the FCG, some call relationships may never occur during actual running of the program. In order to minimize the computational complexity, the present embodiment selects functions and relationships close to the operation steps of the zombie program from the FCG to construct the PSI graph, specifically:
a) Extracting all Printable String Information (PSI) existing in the binary file through the IDAPro plug-in, and selecting PSI with at least three characters in length for balancing detection precision and calculation complexity;
b) Then a PSI composition set P= { PSI containing important semantic information (which can reveal the intention of an attacker) is selected 1 ,psi 2 ,...,psi k };
c) For vertex v in the function call graph i If v i The expressed function contains at least one important printable string information psi i Vertex v i Adding the vertex set V' of the PSI graph, and continuing to execute the step d), otherwise, skipping the step d);
d) Traversing all representation functions v i Edge e of call relation ij If the function v j Also contains at least one psi i And (2) andthen vertex v j Add the vertex set V' of the PSI map and edge e ij Adding an edge set E' of the PSI graph;
e) Repeating steps c) and d) until all vertices in the function call graph are traversed, and finally outputting a PSI graph G ' = (V ', E ').
(2.2.3) converting the PSI map into numerical vector data by using a map embedding technique named graph2vec to obtain a static feature setWherein->Representing PSI map-based features extracted from the ith ELF sample. The result of this step is a set of one-hot vectors of arbitrary length representing the atlas. In this embodiment, the PSI chart is represented as a numerical vector of length 1024.
(3) Two Stacked Automatic Encoders (SAE) are pre-trained to encode dynamic and static features, respectively, extracting deep complex features.
And dividing the characteristic data set extracted in the step into a training set and a testing set, and then dividing the training set again to obtain a pre-training data set and a fine-tuning data set.
In an unsupervised learning mode, using dynamic features in the pre-training dataset as input, a SAE is pre-trained, the structure of which is shown in FIG. 3 (a). For convenience of explanation, this SAE is referred to as SAE1 in this example. SAE1 is composed of two parts, namely an encoder and a decoder, wherein the encoder is composed of two fully connected layers, a ReLU activation function is adopted between each layer, and the decoder and the encoder are of symmetrical structures. The size of the input layer and the output layer of SAE1 corresponds to the dimension of the dynamic feature, set to 9; the number of neurons of the two concealment layers in its encoder is 8 and 4, respectively, so the output size of the encoder final concealment layer is 4.
In an unsupervised learning manner, using static features in the pre-training dataset as input, another SAE, here called SAE2, is pre-trained, the structure of which is shown in FIG. 3 (b). SAE2 consists of two parts, encoder and decoder, which are also symmetrical structures, wherein the structure of the encoder is specifically:
(1) a convolution layer C1, the convolution kernel size is 3×3, the channel number is 16, and the output is 8×8×16;
(2) the pooling layer P1 performs a maximum pooling operation of 2×2 once and outputs 4×4×16;
(3) a convolution layer C2, the convolution kernel size is 3×3, the number of channels is 32, and the output is 4×4×32;
(4) the pooling layer P2 performs a maximum pooling operation of 2×2 once and outputs 2×2×32;
(5) the full connection layer FC1 consists of 128 neurons, adopts a ReLU activation function and outputs 128-dimensional vectors;
(6) the fully connected layer FC2, consisting of 10 neurons, uses the ReLU activation function, so the encoder final hidden layer output size is 10.
And then, respectively encoding the dynamic characteristics and the static characteristics in the fine adjustment data set by using the pre-trained two SAEs to obtain potential representations of the two modal data.
(4) The dynamic characteristics and the static characteristics are fused based on the multi-mode automatic encoder.
To fuse the static and dynamic features, the present embodiment concatenates the final hidden layer outputs of the pre-trained encoders of the two SAEs and takes them as inputs to the multi-mode auto-encoder. The implementation of a multi-modal auto-encoder is essentially based on a hidden layer of another auto-encoder fusing the potential representations of the two modal data, generating a shared potential representation, as shown in fig. 3 (c). A stacked multi-modal automatic encoder with all pre-training layers and a shared hidden layer is finally constructed as shown in fig. 4.
(5) A multi-Modal Stacked Automatic Encoder (MSAE) is fine tuned.
The goal of an automatic encoder is to minimize reconstruction errors of the input and output, let the shared hidden layer learn a shared potential representation of the bimodal data, define a loss function as:
wherein,and->Respectively the dynamic and static feature vectors of the input,/->And->Is the corresponding reconstructed vector of MSAE output;
and (3) performing fine adjustment on the MSAE in a semi-supervised learning mode, taking a fine adjustment data set with a label as a model input, fixing parameters of a pre-training layer, and performing optimization updating on the parameters of the shared hidden layer only through an Adam optimization function based on a gradient descent algorithm.
(6) And taking the encoder of the MSAE model with complete training as a feature extractor, and classifying the shared hidden layer output of the model as the input of a softmax layer to realize the detection of the bot program.
The model structure for testing zombie programs based on MSAE is shown in FIG. 4. Specifically, the stacked auto encoder is expanded and a softmax output layer is added on top of the shared hidden layer to output the corresponding predictive label of the ith input
Where W represents the weight of the softmax layer, b represents the bias of the softmax layer, T is the number of object tag categories, z (i) Is the ith output of the shared hidden layer.
In order to improve the detection precision of zombie programs, the embodiment provides an improved MSAE, classification errors are added into a loss function in a fine tuning stage, and the classification errors are minimized based on a cross entropy loss function:
wherein y is (i) Is the true label of the i-th input sample,is the corresponding predictive label;
the final MSAE is minimized with respect to the reconstruction error L r And divideClass error L c Is a weighted sum of:
wherein R is a regularization term for preventing model overfitting by L2 regularization of weights of layers in the network, L refers to the number of layers of the network, W l Refers to the weight of the corresponding layer; alpha, beta are weighting factors for reconstruction loss and classification loss, respectively, and lambda is a regularization coefficient.
Further, the weighting factors α and β are adaptively calculated by using a softmax function:
as shown in fig. 2, the embodiment combines static analysis and dynamic analysis methods, respectively extracts static features based on PSI graphs and dynamic features based on streams, automatically fuses the static features and the dynamic features based on MSAE, automatically extracts fusion features through iterative training of a network, and detects zombie programs based on the fusion features. Compared with the prior art, the method can extract the complex relation between the bimodal features and fully exert the advantages of mixed analysis.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (10)
1. A botnet detection method based on a multi-modal stacked automatic encoder, comprising the steps of:
(1) Acquiring an executable file of an application program and storing the executable file in an ELF format;
(2) Respectively carrying out dynamic analysis and static analysis on a data set containing benign programs and bots, and extracting dynamic characteristics based on streams and static characteristics based on printable character string information graphs;
(3) Pre-training two stacking automatic encoders to encode dynamic characteristics and static characteristics respectively and extract deep complex characteristics;
(4) Fusing the dynamic characteristics and the static characteristics based on the multi-mode automatic encoder;
(5) Trimming the multi-mode stacking automatic encoder;
(6) And taking the encoder of the fully trained multi-mode stacking automatic encoder model as a feature extractor, and classifying the shared hidden layer output of the model as the input of a softmax layer to realize the detection of the zombie program.
2. The botnet detection method based on multi-modal stacked automatic encoders of claim 1, wherein in step (2) the data set comprising benign programs and botnets is dynamically analyzed, specifically:
2-1) analyzing network behaviors of the ELF file through a Cuckoo sandbox, and recording network traffic in a pcap format;
2-2) stream-dividing the network traffic recorded in the pcap file according to the five-tuple { source IP address, source port number, destination IP address, destination port number, protocol }, and aggregating the data packets with the same five-tuple into stream data f= { p 1 ,p 2 ,...,p i P, where i Representing data packets having the same five-tuple;
2-3) re-aggregating the stream data, and merging the stream data collected from the same program in running to form a stream record of the corresponding ELF file:
2-4) extracting statistical features based on flow records, packetsThe average value, the maximum value and the minimum value of the total number of the data packets contained in the stream, the average value, the maximum value and the minimum value of the communication duration of the stream, and the average value, the maximum value and the minimum value of the byte number contained in the data packets in the stream are 9 feature dimensions in total; obtaining a stream-based dynamic feature setWherein n represents the number of ELF samples, +.>Representing flow-based features extracted from the ith ELF sample.
3. The botnet detection method based on multi-modal stacked automatic encoders of claim 1, wherein in step (2) the data set comprising benign programs and botnets is statically analyzed, specifically:
3-1) checking whether the ELF file is shelled using a shell checking tool DiE, and then unpacking and disassembling the binary using UPX and IDAPro;
3-2) constructing a function call graph and a printable character string information graph according to the relation between a function caller and a callee in the assembly code;
3-3) converting the printable string information graph into numerical vector data by adopting a graph embedding technology named graph2vec to obtain a static feature setWherein->Representing features extracted from the ith ELF sample based on the printable string information graph;
the function call graph is defined as a directed graph g= (V, E), defined by a vertex set v= { V 1 ,v 2 ,...,v m Sum edge set e= { E 12 ,e 13 ,...,e ij Composition, where m represents the number of verticesQuantity e ij Representing a function v i Calling function v j The method comprises the steps of carrying out a first treatment on the surface of the Vertices in the function call graph correspond to functions contained in assembly code of the program, and edges represent caller-callee relationships between the two functions.
4. The botnet detection method based on the multi-mode stacking automatic encoder as claimed in claim 2, wherein the construction process of the function call graph is specifically as follows:
4-1) extracting a set of identified functions from the assembly code;
4-2) then determining an entry point function;
4-3) building a function call graph using breadth-first search algorithm if a function v is identified i And v j With caller-callee relationship, vertex v will be i And v j Add to the vertex set V and edge e ij Added to edge set E.
5. The botnet detection method based on the multi-modal stacked automatic encoder of claim 2, wherein the printable string information graph is constructed by selecting functions and relations close to the botnet operation steps from the function call graph, in particular:
5-1) extracting all printable string information existing in the binary file through the IDAPro plug-in, and selecting printable string information containing at least three characters in length;
5-2) selecting printable string information composition sets P= { psi containing important semantic information 1 ,psi 2 ,...,psi k };
5-3) for vertex v in the function call graph i If v i The expressed function contains at least one important printable string information psi i Vertex v i Adding the vertex set V' of the PSI graph, and continuing to execute the step 5-4), otherwise, skipping the step 5-4);
5-4) traversing all representation functions v i Edge e of call relation ij If the function v j Also contains at least one psi i And (2) andthen vertex v j Add the vertex set V' of the PSI map and edge e ij Adding an edge set E' of the PSI graph;
5-5) repeating the steps 5-3) and 5-4) until all vertices in the function call graph are traversed, and finally outputting a PSI graph G ' = (V ', E ').
6. The botnet detection method based on multi-modal stacked automatic encoders of claim 1, wherein the pre-training of two stacked automatic encoders in step (3) is specifically:
pre-training a stacked automatic encoder using dynamic features, the encoder consisting of two fully-connected layers and a ReLU activation function, the decoder and encoder being of symmetrical construction;
pre-training another stacked automatic encoder using static features, the encoder consisting of two convolutional layers, a fully-concatenated layer, and a ReLU activation function, the decoder and encoder also being of symmetrical construction;
the pre-trained two stacked auto-encoders are used to encode dynamic and static data, respectively, to obtain potential representations of both modality data.
7. The botnet detection method based on the multi-mode stacking automatic encoder as claimed in claim 1, wherein the dynamic features and the static features are fused based on the multi-mode stacking automatic encoder in the step (4), specifically:
the final hidden layer codes of the two pre-training stacked automatic encoders are connected in series to be used as the input of the multi-mode automatic encoder;
fusing the potential representations of the two modal data based on a hidden layer of the multi-modal automatic encoder to generate a shared potential representation;
a stacked multi-modal automatic encoder with all pre-training layers and a shared hidden layer is ultimately built.
8. The botnet detection method based on the multi-mode stacking automatic encoder as claimed in claim 1, wherein the trimming process of the step (5) specifically comprises:
the goal of an automatic encoder is to minimize reconstruction errors of the input and output, let the shared hidden layer learn a shared potential representation of the bimodal data, define a loss function as:
wherein,and->Respectively the dynamic and static feature vectors of the input,/->And->Is the corresponding reconstructed vector of MSAE output;
and fixing parameters of the pre-training layer, training by adopting a gradient descent algorithm, and updating only the weight and the parameters of the shared hidden layer.
9. The botnet detection method based on the multi-modal stacked automatic encoder as claimed in claim 1, wherein the step (6) specifically comprises:
unfolding the stacked automatic encoder, adding a softmax output layer on top of the shared hidden layer, and outputting the corresponding predictive label of the ith input
Where W represents the weight of the softmax layer, b represents the bias of the softmax layer, T is the number of object tag categories, z (i) Is the output of the shared hidden layer.
10. The botnet detection method based on multi-modal stacked automatic encoders of claim 9, wherein, to improve detection accuracy of botnets, classification errors are also added to the loss function of the fine tuning stage, and classification errors are minimized based on cross entropy loss function:
wherein y is (i) Is the true label of the i-th input sample,is the corresponding predictive label;
the minimization goal of the final multi-mode stacked auto-encoder is to reconstruct the error L r And classification error L c Is a weighted sum of:
L=αL r +βL c +λR (5)
wherein R is a regularization term, which is realized by carrying out L2 regularization on the weights of all layers in the network; alpha, beta and lambda are weighting factors;
the weighting factors α and β for the reconstruction error and classification error are adaptively calculated using the softmax function:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311596885.1A CN117640190A (en) | 2023-11-28 | 2023-11-28 | Botnet detection method based on multi-mode stacking automatic encoder |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311596885.1A CN117640190A (en) | 2023-11-28 | 2023-11-28 | Botnet detection method based on multi-mode stacking automatic encoder |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117640190A true CN117640190A (en) | 2024-03-01 |
Family
ID=90022771
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311596885.1A Pending CN117640190A (en) | 2023-11-28 | 2023-11-28 | Botnet detection method based on multi-mode stacking automatic encoder |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117640190A (en) |
-
2023
- 2023-11-28 CN CN202311596885.1A patent/CN117640190A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111783100B (en) | Source code vulnerability detection method for code graph representation learning based on graph convolution network | |
CN109471938B (en) | Text classification method and terminal | |
CN109816032B (en) | Unbiased mapping zero sample classification method and device based on generative countermeasure network | |
CN111382555B (en) | Data processing method, medium, device and computing equipment | |
CN109902301B (en) | Deep neural network-based relationship reasoning method, device and equipment | |
CN113596007B (en) | Vulnerability attack detection method and device based on deep learning | |
CN112148877A (en) | Corpus text processing method and device and electronic equipment | |
CN109033833B (en) | Malicious code classification method based on multiple features and feature selection | |
CN113221112B (en) | Malicious behavior identification method, system and medium based on weak correlation integration strategy | |
CN113360912A (en) | Malicious software detection method, device, equipment and storage medium | |
CN112085166B (en) | Convolutional neural network model acceleration training method and device, electronic equipment and storage medium | |
CN115146279A (en) | Program vulnerability detection method, terminal device and storage medium | |
WO2023029397A1 (en) | Training data acquisition method, abnormal behavior recognition network training method and apparatus, computer device, storage medium, computer program and computer program product | |
CN114091551A (en) | Pornographic image identification method and device, electronic equipment and storage medium | |
CN113591892A (en) | Training data processing method and device | |
CN117640190A (en) | Botnet detection method based on multi-mode stacking automatic encoder | |
CN115035463B (en) | Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium | |
CN114793170B (en) | DNS tunnel detection method, system, equipment and terminal based on open set identification | |
CN117082118A (en) | Network connection method based on data derivation and port prediction | |
CN116226852A (en) | Mobile platform malicious software detection method and device based on multi-mode information fusion | |
CN115080974B (en) | Malicious PE file detection method, device, equipment and medium | |
CN111556017A (en) | Network intrusion detection method based on self-coding machine and electronic device | |
KR20210038027A (en) | Method for Training to Compress Neural Network and Method for Using Compressed Neural Network | |
CN113806338B (en) | Data discrimination method and system based on data sample imaging | |
CN113595987B (en) | Communication abnormal discovery method and device based on baseline behavior characterization, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |