CN117640190A - Botnet detection method based on multi-mode stacking automatic encoder - Google Patents

Botnet detection method based on multi-mode stacking automatic encoder Download PDF

Info

Publication number
CN117640190A
CN117640190A CN202311596885.1A CN202311596885A CN117640190A CN 117640190 A CN117640190 A CN 117640190A CN 202311596885 A CN202311596885 A CN 202311596885A CN 117640190 A CN117640190 A CN 117640190A
Authority
CN
China
Prior art keywords
encoder
function
graph
automatic encoder
static
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311596885.1A
Other languages
Chinese (zh)
Inventor
孙宁
陈乐兰
韩光洁
娄星宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202311596885.1A priority Critical patent/CN117640190A/en
Publication of CN117640190A publication Critical patent/CN117640190A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a botnet detection method based on a multi-mode stacking automatic encoder. The method comprises the following steps: acquiring an executable file of an application program; respectively carrying out dynamic analysis and static analysis on a data set containing benign programs and bots, and extracting dynamic characteristics based on streams and static characteristics based on printable character string information graphs; pre-training two stacking automatic encoders to encode the flow-based features and the graph-based features respectively, and extracting deep features; fusing the dynamic characteristics and the static characteristics based on the multi-mode automatic encoder; fine tuning the multi-modal stacked auto-encoder model; and taking the encoder of the trained multi-mode stacking automatic encoder model as a feature extractor, and taking the output of the shared hidden layer as the input of one softmax layer for bot program detection. According to the invention, the static characteristics and the dynamic characteristics can be automatically fused through the improved multi-mode stacking automatic encoder, the complex relationship between the two different mode characteristics can be learned, the advantages of the hybrid analysis method can be fully exerted, and the precision of detecting the botnet program can be improved.

Description

Botnet detection method based on multi-mode stacking automatic encoder
Technical Field
The invention belongs to the field of network security and machine learning, and particularly relates to a botnet detection method based on a multi-mode stacking automatic encoder.
Background
For data acquisition and feature extraction, two methods of static analysis and dynamic analysis are mainly adopted in the botnet detection field. Static methods extract static features by analyzing binary code of zombie program instances without executing malware. Dynamic methods require execution of a given bot instance, typically in a sandboxed environment, and extraction of dynamic features that represent botnet behavior. Most of the existing botnet detection methods detect botnet programs based only on static features or dynamic features. Static analysis is simple and fast, but is susceptible to confusion techniques such as encryption. In contrast, dynamic analysis reflects the behavior of the program as it runs, is relatively confusing, and has better versatility to unknown attacks and attack variants, whereas the process of data collection is time consuming. Since static analysis is dominant in detecting the structure of malware, dynamic analysis can easily detect ambiguous malware. Therefore, by merging the two types of features in a proper way, the precision of botnet detection can be improved. Although the previous methods consider fusion of multiple features, the previous methods are feature-level fusion or single-mode fusion, and the complex relationship between two features cannot be learned by simply feature-combining the features, so that the advantages of the hybrid analysis method cannot be fully exerted.
Disclosure of Invention
In order to solve the problems, the invention provides a botnet detection method based on a multi-mode stacking automatic encoder. The method combines the multi-modal characteristics extracted by static analysis and dynamic analysis, learns the complex relationship between two different modal characteristics based on a stacked multi-modal automatic encoder, fully plays the advantages of a hybrid analysis method, and has higher detection capability on zombie programs.
In order to achieve the technical purpose and the technical effect, the invention is realized by the following technical scheme:
a botnet detection method based on a multi-mode stacking automatic encoder comprises the following steps:
(1) Acquiring an executable file of an application program and storing the executable file in an ELF format;
(2) Respectively carrying out dynamic analysis and static analysis on a data set containing benign programs and bots, and extracting dynamic characteristics based on streams and static characteristics based on Printable String Information (PSI) graphs;
(3) Pre-training two Stacking Automatic Encoders (SAE), respectively encoding dynamic features and static features, and extracting deep complex features;
(4) Fusing dynamic features and static features based on a multi-Modal Automatic Encoder (MAE);
(5) Trimming a multi-Modal Stacked Automatic Encoder (MSAE);
(6) And taking the encoder of the MSAE model with complete training as a feature extractor, and classifying the shared hidden layer output of the model as the input of a softmax layer to realize the detection of the bot program.
The following provides several alternatives, but not as additional limitations to the above-described overall scheme, and only further additions or preferences, each of which may be individually combined for the above-described overall scheme, or may be combined among multiple alternatives, without technical or logical contradictions.
Preferably, the step (2) of dynamically analyzing the data set containing benign programs and bots specifically includes the following steps:
1) Analyzing network behaviors of the ELF file through a Cuckoo sandbox, and recording network traffic in a pcap format;
2) According to five-tuple { source IP address, source port number, destination IP address, destination port number, protocol }, carrying out stream division on network traffic recorded in the pcap file, and aggregating data packets with the same five-tuple into stream data f= { p 1 ,p 2 ,...,p i P, where i Representing data packets having the same five-tuple;
3) The stream data are aggregated again, and the stream data collected from the same program running process are collected in a union mode to form a stream record of a corresponding ELF file:
4) And extracting statistical characteristics based on the stream records, wherein the statistical characteristics comprise an average value, a maximum value and a minimum value of the total number of data packets contained in the stream, an average value, a maximum value and a minimum value of communication duration of the stream, and the average value, the maximum value and the minimum value of the byte number contained in the data packets in the stream are 9 characteristic dimensions in total. Obtaining a stream-based dynamic feature setWherein n represents the number of ELF samples, +.>Representing flow-based features extracted from the ith ELF sample.
Preferably, in the step (2), the static analysis is performed on the data set containing the benign program and the zombie program, and specifically the method comprises the following steps:
1) Checking whether the ELF file is shelled using a shell checking tool DiE, and then unpacking and disassembling the binary using UPX and IDAPro;
2) Constructing a Function Call Graph (FCG) and a Printable String Information (PSI) graph according to a function caller-callee relationship in the assembly code;
3) Thereafter, a graph2vec graph embedding technique is usedConverting PSI diagram into numerical vector data to obtain static feature setWherein->Representing PSI map-based features extracted from the ith ELF sample.
Preferably, the function call graph is defined as a directed graph g= (V, E), and is defined by a vertex set v= { V 1 ,v 2 ,...,v m Sum edge set e= { E 12 ,e 13 ,...,e ij Composition, where m represents the number of vertices, e ij Representing a function v i Calling function v j . Vertices in the FCG correspond to functions contained in the assembly code of the program, and edges represent caller-callee relationships between the two functions.
Preferably, the construction process of the function call graph is summarized as follows:
a) Extracting a set of identified functions from the assembly code;
b) Then determining an entry point function;
c) Building FCG using breadth-first search algorithm if function v is identified i And v j With caller-callee relationship, vertex v will be i And v j Add to the vertex set V and edge e ij Added to edge set E.
Preferably, the PSI graph is constructed by selecting functions and relationships from FCGs that are close to zombie program operation steps, in order to minimize computational complexity, specifically:
a) Extracting all Printable String Information (PSI) existing in the binary file through an IDAPro plug-in, and selecting PSI containing at least three characters in length;
b) Selecting a set of PSI components P= { PSI containing important semantic information (which may reveal the intention of an attacker) 1 ,psi 2 ,...,psi k };
c) For vertex v in the function call graph i If v i Representation ofContains at least one important printable string information psi in the function of (a) i Vertex v i Adding the vertex set V' of the PSI graph, and continuing to execute the step 4), otherwise, skipping the step 4);
d) Traversing all representation functions v i Edge e of call relation ij If the function v j Also contains at least one psi i And (2) andthen vertex v j Add the vertex set V' of the PSI map and edge e ij Adding an edge set E' of the PSI graph;
e) Repeating the steps 3) and 4) until all vertexes in the function call graph are traversed, and finally outputting a PSI graph G ' = (V ', E ').
Preferably, two SAE are pre-trained in the step (3), specifically:
pre-training an SAE by using dynamic characteristics, wherein an encoder consists of two full connection layers and a ReLU activation function, and the decoder and the encoder are of symmetrical structures;
the other SAE is pre-trained by using static characteristics, the encoder consists of two convolution layers, a full connection layer and a ReLU activation function, and the decoder and the encoder are also symmetrical structures;
the pre-trained two SAEs are used to encode dynamic and static data, respectively, to obtain a potential representation of the two modality data.
Preferably, in the step (4), dynamic features and static features are fused based on the multi-mode automatic encoder, specifically:
the final hidden layer outputs of the encoders of the two pre-trained SAE are connected in series to be used as the input of the multi-mode automatic encoder;
fusing the potential representations of the two modal data based on a hidden layer of the multi-modal automatic encoder to generate a shared potential representation;
finally, a stacked multi-modal automatic encoder (MSAE) with all pre-training layers and shared hidden layers is constructed.
Preferably, the fine tuning process in the step (5) specifically includes:
the goal of an automatic encoder is to minimize reconstruction errors of the input and output, let the shared hidden layer learn a shared potential representation of the bimodal data, define a loss function as:
wherein,and->Respectively the dynamic and static feature vectors of the input,/->And->Is the corresponding reconstructed vector of MSAE output;
and fixing parameters of the pre-training layer, training by adopting a gradient descent algorithm, and updating only the weight and the parameters of the shared hidden layer.
Preferably, the encoder of the MSAE model is used as a feature extractor in the step (6), and classified by using a softmax layer, specifically:
unfolding the stacked automatic encoder, adding a softmax output layer on top of the shared hidden layer, and outputting the corresponding predictive label of the ith input
Where W represents the weight of the softmax layer, b represents the bias of the softmax layer, T is the number of object tag categories, z (i) Is the ith output of the shared hidden layer.
Preferably, in order to improve the detection accuracy of the zombie program, the classification error is also added into the loss function of the fine tuning stage, and the classification error is minimized based on the cross entropy loss function:
wherein y is (i) Is the true label of the i-th input sample,is the corresponding predictive label;
the final MSAE is minimized with respect to the reconstruction error L r And classification error L c Is a weighted sum of:
L=αL r +βL c +λR (5)
wherein R is a regularization term, which is realized by carrying out L2 regularization on the weights of all layers in the network; alpha, beta and lambda are weighting factors.
Preferably, the weighting factors α and β are adaptively calculated by using a softmax function:
compared with the prior art, the invention has the following beneficial effects:
1. the invention combines static analysis and dynamic analysis methods, extracts the features based on the stream and the features based on the PSI graph to detect the zombie program, and has higher accuracy compared with a single feature by means of the complementary advantages of the two analysis methods.
2. According to the invention, the strong autonomous learning capability of the multi-mode automatic encoder is utilized, the static features and the dynamic features are automatically fused through the iterative training of the network model, and compared with a simple method for fusing the features of the two modes through direct splicing, the complex relationship between the two modes can be extracted, and the advantages of hybrid analysis are fully exerted.
3. The invention adopts a pre-training and fine-tuning mode to train the MSAE model, does not need a large number of marked data sets, and adds punishment of classification errors into a loss function of a fine-tuning stage, thereby further enhancing the performance of the network model.
Drawings
FIG. 1 is a training and testing flow diagram of one embodiment of the present invention;
FIG. 2 is a schematic diagram of a bot detection scheme according to one embodiment of the present invention;
FIG. 3 is a diagram of a pre-training network and a fine-tuning network architecture according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an MSAE network structure according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. The described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The principle of application of the invention is described in detail below with reference to the accompanying drawings.
In the embodiment of the invention, the multi-mode characteristics extracted by combining static analysis and dynamic analysis are combined, a stacked multi-mode automatic encoder model is constructed based on the pre-trained stacked automatic encoder and the fine-tuned multi-mode automatic encoder and is used for automatically fusing the static characteristics and the dynamic characteristics, and then the accurate detection of the zombie program is realized based on the fused characteristics. The embodiment provides a botnet detection method based on a multi-mode stacking automatic encoder, specifically, as shown in fig. 1, comprising the following steps:
(1) An executable file of the application program is obtained and stored in an ELF format.
(2) And respectively carrying out dynamic analysis and static analysis on the data set containing the benign program and the zombie program, and extracting dynamic characteristics based on the stream and static characteristics based on the PSI graph.
(2.1) the specific steps of dynamic feature extraction are as follows:
(2.1.1) network behavior analysis is performed on the ELF file through a Cuckoo sandbox, and network traffic is recorded in a pcap format.
(2.1.2) stream-dividing the network traffic recorded in the pcap file according to the five-tuple { source IP address, source port number, destination IP address, destination port number, protocol }, and aggregating the data packets with the same five-tuple into stream data f= { p 1 ,p 2 ,...,p i P, where i Representing packets with the same five-tuple.
(2.1.3) re-aggregating the stream data, and merging the stream data collected from the same program runtime to form a stream record of the corresponding ELF file:
(2.1.4) extracting statistical features based on the flow records, including an average value, a maximum value, and a minimum value of the total number of data packets contained in the flow, an average value, a maximum value, and a minimum value of communication duration of the flow, and an average value, a maximum value, and a minimum value of the number of bytes contained in the data packets contained in the flow, totaling 9 feature elements. Obtaining a stream-based dynamic feature setWherein n represents the number of ELF samples, +.>Representing flow-based features extracted from the ith ELF sample.
(2.2) the specific steps of static feature extraction are as follows:
(2.2.1) checking whether the ELF file is shelled using the shell tool DiE, and then unpacking and disassembling the binary using UPX and IDAPro.
(2.2.2) constructing a Function Call Graph (FCG) and a Printable String Information (PSI) graph according to the function caller-callee relationship in the assembly code.
The function call graph is defined as a directed graph g= (V, E), defined by a vertex set v= { V 1 ,v 2 ,...,v m Sum edge set e= { E 12 ,e 13 ,...,e ij Composition, where m represents the number of vertices, e ij Representing a function v i Calling function v j . Vertices in the FCG correspond to unique functions contained in the assembly code of the program, and edges represent caller-callee relationships between the two functions. The present embodiment uses an existing function call graph construction method, based on breadth-first search algorithm, to construct FCG using FIFO function queues, specifically:
a) Extracting a group of boundaries of the identified functions from the assembly code, and storing the functions into a function set named as FunSet;
b) Then extracting all the entry point functions, storing the entry point functions into EntryFunSet, and adding all the entry point functions into a vertex set V;
c) Initializing a function queue by using an entry point function, and setting a queuing flag 'enQFlag' of the function queue as true so as to prevent repeated queuing of the same vertex;
d) When the queue is not empty, dequeuing the queue from the head element of the queue, and adding the function v i Treated as a function caller, after which the function v is traversed i To fetch its called set;
e) When the called party is acquired, traversing the called party set, and checking whether the called party v exists in the graph j If not, the callee v j Added to the set of vertices V and checked if there is a slave caller V already in the graph i To the called party v j Edge e of (2) ij If not, edge e ij Adding into an edge set E;
f) Detecting whether the called party is queued, if not, setting a queuing flag 'enQFlag' of the called party as true, and attaching the queuing flag to the tail of the queue;
g) Repeating steps d), e), f) until the queue is empty.
The function call graph is intended to represent all possible runs of a program. FCGs are therefore often complex, with a large number of nodes and edges, which requires longer computation time and more memory. Although all call relationships of the program are represented in the FCG, some call relationships may never occur during actual running of the program. In order to minimize the computational complexity, the present embodiment selects functions and relationships close to the operation steps of the zombie program from the FCG to construct the PSI graph, specifically:
a) Extracting all Printable String Information (PSI) existing in the binary file through the IDAPro plug-in, and selecting PSI with at least three characters in length for balancing detection precision and calculation complexity;
b) Then a PSI composition set P= { PSI containing important semantic information (which can reveal the intention of an attacker) is selected 1 ,psi 2 ,...,psi k };
c) For vertex v in the function call graph i If v i The expressed function contains at least one important printable string information psi i Vertex v i Adding the vertex set V' of the PSI graph, and continuing to execute the step d), otherwise, skipping the step d);
d) Traversing all representation functions v i Edge e of call relation ij If the function v j Also contains at least one psi i And (2) andthen vertex v j Add the vertex set V' of the PSI map and edge e ij Adding an edge set E' of the PSI graph;
e) Repeating steps c) and d) until all vertices in the function call graph are traversed, and finally outputting a PSI graph G ' = (V ', E ').
(2.2.3) converting the PSI map into numerical vector data by using a map embedding technique named graph2vec to obtain a static feature setWherein->Representing PSI map-based features extracted from the ith ELF sample. The result of this step is a set of one-hot vectors of arbitrary length representing the atlas. In this embodiment, the PSI chart is represented as a numerical vector of length 1024.
(3) Two Stacked Automatic Encoders (SAE) are pre-trained to encode dynamic and static features, respectively, extracting deep complex features.
And dividing the characteristic data set extracted in the step into a training set and a testing set, and then dividing the training set again to obtain a pre-training data set and a fine-tuning data set.
In an unsupervised learning mode, using dynamic features in the pre-training dataset as input, a SAE is pre-trained, the structure of which is shown in FIG. 3 (a). For convenience of explanation, this SAE is referred to as SAE1 in this example. SAE1 is composed of two parts, namely an encoder and a decoder, wherein the encoder is composed of two fully connected layers, a ReLU activation function is adopted between each layer, and the decoder and the encoder are of symmetrical structures. The size of the input layer and the output layer of SAE1 corresponds to the dimension of the dynamic feature, set to 9; the number of neurons of the two concealment layers in its encoder is 8 and 4, respectively, so the output size of the encoder final concealment layer is 4.
In an unsupervised learning manner, using static features in the pre-training dataset as input, another SAE, here called SAE2, is pre-trained, the structure of which is shown in FIG. 3 (b). SAE2 consists of two parts, encoder and decoder, which are also symmetrical structures, wherein the structure of the encoder is specifically:
(1) a convolution layer C1, the convolution kernel size is 3×3, the channel number is 16, and the output is 8×8×16;
(2) the pooling layer P1 performs a maximum pooling operation of 2×2 once and outputs 4×4×16;
(3) a convolution layer C2, the convolution kernel size is 3×3, the number of channels is 32, and the output is 4×4×32;
(4) the pooling layer P2 performs a maximum pooling operation of 2×2 once and outputs 2×2×32;
(5) the full connection layer FC1 consists of 128 neurons, adopts a ReLU activation function and outputs 128-dimensional vectors;
(6) the fully connected layer FC2, consisting of 10 neurons, uses the ReLU activation function, so the encoder final hidden layer output size is 10.
And then, respectively encoding the dynamic characteristics and the static characteristics in the fine adjustment data set by using the pre-trained two SAEs to obtain potential representations of the two modal data.
(4) The dynamic characteristics and the static characteristics are fused based on the multi-mode automatic encoder.
To fuse the static and dynamic features, the present embodiment concatenates the final hidden layer outputs of the pre-trained encoders of the two SAEs and takes them as inputs to the multi-mode auto-encoder. The implementation of a multi-modal auto-encoder is essentially based on a hidden layer of another auto-encoder fusing the potential representations of the two modal data, generating a shared potential representation, as shown in fig. 3 (c). A stacked multi-modal automatic encoder with all pre-training layers and a shared hidden layer is finally constructed as shown in fig. 4.
(5) A multi-Modal Stacked Automatic Encoder (MSAE) is fine tuned.
The goal of an automatic encoder is to minimize reconstruction errors of the input and output, let the shared hidden layer learn a shared potential representation of the bimodal data, define a loss function as:
wherein,and->Respectively the dynamic and static feature vectors of the input,/->And->Is the corresponding reconstructed vector of MSAE output;
and (3) performing fine adjustment on the MSAE in a semi-supervised learning mode, taking a fine adjustment data set with a label as a model input, fixing parameters of a pre-training layer, and performing optimization updating on the parameters of the shared hidden layer only through an Adam optimization function based on a gradient descent algorithm.
(6) And taking the encoder of the MSAE model with complete training as a feature extractor, and classifying the shared hidden layer output of the model as the input of a softmax layer to realize the detection of the bot program.
The model structure for testing zombie programs based on MSAE is shown in FIG. 4. Specifically, the stacked auto encoder is expanded and a softmax output layer is added on top of the shared hidden layer to output the corresponding predictive label of the ith input
Where W represents the weight of the softmax layer, b represents the bias of the softmax layer, T is the number of object tag categories, z (i) Is the ith output of the shared hidden layer.
In order to improve the detection precision of zombie programs, the embodiment provides an improved MSAE, classification errors are added into a loss function in a fine tuning stage, and the classification errors are minimized based on a cross entropy loss function:
wherein y is (i) Is the true label of the i-th input sample,is the corresponding predictive label;
the final MSAE is minimized with respect to the reconstruction error L r And divideClass error L c Is a weighted sum of:
wherein R is a regularization term for preventing model overfitting by L2 regularization of weights of layers in the network, L refers to the number of layers of the network, W l Refers to the weight of the corresponding layer; alpha, beta are weighting factors for reconstruction loss and classification loss, respectively, and lambda is a regularization coefficient.
Further, the weighting factors α and β are adaptively calculated by using a softmax function:
as shown in fig. 2, the embodiment combines static analysis and dynamic analysis methods, respectively extracts static features based on PSI graphs and dynamic features based on streams, automatically fuses the static features and the dynamic features based on MSAE, automatically extracts fusion features through iterative training of a network, and detects zombie programs based on the fusion features. Compared with the prior art, the method can extract the complex relation between the bimodal features and fully exert the advantages of mixed analysis.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A botnet detection method based on a multi-modal stacked automatic encoder, comprising the steps of:
(1) Acquiring an executable file of an application program and storing the executable file in an ELF format;
(2) Respectively carrying out dynamic analysis and static analysis on a data set containing benign programs and bots, and extracting dynamic characteristics based on streams and static characteristics based on printable character string information graphs;
(3) Pre-training two stacking automatic encoders to encode dynamic characteristics and static characteristics respectively and extract deep complex characteristics;
(4) Fusing the dynamic characteristics and the static characteristics based on the multi-mode automatic encoder;
(5) Trimming the multi-mode stacking automatic encoder;
(6) And taking the encoder of the fully trained multi-mode stacking automatic encoder model as a feature extractor, and classifying the shared hidden layer output of the model as the input of a softmax layer to realize the detection of the zombie program.
2. The botnet detection method based on multi-modal stacked automatic encoders of claim 1, wherein in step (2) the data set comprising benign programs and botnets is dynamically analyzed, specifically:
2-1) analyzing network behaviors of the ELF file through a Cuckoo sandbox, and recording network traffic in a pcap format;
2-2) stream-dividing the network traffic recorded in the pcap file according to the five-tuple { source IP address, source port number, destination IP address, destination port number, protocol }, and aggregating the data packets with the same five-tuple into stream data f= { p 1 ,p 2 ,...,p i P, where i Representing data packets having the same five-tuple;
2-3) re-aggregating the stream data, and merging the stream data collected from the same program in running to form a stream record of the corresponding ELF file:
2-4) extracting statistical features based on flow records, packetsThe average value, the maximum value and the minimum value of the total number of the data packets contained in the stream, the average value, the maximum value and the minimum value of the communication duration of the stream, and the average value, the maximum value and the minimum value of the byte number contained in the data packets in the stream are 9 feature dimensions in total; obtaining a stream-based dynamic feature setWherein n represents the number of ELF samples, +.>Representing flow-based features extracted from the ith ELF sample.
3. The botnet detection method based on multi-modal stacked automatic encoders of claim 1, wherein in step (2) the data set comprising benign programs and botnets is statically analyzed, specifically:
3-1) checking whether the ELF file is shelled using a shell checking tool DiE, and then unpacking and disassembling the binary using UPX and IDAPro;
3-2) constructing a function call graph and a printable character string information graph according to the relation between a function caller and a callee in the assembly code;
3-3) converting the printable string information graph into numerical vector data by adopting a graph embedding technology named graph2vec to obtain a static feature setWherein->Representing features extracted from the ith ELF sample based on the printable string information graph;
the function call graph is defined as a directed graph g= (V, E), defined by a vertex set v= { V 1 ,v 2 ,...,v m Sum edge set e= { E 12 ,e 13 ,...,e ij Composition, where m represents the number of verticesQuantity e ij Representing a function v i Calling function v j The method comprises the steps of carrying out a first treatment on the surface of the Vertices in the function call graph correspond to functions contained in assembly code of the program, and edges represent caller-callee relationships between the two functions.
4. The botnet detection method based on the multi-mode stacking automatic encoder as claimed in claim 2, wherein the construction process of the function call graph is specifically as follows:
4-1) extracting a set of identified functions from the assembly code;
4-2) then determining an entry point function;
4-3) building a function call graph using breadth-first search algorithm if a function v is identified i And v j With caller-callee relationship, vertex v will be i And v j Add to the vertex set V and edge e ij Added to edge set E.
5. The botnet detection method based on the multi-modal stacked automatic encoder of claim 2, wherein the printable string information graph is constructed by selecting functions and relations close to the botnet operation steps from the function call graph, in particular:
5-1) extracting all printable string information existing in the binary file through the IDAPro plug-in, and selecting printable string information containing at least three characters in length;
5-2) selecting printable string information composition sets P= { psi containing important semantic information 1 ,psi 2 ,...,psi k };
5-3) for vertex v in the function call graph i If v i The expressed function contains at least one important printable string information psi i Vertex v i Adding the vertex set V' of the PSI graph, and continuing to execute the step 5-4), otherwise, skipping the step 5-4);
5-4) traversing all representation functions v i Edge e of call relation ij If the function v j Also contains at least one psi i And (2) andthen vertex v j Add the vertex set V' of the PSI map and edge e ij Adding an edge set E' of the PSI graph;
5-5) repeating the steps 5-3) and 5-4) until all vertices in the function call graph are traversed, and finally outputting a PSI graph G ' = (V ', E ').
6. The botnet detection method based on multi-modal stacked automatic encoders of claim 1, wherein the pre-training of two stacked automatic encoders in step (3) is specifically:
pre-training a stacked automatic encoder using dynamic features, the encoder consisting of two fully-connected layers and a ReLU activation function, the decoder and encoder being of symmetrical construction;
pre-training another stacked automatic encoder using static features, the encoder consisting of two convolutional layers, a fully-concatenated layer, and a ReLU activation function, the decoder and encoder also being of symmetrical construction;
the pre-trained two stacked auto-encoders are used to encode dynamic and static data, respectively, to obtain potential representations of both modality data.
7. The botnet detection method based on the multi-mode stacking automatic encoder as claimed in claim 1, wherein the dynamic features and the static features are fused based on the multi-mode stacking automatic encoder in the step (4), specifically:
the final hidden layer codes of the two pre-training stacked automatic encoders are connected in series to be used as the input of the multi-mode automatic encoder;
fusing the potential representations of the two modal data based on a hidden layer of the multi-modal automatic encoder to generate a shared potential representation;
a stacked multi-modal automatic encoder with all pre-training layers and a shared hidden layer is ultimately built.
8. The botnet detection method based on the multi-mode stacking automatic encoder as claimed in claim 1, wherein the trimming process of the step (5) specifically comprises:
the goal of an automatic encoder is to minimize reconstruction errors of the input and output, let the shared hidden layer learn a shared potential representation of the bimodal data, define a loss function as:
wherein,and->Respectively the dynamic and static feature vectors of the input,/->And->Is the corresponding reconstructed vector of MSAE output;
and fixing parameters of the pre-training layer, training by adopting a gradient descent algorithm, and updating only the weight and the parameters of the shared hidden layer.
9. The botnet detection method based on the multi-modal stacked automatic encoder as claimed in claim 1, wherein the step (6) specifically comprises:
unfolding the stacked automatic encoder, adding a softmax output layer on top of the shared hidden layer, and outputting the corresponding predictive label of the ith input
Where W represents the weight of the softmax layer, b represents the bias of the softmax layer, T is the number of object tag categories, z (i) Is the output of the shared hidden layer.
10. The botnet detection method based on multi-modal stacked automatic encoders of claim 9, wherein, to improve detection accuracy of botnets, classification errors are also added to the loss function of the fine tuning stage, and classification errors are minimized based on cross entropy loss function:
wherein y is (i) Is the true label of the i-th input sample,is the corresponding predictive label;
the minimization goal of the final multi-mode stacked auto-encoder is to reconstruct the error L r And classification error L c Is a weighted sum of:
L=αL r +βL c +λR (5)
wherein R is a regularization term, which is realized by carrying out L2 regularization on the weights of all layers in the network; alpha, beta and lambda are weighting factors;
the weighting factors α and β for the reconstruction error and classification error are adaptively calculated using the softmax function:
CN202311596885.1A 2023-11-28 2023-11-28 Botnet detection method based on multi-mode stacking automatic encoder Pending CN117640190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311596885.1A CN117640190A (en) 2023-11-28 2023-11-28 Botnet detection method based on multi-mode stacking automatic encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311596885.1A CN117640190A (en) 2023-11-28 2023-11-28 Botnet detection method based on multi-mode stacking automatic encoder

Publications (1)

Publication Number Publication Date
CN117640190A true CN117640190A (en) 2024-03-01

Family

ID=90022771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311596885.1A Pending CN117640190A (en) 2023-11-28 2023-11-28 Botnet detection method based on multi-mode stacking automatic encoder

Country Status (1)

Country Link
CN (1) CN117640190A (en)

Similar Documents

Publication Publication Date Title
CN111783100B (en) Source code vulnerability detection method for code graph representation learning based on graph convolution network
CN109471938B (en) Text classification method and terminal
CN109816032B (en) Unbiased mapping zero sample classification method and device based on generative countermeasure network
CN111382555B (en) Data processing method, medium, device and computing equipment
CN109902301B (en) Deep neural network-based relationship reasoning method, device and equipment
CN113596007B (en) Vulnerability attack detection method and device based on deep learning
CN112148877A (en) Corpus text processing method and device and electronic equipment
CN109033833B (en) Malicious code classification method based on multiple features and feature selection
CN113221112B (en) Malicious behavior identification method, system and medium based on weak correlation integration strategy
CN113360912A (en) Malicious software detection method, device, equipment and storage medium
CN112085166B (en) Convolutional neural network model acceleration training method and device, electronic equipment and storage medium
CN115146279A (en) Program vulnerability detection method, terminal device and storage medium
WO2023029397A1 (en) Training data acquisition method, abnormal behavior recognition network training method and apparatus, computer device, storage medium, computer program and computer program product
CN114091551A (en) Pornographic image identification method and device, electronic equipment and storage medium
CN113591892A (en) Training data processing method and device
CN117640190A (en) Botnet detection method based on multi-mode stacking automatic encoder
CN115035463B (en) Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium
CN114793170B (en) DNS tunnel detection method, system, equipment and terminal based on open set identification
CN117082118A (en) Network connection method based on data derivation and port prediction
CN116226852A (en) Mobile platform malicious software detection method and device based on multi-mode information fusion
CN115080974B (en) Malicious PE file detection method, device, equipment and medium
CN111556017A (en) Network intrusion detection method based on self-coding machine and electronic device
KR20210038027A (en) Method for Training to Compress Neural Network and Method for Using Compressed Neural Network
CN113806338B (en) Data discrimination method and system based on data sample imaging
CN113595987B (en) Communication abnormal discovery method and device based on baseline behavior characterization, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination