CN115086055B - Detection device and method for encrypting malicious traffic of android mobile device - Google Patents

Detection device and method for encrypting malicious traffic of android mobile device Download PDF

Info

Publication number
CN115086055B
CN115086055B CN202210732607.3A CN202210732607A CN115086055B CN 115086055 B CN115086055 B CN 115086055B CN 202210732607 A CN202210732607 A CN 202210732607A CN 115086055 B CN115086055 B CN 115086055B
Authority
CN
China
Prior art keywords
session
node
traffic
encryption
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210732607.3A
Other languages
Chinese (zh)
Other versions
CN115086055A (en
Inventor
牛伟纳
张小松
周杰
胡佳
任熙璇
周孝笑
陈瑞东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210732607.3A priority Critical patent/CN115086055B/en
Publication of CN115086055A publication Critical patent/CN115086055A/en
Application granted granted Critical
Publication of CN115086055B publication Critical patent/CN115086055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management

Abstract

The invention discloses a detection device and a detection method for encrypting malicious traffic aiming at android mobile equipment, which belong to the technical field of traffic detection and acquire a session to be detected; filtering based on effectiveness and repeatability to obtain a session A, and filtering based on the characteristics of the encryption component to obtain a session B; extracting a packet length sequence and an arrival time interval sequence of the session A, and performing direction processing to obtain a time domain feature vector, and adding a maximum transmission unit to the downlink packet length; then transmitting the packet length and arrival time interval sequence which are not subjected to direction processing into a frequency domain feature processing module, abstracting the sequence into a digital signal to be converted into a frequency domain from a time domain, and then completing conversion from complex number to real number and linear conversion to reduce dimensionality to obtain a frequency domain feature vector; quantifying the characteristics of the flow to be detected by using the session B by using a rule base to obtain a quantified characteristic vector; and finally, sending all the feature vectors into a classifier module for detecting and classifying the encrypted malicious traffic.

Description

Detection device and method for encrypting malicious traffic of android mobile device
Technical Field
The invention relates to the technical field of traffic detection, and provides a detection device and method for encrypting malicious traffic aiming at android mobile equipment.
Background
With the rapid development of network technology, human society has fully entered an intelligent era, and particularly, the needs of recent two-year health codes, online conferences and the like make smart phones more popular. The Android system is deeply favored by developers at home and abroad by virtue of the characteristics of open source and freedom, but becomes a primary attack target of a plurality of lawless persons, and malicious attack layers such as APT attack, privacy stealing, remote control, resource occupation and the like are endless, so that the security of users of the smart phone is threatened at any time. Meanwhile, mobile malicious software detection technology is also rapidly developed, namely decompilation, hash signature analysis, static analysis, dynamic analysis and honeypot induction technology is detected by a permission-based method. The network traffic anomaly detection is used as an effective dynamic detection means for malicious codes, can discover unknown attacks to a certain extent, and is receiving more and more attention. The occurrence of the encryption transmission HTTPS improves the safety of the whole network space compared with the prior art, but certainly the difficulty of network traffic abnormality detection is increased by one step, the traditional malicious traffic detection based on load content analysis is basically invalid, and more malicious software bypasses the detection of safety manufacturers by using encryption network protocol communication, so that the research of the encryption malicious traffic detection method has important research significance.
Malicious network traffic detection has become an effective means for discovering network attacks and maintaining network security, and plays an important role in Android malicious code detection. In document Dissecting android malware: characterization and evolution it is noted that 93.0% of malware requires network communication to complete an attack after infecting a smartphone, sending an attacker's command from a C & C server into the infecting machine. After thousands of malicious samples are collected in the document Analysis of bayesian classification-based approaches for android malware detection, the authority requests of the malicious samples are analyzed, and more than 93% of the samples are found to need network authority requests, which also proves that network communication is a very important link when malicious software attacks, a certain trace is necessarily left, which proves that detecting the malicious software through network traffic anomaly analysis is an effective method, and more researchers are focusing on the research direction.
The malicious traffic detection method based on rule matching is mainly to manually establish a rule base through experience of researchers, and is usually based on text information such as ports, URLs and the like or based on deep message detection. After researching network communication of malicious software, iland et al in document Detecting android malware on network level construct an abnormal traffic model according to transmission content in the connection process of a C & C server, thereby detecting malicious control and privacy disclosure events. Whereas Jyoti et al in document Android malware detection by network traffic analysis analyze DNS queries and information in HTTP traffic to detect android malware with suspicion of leakage sensitive information, the method builds a rule base from four aspects of APK score, URL confidence, data content of traffic load, and communication protocol, and detects according to certain matching rules, such as the occurrence of Personal Information Identification (PII), then the traffic is most likely to be determined as information leakage. The method is generally simpler, has a certain detection effect on the type of attack which has occurred, but the quality of the detection effect is very dependent on the effect of a static feature library of a defender, is easily bypassed by an attacker, and cannot detect unknown malicious traffic, so the method is generally used for rapid traffic screening with low accuracy requirements.
The malicious flow detection method based on machine learning requires a researcher to formulate characteristics to be extracted when data processing is carried out, the work is usually based on experience of the researcher, and the flow detection aspect is mainly based on statistical information and behavior information. Starting from the session-level network traffic in the document Enhanced android malware detection and family classification, using conversation-level network traffic features, features of multiple reserved directions such as session quintuples, data packet lengths, data packet time intervals and the like are extracted, then three machine learning algorithms including random forest, recursive feature elimination and light GBM are used for comparison, and feature intersections with more than a certain effective value are selected as features to perform retraining. In the document Clustering android malware families by HTTP traffic, the flow characteristics of the android botnet are extracted through two clustering algorithms with different granularity, seven statistical characteristics are extracted from the flow of the data packet level to perform coarse-granularity clustering, then similar HTTP requests are calculated through single-chain hierarchical clustering, and then the characteristics of the same sample cluster are extracted to be used as the signatures of the clusters. Document From { Thow-Away } traffic to bots: a novel botnet detection technology is proposed Based on a KNN clustering algorithm in Detecting the rise of { DGA-Based } software, character similarity between domain names is tried to be found out by researching a domain generation algorithm used by the botnet, KNN clustering is carried out on machine groups inquiring the domains through the similarity, then the generated clusters are distributed to a model through algorithm matching, and if the matching is not successful, a novel model is generated.
The malicious flow detection method based on deep learning does not require researchers to have rich priori knowledge, but automatically learns features from the most original data, and is a machine learning technology based on characterization learning ideas. Document An effective android ransomware detection through multifactor feature filtration and recurrent neural network provides an effective malicious software detection model based on deep learning by researching an LSTM neural network, eight different feature selection algorithms such as chi-square and information gain are used in the article, 19 important features such as data packet length are selected through a voting mechanism, and the Lesu software is detected in an android environment. Literature "Datanet: deep learning based encrypted network traffic classification in sdn home gateway A method for classifying encrypted traffic based on MLP (Multi-layer Peer) is named DataNet, and has the advantages that the number of hidden layer parameters is too large to process high-dimensional input, CNN convolution is needed for training, and finally traffic is classified by matching with softmax, so that a good effect is obtained. Document Detection of encrypted malicious network traffic using machine learning proposes a method that does not rely on load analysis to detect malicious traffic, cooperates with CNN to perform feature extraction, and combines the use of SVM with radial basis function kernel to record the size and direction of TLS, and finally performs detection of encrypted traffic.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a method for detecting encrypted malicious traffic of android mobile equipment, which can abstract network traffic into digital signals, extract time sequence features from time domains, combine the digital signals to process noise, perform frequency domain conversion, convert complex numbers into real numbers, realize double-domain feature extraction of the digital signals, and provide a method for quantifying the malicious traffic features by utilizing association rules, and pre-classify rule libraries by researching the influence of android development on the traffic features. And finally, combining the characteristics of three dimensions by utilizing the LSTM to detect the flow.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the utility model provides a detection device to android mobile device encryption malicious flow, including following part:
flow preprocessing module: receiving the encrypted flow data, dividing into session granularity, filtering useless protocol to obtain the session to be detected,
on the one hand, processing the conversation to be detected, establishing a benign traffic feature library, filtering useless network conversations based on validity and repeatability to obtain a conversation A,
On the other hand, the characteristics of the encryption component of each session to be detected are obtained, and useless sessions are filtered based on the characteristics of the encryption component to obtain a session B, wherein the characteristics of the encryption component comprise an encryption algorithm list, a protocol version and a certificate authority;
the time domain feature processing module is used for: firstly, extracting a packet length sequence and an arrival time interval sequence of a data packet in a session A, wherein the sequence characteristic does not have a direction; then introducing a maximum transmission unit (Maximum Transmission Unit, MTU) to distinguish uplink and downlink, and extracting packet length sequences and time sequence feature vectors of uplink, downlink and bidirectional data packets on a session A, and jointly calling the time domain feature vectors;
the frequency domain feature processing module: processing a packet length sequence and an arrival time interval sequence of a session A data packet without a direction, abstracting the packet length sequence and the arrival time interval sequence into digital signals, converting a time domain into a frequency domain by using discrete Fourier transform, converting complex numbers into real numbers at the same time, and reducing characteristic dimensions by using linear transform to obtain frequency domain characteristic vectors;
the associated feature processing module: the method comprises the steps of processing a field of a key negotiation stage in a session B, wherein the field comprises a supported encryption protocol, a certificate and an encryption suite, extracting frequent items by establishing an FP-tree, establishing a rule base of benign and malicious traffic by taking families as rule postitems, dividing each rule base into 5 types according to the characteristics of android system network interaction, and taking the matching quantity of each rule as a feature vector of the part;
And a classifier module: and receiving the feature vectors of the time domain feature processing module, the frequency domain feature processing module and the association rule module, and performing multi-classification on the feature vectors by utilizing the LSTM neural network.
The detected encrypted flow and the test case acquire a session to be detected through a cutting and protocol filtering process; on one hand, processing the conversation to be detected, establishing a benign flow characteristic library, filtering useless network conversations based on effectiveness and repeatability to obtain a conversation A, on the other hand, acquiring the characteristics of an encryption component of each conversation to be detected, and filtering some useless conversations based on the characteristics of the encryption component to obtain a conversation B; the time sequence feature processing module extracts a packet length sequence and an arrival time interval sequence of the network session A, performs direction processing to obtain a time domain feature vector, and adds a maximum transmission unit (Maximum Transmission Unit, MTU) to the downlink packet length; then transmitting the packet length and arrival time interval sequence which are not subjected to direction processing into a frequency domain feature processing module, abstracting the sequence into a digital signal to be converted into a frequency domain from a time domain, and then completing conversion from complex number to real number and linear conversion to reduce dimensionality to obtain a frequency domain feature vector; the cleaned session B is independently transmitted into an associated rule processing module, a special associated rule base is constructed, and the characteristics of the flow to be detected are quantized by utilizing the rule base, so that quantized characteristic vectors are obtained; and finally, sending all the feature vectors into a classifier module for detecting and classifying the encrypted malicious traffic.
In the above technical solution, the flow preprocessing module specifically includes the following steps:
based on the session cutting of the five-tuple, selecting split Cap to cut the encrypted flow data; in the communication process, under the condition that connection is disconnected possibly due to network fluctuation, server abnormality and the like, judging the effective session according to the flag bit, for example, judging the end of the session when messages of FIN and RST marks are detected in the TCP stream, and taking a SYN mark as the beginning of the session;
useless protocol filtering, sessions for which no complete TCP three-way handshake establishes a connection, and sessions for which fewer than three packets carry payloads are filtered out;
meanwhile, some traffic is not in the detection range of the encrypted traffic, such as DNS query and HTTP, and is cleaned,
meanwhile, a benign flow characteristic library is established through DNS domain name inquiry of benign flow and access conditions of IP addresses, initial encrypted flow is matched with the content in the benign flow characteristic library, and if the matching is successful and the number is greater than two, the domain name or the IP is considered to be communication flow using a benign third party development library;
on the other hand, the characteristics of the encryption component of each session are obtained, the characteristics of the encryption component comprise an encryption algorithm list, protocol versions and certificate issuing institutions, the encryption algorithm list, the protocol versions and the certificate issuing institutions are used for encoding the encryption component characteristics related to each session by using a one-hot method, 1 indicates that no encryption component is related to each session, the element names are field names and extraction values, a transaction data set is finally formed, a minimum support is set, the value of the minimum support is the ratio of the number of samples containing a certain encryption component characteristic item set (namely, the number of sessions containing a certain encryption component characteristic such as the number of certificate issuing institutions) to the number of the whole transaction data set, and finally the transaction data set smaller than the minimum support is filtered out to obtain the sample transaction data set.
In the above technical solution, the time domain feature processing module specifically includes the following steps:
after the filtered session A is obtained, extracting the length and the arrival time interval of each data packet as a series of digital sequences, and reserving the information of the first N data packets, wherein zero padding is carried out on sessions of less than N data packets to obtain a packet length sequence and an arrival time interval sequence of the data packets without directions;
the session is divided into uplink, downlink and bidirectional, wherein the uplink is from the client to the server, and the downlink is vice versa;
the maximum transmission unit (Maximum Transmission Unit, MTU) is used to distinguish the direction of data communication, for the uplink data packet, the packet length is the normal packet length, and the downlink data packet length is equal to the data packet length plus the MTU value, for each data packet of session A, the packet length of the data packet length, the time difference between the data packet and the last data packet and the time difference between the data packet and the last data packet in the same direction are calculated, so that a directional packet length sequence and a time sequence vector are obtained, and the time domain feature vector is called together.
In the above technical solution, the specific implementation steps of the frequency domain feature processing module are as follows:
the method comprises the steps of obtaining a packet length sequence and an arrival time interval sequence of a session A which are not subjected to direction processing, abstracting the sequence into a digital signal, performing discrete Fourier transform by using a fast Fourier transform (Fast Fourier Transform, FFT) algorithm to realize frequency domain conversion, converting complex numbers into real numbers by calculating a frequency module to obtain frequency domain characteristics, and finally reducing dimensionality by logarithmic transformation to obtain frequency domain characteristic vectors.
In the above technical solution, the specific implementation steps of the associated feature processing module are as follows:
the sample transaction data set is obtained, an FP-tree is constructed first, a tree class is defined to be used as a node of the FP-tree, and the node has the following variable attributes: the bulletName stores the element Name; the drill Count stores the node frequency, and when each path passes through the node, the node frequency is increased by one; the pellet nodeLink is used for linking child nodes and helping to construct a head pointer linked list; the pellet parent is used for linking the current parent node and helping the upward access during frequent item mining; the pellet child is used for storing child nodes, and the variable stores dictionary type data;
creating an empty treeNode as a root node, traversing elements of each sample transaction data set, adding leaf nodes in a recursive manner, and updating in two cases:
1) If the current node is a child node of the root node, the current path is added before, and only the node frequency count of the child node needs to be updated, wherein the root node of the part does not refer to the root node which is created as empty at the beginning, but the root node of the current path in the traversal process is updated to be the child node of the last time after traversing once;
2) If the current node is not a child node of the root node, the current node is a brand-new path, a new branch needs to be created, the node is added into the child variable of the root node, the node frequency is set to be one, meanwhile, a head pointer linked list is updated, and the node is added into a chain tail;
and then the frequent item sets are mined, specifically: for each element, searching a conditional mode base of the element on the FP-tree, searching a set of prefix paths ending with the element through a head pointer linked list and parent variables of the nodes, and then comparing the set minimum support with the elements with the support value greater than or equal to the support value, wherein the elements reserved at the moment are frequent items for searching the elements;
and finally, generating and quantifying an association rule, taking frequent items as the front items of the association rule, taking the malicious traffic category given by the test data set as the rear item of the association rule, taking the minimum confidence as the ratio of the front item to the rear item, taking all the frequent items with the minimum confidence as the association rule of malicious traffic, constructing a malicious traffic encryption suite rule base, classifying each malicious traffic encryption suite rule base into 5 types according to the characteristics of android system network interaction, and taking the matching quantity of each rule as the feature vector of the part.
Compared with the prior art, the invention has the beneficial effects that:
1. aiming at the problem that the current malicious flow sequence features are difficult to extract, the network flow is abstracted into a digital signal, the time sequence features are extracted from the time domain, the frequency domain conversion is carried out by combining the method of processing noise by the digital signal, and complex numbers are converted into real numbers, so that the double-domain feature extraction of the digital signal is realized;
2. the invention combines the characteristic of low false alarm rate of traditional feature-based detection, provides a method for quantifying malicious flow features by using association rules, and pre-classifies a rule base by researching the influence of android development on the flow features;
3. the invention combines the three dimension characteristics to detect the flow by utilizing the LSTM, adds the attention mechanism based on the time step to add weight to the characteristics on the time step, adapts the transformation of the encrypted flow, and finally classifies the malware while realizing the detection;
4. the detection method and the designed system are deployed in a network bypass, and the system does not influence the normal use of a user.
Drawings
FIG. 1 is a general architecture diagram of the present invention;
FIG. 2 is a network deployment diagram of the detection system of the present invention;
FIG. 3 is a flow chart of a data preprocessing module;
FIG. 4 is an exemplary diagram of a temporal feature extraction module;
FIG. 5 is an exemplary diagram of an LSTM model implementation;
fig. 6 is a full session algorithm diagram.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail. While the invention will be described and illustrated in conjunction with certain specific embodiments, it will be understood that it is not intended to limit the invention to these embodiments alone. On the contrary, the invention is intended to cover modifications and equivalent arrangements included within the scope of the appended claims.
In addition, numerous specific details are set forth in the following description in order to provide a better illustration of the invention. It will be understood by those skilled in the art that the present invention may be practiced without these specific details.
1. Overall system design
The system combines the time domain characteristics and the frequency domain characteristics, detects the Android malicious encrypted traffic on the basis of the double domains, and designs a detection system shown in figure 1 for organically combining the double domain characteristics. The system mainly aims at detecting malicious encrypted traffic, cutting network traffic into session-level particles, obtaining sequence characteristics and encryption suite characteristics after cleaning, filtering, integrating and other operations, abstracting the sequence characteristics into digital signals, extracting frequency domain characteristics of the digital signals, abstracting the encryption suite into characteristic vectors, and detecting under the combined action of three parts
2. System deployment and experimental environment
The detection object of the system is encrypted network traffic, network communication is monitored and detected by bypass, network communication data mirror images are copied into a detection machine, daily use of a user is not affected, and the system belongs to a parallel detection scheme. As shown in FIG. 2, the system can perform bypass monitoring on a certain mobile phone, and also can perform bypass monitoring on a network interface of a certain router, thereby having stronger expansibility. The system was built on the server of the Ubuntu operating system, detailed hardware information is shown in table 1, the python version used in this experiment is 3.8.4, ubuntu version 18.0.4, and the tools required are also wireshark 3.4.4,tshark 3.0.3,SplitCap,tcpdump 4.5.1.
3. Design and implementation of the system
The detection model is divided into four modules, namely a flow data processing module, a frequency domain feature extraction module, a correlation analysis module and a classification detection module: the main function of the data processing module is to process the network traffic into the form of streams and sessions and extract their sequence features and encryption features; the frequency domain feature extraction module mainly converts the time sequence into frequency domain features as input of a classifier; the association analysis module mainly utilizes data mining to perform feature coding on the encryption suite; and finally, the classification detection module combines the time domain features and the frequency domain features to train, and adds a proper attention mechanism according to the characteristics of the encrypted flow. Each of which will be described in detail below.
Data preprocessing module
The data preprocessing module is almost all the parts involved in flow analysis, and generally comprises operations of flow cleaning, format conversion and the like. The purpose of these works is to make the data purer, clear the interference items in part of the data in advance, normalize the data, and facilitate the subsequent feature extraction, and the specific process flow is shown in fig. 3.
a. Quintuple-based session cutting: the subject of the study herein is a session, and thus network traffic needs to be cut. A session is a directional flow, both flows having the same five-tuple but different directions. Compared with the stream, the session can reflect the interaction information of the client and the server, and meanwhile, the respective working information of the client and the server is reserved. The split Cap is selected to cut network traffic, an original Pcap file is input, all the sessions contained in the file are output, and each session is stored in a Pcap format.
b. Active session determination: HTTPS encrypted transmission is established on top of tcp protocol, which is a connection-oriented protocol, and disconnection may occur during communication due to network fluctuations, server anomalies, etc. Meanwhile, when the session is cut, according to the five-tuple only, there may be a situation that after the user disconnects, the user reconnects for a period of time to start communication, and this situation is regarded as two sessions, so that the connection interruption determination needs to be performed according to the flag bit. When the messages of FIN and RST marks are detected in the TCP stream, the end of the session is judged, and the SYN mark is taken as the beginning of the session, and the algorithm design is as shown in figure 6.
c. And (3) flow filtration: this step is to determine if the session carries valid information, and there is typically very little and incomplete traffic in the network, such as a session without a TCP set-up connection phase. Unlike unencrypted traffic detection, these sessions do not carry much useful information, so at this stage, sessions where no complete TCP three-way handshake establishes a connection and sessions with fewer than three datagrams carrying payload are filtered out. Meanwhile, some traffic is not in the detection range of the encrypted traffic, such as DNS query, HTTP and the like, and can be cleaned. The step also establishes a repeated library through DNS domain name query of benign traffic and access conditions of IP addresses, matches initial malicious traffic with content in the repeated library, considers the domain name or IP as communication traffic using a benign third party development library if the matching is successful and the number is more than two, needs to analyze DNS data packets after the domain name matching is successful, finds the correct IP address, and then judges whether the IP address is TLS encrypted communication or not, instead of just filtering out the DNS domain name query data packets.
On the other hand, the characteristics of the encryption component of each session are obtained, the characteristics of the encryption component comprise an encryption algorithm list, protocol versions and a certificate issuing mechanism, the encryption algorithm list, the protocol versions and the certificate issuing mechanism are encoded by using a one-hot method, the characteristics of the encryption component related to each session are represented by 1, 0 is not represented, the element names are field names and extraction values, a transaction data set is finally formed, a minimum support degree support is set, the value of the minimum support degree support is the ratio of the number of samples containing a certain encryption component characteristic item set to the number of the whole transaction data set, and finally the transaction data set smaller than the minimum support degree support is filtered out to obtain the sample transaction data set.
Time domain feature extraction module
Firstly, extracting session length and time stamp of first N datagrams of session, calculating session length, time difference with last datagram and time difference with last datagram to obtain time domain feature. The system hopes to keep the directionality of the time sequence and the data exchange behavior characteristics of the client and the server as much as possible, so the system divides the characteristic vector of the whole session into four parts, namely a bidirectional message length sequence, a unidirectional message length sequence, a bidirectional time difference sequence and a unidirectional time difference sequence, wherein the two unidirectional sequences define that the first half part is a forward sequence and the second half part is a reverse sequence, as shown in fig. 4, and the MTU in the system takes 1514.
Frequency domain feature extraction module
The system sets the sampling data quantity as sampling_N, because the network flow is abstracted into a digital signal instead of truly sampling data according to a certain time interval, the value of the sampling_N affects the frequency domain feature extraction effect, the time sequence feature extracted by the previous module is set as sampling_N, if the sampling_N is less than the sampling_N, zero padding operation is carried out later, if the sampling_N is more than the sampling_N, a part of the features are abandoned, and finally sampling_N sampling data are obtained.
The reprocessed time series data is discrete fourier transformed into frequency domain features, which are herein implemented using a fast fourier (Fast Fourier Transform, FFT) algorithm. The FFT is a discrete fourier transform using a complex form of discrete fourier transform to compute a real form of discrete fourier transform, reducing the time complexity from O (n 2) to O (n×log (n)). The value obtained after FFT is complex, where the modulus of complex is amplitude spectrum amplitude, and the calculation formula is shown as 1.
And extracting an amplitude spectrum as the characteristic of the network flow frequency domain, carrying out logarithmic transformation on the amplitude spectrum at the same time, preventing floating point overflow, and finally obtaining the frequency domain characteristic.
Correlation analysis module
The sample transaction data set is obtained, an FP-tree is constructed first, a tree class is defined to be used as a node of the FP-tree, and the node has the following variable attributes: the bulletName stores the element Name; the drill Count stores the node frequency, and when each path passes through the node, the node frequency is increased by one; the pellet nodeLink is used for linking child nodes and helping to construct a head pointer linked list; the pellet parent is used for linking the current parent node and helping the upward access during frequent item mining; the pellet child is used for storing child nodes, and the variable stores dictionary type data;
Creating an empty treeNode as a root node, traversing elements of each sample transaction data set, adding leaf nodes in a recursive manner, and updating in two cases:
1) If the current node is a child node of the root node, the current path is added before, and only the node frequency count of the child node needs to be updated, wherein the root node of the part does not refer to the root node which is created as empty at the beginning, but the root node of the current path in the traversal process is updated to be the child node of the last time after traversing once;
2) If the current node is not a child node of the root node, the current node is a brand-new path, a new branch needs to be created, the node is added into the child variable of the root node, the node frequency is set to be one, meanwhile, a head pointer linked list is updated, and the node is added into a chain tail;
and then the frequent item sets are mined, specifically: for each element, searching a conditional mode base of the element on the FP-tree, searching a set of prefix paths ending with the element through a head pointer linked list and parent variables of the nodes, and then comparing the set minimum support with the elements with the support value greater than or equal to the support value, wherein the elements reserved at the moment are frequent items for searching the elements;
And finally, generating and quantifying an association rule, taking frequent items as the front items of the association rule, taking the malicious traffic category given by the test data set as the rear item of the association rule, taking the minimum confidence as the ratio of the front item to the rear item, taking all the frequent items with the minimum confidence as the association rule of malicious traffic, constructing a malicious traffic encryption suite rule base, classifying each malicious traffic encryption suite rule base into 5 types according to the characteristics of android system network interaction, and taking the matching quantity of each rule as the characteristic vector and the rule quantification characteristic vector of the part.
LSTM model implementation
In the module, the time sequence features, the frequency domain features and the regular quantization features extracted by the previous three modules are combined to obtain feature vectors which are used as input sequences of the LSTM neural network. Because the data of different dimensions have different value ranges, the feature vector is firstly subjected to standard processing, and the feature of each dimension is processed by using a formula 2.
Wherein x represents a single numerical value, std (x) represents the variance of corresponding dimension data, and mu represents the mean value of the corresponding dimension data, so that prediction errors caused by overlarge data dimension differences of different dimensions can be uniformly reduced. The normalized data are sent to the LSTM neural network for training according to the timetep step length, the attention mechanism is added to learn the weight of each input, and finally the trained feature vectors are classified finally through the softmax layer, as shown in figure 5.

Claims (6)

1. Detection device to android mobile device encryption malicious traffic, characterized by comprising the following parts:
flow preprocessing module: receiving the encrypted flow data, dividing into session granularity, filtering useless protocol to obtain the session to be detected,
on the one hand, processing the conversation to be detected, establishing a benign traffic feature library, filtering useless network conversations based on validity and repeatability to obtain a conversation A,
on the other hand, the characteristics of the encryption component of each session to be detected are obtained, and useless sessions are filtered based on the characteristics of the encryption component to obtain a session B, wherein the characteristics of the encryption component comprise an encryption algorithm list, a protocol version and a certificate authority;
the time domain feature processing module is used for: firstly, extracting a packet length sequence and an arrival time interval sequence of a data packet in a session A, wherein the sequence characteristic does not have a direction; then introducing a maximum transmission unit to distinguish uplink and downlink, and extracting packet length sequences and time sequence feature vectors of uplink, downlink and bidirectional medium data packets on a session A, and jointly calling the time domain feature vectors;
the frequency domain feature processing module: processing a packet length sequence and an arrival time interval sequence of a session A data packet without a direction, abstracting the packet length sequence and the arrival time interval sequence into digital signals, converting a time domain into a frequency domain by using discrete Fourier transform, converting complex numbers into real numbers at the same time, and reducing characteristic dimensions by using linear transform to obtain frequency domain characteristic vectors;
The associated feature processing module: the method comprises the steps of processing a field of a key negotiation stage in a session B, wherein the field comprises a supported encryption protocol, a certificate and an encryption suite, extracting frequent items by establishing an FP-tree, establishing a rule base of benign and malicious traffic by taking families as rule postitems, dividing each rule base into 5 types according to the characteristics of android system network interaction, and taking the matching quantity of each rule as a feature vector of the part;
and a classifier module: receiving the feature vectors of the time domain feature processing module, the frequency domain feature processing module and the association rule module, and performing multi-classification on the feature vectors by utilizing the LSTM neural network;
the flow preprocessing module comprises the following specific steps:
based on five-tuple session cutting, selecting split cap cutting encryption flow data, judging an effective session according to a flag bit, judging the end of the session when messages of FIN and RST marks are detected in a TCP stream, and taking a SYN mark as the beginning of the session;
useless protocol filtering, sessions for which no complete TCP three-way handshake establishes a connection, and sessions for which fewer than three packets carry payloads are filtered out;
meanwhile, some traffic is not cleaned up by DNS query and HTTP in the current encryption traffic detection range, meanwhile, a benign traffic feature library is established by DNS domain name query and access condition of IP addresses of benign traffic, initial encryption traffic is matched with the content in the benign traffic feature library, and if the matching is successful and the number is more than two, the domain name or the IP is considered to be communication traffic using a benign third party development library;
On the other hand, the characteristics of the encryption component of each session are obtained, the characteristics of the encryption component comprise an encryption algorithm list, protocol versions and a certificate issuing mechanism, the encryption algorithm list, the protocol versions and the certificate issuing mechanism are encoded by using a one-hot method, 1 indicates that the encryption component related to each session is not available, the element names are field names and extraction values, a transaction data set is finally formed, a minimum support degree support is set, the value of the minimum support degree support is the ratio of the number of samples containing a certain encryption component characteristic item set to the number of the whole transaction data set, and finally the transaction data set smaller than the minimum support degree support is filtered out to obtain a sample transaction data set;
the associated feature processing module comprises the following specific implementation steps:
the sample transaction data set is obtained, an FP-tree is constructed first, a tree class is defined to be used as a node of the FP-tree, and the node has the following variable attributes: the bulletName stores the element Name; the drill Count stores the node frequency, and when each path passes through the node, the node frequency is increased by one; the pellet nodeLink is used for linking child nodes and helping to construct a head pointer linked list; the pellet parent is used for linking the current parent node and helping the upward access during frequent item mining; the pellet child is used for storing child nodes, and the variable stores dictionary type data;
Creating an empty treeNode as a root node, traversing elements of each sample transaction data set, adding leaf nodes in a recursive manner, and updating in two cases:
1) If the current node is a child node of the root node, the current path is added before, and only the node frequency count of the child node needs to be updated, wherein the root node of the part does not refer to the root which is created as empty at the beginning, but the root node of the current path in the traversal process is updated to be the child node of the last time after traversing once;
2) If the current node is not a child node of the root node, the current node is a brand-new path, a new branch needs to be created, the node is added into the child variable of the root node, the node frequency is set to be one, meanwhile, a head pointer linked list is updated, and the node is added into a chain tail;
and then excavating the frequent item sets, specifically: for each element, searching a conditional mode base of the element on the FP-tree, searching a set of prefix paths ending with the element through a head pointer linked list and parent variables of the nodes, and then comparing the set minimum support with the elements with the support value greater than or equal to the support value, wherein the elements reserved at the moment are frequent items for searching the elements;
And finally, generating and quantifying an association rule, taking frequent items as the front items of the association rule, taking the malicious traffic category given by the test data set as the rear item of the association rule, taking the minimum confidence as the ratio of the front item to the rear item, taking all the frequent items with the minimum confidence as the association rule of malicious traffic, constructing a malicious traffic encryption suite rule base, classifying each malicious traffic encryption suite rule base into 5 types according to the characteristics of android system network interaction, and taking the matching quantity of each rule as the feature vector of the part.
2. The device for detecting the encrypted malicious traffic of the android mobile equipment according to claim 1, wherein the time domain feature processing module specifically comprises the following steps:
after the filtered session A is obtained, extracting the length and the arrival time interval of each data packet as a series of digital sequences, and reserving the information of the first N data packets, wherein zero padding is carried out on sessions of less than N data packets to obtain a packet length sequence and an arrival time interval sequence of the data packets without directions;
the session is divided into uplink, downlink and bidirectional, wherein the uplink is from the client to the server, and the downlink is vice versa;
The maximum transmission unit is used for distinguishing the direction of data communication, for the uplink data packet, the packet length is the normal packet length, and the downlink data packet length is equal to the data packet length plus the MTU value, and for each data packet of the session A, the packet length of the data packet, the time difference between the data packet and the last data packet and the time difference between the data packet and the data packet in the same direction are calculated, so that a directional packet length sequence and a time sequence vector are obtained, and the time domain feature vector is called.
3. The device for detecting the encrypted malicious traffic of the android mobile equipment according to claim 1, wherein the frequency domain feature processing module specifically comprises the following implementation steps:
the method comprises the steps of obtaining a packet length sequence and an arrival time interval sequence of a session A which are not subjected to direction processing, abstracting the sequence into a digital signal, then performing discrete Fourier transform by using a fast Fourier algorithm to realize frequency domain conversion, converting complex numbers into real numbers by calculating a modulus of frequency to obtain frequency domain characteristics, and finally reducing dimensionality by logarithmic transformation to obtain frequency domain characteristic vectors.
4. The detection method for the android mobile device encrypted malicious traffic is characterized by comprising the following steps:
flow pretreatment: receiving the encrypted flow data, dividing into session granularity, filtering useless protocol to obtain the session to be detected,
On the one hand, processing the conversation to be detected, establishing a benign traffic feature library, filtering useless network conversations based on validity and repeatability to obtain a conversation A,
on the other hand, the characteristics of the encryption component of each session to be detected are obtained, and useless sessions are filtered based on the characteristics of the encryption component to obtain a session B, wherein the characteristics of the encryption component comprise an encryption algorithm list, a protocol version and a certificate authority;
the time domain feature processing step: firstly, extracting a packet length sequence and an arrival time interval sequence of a data packet in a session A, wherein the sequence characteristic does not have a direction; then introducing a maximum transmission unit to distinguish uplink and downlink, and extracting packet length sequences and time sequence feature vectors of uplink, downlink and bidirectional medium data packets on a session A, and jointly calling the time domain feature vectors;
the frequency domain feature processing step: processing a packet length sequence and an arrival time interval sequence of a session A data packet without a direction, abstracting the packet length sequence and the arrival time interval sequence into digital signals, converting a time domain into a frequency domain by using discrete Fourier transform, converting complex numbers into real numbers at the same time, and reducing characteristic dimensions by using linear transform to obtain frequency domain characteristic vectors;
and (3) associated feature processing: the method comprises the steps of processing a field of a key negotiation stage in a session B, wherein the field comprises a supported encryption protocol, a certificate and an encryption suite, extracting frequent items by establishing an FP-tree, establishing a rule base of benign and malicious traffic by taking families as rule postitems, dividing each rule base into 5 types according to the characteristics of android system network interaction, and taking the matching quantity of each rule as a feature vector of the part;
Classifier step: receiving the feature vectors of the time domain feature processing module, the frequency domain feature processing module and the association rule module, and performing multi-classification on the feature vectors by utilizing the LSTM neural network;
the flow pretreatment steps are as follows:
based on the session cutting of the five-tuple, selecting split Cap to cut the encrypted flow data; judging the effective session according to the flag bit, for example, judging the end of the session when the messages of the FIN and RST marks are detected in the TCP stream, and taking the SYN mark as the beginning of the session;
useless protocol filtering, sessions for which no complete TCP three-way handshake establishes a connection, and sessions for which fewer than three packets carry payloads are filtered out;
meanwhile, some traffic is not in the encryption traffic detection range, DNS inquiry and HTTP are cleaned, meanwhile, a benign traffic feature library is established through DNS domain name inquiry and access conditions of IP addresses of benign traffic, initial encryption traffic is matched with the content in the benign traffic feature library, and if the initial encryption traffic is successfully matched and the number is more than two, the domain name or the IP is considered to be communication traffic using a benign third party development library;
On the other hand, the characteristics of the encryption component of each session are obtained, the characteristics of the encryption component comprise an encryption algorithm list, protocol versions and a certificate issuing mechanism, the encryption algorithm list, the protocol versions and the certificate issuing mechanism are encoded by using a one-hot method, 1 indicates that the encryption component related to each session is not available, the element names are field names and extraction values, a transaction data set is finally formed, a minimum support degree support is set, the value of the minimum support degree support is the ratio of the number of samples containing a certain encryption component characteristic item set to the number of the whole transaction data set, and finally the transaction data set smaller than the minimum support degree support is filtered out to obtain a sample transaction data set;
the associated feature processing steps are as follows:
the sample transaction data set is obtained, an FP-tree is constructed first, a tree class is defined to be used as a node of the FP-tree, and the node has the following variable attributes: the bulletName stores the element Name; the drill Count stores the node frequency, and when each path passes through the node, the node frequency is increased by one; the pellet nodeLink is used for linking child nodes and helping to construct a head pointer linked list; the pellet parent is used for linking the current parent node and helping the upward access during frequent item mining; the pellet child is used for storing child nodes, and the variable stores dictionary type data;
Creating an empty treeNode as a root node, traversing elements of each sample transaction data set, adding leaf nodes in a recursive manner, and updating in two cases:
1) If the current node is a child node of the root node, the current path is added before, and only the node frequency count of the child node needs to be updated, wherein the root node of the part does not refer to the root which is created as empty at the beginning, but the root node of the current path in the traversal process is updated to be the child node of the last time after traversing once;
2) If the current node is not a child node of the root node, the current node is a brand-new path, a new branch needs to be created, the node is added into the child variable of the root node, the node frequency is set to be one, meanwhile, a head pointer linked list is updated, and the node is added into a chain tail;
and then excavating the frequent item sets, specifically: for each element, searching a conditional mode base of the element on the FP-tree, searching a set of prefix paths ending with the element through a head pointer linked list and parent variables of the nodes, and then comparing the set minimum support with the elements with the support value greater than or equal to the support value, wherein the elements reserved at the moment are frequent items for searching the elements;
And finally, generating and quantifying an association rule, taking frequent items as the front items of the association rule, taking the malicious traffic category given by the test data set as the rear item of the association rule, taking the minimum confidence as the ratio of the front item to the rear item, taking all the frequent items with the minimum confidence as the association rule of malicious traffic, constructing a malicious traffic encryption suite rule base, classifying each malicious traffic encryption suite rule base into 5 types according to the characteristics of android system network interaction, and taking the matching quantity of each rule as the feature vector of the part.
5. The method for detecting the encrypted malicious traffic of the android mobile device according to claim 4, wherein the time domain feature processing steps are as follows:
after the filtered session A is obtained, extracting the length and the arrival time interval of each data packet as a series of digital sequences, and reserving the information of the first N data packets, wherein zero padding is carried out on sessions of less than N data packets to obtain a packet length sequence and an arrival time interval sequence of the data packets without directions;
the session is divided into uplink, downlink and bidirectional, wherein the uplink is from the client to the server, and the downlink is vice versa;
The maximum transmission unit is used for distinguishing the direction of data communication, for each data packet of the session A, the packet length of the data packet, the time difference between the data packet and the last data packet and the time difference between the data packet and the last data packet in the same direction are calculated, so that a directional packet length sequence and a time sequence vector are obtained, and the time sequence feature vector is called as a time domain feature vector.
6. The method for detecting the encrypted malicious traffic of the android mobile device according to claim 4, wherein the frequency domain feature processing steps are as follows:
the method comprises the steps of obtaining a packet length sequence and an arrival time interval sequence of a session A which are not subjected to direction processing, abstracting the sequence into a digital signal, then performing discrete Fourier transform by using a fast Fourier algorithm to realize frequency domain conversion, converting complex numbers into real numbers by calculating a modulus of frequency to obtain frequency domain characteristics, and finally reducing dimensionality by logarithmic transformation to obtain frequency domain characteristic vectors.
CN202210732607.3A 2022-06-24 2022-06-24 Detection device and method for encrypting malicious traffic of android mobile device Active CN115086055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210732607.3A CN115086055B (en) 2022-06-24 2022-06-24 Detection device and method for encrypting malicious traffic of android mobile device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210732607.3A CN115086055B (en) 2022-06-24 2022-06-24 Detection device and method for encrypting malicious traffic of android mobile device

Publications (2)

Publication Number Publication Date
CN115086055A CN115086055A (en) 2022-09-20
CN115086055B true CN115086055B (en) 2023-07-18

Family

ID=83255628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210732607.3A Active CN115086055B (en) 2022-06-24 2022-06-24 Detection device and method for encrypting malicious traffic of android mobile device

Country Status (1)

Country Link
CN (1) CN115086055B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116938761B (en) * 2023-09-15 2024-01-12 深圳市扬名伟创信息技术有限公司 Internet of things terminal rapid testing system and method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107646190A (en) * 2015-03-17 2018-01-30 英国电讯有限公司 Identified using the malice refined net flow of Fourier transformation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10594707B2 (en) * 2015-03-17 2020-03-17 British Telecommunications Public Limited Company Learned profiles for malicious encrypted network traffic identification
CN107483488B (en) * 2017-09-18 2021-04-30 济南互信软件有限公司 Malicious Http detection method and system
CN112235314A (en) * 2020-10-29 2021-01-15 东巽科技(北京)有限公司 Network flow detection method, device and equipment
CN114158039B (en) * 2021-12-14 2024-04-12 哈尔滨工业大学 Traffic analysis method, system, computer and storage medium for low-power consumption Bluetooth encryption communication
CN114565226A (en) * 2022-01-27 2022-05-31 阿里云计算有限公司 Index processing method, server and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107646190A (en) * 2015-03-17 2018-01-30 英国电讯有限公司 Identified using the malice refined net flow of Fourier transformation

Also Published As

Publication number Publication date
CN115086055A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
Yang et al. MTH-IDS: A multitiered hybrid intrusion detection system for internet of vehicles
Resende et al. A survey of random forest based methods for intrusion detection systems
Wang et al. Detecting android malware leveraging text semantics of network flows
Babun et al. Z-iot: Passive device-class fingerprinting of zigbee and z-wave iot devices
Garcia et al. Distributed real-time SlowDoS attacks detection over encrypted traffic using Artificial Intelligence
Davis et al. Data preprocessing for anomaly based network intrusion detection: A review
Aiello et al. DNS tunneling detection through statistical fingerprints of protocol messages and machine learning
Chen et al. DNS covert channel detection method using the LSTM model
Sija et al. A survey of automatic protocol reverse engineering approaches, methods, and tools on the inputs and outputs view
Samarakoon et al. 5g-nidd: A comprehensive network intrusion detection dataset generated over 5g wireless network
Shahid et al. Generative deep learning for Internet of Things network traffic generation
Sakib et al. Using anomaly detection based techniques to detect HTTP-based botnet C&C traffic
Zixu et al. Generative adversarial network and auto encoder based anomaly detection in distributed IoT networks
Thaseen et al. Network intrusion detection using machine learning techniques
Khandait et al. IoTHunter: IoT network traffic classification using device specific keywords
Li et al. A method based on statistical characteristics for detection malware requests in network traffic
CN115086055B (en) Detection device and method for encrypting malicious traffic of android mobile device
Dasari et al. Detection of Different DDoS Attacks Using Machine Learning Classification Algorithms.
Hynek et al. Refined detection of SSH brute-force attackers using machine learning
Wang et al. KRTunnel: DNS channel detector for mobile devices
Machlica et al. Learning detectors of malicious web requests for intrusion detection in network traffic
Davis Machine learning and feature engineering for computer network security
Yin et al. Identifying iot devices based on spatial and temporal features from network traffic
Liang et al. FECC: DNS Tunnel Detection model based on CNN and Clustering
Liu et al. Spatial-temporal feature with dual-attention mechanism for encrypted malicious traffic detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant