CN115426137A

CN115426137A - Malicious encrypted network flow detection tracing method and system

Info

Publication number: CN115426137A
Application number: CN202210969965.6A
Authority: CN
Inventors: 刘敖迪; 杜学绘; 王娜; 吕震昊; 李连成; 于建骁; 王俊杰; 王文娟; 李峰
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-08-12
Filing date: 2022-08-12
Publication date: 2022-12-02

Abstract

The invention belongs to the technical field of network space security, and particularly relates to a malicious encrypted network flow detection traceability method and system, aiming at an original network data flow, converting the network data flow into analyzable multidimensional network flow statistical characteristics and extracting the characteristics; inputting the extracted multi-dimensional network flow statistical characteristics into a trained task analysis model, and analyzing and identifying whether the network flow is encrypted, the content type of the encrypted network flow and the application type of the encrypted network flow by using the task analysis model; meanwhile, inputting the statistical characteristics of the multidimensional network flow into a trained flow perception model, and identifying whether the network flow is abnormal flow or not by using the flow perception model and outputting the abnormal flow; the intelligent contract of the Fabric blockchain platform is utilized to store relevant data information of the network flow so as to find the marked network malicious traffic position by sending an access request addressing to the Fabric blockchain. The method utilizes various machine learning algorithms to carry out fine-grained identification on the malicious flow behaviors, improves the network space security and is convenient for practical scene application.

Description

Malicious encrypted network flow detection tracing method and system

Technical Field

The invention belongs to the technical field of network space security, and particularly relates to a malicious encrypted network traffic detection and tracing method and system.

Background

With the increasing severity of the network space security situation, the phenomenon of malicious software intrusion also frequently occurs. It is counted that each year, malware causes losses to network environment in the tens of trillions of dollars, wherein attack actions released by the malware cause damage to computer systems in different degrees, such as stealing private data, injecting malicious code, and the like. Therefore, in order to maintain the security of the network space, the identification of malicious behaviors of malware has become one of the research hotspots in the network security field.

Fine-grained identification of specific malicious activities inside malware is extremely difficult and also necessary. Malware is often released into the victim's operating system, corrupting it to varying degrees, and accurately identifying these malicious activities helps provide targeted defense. However, at present, most of the methods only identify the malicious nature of the software or detect the category of the malicious software, and the identification of the specific malicious behavior executed inside the malicious software needs to have rich security experience. With the prevalence of encrypted malware, communication traffic during malicious behavior execution is basically an encrypted data interaction part, and an unencrypted part can only reflect the characteristics of an application protocol without characteristics specific to each behavior, which brings difficulty to the fine-grained identification work.

In recent years, research on identifying malicious behaviors released by malicious software at home and abroad mainly focuses on pairing of security log events and api sequences, and mainly uses a machine learning method. These studies have basically surrounded the trace left in the system after the malware is attacked or analyzed the api sequence after the malware is executed reversely to analyze the malicious behavior of the malware, and select appropriate features to train with a machine learning algorithm. The method can obtain good identification effect on the behavior of the malicious software. However, the existing method has the following main problems: (1) The existing method analyzes malicious software after executing malicious behaviors, and at the moment, a system is damaged by the malicious software to different degrees and is difficult to deploy at the edge of a network; (2) Attackers typically use a variety of anti-detection means, such as modifying log files after the attack is completed, or adding confusing api sequences to malicious code, which can result in the inability to accurately identify the malicious behavior of the malware. The above problems cause that the existing method cannot realize rapid identification of specific malicious behaviors of the malware, and the existing research has no traffic data set of the behaviors of the malware, so that the analysis is difficult to be performed from a traffic level.

Disclosure of Invention

Therefore, the invention provides a malicious encrypted network traffic detection tracing method and system, which extract a labeled malicious behavior traffic data set by analyzing the statistical information of malicious software which uses an encryption protocol to carry out data communication, and utilize a plurality of machine learning algorithms to carry out fine-grained identification on malicious traffic behaviors, thereby improving the network space security and facilitating the application of practical scenes.

According to the design scheme provided by the invention, a malicious encrypted network flow detection and tracing method is provided, which comprises the following contents:

aiming at an original network data stream, converting the network data stream into analyzable multidimensional network stream statistical characteristics and extracting the characteristics, wherein the statistical characteristics at least comprise byte number and packet interval;

inputting the extracted multi-dimensional network flow statistical characteristics into a trained task analysis model, and analyzing and identifying whether the network flow is encrypted, the content type of the encrypted network flow and the application type of the encrypted network flow by using the task analysis model; meanwhile, inputting the statistical characteristics of the multidimensional network flow into a trained flow perception model, and identifying whether the network flow is abnormal flow or not by using the flow perception model and outputting the abnormal flow;

and storing relevant data information of the network flow by using an intelligent contract of a Fabric blockchain platform so as to find the marked network malicious traffic position by sending an access request addressing to the Fabric blockchain, wherein the relevant data information of the network flow at least comprises metadata of the network flow, multi-dimensional network flow statistical characteristics and forwarding information.

As a malicious encrypted network flow detection tracing method, when the statistical characteristics of the network flow are extracted, the data packets are read one by one from the pcap file, each data packet is added to the corresponding network data flow, and currentFlowss is used for storing all TCP or UDP flows which are not finished currently, the TCP flow is finished by taking an FIN mark as the current network data flow, and the UDP is judged to be finished by taking the set flowtimeout time as a limit and exceeding the time as the current network data flow.

As the malicious encrypted network traffic detection tracing method, further, when each data packet is added to the corresponding network data stream, the statistical characteristics of each data stream are updated simultaneously, and the statistical characteristics are written into the csv file.

The malicious encrypted network flow detection and tracing method further takes a TCP flow or a UDP flow as a unit and extracts statistical characteristics in the network data flow through forward and backward statistics, wherein the forward direction is from a source address to a destination address, and the reverse direction is from the destination address to the source address.

As the malicious encrypted network traffic detection tracing method, further, the task analysis model adopts a combined model of a support vector machine and a decision tree to perform multi-task identification, wherein the support vector machine is used for identifying whether the network flow is encrypted, and the decision tree is used for identifying the content type of the encrypted network flow and the application type of the encrypted network flow.

As the malicious encrypted network flow detection tracing method, further, a radial basis function is adopted as a kernel function by a support vector machine, penalty factors representing error tolerance and dimension factors representing the number of support vectors after data are mapped to a new feature space are adjusted by a training process of the support vector machine, and a grid searching method is used in the training process to determine the optimal penalty factors and the value of the dimension factors.

As the malicious encrypted network flow detection tracing method, the ISCX VPN-non VPN public data set is further utilized, and the decision tree is trained by setting the label tag, so that the depth of the decision tree and the minimum number of the support examples are adjusted.

As the malicious encrypted network flow detection and tracing method, a KNN model is adopted by a flow perception model to perceive and recognize whether the network flow is a normal network flow or an abnormal network flow, wherein the KNN model selects public data sets SCX-IDS-2012 and CIC-IDS-2017 as training samples, and a KNN algorithm is used for supervised learning to determine the hyperparameter of the KNN model.

As the malicious encrypted network traffic detection tracing method, further, when the network traffic is stored at the block link point, the network traffic is encrypted by using a hash encryption algorithm, and a hash value is stored at the block link point; and storing the collected original network flow by using the link down, and performing network flow source tracing and evidence obtaining by using the link up-link down cooperation.

Further, the present invention also provides a malicious encrypted network traffic detection traceability system, comprising: a feature extraction module, a flow detection module and a data storage module, wherein,

the characteristic extraction module is used for converting the network data flow into analyzable multidimensional network flow statistical characteristics and extracting the characteristics aiming at the original network data flow, wherein the statistical characteristics at least comprise byte number and packet interval;

the flow detection module is used for inputting the extracted multi-dimensional network flow statistical characteristics into a trained task analysis model, and analyzing and identifying whether the network flow is encrypted, the content type of the encrypted network flow and the application type of the encrypted network flow by using the task analysis model; meanwhile, inputting the statistical characteristics of the multidimensional network flow into a trained flow perception model, and identifying whether the network flow is abnormal flow or not by using the flow perception model and outputting the abnormal flow;

and the data storage module is used for storing related data information of the network flow by using an intelligent contract of the Fabric blockchain platform so as to find the marked network malicious flow position by sending an access request addressing to the Fabric blockchain, wherein the related data information of the network flow at least comprises metadata of the network flow, multidimensional network flow statistical characteristics and forwarding information.

The invention has the beneficial effects that:

the invention extracts the network flow in a characterizing way by a data fingerprint extraction technology of multidimensional mixed characteristics, dynamically extracts the multidimensional mixed characteristics of encrypted network flow as the data fingerprint of the network flow, comprehensively describes the encrypted network flow, and converts network flow which cannot be analyzed and complicated into analyzable multidimensional characteristic data; the encryption network flow multi-task recognition technology based on the SVM and the DecisionTree is used for realizing fine-grained classification of the encryption network flow, and the data fingerprint extraction technology of multi-dimensional mixed features is used for carrying out multi-dimensional feature processing, so that the accuracy of classification of the encryption flow is improved, and the false alarm rate is effectively reduced; the abnormal network flow sensing and finding problem is converted into a two-classification problem of the network flow data fingerprint expressed by a multi-dimensional mixed characteristic form by using an abnormal network flow sensing and finding model based on a KNN algorithm; the contents of metadata, data fingerprints, forwarding information and the like of network traffic are stored by using an intelligent contract of a Fabric blockchain platform, the acquired original network traffic data is stored by using the link-down link, a link-up and link-down cooperative network traffic data storage mechanism which has both blockchain security and link-down storage efficiency is constructed, and decentralized storage of the network traffic is realized, so that the scheme can be used for classification and detection evaluation of malicious encrypted network traffic, and is convenient for practical scene application.

Description of the drawings:

fig. 1 is a schematic diagram of a malicious encrypted network traffic detection tracing flow in an embodiment;

FIG. 2is a schematic diagram of a network traffic intelligent detection analysis and forensics platform in an embodiment;

FIG. 3 is a schematic diagram of a mainstream encryption protocol in the embodiment

FIG. 4 is a schematic diagram of the overall process of feature extraction in the embodiment

FIG. 5 is a schematic diagram illustrating a detection result of the multi-task identification technology module for encrypted network traffic in the embodiment;

FIG. 6 is a schematic diagram of a detection result of the abnormal network traffic sensing and discovery technology module in the embodiment;

fig. 7 is an illustration of the mode of operation of the transaction ledger in an embodiment.

The specific implementation mode is as follows:

in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

An embodiment of the present invention, as shown in fig. 1, provides a malicious encrypted network traffic detection tracing method, including:

s101, aiming at an original network data stream, converting the network data stream into analyzable multi-dimensional network stream statistical characteristics and extracting characteristics, wherein the statistical characteristics at least comprise byte number and packet interval;

s102, inputting the extracted multi-dimensional network flow statistical characteristics into a trained task analysis model, and analyzing and identifying whether the network flow is encrypted, the content type of the encrypted network flow and the application type of the encrypted network flow by using the task analysis model; meanwhile, inputting the statistical characteristics of the multidimensional network flow into a trained flow perception model, and identifying whether the network flow is abnormal flow by using the flow perception model and outputting the abnormal flow;

s103, storing relevant data information of the network flow by using an intelligent contract of the Fabric blockchain platform so as to find a marked network malicious traffic position by sending an access request addressing to the Fabric blockchain, wherein the relevant data information of the network flow at least comprises metadata of the network flow, multidimensional network flow statistical characteristics and forwarding information.

The network flow is an in-sequence data packet group interacted by a pair of IP entities on the same pair of ports within a certain time, and mainly comprises a TCP/UDP flow and an ICMP flow. The network flow is uniquely identified by a flow ID. A TCP/UDP flow is a network flow with the transport layer protocol type TCP/UDP. Its network flow ID is generated by the packet five tuple < source IP, destination IP, source port number, destination port number, transport layer protocol type >. An ICMP flow is a network flow of the transport layer protocol type ICMP. Its network flow ID is generated by packet triplet < source IP, destination IP, ICMP.

As shown in fig. 2, firstly, for a complex unanalyzed encrypted network data stream, a multi-dimensional feature extraction technology of encrypted traffic is designed, and is converted into an analyzable multi-dimensional feature, so as to perform necessary data preprocessing for a subsequent machine learning classification model. Secondly, inputting the multi-dimensional characteristic flow into an encrypted flow multi-dimensional task analysis model, and analyzing whether the trained support vector machine model and the trained decision tree model are encrypted or not, the content category of the encrypted network flow and the application category of the encrypted network flow to give a relatively comprehensive and thorough multi-task identification result. And secondly, inputting the multi-dimensional characteristic traffic into a normal network traffic sensing and discovering model, and carrying out abnormal traffic on the trained KNN model to realize the task of screening and classifying the abnormal network traffic. And finally, after the abnormal network traffic is captured, when the abnormal network traffic is stored on a blockchain node, generating characteristic attributes of hash, timestamp and the like of the network data stream, screening the network malicious traffic under the condition of screening a machine learning algorithm by the data traffic, and then finding the position of the malicious traffic by sending an access request to the Fabric blockchain to address the hash of the marked malicious traffic, thereby realizing the tracing of the network malicious traffic.

As a preferred embodiment, when extracting the statistical characteristics of the network flow, the method further reads the packets one by one from the pcap file, adds each packet to the corresponding network data flow, and stores all TCP or UDP flows that are not currently finished by using currentFlows, where the TCP flow is finished with FIN flag as the current network data flow, and the UDP is judged to be finished with the set flowtimeout time as the limit and the time exceeded. Further, when each data packet is added to the corresponding network data stream, the statistical characteristics of each data stream are updated simultaneously, and the statistical characteristics are written into the csv file.

Encrypted traffic already can be said to occupy the large walls of the river mountain of network traffic. Traffic encryption is considered in terms of user privacy protection and the like, but many malicious traffic is encrypted to avoid detection and attack. In the face of encrypted traffic, the traditional detection method cannot normally detect the encrypted traffic, and great uncertainty is brought to network security. Currently, there are three main encryption protocols, i.e., IPsec (IP Security), SSL (Secure Socket Layer)/TLS (Transport Layer Security), and SSH (Secure SHell), and the main encryption protocol is shown in fig. 3.

For encrypted traffic, it is more meaningful to analyze network flow statistical characteristics such as byte number, packet interval, etc. than to analyze encrypted traffic load, and then a specific extraction implementation means is introduced. Some statistics of the transport layer are extracted in units of one TCP stream or one UDP stream. TCP flow ends with FIN flag, UDP ends with flowtimeout time set as limit, and the end is judged when the time is over. There are many packets in a TCP stream that are first held three times and then transmitted and then waved four times. Statistics information in one stream is counted as the extracted features. And the statistical characteristics are divided into forward and backward directions, the forward direction from the source address to the destination address is specified, and the backward direction from the destination address to the source address is specified.

The feature quantity close to 80 dimensions can be extracted, sufficient preparation is made for subsequent machine learning, and the feature quantity can be directly used according to the required features. As shown in fig. 4, packets are read one by one from the pcap file, each packet is added to the corresponding flow, and all TCP and UDP flows that have not been completed currently are stored in currentFlows. And continuously updating the statistical characteristics of each flow in the adding process, and finally writing the statistical characteristics into the csv file. Judging whether a newly added data packet belongs to all current unfinished flows, if the newly added data packet belongs to the current flows, judging whether the newly added data packet belongs to the current flows or not, judging whether the newly added data packet belongs to the forward direction or the reverse direction, then judging whether time is overtime or not, if the time is not overtime, judging whether an FIN mark exists or not, if the time is not overtime, declaring a BasicFlow object, taking the flow corresponding to the current data packet from currentFlows according to id, and calling addPacket to add the data packet into the corresponding flow. If the current flow is not judged to be in all the unfinished flows, a new flow is directly created, only the current data packet is contained in the new flow, and the new flow is stored in currentFlows. If the flow belongs to a certain current unfinished flow and is overtime or has a FIN mark, the current flow is finished, if the flow is overtime, the corresponding flow is removed from currentFlows, the newly-built flow is stored in currentFlows, and if the flow contains the FIN mark, the corresponding flow is directly removed from currentFlows. The flow that ends calls the onFlowgenerated function directly to store the flow print.

Under the method based on the feature extraction, the feature extraction can be realized by writing codes, and the specific extracted features such as

Shown in table 1.

The extracted features are stored in csv as shown in FIG. 5.

As a preferred embodiment, further, the task analysis model adopts a combined model of a support vector machine and a decision tree to perform multi-task identification, wherein the support vector machine is used to identify the type of whether the network stream is encrypted, and the decision tree is used to identify the encrypted network stream content type and the encrypted network stream application type.

The technology for identifying the encrypted network traffic based on the SVM and the Decision Tree is specifically divided into whether the encrypted network traffic is encrypted traffic (Vpn, non _ Vpn), encrypted traffic application type identification (AIM, bitTorrent, email, facebook, FTPS, hangouts, ICQ, netflix, SFTP, skype, spotify, vimeo, voipBuster, youTube), encrypted traffic content type identification (Chat, file, voIP, streaming).

As a preferred embodiment, further, the support vector machine adopts a radial basis function as a kernel function, a penalty factor for characterizing error tolerance and a dimensionality factor for characterizing the number of support vectors after the data is mapped to a new feature space are adjusted by using a training process of the support vector machine, and a grid search method is used in the training process to determine an optimal penalty factor and dimensionality factor value.

The support vector machine is a linear classification model for realizing two classifications by solving a sample maximum edge distance hyperplane. To solve the problem of inseparability of sample linearity, the SVM uses a kernel function to map the original sample space to a higher dimensional feature space, so that the samples are linearly separable in the higher dimensional space. The SVM that is used adopts a Radial Basis Function (RBF) as a kernel Function, and the hyper-parameters that need to be adjusted during training include two: a penalty factor C which represents the tolerance of the error, wherein the larger the factor C is, the more important the loss is; the dimension factor γ determines the number of support vectors after the data is mapped to the new feature space. Because the method contains two changed parameters, a grid search method is needed to determine the optimal values of C and gamma.

Further, as a preferred embodiment, the decision tree is trained by utilizing ISCX VPN-non VPN public data sets and setting label labels to adjust the decision tree depth and the minimum number of support instances.

The decision tree is a tree structure decision analysis model for dividing nodes according to characteristic probability. In the embodiment of the scheme, a decision tree classification model can be constructed based on a C4.5 algorithm, and hyper-parameters needing to be adjusted in training are the depth of the tree and the number of instances of minimum support. For the classification performance of the application type and the content type of the encrypted network stream, since only the ISCX VPN-non VPN dataset in the public dataset can meet the experimental requirements of the present study, experiments are performed only on the dataset. The ISCX VPN-non VPN dataset is 25GB in size, with 22.8G being clear text traffic and the remaining 2.4GB being encrypted traffic using the VPN. The data set contains 280,540 streams, 18,468 of which are ciphertext streams and 262,072 are plaintext streams, including 14 types of applications. As shown in table 2.

TABLE 2ISCX VPN-non VPN dataset

The training data can adopt a public data set ISCX VPN-non VPN to train the model, the data is marked manually, vectorization expression is carried out on the data, finally the converted vector is transmitted in, and the parameters of the model are adjusted and trained. Training a model capable of encrypting network traffic and performing multi-task recognition. Finally, the model performance is checked by using a data set with the test _ size of 0.25, and three indexes of Accuracy (Accuracy), precision (Precision) and recall (call) are used as performance evaluation indexes. The definitions are:

wherein, TP _i Is the number of class i streams correctly classified as class i, FP _i Is the number of non-ith class streams misclassified as ith class, and FN _i Is the number of i-th class streams misclassified as non-i-th class streams, and n is the total number of classes. The detection result is shown in fig. 6, and the task of multitask identification of the encrypted network traffic can be well completed.

As a preferred embodiment, the traffic perception model further adopts a KNN model to perceive and recognize whether the network flow is a normal network flow or an abnormal network flow, wherein the KNN model selects the public data sets SCX-IDS-2012 and CIC-IDS-2017 as training samples, and performs supervised learning by using a KNN algorithm to determine the KNN model hyper-parameter.

And importing the preprocessed experimental data into an abnormal network flow sensing and discovering technology module based on the KNN algorithm in a csv format, and manually marking the training data as normal network flow (Benign _ traffic) and abnormal network flow (silica _ traffic) SSH (server), FTP (file transfer protocol), dos (Dos), DDos (distributed data service), bot (Bot) and Web (ack). The data are subjected to vectorization representation, and finally the converted vectors are transmitted into a machine learning algorithm adopted by the module. The K-Nearest Neighbors (KNN) algorithm is a classification and regression algorithm, is one of the most basic and simplest algorithms in a machine learning algorithm, is proposed by Cover and Hart in 1968, and has the fields of character recognition, text classification, image recognition and the like in application scenes. KNN is called K Nearest Neighbors, a non-parametric, inert algorithm model. Although KNN has no parameter, KNN has a series of super parameters, and the influence of some super parameters on the model prediction effect is large. The most important parameter in the hyper-parameters is mainly the K value and the distance measurement mode.

The ways of measuring distance are typically minkowski distance, euclidean distance, chebyshev distance, manhattan distance, etc. However, in general, the distances most used in the KNN algorithm are euclidean distances, and the following are the calculation formulas for these distances:

minkowski distance:

euclidean distance:

chebyshev distance:

manhattan distance:

through cross validation (splitting the sample data into training data and validation data according to a certain proportion), starting from selecting a smaller K value, continuously increasing the value of K, then calculating the variance of a validation set, and finally finding a more appropriate K. Selecting a smaller K value is equivalent to predicting with a training example in a smaller field, and the error of the approximation "learning" is reduced, and only a training example closer to or similar to the input example will contribute to the prediction result, and at the same time, the problem is that the estimation error of "learning" will increase, in other words, the reduction of K value means that the whole model becomes complicated and overfitting is easy to occur. Selecting a larger value of K is equivalent to predicting with training examples in a larger domain, which has the advantage of reducing the estimation error of learning, but has the disadvantage of increasing the approximation error of learning. At this time, the training instance far away (dissimilar) from the input instance also acts on the predictor, making the prediction wrong, and an increase in the value of K means that the overall model becomes simple.

In the embodiment of the present disclosure, public data sets SCX-IDS-2012 and CIC-IDS-2017 can be used as training samples, and the data thereof are specifically classified as shown in table 4 and table 5

TABLE 4 SCX-IDS-2012 data set

TABLE 5 CIC-IDS-2017 dataset

Finally, the model performance is checked by using a data set with the test _ size of 0.25, and three indexes of Accuracy (Accuracy), precision (Precision) and recall (call) are used as performance evaluation indexes. The definitions are respectively:

wherein, TP _i Is the number of class i streams correctly classified as class i, FP _i Is the number of non-ith class streams misclassified as ith class, and FN _i Is the number of i-th class streams misclassified as non-i-th class streams, and n is the total number of classes. The detection results are shown in FIG. 7.

As a preferred embodiment, further, when performing network traffic storage, the block link point encrypts the network traffic by using a hash encryption algorithm, and stores a hash value to the block link point; and storing the collected original network flow by using the link down, and performing network flow source tracing and evidence obtaining by using the link up-link down cooperation.

The block chain has information data tamper resistance, can carry out effective reliable protection to the number on the chain, it is difficult to trace to the network flow, the characteristic of collecting evidence is difficult, adopt traditional method, even data have already been obtained, also often be difficult to carry out effectual utilization and collecting evidence, therefore, with the help of the security characteristic of block chain, firstly, can effectively integrate the resource, system memory, make things convenient for subsequent analysis, existing short-term control that is favorable to, effectively catch small-size attack, simultaneously to large-scale purposeful, complicated attack, also can effectively arrange in order, provide the evidence.

The block chain is a chain-shaped data structure and a storage mode which are formed by a plurality of blocks. Each block is divided into a block header mainly used to implement a Hash Value (Hash Value) of a block preceding the block link, and a block body mainly including a transaction book, as shown in fig. 8. In the embodiment of the scheme, different from the traditional central account book, all users participating in the blockchain network can obtain detailed information of transactions occurring in the network, so that the credibility is enhanced, one party needs to change the account book information of other users at the same time when tampering the information, and the tampering cost is increased. By utilizing the Hash algorithm, the desire of quickly comparing the user account book to find out abnormal information is realized, and the efficiency of finding out the abnormal information is greatly improved; by utilizing a public key password system, the use authority of the user is protected, and the safety of the block chain information is further enhanced. The files can be directly stored in the data storage of the block chain, but the data of the common network flow is huge, and the resources on the block chain are precious, so that in the embodiment of the scheme, the files are encrypted by a hash algorithm, and the hash value is stored on the block chain, so that the space is saved, the comparison of information fingerprints is completed, and the whole-course tracing of malicious flow is facilitated.

Further, based on the foregoing method, an embodiment of the present invention further provides a malicious encrypted network traffic detection traceability system, including: a feature extraction module, a flow detection module and a data storage module, wherein,

Unless specifically stated otherwise, the relative steps, numerical expressions and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The elements of the various examples and method steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and the components and steps of the examples have been described in a functional generic sense in the foregoing description for clarity of hardware and software interchangeability. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Those skilled in the art will appreciate that all or part of the steps of the above methods can be implemented by a program instructing relevant hardware, and the program can be stored in a computer readable storage medium, such as: read-only memory, magnetic or optical disk, and the like. Alternatively, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits, and accordingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A malicious encrypted network traffic detection tracing method is characterized by comprising the following contents:

inputting the extracted multi-dimensional network flow statistical characteristics into a trained task analysis model, and analyzing and identifying whether the network flow is encrypted, the content type of the encrypted network flow and the application type of the encrypted network flow by using the task analysis model; meanwhile, inputting the statistical characteristics of the multidimensional network flow into a trained flow perception model, and identifying whether the network flow is abnormal flow by using the flow perception model and outputting the abnormal flow;

and storing relevant data information of the network flow by using an intelligent contract of the Fabric blockchain platform so as to find the marked network malicious traffic position by sending an access request addressing to the Fabric blockchain, wherein the relevant data information of the network flow at least comprises metadata of the network flow, multidimensional network flow statistical characteristics and forwarding information.

2. The malicious encrypted network traffic detection tracing method according to claim 1, wherein when the network flow statistical characteristics are extracted, the data packets are read one by one from the pcap file, each data packet is added to the corresponding network data flow, and currentFlows are used to store all TCP or UDP flows that are not currently finished, the TCP flows are finished with FIN marks as the current network data flows, and the UDP is judged to be finished with the set flowtimeout time as a limit and the time is exceeded.

3. The malicious encrypted network traffic detection tracing method according to claim 2, wherein when each data packet is added to a corresponding network data stream, the statistical characteristics of each data stream are updated at the same time, and the statistical characteristics are written into the csv file.

4. The malicious encrypted network traffic detection and tracing method according to claim 1 or 2, wherein statistical features in the network data flow are extracted by forward and backward statistics in units of one TCP flow or one UDP flow, wherein the forward direction is from a source address to a destination address, and the reverse direction is from the destination address to the source address.

5. The malicious encrypted network traffic detection tracing method according to claim 1, wherein the task analysis model adopts a combined model of a support vector machine and a decision tree to perform multi-task identification, wherein the support vector machine is used to identify whether the network flow is encrypted, and the decision tree is used to identify the encrypted network flow content type and the encrypted network flow application type.

6. The malicious encrypted network traffic detection tracing method according to claim 5, characterized in that the support vector machine adopts a radial basis function as a kernel function, a training process of the support vector machine is utilized to adjust a penalty factor representing error tolerance, a dimensionality factor representing the number of support vectors after the data is mapped to a new feature space, and a grid search method is used in the training process to determine an optimal penalty factor and dimensionality factor value.

7. The malicious encrypted network traffic detection tracing method according to claim 5, wherein ISCX VPN-non VPN public data sets are used, and the decision tree is trained by setting tag labels to adjust the decision tree depth and the minimum number of support instances.

8. The malicious encrypted network traffic detection tracing method according to claim 1, wherein the traffic perception model adopts a KNN model to perceive and recognize whether the network flow is a normal network flow or an abnormal network flow, wherein the KNN model selects public data sets SCX-IDS-2012 and CIC-IDS-2017 as training samples, and supervised learning is performed by using a KNN algorithm to determine the KNN model hyper-parameter.

9. The malicious encrypted network traffic detection tracing method according to claim 1, wherein when the block link points store the network traffic, the network traffic is encrypted by using a hash encryption algorithm, and a hash value is stored in the block link points; and storing the collected original network flow by using the link, and performing network flow source tracing and evidence obtaining by using the link uplink and downlink cooperation.

10. A malicious encrypted network traffic detection traceability system is characterized by comprising: a feature extraction module, a flow detection module and a data storage module, wherein,

the characteristic extraction module is used for converting the network data flow into analyzable multi-dimensional network flow statistical characteristics and extracting the characteristics aiming at the original network data flow, wherein the statistical characteristics at least comprise byte number and packet interval;

the flow detection module is used for inputting the extracted multi-dimensional network flow statistical characteristics into a trained task analysis model, and analyzing and identifying whether the network flow is encrypted, the content type of the encrypted network flow and the application type of the encrypted network flow by using the task analysis model; meanwhile, inputting the statistical characteristics of the multidimensional network flow into a trained flow perception model, and identifying whether the network flow is abnormal flow by using the flow perception model and outputting the abnormal flow;

and the data storage module is used for storing relevant data information of the network flow by using an intelligent contract of the Fabric blockchain platform so as to find the marked network malicious flow position by sending an access request addressing to the Fabric blockchain, wherein the relevant data information of the network flow at least comprises metadata of the network flow, multidimensional network flow statistical characteristics and forwarding information.