CN111245860A

CN111245860A - Encrypted malicious flow detection method and system based on two-dimensional characteristics

Info

Publication number: CN111245860A
Application number: CN202010066830.XA
Authority: CN
Inventors: 周志洪; 姚立红; 胡斌; 银鹰; 李建华
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-05

Abstract

The invention discloses a method and a system for detecting encrypted malicious flow based on two-dimensional characteristics, wherein the method comprises the following steps: step 1, merging data packets of the same quintuple into bidirectional conversation flow; step 2, shielding the quintuple characteristics; step 3, extracting the message load characteristics of the session flow; step 4, extracting the stream fingerprint characteristics of the session flow; step 5, integrating and standardizing the flow characteristics extracted in the step 3-4; and 6, carrying out malicious traffic classification by using a logistic regression machine learning model. The method has the beneficial effects that the problem of detection accuracy rate reduction of the encrypted malicious traffic caused by instable quintuple characteristics under large-scale complex network conditions can be solved, and experimental results show that the method has the detection accuracy rate of 97.86 percent higher than that of the traditional method for the encrypted malicious traffic of the SSL/TLS protocol in the complex network environment under the condition of not depending on the quintuple, and the detection accuracy rate of the encrypted malicious traffic is 34.45 percent higher than that of the traditional method.

Description

Encrypted malicious flow detection method and system based on two-dimensional characteristics

Technical Field

The invention relates to the crossing field of network security and machine learning, in particular to an encrypted malicious flow detection method and system based on two-dimensional characteristics.

Background

To protect secure communications between users and enterprises, website traffic encryption has become a mainstream measure, and the application of SSL/TLS (secure socket layer/transport layer security) protocol is the main means for encrypting such traffic. Encrypted traffic can protect the confidentiality and integrity of private information to some extent, but also provides shelter to malicious activities on the network.

At present, a supervised learning method is mainly adopted for encrypting malicious traffic. However, the existing method often uses only one feature, and cannot detect malicious traffic with extremely high domain name updating frequency. In a complex network environment with complex quintuple information, if quintuple information frequently changed with malicious traffic is taken as an important feature, the model identification precision is affected. If the quintuple characteristics of the traffic are removed, the methods are used again to detect the encrypted malicious traffic, and the recognition rate is greatly reduced.

Therefore, the invention provides an encrypted malicious traffic identification method, which divides the encrypted traffic into two dimensions, a message load and a stream fingerprint in a data preprocessing mode. Under the condition of avoiding quintuple information, the position of each flow is described by message load and flow fingerprints, and training and prediction are carried out through a logistic regression machine learning model.

Disclosure of Invention

The invention solves the problems that under the complex network environment with various malicious flow sources, the network layer characteristics of the flow are diversified, the quintuple characteristics have no regularity any more, and the detection rate of the traditional method is reduced. Therefore, the invention provides an SSL/TLS encrypted malicious traffic detection method independent of traffic quintuple characteristics, which induces the traffic multiple characteristics into the combined characteristics of message load characteristics and flow fingerprint characteristics, so that the traffic has more differentiated characteristics in a complex network environment, describes one traffic from two dimensions, and uses a logistic regression model for classification to realize the detection of SSL/TLS protocol encrypted malicious traffic in the complex network environment.

The invention provides an encrypted malicious flow detection method based on two-dimensional characteristics, which extracts message load characteristics and flow fingerprint characteristics of monitored encrypted flow and identifies malicious flow on the basis of the message load characteristics and the flow fingerprint characteristics.

Further, the detection method comprises the following steps:

step 1, merging data packets of the same quintuple into bidirectional conversation flow;

step 2, shielding the quintuple characteristics;

step 3, extracting the message load characteristics of the session flow;

step 4, extracting the stream fingerprint characteristics of the session flow;

step 5, integrating and standardizing the flow characteristics extracted in the step 3-4;

and 6, carrying out malicious traffic classification by using a logistic regression machine learning model.

Further, the step 1 comprises:

step 1.1, merging the same flow of five-tuple into a conversation, wherein the five-tuple refers to a source IP address, a destination IP address, a source port, a destination port and a protocol;

step 1.2, merging the sessions, in which the source IP address of the inflow flow is the same as the destination IP address of the outflow flow, the destination IP address of the inflow flow is the same as the source IP address of the outflow flow, the source port of the inflow flow is the same as the destination port of the outflow flow, the destination port of the inflow flow is the same as the source port of the outflow flow, and the protocol of the inflow flow is the same as the protocol of the outflow flow, into a bidirectional flow.

Further, in the step 2, the data information of the IP, the port, and the protocol field in the session traffic is filled with all 0 s instead.

Further, in the step 3, five elements in the ClientHello and ServerHello messages are selected as characteristics of the message load, including: TLSVersion (protocol version), Ciphers (held cipher suite), Extensions (extension field), EllipticCurves (elliptic curve cipher), EllipticCurvePointFormat (elliptic curve cipher format); the combined data of the five elements is classified into a special fingerprint array X_{Is just}＝[x₁,x₂,x₃,x₄,x₅]Wherein

x₁: the code of the protocol version is used,

x₂: the code of all the cipher suites that are supported,

x₃: the code of all the extension fields is such that,

x₄: a code of the elliptic curve cipher type,

x₅: code in elliptic curve cipher format.

Further, in the step 4, the packet length and the packet inter-arrival time, and the byte distribution data are extracted as the stream fingerprint feature.

Further, regarding the packet length, the lengths of all packets in each session are scattered into a window with the same size, the window size is N bytes, packets with the packet length between [0, N) bytes are placed into a first bin, packets with the packet length between [ N,2N) bytes are placed into a second bin, and the like; then constructing a matrix A, wherein each element A [ i, j ] represents the number of times a packet in the ith bin is converted into a packet in the jth bin is calculated; and finally, carrying out normalization processing on each row of the A, wherein each row is a Markov chain and is used as a packet length characteristic of the conversation.

Further, regarding the packet arrival time interval, the arrival time intervals of all packets in each session are discretized into a window with the same size, the window size is T milliseconds, packets with the packet arrival time interval between [0, T) milliseconds are placed into a first bin, packets with the packet arrival time interval between [ T,2T) milliseconds are placed into a second bin, and the like; then constructing a matrix B, wherein each element B [ i, j ] represents the number of times a packet in the ith bin is converted into a packet in the jth bin is calculated; and finally, carrying out normalization processing on each row of the B, wherein each row is a Markov chain and is used as a packet arrival time interval characteristic of the conversation.

Further, the byte distribution is a length 256 array that counts each byte value in the payload of each packet in the stream; dividing this count by the total number of bytes found in the packet payload to obtain the probability of each byte value occurring; the byte distribution of different applications provides a large amount of information about the encoding of the application data; in addition, byte distribution may also provide the load ratio of SSL/TLS protocol handshake packets to the entire flow, byte composition of the handshake information, and information to add any unachieved padding.

Further, the combined data of both length and packet inter-arrival time, and byte distribution data are normalized to the stream fingerprint characteristics of the proprietary stream.

The invention provides an encrypted malicious flow detection system based on two-dimensional characteristics, which comprises:

the SSL/TLS flow extraction module is used for capturing flow data from a network;

a bidirectional fluidization module for executing the step 1;

a quintuple feature fuzzification processing module for executing the step 2;

the message load characteristic extraction module executes the step 3;

the stream fingerprint feature extraction module executes the step 4;

a logistic regression analysis module for executing the step 5-6;

and the classification result output module outputs a classification result of the flow.

The method has the beneficial effects that the problem of detection accuracy rate reduction of the encrypted malicious traffic caused by instable quintuple characteristics under large-scale complex network conditions can be solved, and experimental results show that the method has the detection accuracy rate of 97.86 percent higher than that of the traditional method for the encrypted malicious traffic of the SSL/TLS protocol in the complex network environment under the condition of not depending on the quintuple, and the detection accuracy rate of the encrypted malicious traffic is 34.45 percent higher than that of the traditional method.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a diagram of the steps of a detection method according to one embodiment of the present application.

FIG. 2 is a block diagram of a detection system according to an embodiment of the present application.

FIG. 3 is a flowchart of an SSL/TLS protocol referenced when extracting a message payload according to an embodiment of the present application.

FIG. 4 is a specific numerical value of an evaluation index obtained in a single network environment and a complex network environment through a test of a known data set in an embodiment of the present application.

FIG. 5 is a two-dimensional graph of different session flows in a single network environment, tested with known data sets, in one embodiment of the present application.

FIG. 6 is a two-dimensional graph of different session flows in a complex network environment, tested with a known data set, according to an embodiment of the present application.

Detailed Description

The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

In a complex network environment with complex quintuple information, if quintuple information frequently changed with malicious traffic is taken as an important feature, the model identification precision is affected. If the quintuple characteristics of the traffic are removed, the methods are used again to detect the encrypted malicious traffic, and the recognition rate is greatly reduced.

Therefore, the encrypted traffic is divided into two dimensions, packet load and stream fingerprint, by means of data preprocessing. Under the condition of fuzzy quintuple information, the position of each flow is described by message load and flow fingerprint, and training and detection are carried out through a logistic regression machine learning model.

The first embodiment is as follows:

as shown in fig. 1, in an embodiment of the detection method proposed by the present invention, a logic model is trained through a known offline data set, and the trained model is tested using known offline data of different contents to test the detection accuracy.

The specific steps of the training process are as follows:

step 1: using offline PCAP files that have been identified as malicious traffic as the training data set for the malicious portion, the malicious traffic is further classified as: encrypted communication of malicious behaviors such as scanning detection, brute force cracking, C & C communication and the like (such as Neris, Rbot, Virut, Menti, Soguo, Murlo, NSIS. ay and the like); using offline PCAP files that have been identified as benign traffic as a training data set for the benign portion; the data sets are encrypted SSL/TLS traffic;

step 2: respectively carrying out data preprocessing on the two parts of data sets;

step 2 again comprises the following substeps:

step 2.1: cutting the flow according to a quintuple of a source IP address, a destination IP address, a source port, a destination port and a protocol;

step 2.2: merging the flows with the same quintuple into a conversation;

step 2.3: merging the conversation with the bidirectional flow, wherein the conversation is the same as the protocol of the outgoing flow, the source IP address of the incoming flow is the same as the destination IP address of the outgoing flow, the source port of the incoming flow is the same as the destination port of the outgoing flow, the destination port of the incoming flow is the same as the source port of the outgoing flow, and the protocol of the incoming flow is the same as the protocol of the outgoing flow;

step 2.4: blurring five-tuple information of all sessions, namely replacing a source IP address, a destination IP address, a source port, a destination port and a protocol by hexadecimal 00, and keeping the rest information of the sessions unchanged;

and step 3: and extracting message load characteristics of the session flow. The content comprises the following steps: TLSVersion (protocol version), Ciphers (supported cipher suite), Extensions (extension field), EllipticCurves (elliptic Curve cipher), EllipticCurvePointFormat (elliptic Curve cipher Format). Data of five elements are combined into a proprietary fingerprint array:

X_{is just}＝[x₁,x₂,x₃,x₄,x₅]

Wherein x is₁: a code of a protocol version; x is the number of²: code for all cipher suites supported; x is the number of³: generation of all extension fieldsCode; x is the number of⁴: a code of elliptic curve cipher type; x is the number of⁵: code in elliptic curve cipher format.

And 4, step 4: extracting a flow fingerprint feature of session traffic, comprising: packet length and packet inter-arrival time, byte distribution data capable of providing application data encoding information as stream fingerprint characteristics;

wherein, packet length and packet inter-arrival time: modeling as a Markov chain;

for packet length, discretizing the length of all packets in each session into a window of the same size, the window size being 150 bytes, placing packets with packet lengths between [0,150) bytes into the first bin, placing packets with packet lengths between [150,300) bytes into the second bin, and so on;

then constructing a matrix A, wherein each element A [ i, j ] represents the number of times a packet in the ith bin is converted into a packet in the jth bin is calculated;

finally, each line of A is normalized, each line is a Markov chain and is used as the packet length characteristic of the conversation, namely A_{Packet length}；

For packet arrival time intervals, discretizing the arrival time intervals of all packets in each session into a window of the same size, the window size being 50 milliseconds, placing packets with packet arrival time intervals between [0,50) milliseconds into a first bin, placing packets with packet arrival time intervals between [50,100) milliseconds into a second bin, and so on;

then constructing a matrix B, wherein each element B [ i, j ] represents the number of times a packet in the ith bin is converted into a packet in the jth bin is calculated;

finally, each row of B is normalized, each row is a Markov chain and is used as the characteristic of the packet arrival time interval of the conversation, namely B_{Time of arrival interval}；

Byte distribution: the byte distribution is a length 256 array that counts each byte value in the payload of each packet in the stream. Dividing this count by the total number of bytes found in the packet payload yields the probability of each byte value occurring. Is differentThe byte distribution of an application provides a large amount of information about the encoding of the application's data, namely C_{Byte distribution}；

Attributing the combined data of the two items as a stream fingerprint characteristic of a proprietary session

Y_{Flow of}＝[A_{Packet length},B_{Time of arrival interval},C_{Byte distribution}]；

And 5: after the message load characteristics and the flow fingerprint characteristics are normalized, the conversation flow is identified and input into a logistic regression machine learning model; the logistic regression machine learning model is configured as follows:

the regularization type is 'l 1', the error range of iteration termination judgment is '1 e-4', the regularization intensity reciprocal is '1.0', the weights of all classes are '1', the algorithm selects 'libilinear', the iteration times are '100', and the loss function selects 'Sigmoid'

Step 6: the training process is completed and the model is saved.

Next, testing the model stored after training, specifically including the following steps:

step 1: using offline PCAP files that have been identified as malicious traffic as the test data set for the malicious portion, the malicious traffic is further classified as: scanning for encrypted communication of malicious behaviors such as detection, brute force cracking, C & C communication and the like (such as Neris, Rbot, Virut, Menti, Soguo, Murlo, NSIS. ay and the like). Using offline PCAP files that have been identified as benign traffic as the benign portion of the test data set, the data sets all being encrypted SSL/TLS traffic;

step 2-4 is the same as training step 2-4, respectively;

and 5: and after the message load characteristics and the stream fingerprint characteristics are normalized, inputting the trained logistic regression model to obtain an output classification result of the model, and comparing the output classification result with the actual classification to detect the accuracy of the model.

Example two:

as shown in fig. 2, an embodiment of the detection system provided by the present invention includes the following modules: the system comprises an SSL/TLS flow extraction module, a bidirectional streaming module, a quintuple feature fuzzification processing module, a message load feature extraction module, a flow fingerprint feature extraction module, a logistic regression classifier module and a classification result output module. The detection system is trained and tested, and the processing procedures of all modules are as follows:

SSL/TLS traffic extraction module: and replaying the test traffic at the server network card by using tcprep, and capturing the packet at the server network card by using tshark, wherein only SSL/TLS traffic is captured.

A bidirectional fluidization module: and cutting the flow according to a quintuple of a source IP address, a destination IP address, a source port, a destination port and a protocol. And then combining the same flow of the quintuple into a session.

A quintuple feature fuzzification processing module: and (3) blurring five-tuple information of all the sessions, namely replacing the source IP address, the destination IP address, the source port, the destination port and the protocol with hexadecimal 00, and keeping the rest information of the sessions unchanged.

A message load characteristic extraction module: the SSL/TLS protocol is constructed as shown in FIG. 3, and its principle is as follows:

after the TLS session is initiated, the client sends a ClientHello packet to the server, the generation of which depends on the software package and method used to build the client application. If the connection is accepted, the server uses the server library and the configuration and the detailed information in the ClientHello message to create a ServerHello data packet for response, and then the server sends a Certificate, and serverheyexchange and ServerHelloDone complete the message sending of ServerHello. After receiving the message, the client uses public Key in the Certificate to exchange Session Key of ClientKeyexchange, and then sends ChangeCipherSpec to indicate that all messages sent by the Server from now are encrypted and end with Finished. After receiving the message, the server sends a message with the same property for confirmation. Then, the application data is transmitted and received according to the SSL protocol standard negotiated before;

the message content of the handshake negotiation stage is plaintext, and the content of the application data transmission stage is ciphertext; therefore, the detailed information in the Hello data packet can be used for carrying out fingerprint identification on the client application program from the message content layer, and the message load characteristics of the session flow are extracted, wherein the contents comprise: version, Ciphers (supported cipher suite), Extensions (extension field), eliptitcurves (elliptic curve cipher), eliptitcurvepointformats;

the combined data of the five elements is classified as a proprietary fingerprint array:

X_{is just}＝[x₁,x₂,x₃,x₄,x₅]

Wherein x¹: a code of a protocol version; x is the number of²: code for all cipher suites supported; x is the number of³: codes of all extension fields; x is the number of₄: a code of elliptic curve cipher type; x is the number of₅: code in elliptic curve cipher format.

The flow fingerprint feature extraction module: extracting a flow fingerprint feature of session traffic, comprising:

packet length and packet inter-arrival time: modeled as a markov chain. For packet length and inter-arrival time, the values are discretized into windows of the same size, for packet length data, a window of size 150 bytes is used, placing data size [0,150 ] into the first bin, data size [150,300 ] into the second bin, and so on. A matrix A [ i, j ] is then constructed]And calculating the transition probability between the ith bin and the jth bin. Finally, carrying out standardization treatment, namely normalization on the A to ensure that a proper Markov chain is obtained; taking A as the characteristic of the data, namely A_{Packet length}；

Byte distribution: the byte distribution is a length 256 array that counts each byte value in the payload of each packet in the stream. Dividing this count by the total number of bytes found in the packet payload yields the probability of each byte value occurring. The byte distribution of the different applications provides a lot of information about the data encoding of the application, i.e. C_{Byte distribution}；

Y_{Flow of}＝[A_{Packet length},B_{Time of arrival interval},C_{Byte distribution}]。

A logistic regression analysis module: and after the message load characteristics and the flow fingerprint characteristics are normalized, the conversation flow is identified and input into a logistic regression machine learning model. The logistic regression machine learning model is configured as follows:

A classification result output module: in the process of testing the model, the performance of the model needs to be evaluated. The evaluation criteria of the present invention are divided into four broad categories: true Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN);

the accuracy (accure, marked as A) is defined as the ratio of the number of correctly classified samples to the total number of samples, i.e. the accuracy is defined

In addition, the invention also uses the precision rate and the recall rate as evaluation indexes; precision and recall represent the ability of the classifier to work on each category; the accuracy reflects the overall performance of the classifier. F1-measure (denoted as F1) is an evaluation index combining precision and recall.

Accuracy rate is true positive example/(true positive example + false positive example)

Recall rate true positive example/(true positive example + false negative example)

F1 ═ 2 · (precision · recall rate)/(precision + recall rate)

The method and the system use a Czech university of technology (CTU) data set as a data set for testing the method and the system, the SSL flow generated by C & C communication in the CTU13 data set is extracted, the total SSL flow is 0.698GB, and the size of a normal flow data set is 0.76 GB. The size of the positive and negative data sets satisfies the balance of the training data.

Comparative experiments were performed on the data sets in both single and complex network environments, resulting in the following tables and figures 4-6:

in the embodiment, after all the characteristics of the flow are classified into the message load characteristics and the flow fingerprint characteristics, the flow is characterized from two dimensions, the classifier model is trained by using the characteristics of the two dimensions, the finally obtained result can reach more than 97% of identification accuracy rate no matter in a single network environment or a complex network environment, and the F1-Measure index of the flow also reaches more than 97%.

Example three:

one embodiment of the detection system provided by the invention comprises the following modules: the system comprises an SSL/TLS flow extraction module, a bidirectional streaming module, a quintuple feature fuzzification processing module, a message load feature extraction module, a flow fingerprint feature extraction module, a logistic regression classifier module and a classification result output module. The detection system is used for real-time flow detection, and the processing process of each module is as follows:

SSL/TLS traffic extraction module: and capturing the packet at the network card of the server by using tshark, and capturing real-time SSL/TLS traffic.

A bidirectional fluidization module: the module with the same name as the second embodiment;

a quintuple feature fuzzification processing module: the module with the same name as the second embodiment;

a message load characteristic extraction module: the module with the same name as the second embodiment;

a logistic regression classifier module: the module with the same name as the second embodiment;

a classification result output module: outputting the judgment of the flow, and finally forming a flow classification result, wherein the flow classification result comprises the following steps: scanning for encrypted communication of malicious behaviors such as detection, brute force cracking, C & C communication and the like (such as Neris, Rbot, Virut, Menti, Soguo, Murlo, NSIS. ay and the like).

In the normal traffic under the complex network environment, because of the normal SSL/TLS communication traffic from different websites, the difference of TLSVersion, Ciphers, Extensions, Elliptics Currves and Elliptics CurvePointFormat is large due to different SSL certificates, the normalized value is distributed between 0 and 1; and malicious traffic can only adopt an SSL/TLS protocol version with an older version because legal SSL certificates of a regular channel cannot be obtained, and a supported password suite and the number of extensions are also small, so that the value distribution area after normalization is limited.

The embodiment of the invention classifies the selected characteristics into the message load characteristics and the flow fingerprint characteristics through encrypted flow preprocessing and IP and port evasion, and describes one flow from two dimensions so as to meet the training requirement of the model. The embodiment result shows that in the traditional encrypted traffic research, quintuple features which gradually lose effectiveness along with the development of malicious behaviors account for larger total classification weight, the identification accuracy of the model on the traffic of a complex network environment is reduced, the provided message and flow fingerprint feature identification method solves the problem, and the model only needs to be trained from the traffic of a single network environment, so the universality is wider.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A method for detecting encrypted malicious traffic based on two-dimensional features is characterized in that message load features and flow fingerprint features of monitored encrypted traffic are extracted, and malicious traffic is identified on the basis of the message load features and the flow fingerprint features.

2. The encrypted malicious traffic detection method based on the two-dimensional features as claimed in claim 1, characterized by comprising the following steps:

step 2, shielding the quintuple characteristics;

step 3, extracting the message load characteristics of the session flow;

step 4, extracting the stream fingerprint characteristics of the session flow;

3. The encrypted malicious traffic detection method based on the two-dimensional features as claimed in claim 2, wherein the step 1 comprises:

4. The method according to claim 2, wherein in step 2, data information of IP, port and protocol fields in the session traffic is filled with all 0 instead.

5. The method according to claim 2, wherein in the step 3, five elements in a ClientHello and ServerHello message are selected as the message load characteristics, and the method includes: TLSVersion (protocol version), Ciphers (held cipher suite), Extensions (extension field), EllipticCurves (elliptic curve cipher), EllipticCurvePointFormat (elliptic curve cipher format); the combined data of the five elements is classified into a special fingerprint array X_{Is just}＝[x₁,x₂,x₃,x₄,x₅]Wherein

x₁: the code of the protocol version is used,

x₂: the code of all the cipher suites that are supported,

x₃: the code of all the extension fields is such that,

x₄: a code of the elliptic curve cipher type,

x₅: code in elliptic curve cipher format.

6. The encrypted malicious traffic detection method based on the two-dimensional features as claimed in claim 2, wherein in the step 4, the packet length and the packet inter-arrival time, and byte distribution data are extracted as the stream fingerprint features.

7. The encrypted malicious traffic detection method based on the two-dimensional features of claim 6, wherein for the packet length, the lengths of all packets in each session are dispersed into a window with the same size, the window size is N bytes, the packets with the packet length between [0, N) bytes are placed into the 1 st bin, the packets with the packet length between [ N,2N) bytes are placed into the 2 nd bin, and so on; then constructing a matrix A, wherein each element A [ i, j ] represents the number of times a packet in the ith bin is converted into a packet in the jth bin is calculated; and finally, carrying out normalization processing on each row of the A, wherein each row is a Markov chain and is used as a packet length characteristic of the conversation.

8. The encrypted malicious traffic detection method based on the two-dimensional features of claim 6, wherein for the packet arrival time interval, the arrival time intervals of all packets in each session are discretized into a window with the same size, the window size is T milliseconds, the packets with the packet arrival time interval between [0, T) milliseconds are placed into the 1 st bin, the packets with the packet arrival time interval between [ T,2T) milliseconds are placed into the 2 nd bin, and so on; then constructing a matrix B, wherein each element B [ i, j ] represents the number of times a packet in the ith bin is converted into a packet in the jth bin is calculated; and finally, carrying out normalization processing on each row of the B, wherein each row is a Markov chain and is used as a packet arrival time interval characteristic of the conversation.

9. The method of claim 6, wherein the byte distribution is a 256-length array that counts each byte value in the payload of each packet in the stream; this count is divided by the total number of bytes found in the packet payload to obtain the probability of each byte value occurring.

10. An encrypted malicious traffic detection system based on two-dimensional features, comprising:

a bi-directional fluidization module to perform step 1 as recited in any one of claims 2-9;

a quintuple feature fuzzification processing module for executing the step 2 according to any one of claims 2 to 9;

a message load characteristic extraction module for executing the step 3 according to any one of claims 2-9;

a stream fingerprint feature extraction module performing step 4 as claimed in any one of claims 2-9;

a logistic regression analysis module performing steps 5-6 as claimed in any one of claims 2-9;