CN110113338A - A kind of encryption traffic characteristic extracting method based on Fusion Features - Google Patents

A kind of encryption traffic characteristic extracting method based on Fusion Features Download PDF

Info

Publication number
CN110113338A
CN110113338A CN201910379472.5A CN201910379472A CN110113338A CN 110113338 A CN110113338 A CN 110113338A CN 201910379472 A CN201910379472 A CN 201910379472A CN 110113338 A CN110113338 A CN 110113338A
Authority
CN
China
Prior art keywords
burst
feature
data packet
plen
ptime
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910379472.5A
Other languages
Chinese (zh)
Other versions
CN110113338B (en
Inventor
沈蒙
张晋鹏
祝烈煌
陈偲祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201910379472.5A priority Critical patent/CN110113338B/en
Publication of CN110113338A publication Critical patent/CN110113338A/en
Application granted granted Critical
Publication of CN110113338B publication Critical patent/CN110113338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Algebra (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The encryption traffic characteristic extracting method based on Fusion Features that the present invention relates to a kind of, belongs to machine learning, network service safe and flow identification technology field.The characteristic value for including the following steps: step 1, extracting encrypted packet different dimensions in an encryption stream;Step 2, calculating signature contributions degree simultaneously normalize, then carry out feature selecting based on signature contributions degree, pick out the optimal characteristics quantity n for participating in fusion, and select preceding n feature as the optimal characteristics amount for participating in fusion;Step 3 is sorted out based on feature of the optimum fusion feature quantity n to different dimensions, carries out a liter peacekeeping fusion using optimal characteristics of the kernel function to the participation fusion that step 2 is selected, and the final characteristic set for participating in classification is exported.The method can preferably portray refined net flow fingerprint;The connection between different characteristic can be characterized;It can quickly determine the feature quantity for participating in fusion, improve the efficiency of Fusion Features;Realize higher accuracy rate.

Description

A kind of encryption traffic characteristic extracting method based on Fusion Features
Technical field
The encryption traffic characteristic extracting method based on Fusion Features that the present invention relates to a kind of, more particularly to different dimensions Traffic characteristic carries out dimension raising and fusion, it is intended to encrypt flow for identification and high-dimensional reliable characteristic be provided, belong to machine learning, Network service safe and flow identification technology field.
Background technique
Flow is the carrier of network information transfer.In order to protect privacy of user, the existing network transmission protocol is using encryption Mode transmits data.By carrying out analysis identification to refined net flow, can preferably can be made for Internet service provider The data distribution efficiency offer theoretical foundation determined routing policy, improve critical transmissions node, further promotes the use of the network user Family experience.Existing encryption method for recognizing flux depends on the network flow characteristic such as data packet length, data packet of single dimension Zone bit information, the temporal information of data packet step on, only rely on the feature of single dimension and the identification of encryption flow helped limited, lead to The Fusion Features for crossing different dimensions can preferably promote refined net traffic classification effect.
Existing method for recognizing flux mainly includes two major classes: the identification of plaintext flow and encryption flow identification.In plain text stream The major technique taken in amount identification is the detection of depth data packet and Port detecting.Use and hop-ports with encryption technology The use of technology, the data packet during network communication are encrypted, and depth data packet inspection technical and Port detecting technology are gradually Lose effectiveness.Present research hotspot is concentrated mainly in encryption flow identification.
In terms of encryption application network traffic classification and identification, maximum two patents of the association that can be retrieved are as follows:
(1) having document A proposes one kind based on markovian refined net stream recognition method.This method utilizes SSL/ The zone bit information of TLS encrypted data packet constructs the Markov fingerprint of different encryption applications, in adding for classification unknown applications The probability that the unknown applications are classified into other different applications is calculated when close flow, this is made a decision using maximum-likelihood method and unknown is answered Generic.The fingerprint of the flag bit state Finite used when constructing markov fingerprint, difference encryption application may Can be closely similar, difference encrypts the case where fingerprint portion applied is overlapped and happens occasionally, this causes this kind of method to be applied in encryption Accuracy in identification.
(2) have document B and propose a kind of encryption method for recognizing flux based on data packet length feature.This method utilizes The data packet length statistical characteristics of every encryption stream, such as minimum value, maximum value, median, average, amount to 54 statistics Characteristic value constructs the fingerprint of different encryption applications, carries out the identification of encryption flow in classification using random forest grader later. As the value volume and range of product of flow to be sorted increases, performance page of this kind of classification method on classification accuracy is had a greatly reduced quality.
In conclusion encryption traffic classification method relies on the feature structure of single dimension in existing encryption traffic classification field The fingerprint of encryption application is built, the feature of single dimension increases with number of applications, and the encryption employing fingerprint of single dimension feature construction is difficult To provide enough differentiation information, will lead to reduces the classification accuracy of encryption application.
Summary of the invention
It is an object of the invention to overcome in existing encryption traffic characteristic extracting method feature quantity is few, characteristic present power Weak technological deficiency provides discrimination big traffic characteristic, and then aids in the flow point of encryption application for identification encryption application Class is extracted and is merged by the feature to flow different dimensions, and the feature after fusion is used classification to promote classification Effect proposes a kind of encryption traffic characteristic extracting method based on Fusion Features.
A kind of encryption traffic characteristic extracting method based on Fusion Features, includes the following steps:
Step 1, the characteristic value for extracting encrypted packet different dimensions in an encryption stream;
Specifically, the encryption stream comprising i data packet is defined with five-tuple, is denoted as flow=[pkt1,…,pkti]; Wherein, five-tuple refers to source port, destination port, source IP, destination IP and transport protocol;pktiIndicate i-th of data packet;
Wherein, the characteristic value of data packet different dimensions includes data packet length statistical characteristics, packet time information system Count characteristic value and data packet Burst behavioral statistics characteristic value;
Step 1 includes following sub-step again:
Step 1.1 is to the data packet computational length statistical characteristics captured;
Wherein, data packet length statistical characteristics includes the data packet length statistical characteristics in three directions;
Wherein, the statistical characteristics quantity in each direction is 19, the data packet length statistical characteristics in three directions Quantity totally 57, it is denoted as Plen=[[plen1],…,[plen57]];
The statistical characteristics in each direction include minimum value Lminimum, maximum value Lmaximum, average value Lmean, in Digit absolute deviation Lmedian_absolute_deviation, standard deviation Lstandard deviation, variance Lvar, tiltedly Rate Lskew, kurtosis Lkurtosis, percentile Lpercentiles10%, Lpercentiles20%, Lpercentiles30%, Lpercentiles40%, Lpercentiles50%, Lpercentiles60%, Lpercentiles70%, Lpercentiles80%, Lpercentiles90%, the data packet number Lnumbers in sequence With the sum of data packet length Lsum;
Step 1.2 calculates temporal information statistical characteristics to the data packet captured;
Wherein, packet time Information Statistics characteristic value includes the packet time Information Statistics characteristic value in three directions;
Wherein, the statistical characteristics quantity in each direction is 18, the data packet length statistical characteristics in three directions Quantity totally 54, it is denoted as Ptime=[[ptime1],…,[ptime54]];
The statistical characteristics in each direction include minimum value Tminimum, maximum of T maximum, average value Tmean, in Digit absolute deviation Tmedian_absolute_deviation, standard deviation Tstandard_deviation, variance Tvar, tiltedly Rate Tskew, kurtosis Tkurtosis, percentile Tpercentiles10%, Tpercentiles20%, Tpercentiles30%, Tpercentiles40%, Tpercentiles50%, Tpercentiles60%, Element number Tnumbers in Tpercentiles70%, Tpercentiles80%, Tpercentiles90% and sequence;
Step 1.3 calculates Burst behavioral statistics characteristic value to the data packet captured;
Wherein, Burst refers to the data packet that the same direction continuously transmits in a stream;
Burst behavioral statistics characteristic value includes that Burst Size and Burst Length, Burst Size refer to a Burst In data packet number, Burst Length refers to the sum of all data packet lengths in a Burst;
Burst Size and Burst Length consider the system in the direction Ingress Burst and the direction Egress Burst Characteristic value is counted, the statistical characteristics of four direction is 72 total, is denoted as PBurst=[[burst1],…,[burst72]];
The statistical characteristics in each direction include minimum value Bminimum, maximum value Bmaximum, average value Bmean, in Digit absolute deviation Bmedian_absolute_deviation, standard deviation Bstandard_deviation, variance Bvariance, slope Bskew, kurtosis Bkurtosis, percentile Bpercentiles10%, Bpercentiles20%, Bpercentiles30%, Bpercentiles40%, Bpercentiles50%, Bpercentiles60%, Element number in Bpercentiles70%, Bpercentiles80%, Bpercentiles90% and sequence Bnumbers amounts to 18;Therefore Burst Size and the Burst Length of all Burst includes the direction Ingress Burst Statistical characteristics with the direction Egress Burst is 72;
Step 2 calculates signature contributions degree and normalizes, then carries out feature selecting based on signature contributions degree, picks out participation The optimal characteristics quantity n of fusion, and select preceding n feature as the optimal characteristics for participating in fusion, specifically include following sub-step:
Step 2.1 calculates signature contributions degree;
The signature contributions degree VIM of every kind of feature is calculated using the Gini coefficient in random foresti
Wherein, every kind of feature refers to the calculated Plen=of step 1.1, step 1.2 and step 1.3 [[plen1],…,[plen57]], Ptime=[[ptime1],…,[ptime54]] and PBurst=[[burst1],…, [burst72One of]];
Wherein, i represents ith feature, and it is 57,54 and 72 sum, respectively that the value range of i, which is 1 to c and c=183, Correspond to the number of species of Plen, Ptime and PBurst;
Step 2.2 is based on the signature contributions degree VIM that formula (1) calculates step 2.1jIt is normalized:
Wherein, c represents all Characteristic Numbers;VIMiRepresent the signature contributions degree of ith feature;
Step 2.3 calculates feature selecting standard value CFC;
The resulting signature contributions degree of step 2.2 is ranked up by sequence from big to small, calculates each feature according to (2) Feature selecting standard value CFC:
Wherein, CFCjIndicate the feature selecting standard value CFC of j-th of feature;The value range of j is 1 to c and c=183;
Step 2.3 draws feature CFC value with the trend chart of characteristic j according to the CFC value calculated in step 2.2, looks for Out in figure inflection point and remember the corresponding j of this inflection point be n, this n be participate in fusion optimal characteristics quantity;
Step 3, sorted out based on feature of the optimum fusion feature quantity n to different dimensions, using kernel function to step 2 The optimal characteristics for the participation fusion selected carry out a liter peacekeeping fusion, export the final characteristic set for participating in classification;
Step 3, specific includes following sub-step again:
Step 3.1 sorts out the feature of different dimensions according to the optimum fusion feature quantity n obtained in step 2;
Wherein, the feature of different dimensions includes data packet length feature, packet time information characteristics and data packet Burst behavioural characteristic, three's quantity are respectively i, j and k;Sort out data packet length feature, is denoted as Plen= [[plen1],…,[pleni]], packet time information characteristics are denoted as Ptime=[[ptime1],…,[ptimej]], data Burst behavioural characteristic is wrapped, Burst=[[burst is denoted as1],…,[burstk]];
And Plen=[[plen1],…,[pleni]] update and the data packet length statistical nature that is substituted in step 1 Plen=[[plen1],…,[plen57]];Ptime=[[ptime1],…,[ptimej]] update and be substituted in step 1 Ptime=[[ptime1],…,[ptime54]], Burst=[[burst1],…,[burstk]] update and be substituted in step 1 Data packet Burst behavioral statistics feature PBurst=[[burstr1],…,[burst72]];
Step 3.2 merges single dimension feature using kernel function, i.e. progress single dimension feature rises dimension, specifically: use x Any one dimensional characteristics in characteristic set f=[Plen, Ptime, Burst] are represented, x is calculated according to (3) first to x and is turned Matrix x ' is set, x is the matrix of a n*1, and x ' is the matrix of a 1*n;
X '=xT (3)
Feature, which is carried out, using Radial basis kernel function (4) rises dimension:
Wherein, K (x, x ') is the matrix of a n*n, δ ∈ (0,1);
After step 3.2, feature quantity is respectively the Plen of i, j, k, and the feature quantity of Ptime, Burst become respectively i2、j2And k2
Step 3.3 is to the i after step 3.2 liter dimension2、j2And k2A feature is merged, specifically: Plen is successively traversed, Element is added in Feature matrix after Ptime, Burst elevation dimension, returns to Feature as final participation classification Characteristic set.
Beneficial effect
The invention proposes a kind of encryption traffic characteristic extracting methods based on special type fusion, with existing encryption traffic characteristic Extracting method is compared, and is had the following beneficial effects:
1. invention introduces the systems of data packet Burst behavior in data packet length, packet time information and network flow Information is counted, the feature set of refined net flow is extracted from multiple dimensions, can preferably portray refined net flow fingerprint;
2. present invention uses Radial basis kernel functions to increase Characteristic Number, the connection between different characteristic is characterized;
3. the present invention devises the balancing method of optimum fusion feature quantity, by using this method, it is embodied in step Rapid 2.3, the interference that can reduce useless feature is selected by the feature to Fusion Features to be participated in, and can quickly determine participation The feature quantity of fusion improves the efficiency of Fusion Features;
4. the present invention is by lot of experimental data it is demonstrated experimentally that with existing refined net traffic classification and recognition methods phase Than can be realized higher accuracy rate using the feature classifiers after Fusion Features.
Detailed description of the invention
Fig. 1 is a kind of overall flow figure of the encryption traffic characteristic extracting method based on Fusion Features of the present invention;
Fig. 2 is that the Burst behavior in a kind of encryption traffic characteristic extracting method step 1 based on special type fusion of the present invention is shown It is intended to;
Fig. 3 is a kind of CFC value encrypted in traffic characteristic extracting method step 2 merged based on special type of the present invention with feature Number changes schematic diagram.
Specific embodiment
With reference to the accompanying drawings and examples, further illustrating the present invention, " a kind of encryption flow based on Fusion Features is special The process of sign extracting method ", and illustrate its advantage.It should be pointed out that implementation of the invention is not limited by the following examples, it is right Accommodation in any form that the present invention is made changes and will fall into protection scope of the present invention.
Embodiment 1
The present embodiment is that the complete encryption traffic characteristic carried out based on step 1 of the invention to step 3 extracts emulation, whole Body flow chart as shown in Figure 1, Dataset Collection be data acquisition phase, can acquire Taobao, Jingdone district etc. using plus The website traffic of close agreement transmission data, goes after then carrying out feature, then carries out feature selecting and Fusion Features, will finally melt Feature after conjunction is classified for Machine learning classifiers.By extracting the feature of different dimensions, Radial basis kernel function is used It carries out feature and rises dimension to obtain the last characteristic set for participating in classification.
Taobao, Jingdone district, Netease's cloud, Amazon, Alipay, wechat etc. are acquired using the flow of cryptographic protocol transmission, with five The form of tuple is shunted, specifically:
It is to extract data packet about data packet length, the statistics of packet time information and data packet Burst behavior first Characteristic value, detailed process are as shown in Figure 1.Assuming that certain the data flow table captured is shown as F=(p1,…,pn), extract this stream Data packet length statistical nature Plen=[[plen1],…,[plen57]], packet time Information Statistics feature Ptime= [[ptime1],…,[ptime54]] and data packet Burst behavioral statistics feature Burst=[[burst1],…, [burst72]].Burst behavior schematic diagram as shown in Fig. 2, one stream in Burst include both direction Ingress Burst and Egress Burst, Burst Size is the data packet number in Burst, and Burst Length is data packet length in Burst The sum of.
The contribution degree of these features is calculated using the Gini coefficient in random forest, the signature contributions degree of Partial Feature is such as Shown in table 1.The CFC value changed with Characteristic Number is calculated according to the feature digit after signature contributions degree and sequence, with Characteristic Number The CFC value schematic diagram of variation is as shown in figure 3, select the inflection point in figure as the optimal number of fusion feature, in this example, We select 120 optimal numbers as fusion feature.
1 Partial Feature signature contributions degree of table
Feature Contribution degree Feature Contribution degree
plen_18 0.030011 burst_11 0.016430
plen_38 0.027685 plen_35 0.015731
plen_55 0.025450 burst_17 0.015577
plen_47 0.018072 plen_33 0.015150
plen_34 0.017442 plen_40 0.014951
plen_42 0.016791 burst_16 0.014811
Then the feature chosen is subjected to feature according to the method in step 3 and rises peacekeeping fusion, by fused spy It takes over for use in traffic classification.
Embodiment 2
The present embodiment is that the traffic characteristic for extracting the method for the invention is used for Machine learning classifiers, with other use Single dimension feature classifiers compare, to verify advantage and validity of the invention.Melted of the present invention based on feature The encryption traffic characteristic extracting method of conjunction is in conjunction with conventional machines learning algorithm random forest, as the classifier of this method, note For FFP.
The method to be compared includes that data packet flag bit is only used to use as the markov classifier (MARK) of feature and only Random forest grader (APPS) of the data packet length as feature.The index of comparison includes the accuracy rate of classifier (Accuracy) and F1-score, F1-Score comprehensively considered accurate rate (Precision) and recall rate (Recall) to point The evaluation criteria of class device.Comparing result is as shown in table 2.
Table 2 and advanced traffic classification category of model Contrast on effect
Classification method MARK APPS FFP
Accuracy rate 0.5879 0.8080 0.9181
F1-Score 0.5665 0.7977 0.9175
From table 2 it can be seen that the present invention has a clear superiority compared with existing traffic classification method, classification it is accurate Rate and F1-Score are higher than other two kinds of sorting algorithms.The present invention is good to using the encrypted flow of cryptographic protocol that can extract Good traffic characteristic, power-assisted can be improved classification accuracy, can put into practical application in encryption traffic classification detection.
Although describing the embodiment of this patent herein in conjunction with attached Example, those skilled in the art are come It says, under the premise of not departing from this patent principle, several improvement can also be made, these are also the protection model to belong to this patent It encloses.

Claims (7)

1. a kind of encryption traffic characteristic extracting method based on Fusion Features, characterized by the following steps:
Step 1, the characteristic value for extracting encrypted packet different dimensions in an encryption stream;
Specifically, the encryption stream comprising i data packet is defined with five-tuple, is denoted as flow=[pkt1,…,pkti];pkti Indicate i-th of data packet;
Wherein, the characteristic value of data packet different dimensions includes data packet length statistical characteristics, packet time Information Statistics spy Value indicative and data packet Burst behavioral statistics characteristic value;
Step 1 includes following sub-step again:
Step 1.1 is to the data packet computational length statistical characteristics captured;
Wherein, data packet length statistical characteristics includes the data packet length statistical characteristics in three directions;
Wherein, the statistical characteristics quantity in each direction is 19, the quantity of the data packet length statistical characteristics in three directions Totally 57, it is denoted as Plen=[[plen1],…,[plen57]];
Step 1.2 calculates temporal information statistical characteristics to the data packet captured;
Wherein, packet time Information Statistics characteristic value includes the packet time Information Statistics characteristic value in three directions;
Wherein, the statistical characteristics quantity in each direction is 18, the quantity of the data packet length statistical characteristics in three directions Totally 54, it is denoted as Ptime=[[ptime1],…,[ptime54]];
Step 1.3 calculates Burst behavioral statistics characteristic value to the data packet captured;
Wherein, Burst refers to the data packet that the same direction continuously transmits in a stream;
Wherein, Burst behavioral statistics characteristic value includes that Burst Size and Burst Length, Burst Size refer to one Data packet number in Burst, Burst Length refer to the sum of all data packet lengths in a Burst;
Burst Size and the Burst Length of Burst includes the direction Ingress Burst and the direction Egress Burst Statistical characteristics is 72 total, is denoted as PBurst=[[burst1],…,[burst72]];
Step 2 calculates signature contributions degree and normalizes, then carries out feature selecting based on signature contributions degree, picks out participation fusion Optimal characteristics quantity n, and optimal characteristics that n feature is merged as participation before selecting specifically include following sub-step:
Step 2.1 calculates signature contributions degree;
The signature contributions degree VIM of every kind of feature is calculated using the Gini coefficient in random foresti
Wherein, i represents ith feature, and the value range of i is 1 to c and c=183, is 57,54 and 72 sum, respectively corresponds The number of species of Plen, Ptime and PBurst;
Step 2.2 is based on the signature contributions degree VIM that formula (1) calculates step 2.1jIt is normalized:
Wherein, c represents all Characteristic Numbers;VIMiRepresent the signature contributions degree of ith feature;
Step 2.3 calculates feature selecting standard value CFC;
The resulting signature contributions degree of step 2.2 is ranked up by sequence from big to small, the spy of each feature is calculated according to (2) Levy selection criteria value CFC:
Wherein, CFCjIndicate the feature selecting standard value CFC of j-th of feature;The value range of j is 1 to c and c=183;
Step 2.3 draws feature CFC value with the trend chart of characteristic j according to the CFC value calculated in step 2.2, finds out figure Middle inflection point simultaneously remembers that the corresponding j of this inflection point is n, this n is the optimal characteristics quantity for participating in fusion;
Step 3 is sorted out based on feature of the optimum fusion feature quantity n to different dimensions, is selected using kernel function to step 2 The optimal characteristics of participation fusion carry out liter peacekeeping fusion, export the final characteristic set for participating in classification;
Step 3, specific includes following sub-step again:
Step 3.1 sorts out the feature of different dimensions according to the optimum fusion feature quantity n obtained in step 2;
Wherein, the feature of different dimensions includes data packet length feature, packet time information characteristics and data packet Burst row It is characterized, three's quantity is respectively i, j and k;Sort out data packet length feature, is denoted as Plen=[[plen1],…, [pleni]], packet time information characteristics are denoted as Ptime=[[ptime1],…,[ptimej]], data packet Burst behavior Feature is denoted as Burst=[[burst1],…,[burstk]];
Step 3.2 merges single dimension feature using kernel function, i.e. progress single dimension feature rises dimension, specifically: use x generation Any one dimensional characteristics in table characteristic set f=[Plen, Ptime, Burst] calculate first x the transposition of x according to (3) Matrix x ', x are the matrixes of a n*1, and x ' is the matrix of a 1*n;
X '=xT (3)
Feature, which is carried out, using Radial basis kernel function (4) rises dimension:
Wherein, K (x, x ') is the matrix of a n*n, δ ∈ (0,1);
After step 3.2, feature quantity is respectively the Plen of i, j, k, and the feature quantity of Ptime, Burst become i respectively2、j2 And k2
Step 3.3 is to the i after step 3.2 liter dimension2、j2And k2A feature is merged, specifically: Plen is successively traversed, Element is added in Feature matrix after Ptime, Burst elevation dimension, returns to Feature as final participation classification Characteristic set.
2. a kind of encryption traffic characteristic extracting method based on Fusion Features according to claim 1, it is characterised in that: step Five-tuple in rapid 1 refers to source port, destination port, source IP, destination IP and transport protocol.
3. a kind of encryption traffic characteristic extracting method based on Fusion Features according to claim 1, it is characterised in that: step The statistical characteristics in each direction includes minimum value Lminimum, maximum value Lmaximum, average value Lmean, middle position in rapid 1.1 Number absolute deviation Lmedian_absolute_deviation, standard deviation Lstandard deviation, variance Lvar, slope Lskew, kurtosis Lkurtosis, percentile Lpercentiles10%, Lpercentiles20%, Lpercentiles30%, Lpercentiles40%, Lpercentiles50%, Lpercentiles60%, Lpercentiles70%, Lpercentiles80%, Lpercentiles90%, the data packet number Lnumbers in sequence With the sum of data packet length Lsum.
4. a kind of encryption traffic characteristic extracting method based on Fusion Features according to claim 1, it is characterised in that: step The statistical characteristics in each direction includes minimum value Tminimum, maximum of T maximum, average value Tmean, middle position in rapid 1.2 Number absolute deviation Tmedian_absolute_deviation, standard deviation Tstandard_deviation, variance Tvar, slope Tskew, kurtosis Tkurtosis, percentile Tpercentiles10%, Tpercentiles20%, Tpercentiles30%, Tpercentiles40%, Tpercentiles50%, Tpercentiles60%, Element number Tnumbers in Tpercentiles70%, Tpercentiles80%, Tpercentiles90% and sequence.
5. a kind of encryption traffic characteristic extracting method based on Fusion Features according to claim 1, it is characterised in that: step Burst Size and Burst Length described in rapid 1.3 consider the direction Ingress Burst and the direction Egress Burst Statistical characteristics, in four direction the statistical characteristics in each direction include minimum value Bminimum, maximum value Bmaximum, Average value Bmean, median absolute deviation Bmedian_absolute_deviation, standard deviation Bstandard_ Deviation, variance Bvariance, slope Bskew, kurtosis Bkurtosis, percentile Bpercentiles10%, Bpercentiles20%, Bpercentiles30%, Bpercentiles40%, Bpercentiles50%, In Bpercentiles60%, Bpercentiles70%, Bpercentiles80%, Bpercentiles90% and sequence Element number Bnumbers, amount to 18.
6. a kind of encryption traffic characteristic extracting method based on Fusion Features according to claim 1, it is characterised in that: step In rapid 2.1, every kind of feature refers to calculated the Plen=[[plen of step 1.1, step 1.2 and step 1.31],…, [plen57]], Ptime=[[ptime1],…,[ptime54]] and PBurst=[[burst1],…,[burst72]] in It is a kind of.
7. a kind of encryption traffic characteristic extracting method based on Fusion Features according to claim 1, it is characterised in that: step Plen=[[plen in rapid 3.11],…,[pleni]] update and the data packet length statistical nature Plen=that is substituted in step 1 [[plen1],…,[plen57]];Ptime=[[ptime1],…,[ptimej]] update and the Ptime=that is substituted in step 1 [[ptime1],…,[ptime54]], Burst=[[burst1],…,[burstk]] update and the data that are substituted in step 1 Wrap Burst behavioral statistics feature PBurst=[[burst1],…,[burst72]]。
CN201910379472.5A 2019-05-08 2019-05-08 Encrypted flow characteristic extraction method based on characteristic fusion Active CN110113338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910379472.5A CN110113338B (en) 2019-05-08 2019-05-08 Encrypted flow characteristic extraction method based on characteristic fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910379472.5A CN110113338B (en) 2019-05-08 2019-05-08 Encrypted flow characteristic extraction method based on characteristic fusion

Publications (2)

Publication Number Publication Date
CN110113338A true CN110113338A (en) 2019-08-09
CN110113338B CN110113338B (en) 2020-06-26

Family

ID=67488756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910379472.5A Active CN110113338B (en) 2019-05-08 2019-05-08 Encrypted flow characteristic extraction method based on characteristic fusion

Country Status (1)

Country Link
CN (1) CN110113338B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751222A (en) * 2019-10-25 2020-02-04 中国科学技术大学 Online encrypted traffic classification method based on CNN and LSTM
CN110958233A (en) * 2019-11-22 2020-04-03 上海交通大学 Encryption type malicious flow detection system and method based on deep learning
CN111526100A (en) * 2020-04-16 2020-08-11 中南大学 Cross-network traffic identification method and device based on dynamic identification and path hiding
CN112001452A (en) * 2020-08-27 2020-11-27 深圳前海微众银行股份有限公司 Feature selection method, device, equipment and readable storage medium
CN114363061A (en) * 2021-12-31 2022-04-15 深信服科技股份有限公司 Abnormal flow detection method, system, storage medium and terminal
CN116016365A (en) * 2023-01-06 2023-04-25 哈尔滨工业大学 Webpage identification method based on data packet length information under encrypted flow

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104135385A (en) * 2014-07-30 2014-11-05 南京市公安局 Method of application classification in Tor anonymous communication flow
US20180260705A1 (en) * 2017-03-05 2018-09-13 Verint Systems Ltd. System and method for applying transfer learning to identification of user actions
CN108650194A (en) * 2018-05-14 2018-10-12 南开大学 Net flow assorted method based on K_means and KNN blending algorithms
CN109194657A (en) * 2018-09-11 2019-01-11 北京理工大学 A kind of encrypting web traffic characteristic extracting method based on accumulation data packet length
CN109286576A (en) * 2018-10-10 2019-01-29 北京理工大学 A kind of network agent encryption traffic characteristic extracting method of data packet frequency analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104135385A (en) * 2014-07-30 2014-11-05 南京市公安局 Method of application classification in Tor anonymous communication flow
US20180260705A1 (en) * 2017-03-05 2018-09-13 Verint Systems Ltd. System and method for applying transfer learning to identification of user actions
CN108650194A (en) * 2018-05-14 2018-10-12 南开大学 Net flow assorted method based on K_means and KNN blending algorithms
CN109194657A (en) * 2018-09-11 2019-01-11 北京理工大学 A kind of encrypting web traffic characteristic extracting method based on accumulation data packet length
CN109286576A (en) * 2018-10-10 2019-01-29 北京理工大学 A kind of network agent encryption traffic characteristic extracting method of data packet frequency analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KHALED AL-NAAMI等: "Adaptive encrypted traffic fingerprinting with bi-directional dependence", 《ACSAC’16:PROCEEDINGS OF THE 32ND ANNUAL CONFERENCE ON COMPUTER SECURITY》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751222A (en) * 2019-10-25 2020-02-04 中国科学技术大学 Online encrypted traffic classification method based on CNN and LSTM
CN110958233A (en) * 2019-11-22 2020-04-03 上海交通大学 Encryption type malicious flow detection system and method based on deep learning
CN110958233B (en) * 2019-11-22 2021-08-20 上海交通大学 Encryption type malicious flow detection system and method based on deep learning
CN111526100A (en) * 2020-04-16 2020-08-11 中南大学 Cross-network traffic identification method and device based on dynamic identification and path hiding
CN111526100B (en) * 2020-04-16 2021-08-24 中南大学 Cross-network traffic identification method and device based on dynamic identification and path hiding
CN112001452A (en) * 2020-08-27 2020-11-27 深圳前海微众银行股份有限公司 Feature selection method, device, equipment and readable storage medium
CN112001452B (en) * 2020-08-27 2021-08-27 深圳前海微众银行股份有限公司 Feature selection method, device, equipment and readable storage medium
CN114363061A (en) * 2021-12-31 2022-04-15 深信服科技股份有限公司 Abnormal flow detection method, system, storage medium and terminal
CN116016365A (en) * 2023-01-06 2023-04-25 哈尔滨工业大学 Webpage identification method based on data packet length information under encrypted flow
CN116016365B (en) * 2023-01-06 2023-09-19 哈尔滨工业大学 Webpage identification method based on data packet length information under encrypted flow

Also Published As

Publication number Publication date
CN110113338B (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN110113338A (en) A kind of encryption traffic characteristic extracting method based on Fusion Features
CN111340191B (en) Bot network malicious traffic classification method and system based on ensemble learning
CN112235264B (en) Network traffic identification method and device based on deep migration learning
CN108768986B (en) Encrypted traffic classification method, server and computer readable storage medium
Gogoi et al. MLH-IDS: a multi-level hybrid intrusion detection method
CN104244035B (en) Network video stream sorting technique based on multi-level clustering
CN104135385B (en) Method of application classification in Tor anonymous communication flow
Wang et al. A deep hierarchical network for packet-level malicious traffic detection
CN105871619B (en) A kind of flow load type detection method based on n-gram multiple features
CN113364787B (en) Botnet flow detection method based on parallel neural network
Ahn et al. Explaining deep learning-based traffic classification using a genetic algorithm
CN110958233B (en) Encryption type malicious flow detection system and method based on deep learning
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
Niu et al. A heuristic statistical testing based approach for encrypted network traffic identification
Liu et al. A distance-based method for building an encrypted malware traffic identification framework
CN109286576A (en) A kind of network agent encryption traffic characteristic extracting method of data packet frequency analysis
Dowoo et al. PcapGAN: Packet capture file generator by style-based generative adversarial networks
Lu et al. A heuristic-based co-clustering algorithm for the internet traffic classification
CN108123962A (en) A kind of method that BFS algorithms generation attack graph is realized using Spark
Zheng et al. Two-layer detection framework with a high accuracy and efficiency for a malware family over the TLS protocol
Chung et al. An effective similarity metric for application traffic classification
CN113254743B (en) Security semantic perception searching method for dynamic spatial data in Internet of vehicles
CN107124410A (en) Network safety situation feature clustering method based on machine deep learning
CN106557983A (en) A kind of microblogging junk user detection method based on fuzzy multiclass SVM
Lu et al. Cascaded classifier for improving traffic classification accuracy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant