CN110417729A - A kind of service and application class method and system encrypting flow - Google Patents
A kind of service and application class method and system encrypting flow Download PDFInfo
- Publication number
- CN110417729A CN110417729A CN201910504060.XA CN201910504060A CN110417729A CN 110417729 A CN110417729 A CN 110417729A CN 201910504060 A CN201910504060 A CN 201910504060A CN 110417729 A CN110417729 A CN 110417729A
- Authority
- CN
- China
- Prior art keywords
- flow
- traffic
- data packet
- session
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
Abstract
The invention discloses a kind of services and application class method and system for encrypting flow.The method include the steps that 1) according to session granularity by continuous flow cutting to be processed be multiple session traffics;2) to treated, each session traffic cutting is multiple flow groups according to the progress cutting of data packet granularity by each session traffic, and the data packet number in each flow group is no more than the maximum value of setting;3) size of each flow group is subjected to unification, each flow group is then converted into traffic matrix, and be IDX flow file by traffic matrix and its Tag Packaging;4) with above-mentioned IDX flow file training CapsNet model, the identification model with automated characterization selective power is obtained;5) encryption flow to be identified for one, divides it and is converted to traffic matrix and be then input to the identification model, and service type and applicating category belonging to the flow to be identified are obtained.The present invention can effectively classify to encryption flow.
Description
Technical field
The invention proposes a kind of services and application class method for encrypting flow, it proposes a kind of novel flow two
Secondary cutting mechanism realizes effective classification of encryption flow in combination with capsule neural network (CapsNet), and the present invention covers original
The conversion of flow, the model training based on CapsNet, the classification for encrypting flow, belong to the friendship of network security and computer science
Pitch technical field.
Background technique
In recent years, with the continuous development of Internet technology and information science technology, network flow is in explosive growth.Root
According to the visual network exponential forecasting report of Cisco's publication, the IP data on flows transmitted on public and private network, including support
The mobile data flow and internet traffic that pipe IP flow, consumer generate, the whole world in 2017 is average monthly to generate 122EB
The data on flows of (1EB=220TB), and will increase by twice to global ip flow in 2022, reach monthly 396EB.At the same time,
Demand with netizen to network world constantly changes, so that various new business emerge one after another.These new business are given
While netizen offers convenience, the isomerism and complexity of network are also increased, this brings unprecedented to network security
Challenge.
In terms of network security, network security has become one of the key problem that internet is faced, information in recent years
The behavior of the hostile networks such as leakage, illegal invasion, ddos attack increasingly influences use of the user to internet, and with technology
Development and progress, the discharge characteristic of network malicious attack becomes increasingly complicated and hidden.According to identity theft resource in 2018
The data at center share nearly 34,200,000 thefts record by September, 2018;According to Arbor Networks the 13rd year phase base
Infrastructure safety message, the peak value challenging dose of first half of the year DDoS in 2018 have reached 1.7Tbps, and than 2017 first half of the year increased
179%, by 2022, global ddos attack sum will be doubled than 2017, reach 14,500,000.Network administrator needs to net
Network flow carries out Classification and Identification fast and accurately to position abnormal behaviour present in network, cuts off the propagation of malicious intrusions in time
Approach, reduction malicious intrusions as far as possible are to harm caused by user security risk and loss.Meanwhile it can be found not by flow identification technology
It is knowing, camouflage property Webshell, entire attack process is restored from the angle of Kill Chain, to attacker, attack tool,
Attacking ways etc. are analysed in depth and are drawn a portrait.
The classification of network flow and identification technology run through the modules of security postures perception, are network security situation awareness
In essential a part.Have a large amount of net flow assorted at present to be suggested with identification technology, is broadly divided into and is based on
It the flow identification technology of port, the flow identification technology based on deep-packet detection, the flow identification technology based on statistics and is based on
The flow identification technology of behavior.
Above-mentioned network flow identification technology has good recognition effect for traditional network application.However, from " prism "
After the exposure of monitoring project, global refined net flow constantly rises violently.2018 annual report of Sandvine is shown to be surpassed on internet
The flow for crossing 50% is encryption, and will continue to increase.For the detection for hiding firewall and antivirus software, most of malice
Software generally uses Traffic Encryption technology to hide communication information.Traffic Encryption almost becomes all including Malware
The fact that network application standing procedure, based on encryption flow identification technology will become can not interpret security threat under content scene
The important means of detection analyzes the key messages such as network behavior, process behavior by encryption flow identification technology, and passes through Kill
Chain analyzes to restore attack process, provides threat treatment advice for safety officer.
Although the research of present flow rate identification achieves many research achievements, but on going result is directed to non-encrypted flow mostly
Study of recognition.In actual flow identification process, encryption flow identification has not been suitable for traditional flow identification technology.Wherein, with
P2P application appearance and dynamic end slogan technology be widely used, the method using port numbers identification flow is no longer valid;End
The development of mouth obfuscation further limits its validity.Encryption flow growing day by day due to concealing load characteristic,
Lead to not be identified using deep packet inspection method, the tunnelings technology such as tunnel also further limits its application.It removes
Except this, since deep packet is identified that this is related to the problem of invading privacy of user by analyzing application layer data.Due to lacking
Weary effective encryption flow analysis and administrative skill, bring huge challenge to network management and security.
Currently, it is very rich to carry out knowledge method for distinguishing to encryption flow based on machine learning, but conventional machines study needs
It manually extracts feature and nicety of grading is too dependent on feature selecting, this not only limits the scalability of the method and make it
It cannot achieve real-time grading.Deep learning is an effective way for solving manually to extract feature in conventional machines study, it can
To automatically extract feature from input data without human intervention, model is established by way of simulating human brain, explains number
According to, with reach identification internet in encryption flow, be a completely new trial.
According to statistics, the encryption flow recognizer based on deep learning mainly includes multilayer perceptron (MLP), stacks coding
Device (SAE), one-dimensional convolutional neural networks (1dCNN) are found, base in the comparison of the encryption flow recognizer of a large number of researchers
Higher accuracy of identification is achieved than the recognizer that conventional machines learn in the recognizer of deep learning, and based on deep
It spends in the algorithm of study, 1dCNN algorithm achieves optimal encryption flow recognition effect.
However, 1dCNN requires feature unrelated with position, and only consider the presence or absence of feature without considering in identification process
The position of feature and other attributes.But it is considered that in flow the position of specific character string and data packet put in order it is same
It is also one of feature in need of consideration.In addition to this, in the identification mission of encryption flow, these coding files are not equal to figure
Piece file, they are no longer desirable for the pondization operation of CNN.Unquestionably, either maximum pondization operation or minimum Chi Huacao
Make, all will abandon certain information and change the validity feature of coded string behind.
Summary of the invention
For the technical problems in the prior art, the invention proposes a kind of services and application class for encrypting flow
Method and system.The present invention is based on the encryption traffic classification models of the secondary cutting mechanism of capsule neural network (CapsNet), will
It is named as SPCaps, which can effectively classify to encryption flow.The present invention division field different for encryption flow
Scape is classified, and is specifically included the classification of service of encryption flow, i.e., is classified according to the service type of encryption flow, such as: net
Page browsing, Streaming Media, instant messaging etc.;The application class of flow is encrypted, i.e., the application program according to belonging to encryption flow carries out
Classification, such as: Skype, BitTorrent, YouTube etc..
The present invention proposes a kind of novel secondary cutting mechanism of flow in process of data preprocessing, and develops a set of
By the pretreating tool collection integrated including EditCap tool, SplitCap tool, Powershell script, Python script, mesh
Be the specific gravity for diluting unrelated flow while the weight for increasing effective discharge.In addition, present invention combination CapsNet algorithm is realized
The training of model, in encryption traffic classification, the drawbacks of CapsNet can make up 1dCNN, major embodiment in the following areas: 1)
For CapsNet in the space characteristics of study encryption flow, input and output no longer use the scalar of traditional neural network, but logical
Vector replacement is crossed, in the present invention, the length of vector indicates the probability of flow generic, and the direction of vector indicates the category of classification
Property includes putting in order between the fixation position and data packet of specific character string in flow.2) CapsNet does not use convolution
Pondization operation in neural network, pondization operation have also abandoned some necessary while reducing Connecting quantity, refining feature
Information, CapsNet, which gives up pondization operation, will be more suitable for the such coding file of flow.3) in the premise for guaranteeing accuracy of identification
Under, CapsNet ratio CNN has faster recognition speed, therefore the flow identification being more suitable under real time environment.
To achieve the goal, the present invention is using specific technical solution:
A kind of recognition methods encrypting flow, comprising the following steps:
1) carry out first time cutting according to session granularity: the traffic classification method based on deep learning is needed first, in accordance with one
Determining granularity for continuous flow cutting is multiple discrete units.There are five types of network flow slit modes: TCP connection, stream, session, clothes
Business, host.Wherein, stream and session are that the more flow form of expression is used in current research.Therefore, the present invention will be to be processed
Original flow carries out first time cutting according to session granularity.Session refers to the flow packet being made of bidirectional flow, i.e., having the same
Five-tuple (source IP, source port, destination IP, destination port, transport layer protocol), wherein source IP and destination IP can be interchanged.
2) flow cleaning is encrypted: in traffic classification, the IP address of the address Mac of data link layer and network layer (source IP,
Destination IP) it can not be as the feature of classification.If traffic capture environment is relatively limited, the address Mac and IP address can be to a certain extent
The training for influencing model leads to the over-fitting of classification, therefore we delete the field of the address Mac and IP address in data packet.
3) second of cutting is carried out according to data packet granularity: since the flow acquired from real network environment includes
The data packet unrelated with classification, this will directly affect the training and test of model.Therefore we are flowed by setting by step 2)
The maximum quantity of data packet in amount continues cutting to flow.Due to the communication process that session to be slit is acted normally mostly, this
The specific gravity of unrelated flow in original flow is diluted in terms of step 1, on the other hand also increases the weight of effective discharge.
4) input form of specification encryption flow: the input of fixed size is needed using neural metwork training data, therefore
Flow file Jing Guo above step is unified size according to fixed byte by us, if flow file is greater than set fixed word
Section, the then byte after deleting add to fixed byte with 00 if flow file is less than fixed byte.Finally, we will be through
It crosses the above flow handled and is converted to traffic matrix, and be packaged traffic matrix sample and its label by IDX file,
IDX file is the input file reference format that many CapsNet and CNN models use.
5) based on the model training of CapsNet: using the flow file of above-mentioned steps treated IDX format, being based on
CapsNet, using the space characteristics of convolution operation and Dynamic routing mechanisms study encryption flow, establishing has automated characterization selection
The efficient identification model of ability, can be according to identification encryption flow and according to the service type and application class of flow
Type is effectively classified.
6) encryption flow identification: completing the identification and classification of encryption flow using the model by above step training,
In, the present invention can realize effective encryption traffic classification in following scene, comprising: 1) classification of service, i.e. identification encryption flow
Affiliated service type;2) application class, i.e. concrete application program belonging to identification encryption flow.
The present invention provides a kind of service and application class system for encrypting flow, which is characterized in that pre-processes including flow
Module, model training module and encryption traffic identification module;Wherein,
Flow preprocessing module, for according to session granularity by continuous flow cutting to be processed be multiple session traffics;
Then cutting is carried out according to data packet granularity to each session traffic, is multiple flow groups, every one stream by each session traffic cutting
Data packet number in amount group is no more than the maximum value of setting;Then the size of each flow group is carried out after reunification by each flow
Group is converted to traffic matrix, and is IDX flow file by traffic matrix and its Tag Packaging;
Model training module is obtained using IDX flow file training CapsNet model with automated characterization selective power
Identification model;
Traffic identification module is encrypted, for the traffic matrix of encryption flow to be identified to be input to the identification model, is obtained
Service type and applicating category belonging to the flow to be identified.
Compared with prior art, the positive effect of the present invention:
1. the encryption flow identification model based on CapsNet that the invention proposes a kind of, it can be compiled fixed in flow
One of the feature to put in order as study between the specific position and packet and packet of code.
2. the invention proposes a kind of secondary cutting mechanism of flow, for diluting the specific gravity of unrelated flow and increasing effective
The weight of flow can realize effective noise reduction to flow while determining the flow form of expression.
3. the present invention is using assessment SPCaps model, experimental result on publicly available ISCX VPN-nonVPN data set
Show that SPCaps is better than state-of-the-art recognition methods in encryption traffic service and application identification mission.
Detailed description of the invention
Fig. 1 is overall flow figure of the invention.
Fig. 2 is the secondary cutting schematic diagram of mechanism of flow of the invention.
Fig. 3 is the model framework schematic diagram of the invention based on CapsNet.
Fig. 4 is size distribution plot of the original flow under session granularity in ISCX VPN-nonVPN data set.
Specific embodiment
Technical solution in embodiment in order to enable those skilled in the art to better understand the present invention, and make of the invention
Objects, features and advantages can be more obvious and easy to understand, makees with reference to the accompanying drawing to technological core in the present invention further details of
Explanation.It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
In the present invention, a kind of service and application class method for encrypting flow is devised.The general thought of this method is
Cutting, cleaning, specification are carried out to the encryption flow under true environment by pretreating tool collection, dilute the specific gravity of unrelated flow simultaneously
Increase the weight of effective discharge, and then establish the space characteristics of model learning encryption flow based on CapsNet, finally can be achieved
The encryption flow identification of effect and the classification of service and application.
Overall flow figure of the invention as shown in Figure 1, the method specific steps datail description are as follows:
(1) conversion of original flow
The present invention is in data preprocessing phase, in order to reduce the noise and specification its input form of original flow, we
The conversion of original flow: Pcap-Sessions cutting, deleting MAC address and IP address will be completed by following five steps,
Session-Packets cutting, it is unified to input size, be converted to IDX.
1) Pcap-Sessions cutting: the method for recognizing flux based on deep learning is needed continuous flow with certain
Specific granularity cutting is discrete unit.Original flow P is the set comprising different data packet, is expressed as P={ p1,…,
p|P|}.Wherein, a data packet piIt is defined as:
pi=(xi,bi,ti) (1)
Wherein, i=1,2 ..., | P |, bi∈ (0, ∞), ti∈ [0, ∞), xiIt is the five-tuple (source of i-th of data packet
IP, source port, destination IP, destination port, transport layer protocol), biIt is the byte length and t of i-th of data packetiIt is i-th of number
At the beginning of packet.Original flow carries out first time cutting according to session granularity.One session SiIt is containing identical five-tuple
Bidirectional flow set, it is defined as:
Si={ p1=(x1,b1,t1),...,pn=(xn,bn,tn)} (2)
Wherein, x1=...=xn, t1<…<tn, n is SiIn data packet number.This step is specifically real by SplitCap tool
It is existing.
2) delete the address Mac and IP address: the address Mac and IP address can not as the feature in training process, on the contrary, it
Presence be easy to cause the over-fitting of model, therefore, we are deleted by the character string of corresponding position in packet discard
The address Mac and IP address.This step is specifically realized by EditCap tool.
3) Session-Packets cutting: there is lesser session of taking measurements greatly in the network flow captured under true environment,
They often identify unrelated session, such as SNMP with flow, DNS and ARP data segment, this seriously affects effective knowledge of flow
Not.Due to those larger-size sessions be the main activities in communication process and also they there was only a small amount of extraneous data packet,
Therefore we have proposed a kind of Session-Packets slit modes, for diluting the specific gravity of unrelated flow and increasing effectively stream
The weight of amount.It continues each session of cutting (i.e. discrete unit) by the maximum value of data packet in setting session traffic, obtains
To the corresponding multiple flow groups of each session, the data packet in each flow group is at most no more than the maximum value of setting.G is indicated
Newest flow group after Session-Packets cutting, is defined as:
Wherein, GijIt is i-th of session traffic SiIn j-th of newest flow group, m is GijIn data packet number, C is several
According to the maximum quantity of packet, it is defined as:
Wherein, LsampleIndicate the file byte length of storage flow group, LheaderIndicate the text of the file of storage flow group
Part head byte length, LpacketIndicate the data byte length of data packet;Performance of the traffic matrix before being converted to IDX file
Form is all .pcap file, other than comprising data on flows, the also file header (.txt.jpg comprising mark the file information
Equal files have file header), file header accounts for 112 bytes, the maximum value C of all discrete units be it is unified, set C=herein
16.Flow group byte length is unified for 784 bytes, and the minimum byte length behind the data packet deletion address Mac and IP address is 40
Byte is handled file header above and is fixed with 112 bytes, and theoretic maximum packet quantity should be 16.8 bytes, but due to packet number
Amount should be integer, and be easy to upset the communication sequence between data packet according to odd number cutting, and therefore, setting C is 16.Why this
The reason of sample defines is it is desirable that flow group G can be made full use of to predict entire session.In our view, it is wrapped in flow group G
The data packet number contained is more, then more representative.Therefore, we make C as big as possible to give full play to flow group G representative
Property.We summarize secondary cutting mechanism (Pcap-Sessions cutting and Session-Packets cutting) as shown in Fig. 2, former
Beginning flow carries out Pacp-Sessions cutting according to the flow form of expression of session, then passes through data in setting session traffic
The maximum quantity C of packet carries out Session-Packets cutting to session traffic.This step is specifically realized by EditCap tool.
4) unified input size: the input of fixed size is needed using neural network, therefore flow group G is unified for by we
784 bytes only retain 784 bytes most started if flow dimensions are greater than 784 bytes;If flow dimensions are less than 784 words
Section, then with setting character string (such as 0x00) polishing to 784 bytes.This step is specifically realized by Powershell script.
5) be converted to IDX: the flow of 784 bytes is converted to the traffic matrix of 28*28 by we, i.e., by one-dimensional 784 byte
Flow coded sequence be converted into the traffic matrix of 28*28.These traffic matrixs and its label are then packaged as IDX file,
IDX file is the standard input of many CapsNet and CNN models.This step is specifically realized by Python script.
(2) it is based on CapsNet training pattern
The present invention is based on CapsNet algorithms, using the traffic matrix and label encapsulated by IDX as data set, to encrypt flow
Establish classification of service model and application class model.Algorithm mainly includes convolution operation and dynamic routing, architecture diagram such as Fig. 3 institute
Show.
1) convolution operation
Model read first passes through the traffic matrix of the above pretreated 28*28, while making normalized to them.In
In ReLU convolutional layer, the convolution that step number is 1 is executed to every traffic matrix using 256 convolution kernels having a size of 9*9 first and is grasped
Make, generates 256 eigenmatrixes having a size of 20*20.Then, input of second convolutional layer PrimaryCaps as capsule
Layer building vector structure.PrimaryCaps executes the convolution operation of 8 different weights, each convolution in 256 eigenmatrixes
Operation will use 32 convolution kernels having a size of 9*9 to execute the convolution operation that step number is 2, ultimately generate 6*6*32 8 dimensional vectors,
That is activity vector, the capsule unit that each activity vector is made of 8 common convolution units.
2) dynamic routing
The third layer DigitCaps of neural network is for transmitting and the input of more new capsule, including affine transformation and dynamic
Route two steps.In affine transformation, the activity vector u of PrimaryCaps layers of low layer outputiWith weight matrix WijIt is mutually multiplied
To predicted vectorThe input s of high-rise capsulejByObtained by weighted sum, it is defined as:
Wherein, each activity vector uiRespectively correspond a weight matrix Wij, WijBy the random number for complying with standard normal distribution
Initialization generates, and is updated by loss function, cijIt is the coefficient of coup determined by the dynamic routing of iteration.
The purpose of Dynamic routing mechanisms is to find the optimal path of capsule output with the input of next layer of capsule, and finds
One of the method for " optimal path " is exactly that the input vector for being best suitable for output is found by way of iteration, and degree of conformity passes through output
The inner product of vector sum input vector (vector after affine transformation and weighted sum) is characterized, this degree of conformity is added directly into
cijIn, the present invention determines to set 3 for the number of iterations by multiple parameter optimization.C in formula (5)ijMore new formula it is as follows:
cij=softmax (bij) (6)
Wherein, bijIt is the logarithm prior probability that capsule i is coupled to capsule j.
The length of capsule output vector indicates the probability for belonging to some classification, thus its value range should between [0,1],
This process is realized by compression function, is defined as follows:
Wherein, vjIt is the output vector of capsule j, sjIt is all input vectors of capsule j.
WijIt is updated with other deconvolution parameters of whole network by loss function, therefore, we use Margin
Loss function as loss function, is defined as:
Lc=Tcmax(0,m+-||vc||2)+λ(1-Tc)max(0,||vc||-m-) (8)
Wherein, c is prediction classification, TcIt is an indicator function, when c prediction is correct, TcEqual to 1, conversely, TcEqual to 0.m+It is vector length | | vc| | coboundary, m- is vector length | | vc| | lower boundary.In addition, our ratios according to 0.0005
Reconstruct loss is reduced, so that it will not dominate the Margin loss loss function in training process.
Traffic matrix to be identified passes through CapsNet, exports N number of 16 dimensional vector, and N indicates that flow classification to be sorted is total
Number, the length representative flow of vector belong to the probability of some classification, and the attribute that the direction of vector represents flow includes fixed character
The position of string and the sequence of data parlor.Then N number of 16 dimensional vector is exported into traffic matrix to be identified by softmax classifier
Belong to the probability of each classification, the classification of maximum probability is the prediction classification of flow, and prediction classification is the final output of this model.
(3) identification and application of encryption flow and classification of service
The identification and classification of encryption flow, i.e., classification to be identified for one are completed using the model by above step training
Flow, divided and be converted to traffic matrix first, traffic matrix, which is then inputted above-mentioned trained model, to obtain
To the classification of the flow, comprising: 1) classification of service, 2) application class.
(4) experimental result compares
To verify effectiveness of the invention, using ISCX VPN-nonVPN data set as initial data, it includes for we
150 original flow files, wherein including 6 kinds of conventional encryption flows (Chat, Streaming, VoIP etc.) and 6 kinds of VPN flows
(VPNChat, VPNStreaming, VPNVoIP etc.), in addition, have 9 original flow files be captured by Tor software 5 kinds not
The flow generated with application program.Since Tor flow only supports encrypted link and TCP flow on internet, it is difficult to track and divides
Analyse their flow.Therefore, we extract them to realize the application program classification of Tor.Finally, we are by precision, Cha Zhun
Four rate, recall ratio, F1 value indexs assess effectiveness of the invention compared with existing method with this.
Specifically, experiment is divided by we: 1) assessing and compare deleting MAC address and IP address and two in data prediction
The validity of secondary cutting mechanism;2) assess and compare validity of the SPCaps in encryption traffic service classification task;3) it assesses
The validity in flow application classification task is being encrypted with SPCaps is compared.
1) pre-processed results
We pre-process ISCX VPN-nonVPN data set using above-mentioned original flow switch process, are executing
After Pcap-Sessions cutting, the byte size distribution that we have counted the session traffic for classification of service is as shown in Figure 4.
As can be seen that the size distributed pole of session traffic is uneven, the session traffic in this 12 kinds of flows, more than 50%
It is the flow unrelated with classification task mostly among these less than 0.5KB.Especially, have in Chat, Email, File and Voip
80% or more session traffic is less than 0.2KB.Therefore, the size distribution of session traffic confirms Session-Packets cutting
Necessity and rationality in preprocessing process.According to formula (4), we are set in Session-Packets dicing step
Setting the maximum quantity wrapped in each session is 16.Finally, in the classification of service task of encryption flow, item name contained is answered
It is amounted to program and flow as shown in table 1.
Table 1 is the sample content for encrypting traffic service classification
Classification | Application program | It amounts to |
Chat | AIM Facebook Hangouts ICQ Skype | 11365 |
Email Gmail | 12822 | |
File | Ftps SCP Sftp Skype | 19553 |
P2P | Torrent | 60000 |
Streaming | Facebook Hangouts Netflix Skype Spotify Vimeo YouTube | 21273 |
Voip | Facebook Hangouts Skype Voipbuster | 21000 |
VPNChat | AIM Facebook Hangouts ICQ Skype | 13710 |
VPNEmail | 2890 | |
VPNFile | Ftps Sftp Skype | 17528 |
VPNP2P | Bittorrent | 6000 |
VPNStreaming | Facebook Netflix Spotify Vimeo YouTube | 12000 |
VPNVoip | Hangouts Skype Voipbuster | 14805 |
2) pretreatment comparison
In the conversion process of original flow, it is proposed that deleting the address Mac and IP address to avoid over-fitting, while I
Propose that Session-Packets cutting carries out cutting twice to traditional session traffic.In addition to this, in order to prove flowing
CapsNet ratio 1dCNN is more suitable in amount classification, and in each experiment, we are carried out pair using the two neural network algorithms
Than.Therefore, we perform six different encryption traffic service classification tasks altogether on ISCX VPN-nonVPN data set,
Experimental result is as shown in table 2.
Table 2 is pretreatment contrast and experiment
The results show that either 1dCNN or CapsNet, it is proposed that the address deletion Mac and IP address and
Session-Packets cutting illustrates better classifying quality.It, can be in the comparative experiments of two neural networks except this
Find out, CapsNet ratio 1dCNN illustrates higher nicety of grading and F1 value.
3) comparison of encryption traffic service classification
In order to assess and compare validity of the SPCaps in encryption traffic service classification, we use ISCX VPN-
In nonVPN 12 in flow tested.As shown in table 3, experimental result display precision is up to 99.1%, and each classification
Precision ratio and recall ratio 97% or more.
Table 3 is encryption traffic service classification experiments result
Next, SPCaps is compared by we with existing Baseline Methods, is compared in encryption traffic service classification
The results are shown in Table 4.The results show that SPCaps shows better classifying quality and has reached practical application standard.
Table 4 is the SPCaps and Baseline Methods comparison result for encrypting traffic service classification
Method | Input form | Recall ratio | Precision ratio | F1 value |
SPCaps | Session-Packets | 99.3 | 99.3 | 99.3 |
1dCNN | Session | 90.6 | 88.9 | 89.7 |
SAE | Deep Packets | 92 | 92 | 92 |
1dCNN | Deep Packets | 94 | 93 | 93 |
4) comparison of encryption flow application classification
In order to assess validity of the SPCaps in the application program classification task of Tor flow, we are to ISCX VPN-
The flow of 5 kinds of different applications by Tor capture in nonVPN is tested, and experimental result is as shown in table 5.As a result it shows
Show, SPCap is in the application program classification task of Tor flow, and precision is up to 99.8%.
Table 5 is encryption flow application classification experiments result
Next, SPCaps is compared by we with existing Baseline Methods, is compared in encryption flow application classification
The results are shown in Table 6.The results show that SPCaps achieves breakthrough effect in the application class of Tor.
Table 6 is encryption flow application classification SPCaps and Baseline Methods comparison result
Method | Recall ratio | Precision ratio | F1 value |
SPcaps | 99.4 | 99.5 | 99.5 |
SAE | 57 | 44 | 30 |
1dCNN | 35 | 40 | 36 |
The above experiment, which shows SPCaps, may be implemented effectively to encrypt traffic classification, and experimental result has reached reality
The standard of application.
Embodiments of the present invention above described embodiment only expresses, description is more specific, but can not therefore manage
Solution is limitation of the scope of the invention.It should be pointed out that for those of ordinary skill in the art, not departing from structure of the present invention
Under the premise of think of, various modifications and improvements can be made, and these are all within the scope of protection of the present invention.Therefore, of the invention
Protection scope should be determined by the appended claims.
Claims (10)
1. a kind of service and application class method for encrypting flow, step include:
It 1) is multiple session traffics by continuous flow cutting to be processed according to session granularity;
2) to treated, each session traffic cutting is multiple flows according to the progress cutting of data packet granularity by each session traffic
Group, the data packet number in each flow group are no more than the maximum value of setting;
3) size of each flow group is subjected to unification, each flow group is then converted into traffic matrix, and by traffic matrix
And its Tag Packaging is IDX flow file;
4) with above-mentioned IDX flow file training CapsNet model, the identification model with automated characterization selective power is obtained;
5) encryption flow to be identified for one, divides it and is converted to traffic matrix and be then input to the identification model,
Obtain service type and applicating category belonging to the flow to be identified.
2. the method as described in claim 1, which is characterized in that i-th of session traffic SiIn j-th of flow group be Gij;Its
Middle Gij={ p1=(x1,b1,t1),...,pm=(xm,bm,tm)}、
M is GijIn data packet number, C be setting data packet maximum quantity, session traffic SiIn i-th of data packet pi=
(xi,bi,ti), xiIt is the five-tuple of i-th of data packet, biIt is the byte length of i-th of data packet, tiIt is i-th of data packet
Time started, | Si| it is session traffic SiIn data packet sum.
3. method according to claim 2, which is characterized in thatWherein, LsampleIndicate storage flow
The file byte length of group, LheaderIndicate the file header byte length of the file of storage flow group, LpacketIndicate data packet
Byte length.
4. the method as described in claim 1, which is characterized in that carry out data cleansing to each session traffic, delete the address Mac
And IP address;Then step 2) is carried out.
5. the method as described in claim 1, which is characterized in that the method that flow group is converted to traffic matrix are as follows: by flow
The one-dimensional flow coded sequence of group is converted into two-dimensional traffic matrix;The flow group for wherein unifying size is 784 bytes, after conversion
Traffic matrix be 28*28 traffic matrix.
6. the method as described in claim 1, which is characterized in that with the method for IDX flow file training CapsNet model are as follows:
Convolution operation is executed to every traffic matrix first with the first convolutional layer, generates multiple eigenmatrixes;Then to the feature
Matrix carries out convolution operation and generates multiple activity vectors;Then each activity vector and respective weights matrix multiple are predicted
Vector, and the input to the predicted vector weighted sum of low layer as high-rise capsule.
7. a kind of service and application class system for encrypting flow, which is characterized in that including flow preprocessing module, model training
Module and encryption traffic identification module;Wherein,
Flow preprocessing module, for according to session granularity by continuous flow cutting to be processed be multiple session traffics;Then
Cutting is carried out according to data packet granularity to each session traffic, is multiple flow groups, each flow group by each session traffic cutting
In data packet number be no more than setting maximum value;Then the size of each flow group after reunification turn each flow group
It is changed to traffic matrix, and is IDX flow file by traffic matrix and its Tag Packaging;
Model training module obtains the knowledge with automated characterization selective power using IDX flow file training CapsNet model
Other model;
Encrypt traffic identification module, for by it is to be identified encryption flow traffic matrix be input to the identification model, obtain this to
Identify service type and applicating category belonging to flow.
8. system as claimed in claim 7, which is characterized in that i-th of session traffic SiIn j-th of flow group be Gij;Its
Middle Gij={ p1=(x1,b1,t1),...,pm=(xm,bm,tm)}、
M is GijIn data packet number, C be setting data packet maximum quantity, session traffic SiIn i-th of data packet pi=
(xi,bi,ti), xiIt is the five-tuple of i-th of data packet, biIt is the byte length of i-th of data packet, tiIt is i-th of data packet
Time started, | Si| it is session traffic SiIn data packet sum.
9. system as claimed in claim 8, which is characterized in thatWherein LsampleIndicate storage flow
The file byte length of group, LheaderIndicate the file header byte length of the file of storage flow group, LpacketIndicate data packet
Byte length.
10. system as claimed in claim 7, which is characterized in that the flow preprocessing module carries out each session traffic
The address Mac and IP address are deleted in data cleansing;Then cutting is carried out according to data packet granularity to each session traffic.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910504060.XA CN110417729B (en) | 2019-06-12 | 2019-06-12 | Service and application classification method and system for encrypted traffic |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910504060.XA CN110417729B (en) | 2019-06-12 | 2019-06-12 | Service and application classification method and system for encrypted traffic |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110417729A true CN110417729A (en) | 2019-11-05 |
CN110417729B CN110417729B (en) | 2020-10-27 |
Family
ID=68358996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910504060.XA Active CN110417729B (en) | 2019-06-12 | 2019-06-12 | Service and application classification method and system for encrypted traffic |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110417729B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967798A (en) * | 2020-09-07 | 2020-11-20 | 上海优扬新媒信息技术有限公司 | Method and device for distributing experimental samples, equipment and computer readable storage medium |
CN112468324A (en) * | 2020-11-11 | 2021-03-09 | 国网冀北电力有限公司信息通信分公司 | Graph convolution neural network-based encrypted traffic classification method and device |
CN113037646A (en) * | 2021-03-04 | 2021-06-25 | 西南交通大学 | Train communication network flow identification method based on deep learning |
CN113162908A (en) * | 2021-03-04 | 2021-07-23 | 中国科学院信息工程研究所 | Encrypted flow detection method and system based on deep learning |
CN113472751A (en) * | 2021-06-04 | 2021-10-01 | 中国科学院信息工程研究所 | Encrypted flow identification method and device based on data packet header |
CN113794601A (en) * | 2021-08-17 | 2021-12-14 | 中移(杭州)信息技术有限公司 | Network traffic processing method, device and computer readable storage medium |
CN114386079A (en) * | 2022-03-23 | 2022-04-22 | 清华大学 | Encrypted traffic classification method and device based on contrast learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102394827A (en) * | 2011-11-09 | 2012-03-28 | 浙江万里学院 | Hierarchical classification method for internet flow |
US20140185482A1 (en) * | 2010-08-12 | 2014-07-03 | Citrix Systems, Inc. | Systems and methods for quality of service of encrypted network traffic |
CN106452953A (en) * | 2016-09-30 | 2017-02-22 | 苏州迈科网络安全技术股份有限公司 | Synthetic data feature analysis method and system based on DPI (Deep Packet Inspection) technology |
CN106790019A (en) * | 2016-12-14 | 2017-05-31 | 北京天融信网络安全技术有限公司 | The encryption method for recognizing flux and device of feature based self study |
WO2017221152A1 (en) * | 2016-06-20 | 2017-12-28 | Telefonaktiebolaget Lm Ericsson (Publ) | Method for classifying the payload of encrypted traffic flows |
CN107749859A (en) * | 2017-11-08 | 2018-03-02 | 南京邮电大学 | A kind of malice Mobile solution detection method of network-oriented encryption flow |
CN109660656A (en) * | 2018-11-20 | 2019-04-19 | 重庆邮电大学 | A kind of intelligent terminal method for identifying application program |
CN109831422A (en) * | 2019-01-17 | 2019-05-31 | 中国科学院信息工程研究所 | A kind of encryption traffic classification method based on end-to-end sequence network |
-
2019
- 2019-06-12 CN CN201910504060.XA patent/CN110417729B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140185482A1 (en) * | 2010-08-12 | 2014-07-03 | Citrix Systems, Inc. | Systems and methods for quality of service of encrypted network traffic |
CN102394827A (en) * | 2011-11-09 | 2012-03-28 | 浙江万里学院 | Hierarchical classification method for internet flow |
WO2017221152A1 (en) * | 2016-06-20 | 2017-12-28 | Telefonaktiebolaget Lm Ericsson (Publ) | Method for classifying the payload of encrypted traffic flows |
CN106452953A (en) * | 2016-09-30 | 2017-02-22 | 苏州迈科网络安全技术股份有限公司 | Synthetic data feature analysis method and system based on DPI (Deep Packet Inspection) technology |
CN106790019A (en) * | 2016-12-14 | 2017-05-31 | 北京天融信网络安全技术有限公司 | The encryption method for recognizing flux and device of feature based self study |
CN107749859A (en) * | 2017-11-08 | 2018-03-02 | 南京邮电大学 | A kind of malice Mobile solution detection method of network-oriented encryption flow |
CN109660656A (en) * | 2018-11-20 | 2019-04-19 | 重庆邮电大学 | A kind of intelligent terminal method for identifying application program |
CN109831422A (en) * | 2019-01-17 | 2019-05-31 | 中国科学院信息工程研究所 | A kind of encryption traffic classification method based on end-to-end sequence network |
Non-Patent Citations (1)
Title |
---|
HAIPENG YAO: ""Capsule network assisted IoT Traffic Classification Mechanism for Smart Cities"", 《IEEE》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967798A (en) * | 2020-09-07 | 2020-11-20 | 上海优扬新媒信息技术有限公司 | Method and device for distributing experimental samples, equipment and computer readable storage medium |
CN111967798B (en) * | 2020-09-07 | 2023-10-03 | 度小满科技(北京)有限公司 | Method, device and equipment for distributing experimental samples and computer readable storage medium |
CN112468324A (en) * | 2020-11-11 | 2021-03-09 | 国网冀北电力有限公司信息通信分公司 | Graph convolution neural network-based encrypted traffic classification method and device |
CN112468324B (en) * | 2020-11-11 | 2023-04-07 | 国网冀北电力有限公司信息通信分公司 | Graph convolution neural network-based encrypted traffic classification method and device |
CN113037646A (en) * | 2021-03-04 | 2021-06-25 | 西南交通大学 | Train communication network flow identification method based on deep learning |
CN113162908A (en) * | 2021-03-04 | 2021-07-23 | 中国科学院信息工程研究所 | Encrypted flow detection method and system based on deep learning |
CN113472751A (en) * | 2021-06-04 | 2021-10-01 | 中国科学院信息工程研究所 | Encrypted flow identification method and device based on data packet header |
CN113472751B (en) * | 2021-06-04 | 2023-01-17 | 中国科学院信息工程研究所 | Encrypted flow identification method and device based on data packet header |
CN113794601A (en) * | 2021-08-17 | 2021-12-14 | 中移(杭州)信息技术有限公司 | Network traffic processing method, device and computer readable storage medium |
CN113794601B (en) * | 2021-08-17 | 2024-03-22 | 中移(杭州)信息技术有限公司 | Network traffic processing method, device and computer readable storage medium |
CN114386079A (en) * | 2022-03-23 | 2022-04-22 | 清华大学 | Encrypted traffic classification method and device based on contrast learning |
CN114386079B (en) * | 2022-03-23 | 2022-12-06 | 清华大学 | Encrypted traffic classification method and device based on contrast learning |
Also Published As
Publication number | Publication date |
---|---|
CN110417729B (en) | 2020-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110417729A (en) | A kind of service and application class method and system encrypting flow | |
Barradas et al. | FlowLens: Enabling Efficient Flow Classification for ML-based Network Security Applications. | |
Yamansavascilar et al. | Application identification via network traffic classification | |
CN106464577B (en) | Network system, control device, communication device and communication control method | |
Cui et al. | A session-packets-based encrypted traffic classification using capsule neural networks | |
CN102739457B (en) | Network flow recognition system and method based on DPI (Deep Packet Inspection) and SVM (Support Vector Machine) technology | |
Cao et al. | Detecting and mitigating DDoS attacks in SDN using spatial-temporal graph convolutional network | |
CN111222019B (en) | Feature extraction method and device | |
Chen et al. | Exploring a service-based normal behaviour profiling system for botnet detection | |
CN108123962A (en) | A kind of method that BFS algorithms generation attack graph is realized using Spark | |
Zhao et al. | Identifying known and unknown mobile application traffic using a multilevel classifier | |
Muliukha et al. | Analysis and classification of encrypted network traffic using machine learning | |
Sheikh et al. | Procedures, criteria, and machine learning techniques for network traffic classification: a survey | |
Feng et al. | BotFlowMon: Learning-based, content-agnostic identification of social bot traffic flows | |
Xu et al. | TrafficGCN: Mobile application encrypted traffic classification based on GCN | |
Singhal et al. | State of the art review of network traffic classification based on machine learning approach | |
Kousar et al. | DDoS attack detection system using Apache spark | |
Dener et al. | Rfse-gru: Data balanced classification model for mobile encrypted traffic in big data environment | |
Shamsimukhametov et al. | Are neural networks the best way for encrypted traffic classification? | |
Zhou et al. | Classification of botnet families based on features self-learning under network traffic censorship | |
Gomez et al. | Efficient network telemetry based on traffic awareness | |
CN111835720B (en) | VPN flow WEB fingerprint identification method based on feature enhancement | |
Pereira et al. | ITCM: A real time internet traffic classifier monitor | |
Du et al. | Fenet: Roles classification of ip addresses using connection patterns | |
Mo et al. | Encrypted traffic classification using graph convolutional networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |