CN110417729A - A kind of service and application class method and system encrypting flow - Google Patents

A kind of service and application class method and system encrypting flow Download PDF

Info

Publication number
CN110417729A
CN110417729A CN201910504060.XA CN201910504060A CN110417729A CN 110417729 A CN110417729 A CN 110417729A CN 201910504060 A CN201910504060 A CN 201910504060A CN 110417729 A CN110417729 A CN 110417729A
Authority
CN
China
Prior art keywords
flow
traffic
data packet
session
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910504060.XA
Other languages
Chinese (zh)
Other versions
CN110417729B (en
Inventor
崔苏苏
卢志刚
姜波
徐健锋
刘松
崔泽林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201910504060.XA priority Critical patent/CN110417729B/en
Publication of CN110417729A publication Critical patent/CN110417729A/en
Application granted granted Critical
Publication of CN110417729B publication Critical patent/CN110417729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Abstract

The invention discloses a kind of services and application class method and system for encrypting flow.The method include the steps that 1) according to session granularity by continuous flow cutting to be processed be multiple session traffics;2) to treated, each session traffic cutting is multiple flow groups according to the progress cutting of data packet granularity by each session traffic, and the data packet number in each flow group is no more than the maximum value of setting;3) size of each flow group is subjected to unification, each flow group is then converted into traffic matrix, and be IDX flow file by traffic matrix and its Tag Packaging;4) with above-mentioned IDX flow file training CapsNet model, the identification model with automated characterization selective power is obtained;5) encryption flow to be identified for one, divides it and is converted to traffic matrix and be then input to the identification model, and service type and applicating category belonging to the flow to be identified are obtained.The present invention can effectively classify to encryption flow.

Description

A kind of service and application class method and system encrypting flow
Technical field
The invention proposes a kind of services and application class method for encrypting flow, it proposes a kind of novel flow two Secondary cutting mechanism realizes effective classification of encryption flow in combination with capsule neural network (CapsNet), and the present invention covers original The conversion of flow, the model training based on CapsNet, the classification for encrypting flow, belong to the friendship of network security and computer science Pitch technical field.
Background technique
In recent years, with the continuous development of Internet technology and information science technology, network flow is in explosive growth.Root According to the visual network exponential forecasting report of Cisco's publication, the IP data on flows transmitted on public and private network, including support The mobile data flow and internet traffic that pipe IP flow, consumer generate, the whole world in 2017 is average monthly to generate 122EB The data on flows of (1EB=220TB), and will increase by twice to global ip flow in 2022, reach monthly 396EB.At the same time, Demand with netizen to network world constantly changes, so that various new business emerge one after another.These new business are given While netizen offers convenience, the isomerism and complexity of network are also increased, this brings unprecedented to network security Challenge.
In terms of network security, network security has become one of the key problem that internet is faced, information in recent years The behavior of the hostile networks such as leakage, illegal invasion, ddos attack increasingly influences use of the user to internet, and with technology Development and progress, the discharge characteristic of network malicious attack becomes increasingly complicated and hidden.According to identity theft resource in 2018 The data at center share nearly 34,200,000 thefts record by September, 2018;According to Arbor Networks the 13rd year phase base Infrastructure safety message, the peak value challenging dose of first half of the year DDoS in 2018 have reached 1.7Tbps, and than 2017 first half of the year increased 179%, by 2022, global ddos attack sum will be doubled than 2017, reach 14,500,000.Network administrator needs to net Network flow carries out Classification and Identification fast and accurately to position abnormal behaviour present in network, cuts off the propagation of malicious intrusions in time Approach, reduction malicious intrusions as far as possible are to harm caused by user security risk and loss.Meanwhile it can be found not by flow identification technology It is knowing, camouflage property Webshell, entire attack process is restored from the angle of Kill Chain, to attacker, attack tool, Attacking ways etc. are analysed in depth and are drawn a portrait.
The classification of network flow and identification technology run through the modules of security postures perception, are network security situation awareness In essential a part.Have a large amount of net flow assorted at present to be suggested with identification technology, is broadly divided into and is based on It the flow identification technology of port, the flow identification technology based on deep-packet detection, the flow identification technology based on statistics and is based on The flow identification technology of behavior.
Above-mentioned network flow identification technology has good recognition effect for traditional network application.However, from " prism " After the exposure of monitoring project, global refined net flow constantly rises violently.2018 annual report of Sandvine is shown to be surpassed on internet The flow for crossing 50% is encryption, and will continue to increase.For the detection for hiding firewall and antivirus software, most of malice Software generally uses Traffic Encryption technology to hide communication information.Traffic Encryption almost becomes all including Malware The fact that network application standing procedure, based on encryption flow identification technology will become can not interpret security threat under content scene The important means of detection analyzes the key messages such as network behavior, process behavior by encryption flow identification technology, and passes through Kill Chain analyzes to restore attack process, provides threat treatment advice for safety officer.
Although the research of present flow rate identification achieves many research achievements, but on going result is directed to non-encrypted flow mostly Study of recognition.In actual flow identification process, encryption flow identification has not been suitable for traditional flow identification technology.Wherein, with P2P application appearance and dynamic end slogan technology be widely used, the method using port numbers identification flow is no longer valid;End The development of mouth obfuscation further limits its validity.Encryption flow growing day by day due to concealing load characteristic, Lead to not be identified using deep packet inspection method, the tunnelings technology such as tunnel also further limits its application.It removes Except this, since deep packet is identified that this is related to the problem of invading privacy of user by analyzing application layer data.Due to lacking Weary effective encryption flow analysis and administrative skill, bring huge challenge to network management and security.
Currently, it is very rich to carry out knowledge method for distinguishing to encryption flow based on machine learning, but conventional machines study needs It manually extracts feature and nicety of grading is too dependent on feature selecting, this not only limits the scalability of the method and make it It cannot achieve real-time grading.Deep learning is an effective way for solving manually to extract feature in conventional machines study, it can To automatically extract feature from input data without human intervention, model is established by way of simulating human brain, explains number According to, with reach identification internet in encryption flow, be a completely new trial.
According to statistics, the encryption flow recognizer based on deep learning mainly includes multilayer perceptron (MLP), stacks coding Device (SAE), one-dimensional convolutional neural networks (1dCNN) are found, base in the comparison of the encryption flow recognizer of a large number of researchers Higher accuracy of identification is achieved than the recognizer that conventional machines learn in the recognizer of deep learning, and based on deep It spends in the algorithm of study, 1dCNN algorithm achieves optimal encryption flow recognition effect.
However, 1dCNN requires feature unrelated with position, and only consider the presence or absence of feature without considering in identification process The position of feature and other attributes.But it is considered that in flow the position of specific character string and data packet put in order it is same It is also one of feature in need of consideration.In addition to this, in the identification mission of encryption flow, these coding files are not equal to figure Piece file, they are no longer desirable for the pondization operation of CNN.Unquestionably, either maximum pondization operation or minimum Chi Huacao Make, all will abandon certain information and change the validity feature of coded string behind.
Summary of the invention
For the technical problems in the prior art, the invention proposes a kind of services and application class for encrypting flow Method and system.The present invention is based on the encryption traffic classification models of the secondary cutting mechanism of capsule neural network (CapsNet), will It is named as SPCaps, which can effectively classify to encryption flow.The present invention division field different for encryption flow Scape is classified, and is specifically included the classification of service of encryption flow, i.e., is classified according to the service type of encryption flow, such as: net Page browsing, Streaming Media, instant messaging etc.;The application class of flow is encrypted, i.e., the application program according to belonging to encryption flow carries out Classification, such as: Skype, BitTorrent, YouTube etc..
The present invention proposes a kind of novel secondary cutting mechanism of flow in process of data preprocessing, and develops a set of By the pretreating tool collection integrated including EditCap tool, SplitCap tool, Powershell script, Python script, mesh Be the specific gravity for diluting unrelated flow while the weight for increasing effective discharge.In addition, present invention combination CapsNet algorithm is realized The training of model, in encryption traffic classification, the drawbacks of CapsNet can make up 1dCNN, major embodiment in the following areas: 1) For CapsNet in the space characteristics of study encryption flow, input and output no longer use the scalar of traditional neural network, but logical Vector replacement is crossed, in the present invention, the length of vector indicates the probability of flow generic, and the direction of vector indicates the category of classification Property includes putting in order between the fixation position and data packet of specific character string in flow.2) CapsNet does not use convolution Pondization operation in neural network, pondization operation have also abandoned some necessary while reducing Connecting quantity, refining feature Information, CapsNet, which gives up pondization operation, will be more suitable for the such coding file of flow.3) in the premise for guaranteeing accuracy of identification Under, CapsNet ratio CNN has faster recognition speed, therefore the flow identification being more suitable under real time environment.
To achieve the goal, the present invention is using specific technical solution:
A kind of recognition methods encrypting flow, comprising the following steps:
1) carry out first time cutting according to session granularity: the traffic classification method based on deep learning is needed first, in accordance with one Determining granularity for continuous flow cutting is multiple discrete units.There are five types of network flow slit modes: TCP connection, stream, session, clothes Business, host.Wherein, stream and session are that the more flow form of expression is used in current research.Therefore, the present invention will be to be processed Original flow carries out first time cutting according to session granularity.Session refers to the flow packet being made of bidirectional flow, i.e., having the same Five-tuple (source IP, source port, destination IP, destination port, transport layer protocol), wherein source IP and destination IP can be interchanged.
2) flow cleaning is encrypted: in traffic classification, the IP address of the address Mac of data link layer and network layer (source IP, Destination IP) it can not be as the feature of classification.If traffic capture environment is relatively limited, the address Mac and IP address can be to a certain extent The training for influencing model leads to the over-fitting of classification, therefore we delete the field of the address Mac and IP address in data packet.
3) second of cutting is carried out according to data packet granularity: since the flow acquired from real network environment includes The data packet unrelated with classification, this will directly affect the training and test of model.Therefore we are flowed by setting by step 2) The maximum quantity of data packet in amount continues cutting to flow.Due to the communication process that session to be slit is acted normally mostly, this The specific gravity of unrelated flow in original flow is diluted in terms of step 1, on the other hand also increases the weight of effective discharge.
4) input form of specification encryption flow: the input of fixed size is needed using neural metwork training data, therefore Flow file Jing Guo above step is unified size according to fixed byte by us, if flow file is greater than set fixed word Section, the then byte after deleting add to fixed byte with 00 if flow file is less than fixed byte.Finally, we will be through It crosses the above flow handled and is converted to traffic matrix, and be packaged traffic matrix sample and its label by IDX file, IDX file is the input file reference format that many CapsNet and CNN models use.
5) based on the model training of CapsNet: using the flow file of above-mentioned steps treated IDX format, being based on CapsNet, using the space characteristics of convolution operation and Dynamic routing mechanisms study encryption flow, establishing has automated characterization selection The efficient identification model of ability, can be according to identification encryption flow and according to the service type and application class of flow Type is effectively classified.
6) encryption flow identification: completing the identification and classification of encryption flow using the model by above step training, In, the present invention can realize effective encryption traffic classification in following scene, comprising: 1) classification of service, i.e. identification encryption flow Affiliated service type;2) application class, i.e. concrete application program belonging to identification encryption flow.
The present invention provides a kind of service and application class system for encrypting flow, which is characterized in that pre-processes including flow Module, model training module and encryption traffic identification module;Wherein,
Flow preprocessing module, for according to session granularity by continuous flow cutting to be processed be multiple session traffics; Then cutting is carried out according to data packet granularity to each session traffic, is multiple flow groups, every one stream by each session traffic cutting Data packet number in amount group is no more than the maximum value of setting;Then the size of each flow group is carried out after reunification by each flow Group is converted to traffic matrix, and is IDX flow file by traffic matrix and its Tag Packaging;
Model training module is obtained using IDX flow file training CapsNet model with automated characterization selective power Identification model;
Traffic identification module is encrypted, for the traffic matrix of encryption flow to be identified to be input to the identification model, is obtained Service type and applicating category belonging to the flow to be identified.
Compared with prior art, the positive effect of the present invention:
1. the encryption flow identification model based on CapsNet that the invention proposes a kind of, it can be compiled fixed in flow One of the feature to put in order as study between the specific position and packet and packet of code.
2. the invention proposes a kind of secondary cutting mechanism of flow, for diluting the specific gravity of unrelated flow and increasing effective The weight of flow can realize effective noise reduction to flow while determining the flow form of expression.
3. the present invention is using assessment SPCaps model, experimental result on publicly available ISCX VPN-nonVPN data set Show that SPCaps is better than state-of-the-art recognition methods in encryption traffic service and application identification mission.
Detailed description of the invention
Fig. 1 is overall flow figure of the invention.
Fig. 2 is the secondary cutting schematic diagram of mechanism of flow of the invention.
Fig. 3 is the model framework schematic diagram of the invention based on CapsNet.
Fig. 4 is size distribution plot of the original flow under session granularity in ISCX VPN-nonVPN data set.
Specific embodiment
Technical solution in embodiment in order to enable those skilled in the art to better understand the present invention, and make of the invention Objects, features and advantages can be more obvious and easy to understand, makees with reference to the accompanying drawing to technological core in the present invention further details of Explanation.It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
In the present invention, a kind of service and application class method for encrypting flow is devised.The general thought of this method is Cutting, cleaning, specification are carried out to the encryption flow under true environment by pretreating tool collection, dilute the specific gravity of unrelated flow simultaneously Increase the weight of effective discharge, and then establish the space characteristics of model learning encryption flow based on CapsNet, finally can be achieved The encryption flow identification of effect and the classification of service and application.
Overall flow figure of the invention as shown in Figure 1, the method specific steps datail description are as follows:
(1) conversion of original flow
The present invention is in data preprocessing phase, in order to reduce the noise and specification its input form of original flow, we The conversion of original flow: Pcap-Sessions cutting, deleting MAC address and IP address will be completed by following five steps, Session-Packets cutting, it is unified to input size, be converted to IDX.
1) Pcap-Sessions cutting: the method for recognizing flux based on deep learning is needed continuous flow with certain Specific granularity cutting is discrete unit.Original flow P is the set comprising different data packet, is expressed as P={ p1,…, p|P|}.Wherein, a data packet piIt is defined as:
pi=(xi,bi,ti) (1)
Wherein, i=1,2 ..., | P |, bi∈ (0, ∞), ti∈ [0, ∞), xiIt is the five-tuple (source of i-th of data packet IP, source port, destination IP, destination port, transport layer protocol), biIt is the byte length and t of i-th of data packetiIt is i-th of number At the beginning of packet.Original flow carries out first time cutting according to session granularity.One session SiIt is containing identical five-tuple Bidirectional flow set, it is defined as:
Si={ p1=(x1,b1,t1),...,pn=(xn,bn,tn)} (2)
Wherein, x1=...=xn, t1<…<tn, n is SiIn data packet number.This step is specifically real by SplitCap tool It is existing.
2) delete the address Mac and IP address: the address Mac and IP address can not as the feature in training process, on the contrary, it Presence be easy to cause the over-fitting of model, therefore, we are deleted by the character string of corresponding position in packet discard The address Mac and IP address.This step is specifically realized by EditCap tool.
3) Session-Packets cutting: there is lesser session of taking measurements greatly in the network flow captured under true environment, They often identify unrelated session, such as SNMP with flow, DNS and ARP data segment, this seriously affects effective knowledge of flow Not.Due to those larger-size sessions be the main activities in communication process and also they there was only a small amount of extraneous data packet, Therefore we have proposed a kind of Session-Packets slit modes, for diluting the specific gravity of unrelated flow and increasing effectively stream The weight of amount.It continues each session of cutting (i.e. discrete unit) by the maximum value of data packet in setting session traffic, obtains To the corresponding multiple flow groups of each session, the data packet in each flow group is at most no more than the maximum value of setting.G is indicated Newest flow group after Session-Packets cutting, is defined as:
Wherein, GijIt is i-th of session traffic SiIn j-th of newest flow group, m is GijIn data packet number, C is several According to the maximum quantity of packet, it is defined as:
Wherein, LsampleIndicate the file byte length of storage flow group, LheaderIndicate the text of the file of storage flow group Part head byte length, LpacketIndicate the data byte length of data packet;Performance of the traffic matrix before being converted to IDX file Form is all .pcap file, other than comprising data on flows, the also file header (.txt.jpg comprising mark the file information Equal files have file header), file header accounts for 112 bytes, the maximum value C of all discrete units be it is unified, set C=herein 16.Flow group byte length is unified for 784 bytes, and the minimum byte length behind the data packet deletion address Mac and IP address is 40 Byte is handled file header above and is fixed with 112 bytes, and theoretic maximum packet quantity should be 16.8 bytes, but due to packet number Amount should be integer, and be easy to upset the communication sequence between data packet according to odd number cutting, and therefore, setting C is 16.Why this The reason of sample defines is it is desirable that flow group G can be made full use of to predict entire session.In our view, it is wrapped in flow group G The data packet number contained is more, then more representative.Therefore, we make C as big as possible to give full play to flow group G representative Property.We summarize secondary cutting mechanism (Pcap-Sessions cutting and Session-Packets cutting) as shown in Fig. 2, former Beginning flow carries out Pacp-Sessions cutting according to the flow form of expression of session, then passes through data in setting session traffic The maximum quantity C of packet carries out Session-Packets cutting to session traffic.This step is specifically realized by EditCap tool.
4) unified input size: the input of fixed size is needed using neural network, therefore flow group G is unified for by we 784 bytes only retain 784 bytes most started if flow dimensions are greater than 784 bytes;If flow dimensions are less than 784 words Section, then with setting character string (such as 0x00) polishing to 784 bytes.This step is specifically realized by Powershell script.
5) be converted to IDX: the flow of 784 bytes is converted to the traffic matrix of 28*28 by we, i.e., by one-dimensional 784 byte Flow coded sequence be converted into the traffic matrix of 28*28.These traffic matrixs and its label are then packaged as IDX file, IDX file is the standard input of many CapsNet and CNN models.This step is specifically realized by Python script.
(2) it is based on CapsNet training pattern
The present invention is based on CapsNet algorithms, using the traffic matrix and label encapsulated by IDX as data set, to encrypt flow Establish classification of service model and application class model.Algorithm mainly includes convolution operation and dynamic routing, architecture diagram such as Fig. 3 institute Show.
1) convolution operation
Model read first passes through the traffic matrix of the above pretreated 28*28, while making normalized to them.In In ReLU convolutional layer, the convolution that step number is 1 is executed to every traffic matrix using 256 convolution kernels having a size of 9*9 first and is grasped Make, generates 256 eigenmatrixes having a size of 20*20.Then, input of second convolutional layer PrimaryCaps as capsule Layer building vector structure.PrimaryCaps executes the convolution operation of 8 different weights, each convolution in 256 eigenmatrixes Operation will use 32 convolution kernels having a size of 9*9 to execute the convolution operation that step number is 2, ultimately generate 6*6*32 8 dimensional vectors, That is activity vector, the capsule unit that each activity vector is made of 8 common convolution units.
2) dynamic routing
The third layer DigitCaps of neural network is for transmitting and the input of more new capsule, including affine transformation and dynamic Route two steps.In affine transformation, the activity vector u of PrimaryCaps layers of low layer outputiWith weight matrix WijIt is mutually multiplied To predicted vectorThe input s of high-rise capsulejByObtained by weighted sum, it is defined as:
Wherein, each activity vector uiRespectively correspond a weight matrix Wij, WijBy the random number for complying with standard normal distribution Initialization generates, and is updated by loss function, cijIt is the coefficient of coup determined by the dynamic routing of iteration.
The purpose of Dynamic routing mechanisms is to find the optimal path of capsule output with the input of next layer of capsule, and finds One of the method for " optimal path " is exactly that the input vector for being best suitable for output is found by way of iteration, and degree of conformity passes through output The inner product of vector sum input vector (vector after affine transformation and weighted sum) is characterized, this degree of conformity is added directly into cijIn, the present invention determines to set 3 for the number of iterations by multiple parameter optimization.C in formula (5)ijMore new formula it is as follows:
cij=softmax (bij) (6)
Wherein, bijIt is the logarithm prior probability that capsule i is coupled to capsule j.
The length of capsule output vector indicates the probability for belonging to some classification, thus its value range should between [0,1], This process is realized by compression function, is defined as follows:
Wherein, vjIt is the output vector of capsule j, sjIt is all input vectors of capsule j.
WijIt is updated with other deconvolution parameters of whole network by loss function, therefore, we use Margin Loss function as loss function, is defined as:
Lc=Tcmax(0,m+-||vc||2)+λ(1-Tc)max(0,||vc||-m-) (8)
Wherein, c is prediction classification, TcIt is an indicator function, when c prediction is correct, TcEqual to 1, conversely, TcEqual to 0.m+It is vector length | | vc| | coboundary, m- is vector length | | vc| | lower boundary.In addition, our ratios according to 0.0005 Reconstruct loss is reduced, so that it will not dominate the Margin loss loss function in training process.
Traffic matrix to be identified passes through CapsNet, exports N number of 16 dimensional vector, and N indicates that flow classification to be sorted is total Number, the length representative flow of vector belong to the probability of some classification, and the attribute that the direction of vector represents flow includes fixed character The position of string and the sequence of data parlor.Then N number of 16 dimensional vector is exported into traffic matrix to be identified by softmax classifier Belong to the probability of each classification, the classification of maximum probability is the prediction classification of flow, and prediction classification is the final output of this model.
(3) identification and application of encryption flow and classification of service
The identification and classification of encryption flow, i.e., classification to be identified for one are completed using the model by above step training Flow, divided and be converted to traffic matrix first, traffic matrix, which is then inputted above-mentioned trained model, to obtain To the classification of the flow, comprising: 1) classification of service, 2) application class.
(4) experimental result compares
To verify effectiveness of the invention, using ISCX VPN-nonVPN data set as initial data, it includes for we 150 original flow files, wherein including 6 kinds of conventional encryption flows (Chat, Streaming, VoIP etc.) and 6 kinds of VPN flows (VPNChat, VPNStreaming, VPNVoIP etc.), in addition, have 9 original flow files be captured by Tor software 5 kinds not The flow generated with application program.Since Tor flow only supports encrypted link and TCP flow on internet, it is difficult to track and divides Analyse their flow.Therefore, we extract them to realize the application program classification of Tor.Finally, we are by precision, Cha Zhun Four rate, recall ratio, F1 value indexs assess effectiveness of the invention compared with existing method with this.
Specifically, experiment is divided by we: 1) assessing and compare deleting MAC address and IP address and two in data prediction The validity of secondary cutting mechanism;2) assess and compare validity of the SPCaps in encryption traffic service classification task;3) it assesses The validity in flow application classification task is being encrypted with SPCaps is compared.
1) pre-processed results
We pre-process ISCX VPN-nonVPN data set using above-mentioned original flow switch process, are executing After Pcap-Sessions cutting, the byte size distribution that we have counted the session traffic for classification of service is as shown in Figure 4.
As can be seen that the size distributed pole of session traffic is uneven, the session traffic in this 12 kinds of flows, more than 50% It is the flow unrelated with classification task mostly among these less than 0.5KB.Especially, have in Chat, Email, File and Voip 80% or more session traffic is less than 0.2KB.Therefore, the size distribution of session traffic confirms Session-Packets cutting Necessity and rationality in preprocessing process.According to formula (4), we are set in Session-Packets dicing step Setting the maximum quantity wrapped in each session is 16.Finally, in the classification of service task of encryption flow, item name contained is answered It is amounted to program and flow as shown in table 1.
Table 1 is the sample content for encrypting traffic service classification
Classification Application program It amounts to
Chat AIM Facebook Hangouts ICQ Skype 11365
Email Email Gmail 12822
File Ftps SCP Sftp Skype 19553
P2P Torrent 60000
Streaming Facebook Hangouts Netflix Skype Spotify Vimeo YouTube 21273
Voip Facebook Hangouts Skype Voipbuster 21000
VPNChat AIM Facebook Hangouts ICQ Skype 13710
VPNEmail Email 2890
VPNFile Ftps Sftp Skype 17528
VPNP2P Bittorrent 6000
VPNStreaming Facebook Netflix Spotify Vimeo YouTube 12000
VPNVoip Hangouts Skype Voipbuster 14805
2) pretreatment comparison
In the conversion process of original flow, it is proposed that deleting the address Mac and IP address to avoid over-fitting, while I Propose that Session-Packets cutting carries out cutting twice to traditional session traffic.In addition to this, in order to prove flowing CapsNet ratio 1dCNN is more suitable in amount classification, and in each experiment, we are carried out pair using the two neural network algorithms Than.Therefore, we perform six different encryption traffic service classification tasks altogether on ISCX VPN-nonVPN data set, Experimental result is as shown in table 2.
Table 2 is pretreatment contrast and experiment
The results show that either 1dCNN or CapsNet, it is proposed that the address deletion Mac and IP address and Session-Packets cutting illustrates better classifying quality.It, can be in the comparative experiments of two neural networks except this Find out, CapsNet ratio 1dCNN illustrates higher nicety of grading and F1 value.
3) comparison of encryption traffic service classification
In order to assess and compare validity of the SPCaps in encryption traffic service classification, we use ISCX VPN- In nonVPN 12 in flow tested.As shown in table 3, experimental result display precision is up to 99.1%, and each classification Precision ratio and recall ratio 97% or more.
Table 3 is encryption traffic service classification experiments result
Next, SPCaps is compared by we with existing Baseline Methods, is compared in encryption traffic service classification The results are shown in Table 4.The results show that SPCaps shows better classifying quality and has reached practical application standard.
Table 4 is the SPCaps and Baseline Methods comparison result for encrypting traffic service classification
Method Input form Recall ratio Precision ratio F1 value
SPCaps Session-Packets 99.3 99.3 99.3
1dCNN Session 90.6 88.9 89.7
SAE Deep Packets 92 92 92
1dCNN Deep Packets 94 93 93
4) comparison of encryption flow application classification
In order to assess validity of the SPCaps in the application program classification task of Tor flow, we are to ISCX VPN- The flow of 5 kinds of different applications by Tor capture in nonVPN is tested, and experimental result is as shown in table 5.As a result it shows Show, SPCap is in the application program classification task of Tor flow, and precision is up to 99.8%.
Table 5 is encryption flow application classification experiments result
Next, SPCaps is compared by we with existing Baseline Methods, is compared in encryption flow application classification The results are shown in Table 6.The results show that SPCaps achieves breakthrough effect in the application class of Tor.
Table 6 is encryption flow application classification SPCaps and Baseline Methods comparison result
Method Recall ratio Precision ratio F1 value
SPcaps 99.4 99.5 99.5
SAE 57 44 30
1dCNN 35 40 36
The above experiment, which shows SPCaps, may be implemented effectively to encrypt traffic classification, and experimental result has reached reality The standard of application.
Embodiments of the present invention above described embodiment only expresses, description is more specific, but can not therefore manage Solution is limitation of the scope of the invention.It should be pointed out that for those of ordinary skill in the art, not departing from structure of the present invention Under the premise of think of, various modifications and improvements can be made, and these are all within the scope of protection of the present invention.Therefore, of the invention Protection scope should be determined by the appended claims.

Claims (10)

1. a kind of service and application class method for encrypting flow, step include:
It 1) is multiple session traffics by continuous flow cutting to be processed according to session granularity;
2) to treated, each session traffic cutting is multiple flows according to the progress cutting of data packet granularity by each session traffic Group, the data packet number in each flow group are no more than the maximum value of setting;
3) size of each flow group is subjected to unification, each flow group is then converted into traffic matrix, and by traffic matrix And its Tag Packaging is IDX flow file;
4) with above-mentioned IDX flow file training CapsNet model, the identification model with automated characterization selective power is obtained;
5) encryption flow to be identified for one, divides it and is converted to traffic matrix and be then input to the identification model, Obtain service type and applicating category belonging to the flow to be identified.
2. the method as described in claim 1, which is characterized in that i-th of session traffic SiIn j-th of flow group be Gij;Its Middle Gij={ p1=(x1,b1,t1),...,pm=(xm,bm,tm)}、
M is GijIn data packet number, C be setting data packet maximum quantity, session traffic SiIn i-th of data packet pi= (xi,bi,ti), xiIt is the five-tuple of i-th of data packet, biIt is the byte length of i-th of data packet, tiIt is i-th of data packet Time started, | Si| it is session traffic SiIn data packet sum.
3. method according to claim 2, which is characterized in thatWherein, LsampleIndicate storage flow The file byte length of group, LheaderIndicate the file header byte length of the file of storage flow group, LpacketIndicate data packet Byte length.
4. the method as described in claim 1, which is characterized in that carry out data cleansing to each session traffic, delete the address Mac And IP address;Then step 2) is carried out.
5. the method as described in claim 1, which is characterized in that the method that flow group is converted to traffic matrix are as follows: by flow The one-dimensional flow coded sequence of group is converted into two-dimensional traffic matrix;The flow group for wherein unifying size is 784 bytes, after conversion Traffic matrix be 28*28 traffic matrix.
6. the method as described in claim 1, which is characterized in that with the method for IDX flow file training CapsNet model are as follows: Convolution operation is executed to every traffic matrix first with the first convolutional layer, generates multiple eigenmatrixes;Then to the feature Matrix carries out convolution operation and generates multiple activity vectors;Then each activity vector and respective weights matrix multiple are predicted Vector, and the input to the predicted vector weighted sum of low layer as high-rise capsule.
7. a kind of service and application class system for encrypting flow, which is characterized in that including flow preprocessing module, model training Module and encryption traffic identification module;Wherein,
Flow preprocessing module, for according to session granularity by continuous flow cutting to be processed be multiple session traffics;Then Cutting is carried out according to data packet granularity to each session traffic, is multiple flow groups, each flow group by each session traffic cutting In data packet number be no more than setting maximum value;Then the size of each flow group after reunification turn each flow group It is changed to traffic matrix, and is IDX flow file by traffic matrix and its Tag Packaging;
Model training module obtains the knowledge with automated characterization selective power using IDX flow file training CapsNet model Other model;
Encrypt traffic identification module, for by it is to be identified encryption flow traffic matrix be input to the identification model, obtain this to Identify service type and applicating category belonging to flow.
8. system as claimed in claim 7, which is characterized in that i-th of session traffic SiIn j-th of flow group be Gij;Its Middle Gij={ p1=(x1,b1,t1),...,pm=(xm,bm,tm)}、
M is GijIn data packet number, C be setting data packet maximum quantity, session traffic SiIn i-th of data packet pi= (xi,bi,ti), xiIt is the five-tuple of i-th of data packet, biIt is the byte length of i-th of data packet, tiIt is i-th of data packet Time started, | Si| it is session traffic SiIn data packet sum.
9. system as claimed in claim 8, which is characterized in thatWherein LsampleIndicate storage flow The file byte length of group, LheaderIndicate the file header byte length of the file of storage flow group, LpacketIndicate data packet Byte length.
10. system as claimed in claim 7, which is characterized in that the flow preprocessing module carries out each session traffic The address Mac and IP address are deleted in data cleansing;Then cutting is carried out according to data packet granularity to each session traffic.
CN201910504060.XA 2019-06-12 2019-06-12 Service and application classification method and system for encrypted traffic Active CN110417729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910504060.XA CN110417729B (en) 2019-06-12 2019-06-12 Service and application classification method and system for encrypted traffic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910504060.XA CN110417729B (en) 2019-06-12 2019-06-12 Service and application classification method and system for encrypted traffic

Publications (2)

Publication Number Publication Date
CN110417729A true CN110417729A (en) 2019-11-05
CN110417729B CN110417729B (en) 2020-10-27

Family

ID=68358996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910504060.XA Active CN110417729B (en) 2019-06-12 2019-06-12 Service and application classification method and system for encrypted traffic

Country Status (1)

Country Link
CN (1) CN110417729B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967798A (en) * 2020-09-07 2020-11-20 上海优扬新媒信息技术有限公司 Method and device for distributing experimental samples, equipment and computer readable storage medium
CN112468324A (en) * 2020-11-11 2021-03-09 国网冀北电力有限公司信息通信分公司 Graph convolution neural network-based encrypted traffic classification method and device
CN113037646A (en) * 2021-03-04 2021-06-25 西南交通大学 Train communication network flow identification method based on deep learning
CN113162908A (en) * 2021-03-04 2021-07-23 中国科学院信息工程研究所 Encrypted flow detection method and system based on deep learning
CN113472751A (en) * 2021-06-04 2021-10-01 中国科学院信息工程研究所 Encrypted flow identification method and device based on data packet header
CN113794601A (en) * 2021-08-17 2021-12-14 中移(杭州)信息技术有限公司 Network traffic processing method, device and computer readable storage medium
CN114386079A (en) * 2022-03-23 2022-04-22 清华大学 Encrypted traffic classification method and device based on contrast learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394827A (en) * 2011-11-09 2012-03-28 浙江万里学院 Hierarchical classification method for internet flow
US20140185482A1 (en) * 2010-08-12 2014-07-03 Citrix Systems, Inc. Systems and methods for quality of service of encrypted network traffic
CN106452953A (en) * 2016-09-30 2017-02-22 苏州迈科网络安全技术股份有限公司 Synthetic data feature analysis method and system based on DPI (Deep Packet Inspection) technology
CN106790019A (en) * 2016-12-14 2017-05-31 北京天融信网络安全技术有限公司 The encryption method for recognizing flux and device of feature based self study
WO2017221152A1 (en) * 2016-06-20 2017-12-28 Telefonaktiebolaget Lm Ericsson (Publ) Method for classifying the payload of encrypted traffic flows
CN107749859A (en) * 2017-11-08 2018-03-02 南京邮电大学 A kind of malice Mobile solution detection method of network-oriented encryption flow
CN109660656A (en) * 2018-11-20 2019-04-19 重庆邮电大学 A kind of intelligent terminal method for identifying application program
CN109831422A (en) * 2019-01-17 2019-05-31 中国科学院信息工程研究所 A kind of encryption traffic classification method based on end-to-end sequence network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140185482A1 (en) * 2010-08-12 2014-07-03 Citrix Systems, Inc. Systems and methods for quality of service of encrypted network traffic
CN102394827A (en) * 2011-11-09 2012-03-28 浙江万里学院 Hierarchical classification method for internet flow
WO2017221152A1 (en) * 2016-06-20 2017-12-28 Telefonaktiebolaget Lm Ericsson (Publ) Method for classifying the payload of encrypted traffic flows
CN106452953A (en) * 2016-09-30 2017-02-22 苏州迈科网络安全技术股份有限公司 Synthetic data feature analysis method and system based on DPI (Deep Packet Inspection) technology
CN106790019A (en) * 2016-12-14 2017-05-31 北京天融信网络安全技术有限公司 The encryption method for recognizing flux and device of feature based self study
CN107749859A (en) * 2017-11-08 2018-03-02 南京邮电大学 A kind of malice Mobile solution detection method of network-oriented encryption flow
CN109660656A (en) * 2018-11-20 2019-04-19 重庆邮电大学 A kind of intelligent terminal method for identifying application program
CN109831422A (en) * 2019-01-17 2019-05-31 中国科学院信息工程研究所 A kind of encryption traffic classification method based on end-to-end sequence network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAIPENG YAO: ""Capsule network assisted IoT Traffic Classification Mechanism for Smart Cities"", 《IEEE》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967798A (en) * 2020-09-07 2020-11-20 上海优扬新媒信息技术有限公司 Method and device for distributing experimental samples, equipment and computer readable storage medium
CN111967798B (en) * 2020-09-07 2023-10-03 度小满科技(北京)有限公司 Method, device and equipment for distributing experimental samples and computer readable storage medium
CN112468324A (en) * 2020-11-11 2021-03-09 国网冀北电力有限公司信息通信分公司 Graph convolution neural network-based encrypted traffic classification method and device
CN112468324B (en) * 2020-11-11 2023-04-07 国网冀北电力有限公司信息通信分公司 Graph convolution neural network-based encrypted traffic classification method and device
CN113037646A (en) * 2021-03-04 2021-06-25 西南交通大学 Train communication network flow identification method based on deep learning
CN113162908A (en) * 2021-03-04 2021-07-23 中国科学院信息工程研究所 Encrypted flow detection method and system based on deep learning
CN113472751A (en) * 2021-06-04 2021-10-01 中国科学院信息工程研究所 Encrypted flow identification method and device based on data packet header
CN113472751B (en) * 2021-06-04 2023-01-17 中国科学院信息工程研究所 Encrypted flow identification method and device based on data packet header
CN113794601A (en) * 2021-08-17 2021-12-14 中移(杭州)信息技术有限公司 Network traffic processing method, device and computer readable storage medium
CN113794601B (en) * 2021-08-17 2024-03-22 中移(杭州)信息技术有限公司 Network traffic processing method, device and computer readable storage medium
CN114386079A (en) * 2022-03-23 2022-04-22 清华大学 Encrypted traffic classification method and device based on contrast learning
CN114386079B (en) * 2022-03-23 2022-12-06 清华大学 Encrypted traffic classification method and device based on contrast learning

Also Published As

Publication number Publication date
CN110417729B (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN110417729A (en) A kind of service and application class method and system encrypting flow
Barradas et al. FlowLens: Enabling Efficient Flow Classification for ML-based Network Security Applications.
Yamansavascilar et al. Application identification via network traffic classification
CN106464577B (en) Network system, control device, communication device and communication control method
Cui et al. A session-packets-based encrypted traffic classification using capsule neural networks
CN102739457B (en) Network flow recognition system and method based on DPI (Deep Packet Inspection) and SVM (Support Vector Machine) technology
Cao et al. Detecting and mitigating DDoS attacks in SDN using spatial-temporal graph convolutional network
CN111222019B (en) Feature extraction method and device
Chen et al. Exploring a service-based normal behaviour profiling system for botnet detection
CN108123962A (en) A kind of method that BFS algorithms generation attack graph is realized using Spark
Zhao et al. Identifying known and unknown mobile application traffic using a multilevel classifier
Muliukha et al. Analysis and classification of encrypted network traffic using machine learning
Sheikh et al. Procedures, criteria, and machine learning techniques for network traffic classification: a survey
Feng et al. BotFlowMon: Learning-based, content-agnostic identification of social bot traffic flows
Xu et al. TrafficGCN: Mobile application encrypted traffic classification based on GCN
Singhal et al. State of the art review of network traffic classification based on machine learning approach
Kousar et al. DDoS attack detection system using Apache spark
Dener et al. Rfse-gru: Data balanced classification model for mobile encrypted traffic in big data environment
Shamsimukhametov et al. Are neural networks the best way for encrypted traffic classification?
Zhou et al. Classification of botnet families based on features self-learning under network traffic censorship
Gomez et al. Efficient network telemetry based on traffic awareness
CN111835720B (en) VPN flow WEB fingerprint identification method based on feature enhancement
Pereira et al. ITCM: A real time internet traffic classifier monitor
Du et al. Fenet: Roles classification of ip addresses using connection patterns
Mo et al. Encrypted traffic classification using graph convolutional networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant