CN109617904A - A kind of HTTPS application and identification method in IPv6 network - Google Patents
A kind of HTTPS application and identification method in IPv6 network Download PDFInfo
- Publication number
- CN109617904A CN109617904A CN201811637611.1A CN201811637611A CN109617904A CN 109617904 A CN109617904 A CN 109617904A CN 201811637611 A CN201811637611 A CN 201811637611A CN 109617904 A CN109617904 A CN 109617904A
- Authority
- CN
- China
- Prior art keywords
- ssl
- tls
- classifier
- probability
- application
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/16—Implementing security features at a particular protocol layer
- H04L63/166—Implementing security features at a particular protocol layer at the transport layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2101/00—Indexing scheme associated with group H04L61/00
- H04L2101/60—Types of network addresses
- H04L2101/618—Details of network addresses
- H04L2101/659—Internet protocol version 6 [IPv6] addresses
Abstract
The present invention discloses the HTTPS application and identification method in a kind of IPv6 network, Step 1: capture SSL/TLS data packet and forming flow sample, by the data packet group stream of the same session, obtains stream feature;Step 2: establishing second order Markov chain model based on SSL/TLS flow sample;Step 3: establishing HMM model based on SSL/TLS flow sample;Step 4: according to second order Markov chain model and HMM model building weighting integrated classifier;Step 5: encrypting stream to the SSL/TLS newly arrived by weighting integrated classifier carries out identification classification.The present invention handles SSL/TLS encryption application class problem, enhances the controllability and safety of network.
Description
Technical field
The present invention relates to communication technique fields, and in particular to the HTTPS application and identification method in a kind of IPv6 network.
Background technique
Currently, domestic telecommunication operator has carried out IPv6 transformation to backbone network, IPv4/IPv6 dual stack is supported comprehensively;It is domestic
IPv4/IPv6 dual stack is supported in the main business of Large-Scale Interconnected net company comprehensively;It is expected that transformation will be carried out comprehensively at 2019 beginning of the years,
And it will be completed comprehensively in the year two thousand twenty.From the point of view of situation is transformed in current IPv6, ipv6 traffic will be broken out in a short time.Although
Ipsec encryption agreement built in IPv6 agreement, practical realize is not used in the process, because IPSec is network layer encryption, and
HTTPS is application layer encryption, and a HTTPS stream only needs encryption and decryption primary, if a HTTPS stream includes 10 IP packets, is needed
It wants encryption and decryption 10 times.Therefore, there are still a large amount of HTTPS to encrypt application traffic in IPv6 network, encrypts flow to these
Carry out the most important thing that Classification Management is IPv6 network security.
In recent years, with the rapid development of IPv6,5G, Internet of Things and industry internet, new network application is continued to bring out.
Disparate networks are applied while providing convenient service for user on internet, also bring security risk, such as net to network
On network there is the risk for being illegally listened, kidnap, stealing and modifying in transmitting subscriber identify.SSL/TLS agreement is guaranteeing network
It comes into being under the overall background of safety, SSL/TLS agreement is established safe logical between clients and servers by encryption technology
Road is widely used in the critical networks service such as online payment, social activity, it is contemplated that is up to the year two thousand twenty SSL/TLS flow accounting
80% or more.
SSL/TLS refined net is applied more and more on current internet, and is become increasingly complex, due to traditional based on end
Mouth and the method based on load can not realize effective fining classification to SSL/TLS network application.SSL/TLS agreement is being protected
While network security, also under cover abnormal flow, abnormal flow can easily escape DPI detection.In order to pacify in Logistics networks
Better service quality is provided while complete, needs effectively to supervise all kinds of SSL/TLS encryption application on network.
It, can only be in order to effectively identify SSL/TLS encryption application since the available information of SSL/TLS encryption application is limited
In SSL/TLS handshake procedure based on type of message sequence signature.In view of the type of message of handshake procedure between SSL/TLS application
Sequence be it is similar, can not identify more SSL/TLS application well.
Summary of the invention
It is an object of the invention to solve at least the above problems, and provide the advantages of at least will be described later.
The object of the present invention is to provide the HTTPS application and identification methods in a kind of IPv6 network, encrypt and apply to SSL/TLS
Classification problem is handled, and machine learning method is used for SSL/TLS encryption application class, enhances the controllability and safety of network
Property.
Specifically, the present invention proposes that a kind of weighting Ensemble classifier method WENC solves existing SSL/TLS encryption application identification
Existing deficiency.In order to enhance the ga s safety degree of application model, joint considers type of message and corresponding message in handshake procedure
Size two dimensional character establishes second order Markov chain model.In addition, utilizing the big foreword of application datagrams text in data transmission procedure
Column feature establishes HMM, and improves emission probability according to adjacent message size correlation.Finally, weighting integrated classifier raising is general
Change performance.
In order to realize these purposes according to the present invention and other advantages, the HTTPS provided in a kind of IPv6 network is answered
With recognition methods, comprising the following steps:
Step 1: capturing SSL/TLS data packet and forming flow sample, the data packet group stream of the same session obtains
Flow feature;
Step 2: establishing second order Markov chain model based on SSL/TLS flow sample;
Step 3: establishing HMM model based on SSL/TLS flow sample;
Step 4: according to second order Markov chain model and HMM model building weighting integrated classifier;
Step 5: encrypting stream to the SSL/TLS newly arrived by weighting integrated classifier carries out identification classification.
Preferably, in the step 2, the type of message of SSL/TLS interaction and the two dimensional character of message size formation are utilized
Sequence establishes the Fingerprint Model of application, i.e. second order Markov chain model.
Preferably, X is usedtIt indicates second order Markov chain model, estimates current state using the first two state:
P(Xt=it|Xt-1=it-1,Xt-2=it-2,...,X1=i1)=P (Xt=it|Xt-1=it-1,Xt-2=it-2)
(1)
Assuming that second-order Markov chain be it is homogeneous, being transferred to time t from time t-2 and t-1 is that the time is constant, then can obtain
It arrives:
P(Xt=it|Xt-1=it-1,Xt-2=it-2)=P (Xt=k | Xt-1=j, Xt-2=i)=pi-j-k (2)
Wherein, the transfer matrix between intermediate state is expressed as follows:
Wherein, set P'={ p1,p2,...,pnIndicate identification probability,T=t0,t1,...,tn∈ T,
it∈ { 1,2 ..., s }, itIt is a union feature or union feature sequence, described in type of message MT and block length PT are constituted
Union feature<MT, PT>, union feature indicates the state of Markov chain.
Preferably, into the probability distribution ENPD of the first two state of second order Markov chain are as follows:
Q=[q1-1,...,q1-s,q2-1,...,q2-s,...,qs-s](4)
Wherein qi-j=P (XT+1=j,XT=i);
Exit the probability distribution EXPD of most latter two state of second order Markov chain are as follows:
W=[w1-1,...,w1-s,w2-1,...,w2-s,...,ws-s] (5)
wi-j=P (XT+1=j,XT=i) indicate when in state i, time tnTerminate the probability of session;
The probability of SSL/TLS session is expressed as follows:
Obtained probability indicates the characteristic sequence of SSL/TLS session close to the degree of application model, and the bigger expression of value is worked as
Preceding SSL/TLS session is closer to corresponding application model.
Preferably, in the step 3, the HMM model is indicated with a five-tuple: λ=(S, K, A, B, π), wherein S
For state set, K is observation set, and A is transfer matrix, and B is observation probability matrix, and π is initial state distribution;
Seek observation sequence o1,o2...otThe probability P (O | λ) of appearance:
Wherein, O=(O1=o1,...,OM=oM) it is the observable state exported, M is the observation number in sequence, Aij=
{aijThe probability matrix that network operating state shifts is represented, 1≤i, j≤N, N is status number, aij=p (uj/ui) indicate from shape
State uiIt is transferred to ujProbability,1≤i≤N, B={ bimIndicate the phase obtained in given time from network operation
The probability of ADPT output valve is answered, ADPT is using data packet length, bim=P (vm/ui) indicate given state uiOutput characteristic value vm
Probability, at random generate π={ πi, i=1 ... N, }It is the initial probability distribution of network operating state, α t (i)
=P (O1O2Ot, qt=si | λ), β i (t)=P (Ot+1...OT | qt=si, λ).
Preferably, the corresponding HMM model for giving SSL/TLS session is Happ, with set F={ F1,F2,...,FlIndicate to answer
Continuous l item stream selects ADPT feature construction training pattern Happ, for unknown flow rate Fi={ g1,g2,...gr... }, it adopts
Use λapp=P2(Fi|Happ) indicate that application is identified as FiProbability.
It preferably, is sequentially in time a data block S by newcomer's encryption flow point in the step 4i, and will be every
A data block SiBe dimensioned to it is identical, by each data block SiConstruct classifier Ci, each classifier CiWeight with
Error is inversely proportional.
Preferably, test sample is constructed, calculates test sample in classifier CiOn identification error rate, and by test sample
In classifier CiOn identification error rate be set as classifier CiWeight.
Preferably, the identification error rate of test sample (x, c) is in classifier CiOn be It is classifier Ci
The accuracy rate provided, x are an example of c class application, classifier CiMean square deviation be:
Assuming that random guess, in entire space-like C={ all classes }, the probability distribution P (Y) of Y indicates that X is classified as Y's
Probability, the mean square deviation of the probabilistic classifier of classified instance is as follows in this way:
For determining data set, MSErIt is fixed value, classifier CiWeight wiIt calculates as follows:
wi=MSEr-MSEi (10)
If classifier CiRecognition performance is poorer than random guess, then by classifier CiWeight be set as 0 so that classifier
CiIt is not used for integrated classifier, it is ensured that if error rate is larger, weight can be smaller;
The result that second order Markov chain model and HMM model integrate can be described as:
Wherein, Hi∈P1、P2。
Compared with prior art, the beneficial effect that the present invention includes is:
1, recognition methods through the invention effectively carries out identification classification to SSL/TLS encryption application, and enhance network can
Control property and safety.
2, the classification accuracy of recognition methods of the present invention is higher.
Further advantage, target and feature of the invention will be partially reflected by the following instructions, and part will also be by this
The research and practice of invention and be understood by the person skilled in the art.
Detailed description of the invention
Fig. 1 is the message exchange schematic diagram of SSL/TLS agreement;
Fig. 2 is SSL/TLS encryption application class system architecture schematic diagram;
Fig. 3 is classification accuracy histogram;
Fig. 4 is overall merit histogram.
Specific embodiment
Present invention will be described in further detail below with reference to the accompanying drawings, to enable those skilled in the art referring to comment
It can implement accordingly.
For network protocol, in entire interactive process from start to end, agreement is different in the different stages
Movement, correspondingly show as the difference " state " of protocol interaction process, and this " shape for having sequencing of network protocol
State " sequence is exactly accurately reflecting for network flow temporal aspect.From start to end, SSL/TLS agreement experiencings different " shapes
State ", and correspondingly take different operations." state " sequence is accurately reflected with regard to the temporal aspect of SSL/TLS agreement.Such as Fig. 1
It is shown, describe the message exchange example between SSL/TLS ession for telecommunication client and server.
Initial ClientHello message between client and server contains the random number of client generation, agreement
The information such as version, cipher suite.The exchange of Server Key Exchange key includes four message: server certificate, server
Key exchange, client certificate and client key exchange.Then, client sends change password specification Change Cipher
Spec, and encrypted using the new next message of algorithm and key pair.In response, server is according to new cryptographic specification
Change password specification message and the Server Finished Message message for sending oneself complete SSL/TLS protocol handshake, double
Side starts exchange using data.Alert message expiration session can be used in server.SSL/TLS most of information during shaking hands
It is plaintext transmission.After SSL handshake process, only protocol type, record length and SSL/TLS version information are not encrypted,
Remaining equal encrypted transmission, to guarantee the safety of intercommunication.
Although SSL/TLS agreement is cryptographic protocol, there are the format of some fixations in the load of SSL/TLS flow, give
SSL/TLS encryption provides some information using identification.Ssl protocol can be divided into two sublayers: upper layer includes that SSL shakes hands association
View, SSL change cryptographic specification agreement and SSL alarm protocol;Lower layer is SSL record protocol.The head of SSL interrecord structure is by content
Type, version and length composition.Facilitate description in order to subsequent, according to protocol message types information to SSL/TLS protocol interaction into
Row coding, encoding scheme are as shown in table 1.
1 SSL/TLS protocol encoding scheme of table
" state " feature is indicated using clear text field, for example, Application Data is 23:, ClientHello message
Therefore it can establish the application model of SSL/TLS encryption stream according to " state " characteristic sequence for 22:02.The present invention considers
The type of message of client-server in SSL/TLS session separately facilitates at client and server end to solve asymmetric road
By problem, the flow in a direction need to be only observed.Different according to configuration, client features are slightly different, and server end feature
It is representative in a network.
In order to improve the ga s safety degree for applying recognition mode, the long feature combination type of message sequence of Certificate packet is introduced
Column feature is to improve the characteristic polymorphic that SSL/TLS is applied.However, the Certificate message size of different application is at some
In the case of still can cluster in same class, in particular with the increase of application.
There is the single order Ma Erke established using type of message sequence signature during SSL/TLS protocol interaction in the prior art
Husband's chain model identifies SSL/TLS encryption application.Although these type of message features facilitate SSL/TLS encryption application class,
It may fail in some cases.Firstly, these models only consider two discrete transfering states, these states cannot be complete
Ga s safety degree between SSL/TLS application is described.Second, although the type of message sequence of SSL/TLS application is that state transfer mentions
Visible fingerprint is supplied, but the plyability of the fingerprint from different SSL/TLS application may be very high, therefore fingerprint matching probability is high
It may still result in wrong classification.Secondly, being based on single variable, i.e. the state ability to express of the type of message of SSL/TLS session has
Limit, is easy to cause the low discrimination of fingerprint.Since the Certificate message size from different application may be clustered one
It rises, so mistake classification is inevitable under maximum-likelihood criterion.
In view of the above-mentioned problems, firstly, in view of current state it is not only related with original state, also with first the first two state phase
It closes, it is hypodynamic that introducing second order Markov chain solves the problems, such as that single order Markov chain shows on employing fingerprint.Second, observation is shaken hands
Correlation between block length and corresponding application in the process, by type of message MT and block length PT constitute union feature <
MT, PT >, improve the ability to express of state in application model.Third will be introduced using HMM using data packet length ADPT feature
Identification process.Obviously, ADPT is relevant to the behavior of respective application.Although ADPT feature is used alone to identify using tool
There is uncertainty, but ADPT feature process for identification can be combined.This method by second order Markov chain model in conjunction with HMM,
The SSL/TLS that high generalization ability is constructed by weighting integrated study encrypts application class model.
As shown in Fig. 2 the framework of the SSL/TLS application class system based on weighting integrated study, by following functions mould
Block composition: flow pretreatment, learning process and assorting process.Flow preprocessing module captures SSL/TLS data packet, will be same
Then the data packet group stream of session obtains stream feature, prepare for building disaggregated model.The major function of learning process module is
Ensemble classifier model is established based on SSL/TLS stream sample, according to the handshake procedure and data transmission procedure of SSL/TLS protocol interaction
Second order Markov chain and HMM model are established respectively, then building weighting integrated classifier.Categorization module is namely based on adding for building
The SSL/TLS encryption stream that power Ensemble classifier identification is newly arrived.
According to above system, the present invention provides the HTTPS application and identification methods in a kind of IPv6 network, including following step
It is rapid:
Step 1: capturing SSL/TLS data packet and forming flow sample, the data packet group stream of the same session obtains
Flow feature;
Step 2: second order Markov chain model is established based on SSL/TLS flow sample, based on second order Markov chain
SSL/TLS encryption method for recognizing flux only needs to observe server to the one-way flow of client, and this method is by introducing one
A little new features enhance the ga s safety degree between SSL/TLS application;Utilize the type of message and message size shape of SSL/TLS interaction
At two dimensional character sequence establish the Fingerprint Model of application, i.e. second order Markov chain model;As network flow si={ f1,f2,...,
fmPass through set of applications P={ p1,p2,...,pn" fingerprint " model when, successively calculate network flow siIt is identified as applying
p1,p2,...,pnProbability, the corresponding application of maximum probability is determined as the application that the network flow belongs to.
Specifically, with discrete random variable XtIt indicates second order Markov chain model, is estimated using the first two state
Current state:
P(Xt=it|Xt-1=it-1,Xt-2=it-2,...,X1=i1)=P (Xt=it|Xt-1=it-1,Xt-2=it-2)
(1)
Assuming that second-order Markov chain be it is homogeneous, being transferred to time t from time t-2 and t-1 is that the time is constant, then can obtain
It arrives:
P(Xt=it|Xt-1=it-1,Xt-2=it-2)=P (Xt=k | Xt-1=j, Xt-2=i)=pi-j-k (2)
Wherein, the transfer matrix between intermediate state is expressed as follows:
Wherein, set P'={ p1,p2,...,pnIndicate identification probability,T=t0,t1,...,tn∈ T,
it∈ { 1,2 ..., s }, itIt is a union feature or union feature sequence, described in type of message MT and block length PT are constituted
Union feature<MT, PT>, union feature indicates the state of Markov chain.
Into the probability distribution ENPD of the first two state of second order Markov chain are as follows:
Q=[q1-1,...,q1-s,q2-1,...,q2-s,...,qs-s] (4)
Wherein qi-j=P (XT+1=j,XT=i);
Exit the probability distribution EXPD of most latter two state of second order Markov chain are as follows:
W=[w1-1,...,w1-s,w2-1,...,w2-s,...,ws-s] (5)
wi-j=P (XT+1=j,XT=i) indicate when in state i, time tnTerminate the probability of session;
The probability of SSL/TLS session is expressed as follows:
Obtained probability indicates the characteristic sequence of SSL/TLS session close to the degree of application model, and the bigger expression of value is worked as
Preceding SSL/TLS session is closer to corresponding application model.
In the present invention, it is based on union feature<MT, PT>and its temporal correlation establish second order Markov chain model, directly make
For two-dimentional variable<MT, PT>, without any pretreatment such as vector quantization, treatment process is more simple and efficient.
Step 3: establishing HMM model based on SSL/TLS flow sample, the HMM model is indicated with a five-tuple: λ=
(S, K, A, B, π), wherein S is state set, and K is observation set, and A is transfer matrix, and B is observation probability matrix, and π is initial shape
State distribution;
Seek observation sequence o1, probability P that o2...ot occurs (O | λ):
P (O | λ)=∑ NiP (qt=si, O | λ)=∑ Ni=1 α i (t) β i (t) (7)
Wherein, O=(O1=o1,...,OM=oM) it is the observable state exported, M is the observation number in sequence, Aij=
{aijThe probability matrix that network operating state shifts is represented, 1≤i, j≤N, N is status number, aij=p (uj/ui) indicate from shape
State uiIt is transferred to ujProbability,1≤i≤N, B={ bimIndicate the phase obtained in given time from network operation
The probability of ADPT output valve is answered, ADPT is using data packet length, bim=P (vm/ui) indicate given state uiOutput characteristic value
vmProbability, vmIt indicates the ADPT characteristic value in discretization, generates π={ π at randomi, i=1 ..., N },
It is the initial probability distribution of network operating state, αt(i)=P (O1O2Ot,qt=si| λ), βi(t)=P (Ot+1...OT|qt=si,
λ)。
HMM model is constructed by network operating state and ADPT significant condition, gives the corresponding HMM model of SSL/TLS session
For Happ, with set F={ F1,F2,...,FlIndicate the continuous l item stream applied, select ADPT feature construction training pattern Happ,
For unknown flow rate Fi={ g1,g2,...gr... }, using λapp=P2(Fi|Happ) indicate that application is identified as FiProbability.
Step 4: according to second order Markov chain model and HMM model building weighting integrated classifier;Specifically, pressing first
It is a data block S that newcomer, which is encrypted flow point, according to time sequencingi, and by each data block SiBe dimensioned to it is identical,
By each data block SiConstruct classifier Ci, each classifier CiWeight be inversely proportional with error.
Test sample is constructed, calculates test sample in classifier CiOn identification error rate, and test sample is being classified
Device CiOn identification error rate be set as classifier CiWeight.After tested, the classification results based on newest training sample, which approach, works as
The distribution of preceding test sample therefore can be by measurement test sample in classifier CiOn identification error rate approximatively obtain
The weight of classifier.
Specifically, the identification error rate of test sample (x, c) is in classifier CiOn be It is classifier Ci
The accuracy rate provided, x are an example of c class application, classifier CiMean square deviation be:
Assuming that random guess, in entire space-like C={ all classes }, the probability distribution P (Y) of Y indicates that X is classified as Y's
Probability, the mean square deviation of the probabilistic classifier of classified instance is as follows in this way:
For determining data set, MSErFixed value, for example, equally distributed two class be distributed, the classification task with
The mean square deviation of machine conjecture will be 0.25.Classifier CiWeight wiIt calculates as follows:
wi=MSEr-MSEi (10)
If classifier CiRecognition performance is poorer than random guess, then by classifier CiWeight be set as 0 so that classifier
CiIt is not used for integrated classifier, this setting can ensure that, if error rate is larger, weight can be smaller;
The result that second order Markov chain model and HMM model integrate can be described as:
Wherein, Hi∈P1、P2。
Thus the weighting integrated classifier of second order Markov chain model and HMM model.
Step 5: encrypting stream to the SSL/TLS newly arrived by weighting integrated classifier carries out identification classification.
Application and identification method of the invention is embodied with WENC algorithm, and WENC algorithm flow pseudocode is as shown in table 2:
2 WENC pseudo-code of the algorithm of table
The algorithm is divided into two parts, and first part is to establish weighting integrated classifier (row 1-6).The building of row 2 is based on difference
Feature set classifier;Row 3 calculates hiClassification error rate;The error rate of the calculating probabilistic classifier of row 4;Row 5 calculates wiAnd really
Determine weighting classification device.Second part describes assorting process (row 7-10), exports final classification result according to maximum confidence.
The verification process of the method for the present invention identification classification accuracy:
Construct data set: several selected border routers capture flow, and logarithm between campus network and Internet
Data preprocess such as rejects non-SSL/TLS encryption flow.Campus1 and Campus2 data set, which comes from, to be located in same campus network not
With two particular routers of website.The two data sets respectively collect one week flow, 13-19 days in June, 2016, Campus1
Data set includes 139035 streams, and 2144631 data packets, Campus2 data set includes 124128 streams, 2004996 data
Packet.
Grab the Web application under 15 kinds of common SSL/TLS agreements, including video, network direct broadcasting, mail, search, network
Payment, social, network storage and social networks etc..Taobao includes Alipay and Taobao, and Facebook includes Facebook
And Instagram, Google include Google search and Gmail.As table 3 describes the fluxion and packet number that every kind of SSL/TLS is applied.
Every kind of application at least grabs thousands of streams, to ensure that disaggregated model can cover all characteristics of respective application.
3 SSL/TLS of table encrypts applied statistics information
Building standard data set: WENC method be all under any circumstance it is universal effective because it only needs SSL/TLS
In indispensable handshake session information and ADPT feature.For the validity of verification method, need to prepare normal data in advance
Collection.By two step marker samples, the first step searches domain name by Fiddler and uses open Web in this process
Plug-in unit Fiddler, for searching for the corresponding domain name of given application.Second step extracts and analyzes some specific character strings, so as to true
Surely belong to the specific stream of which application.The conditional some applications of selection in evaluation process, because the standard data set constructs
Method, which is not applied for us, can not collect the application of its domain name signature.The specific word in specific application domain name is described such as table 4
Symbol string.
Specific character string in the application domain name of table 4
For comprehensive assessment WENC method, compared with following three kinds of methods.(1) single order Markov chain method (FOM).
(2) second-order Markov chain (SOM) replaces the single order Markov chain in FOM with second-order Markov chain.Based on two dimensional character
Second order Markov chain (TSOM), using<MT, PT>feature replaces the type of message in SOM.And TFOM is using two dimensional character
Single order Markov chain.(3) the second order Markov chain method (SOCRT) for considering certificate message size introduces certificate message size and increases
Add characteristic polymorphic.
In order to verify the employing fingerprint validity of WENC method, using two heterogeneous datasets Campus1 and Campus2 into
Row cross validation.When using Campus1 as training dataset, Campus2 data set is for verifying.Similarly, when
When Campus2 is used as training dataset, Campus1 is for verifying.
Performance Evaluation
Accuracy rate: the accuracy in order to verify WENC compares it with three kinds of algorithms (SOCRT, SOM and FOM), such as
Shown in Fig. 3.It is mutually verified as trained and test data set using Campus1 and Campus2 data set.Due to WENC
The advantages of being integrated with weighting classification device has better applicability and separating capacity.3 display WENC algorithms are in terms of nicety of grading
Better than other three kinds of methods.In addition, the classifying quality of fingerprint method SOM and FOM are performed poor, because selected fingerprint is distinguished
Property it is insufficient.Although SOCRT considers certificate message size feature, it is better than SOM method, improved feature differentiation ability is still not
Foot.
Classification accuracy can only the entire data set of overall merit accuracy of identification, precision ratio and recall ratio can be with effective evaluations
All kinds of classification situations, precision ratio, recall ratio are as shown in table 5, and overall merit F-Measure is as shown in Figure 4.
5 precision ratio of table and recall ratio
Obviously, WENC has preferable F-Measure to each application, since weighting Ensemble classifier device is with stronger
Applicability.Although WENC method in most cases has preferable accuracy to SSL/TLS application class, but still exists
The mistake classification of small probability.WENC method cause misclassification it is following it is several due to: (1) irregular SSL/TLS agreement is real
Existing: the realization of many SSL/TLS agreements does not follow RFC specification and shows with common SSL/TLS agreement slightly different.(2) it takes
Business device configuration: some SSL/TLS protocol messages are optional.The messaging parameter of SSL/TLS server can be configured and may
As the time changes.(3) abuse of SSL/TLS agreement: the tunnel SSL/TLS be increasingly used for hiding by network configuration and
The limitation of safety inspection, rather than for executing the SSL/TLS application for ensureing transmission safety.However, similar with above situation
Mistake class probability is lower.In general, WENC is suitable for most of SSL/TLS application identification.
Be based on two dimensional character<MT to verify, the validity of PT>with the second order Markov chain of one-dimensional characteristic, by TSOM with
SOM method is compared, and the single order Markov chain TFOM of two dimensional character and second order Markov chain TSOM is compared, campus
1 is used as training set and campus 2 to be used as test set, and the results are shown in Table 6.
The influence of 6 two-dimensional characteristics of table
Since the ga s safety degree of two dimensional character is strong, the precision of the TSOM and TFOM method based on two dimensional character is substantially better than one
Dimensional feature method SOM and FOM.Compared with SOM, TSOM shows more preferable, precision raising about 20% in each application, because two
Dimensional feature improves the ga s safety degree of application.Compared with TFOM, TSOM is almost performed better than in each application, and accuracy rate mentions
It is high by about 4%, because second order Markov chain can preferably describe the state transfer of SSL/TLS session.
From the above mentioned, recognition methods through the invention effectively carries out identification classification to SSL/TLS encryption application, enhances net
The controllability and safety of network, meanwhile, the classification accuracy of recognition methods of the present invention is higher.
Although the embodiments of the present invention have been disclosed as above, but its is not only in the description and the implementation listed
With it can be fully applied to various fields suitable for the present invention, for those skilled in the art, can be easy
Realize other modification, therefore without departing from the general concept defined in the claims and the equivalent scope, the present invention is simultaneously unlimited
In specific details and legend shown and described herein.
Claims (9)
1. the HTTPS application and identification method in a kind of IPv6 network, which comprises the following steps:
Step 1: capturing SSL/TLS data packet and forming flow sample, by the data packet group stream of the same session, it is special to obtain stream
Sign;
Step 2: establishing second order Markov chain model based on SSL/TLS flow sample;
Step 3: establishing HMM model based on SSL/TLS flow sample;
Step 4: according to second order Markov chain model and HMM model building weighting integrated classifier;
Step 5: encrypting stream to the SSL/TLS newly arrived by weighting integrated classifier carries out identification classification.
2. the HTTPS application and identification method in IPv6 network as described in claim 1, which is characterized in that in the step 2,
The Fingerprint Model applied using the two dimensional character sequence foundation that the type of message and message size of SSL/TLS interaction are formed, i.e., two
Rank Markov chain model.
3. the HTTPS application and identification method in IPv6 network as claimed in claim 2, which is characterized in that use XtIndicate second order
Markov chain model estimates current state using the first two state:
P(Xt=it|Xt-1=it-1,Xt-2=it-2,...,X1=i1)=P (Xt=it|Xt-1=it-1,Xt-2=it-2) (1)
Assuming that second-order Markov chain be it is homogeneous, being transferred to time t from time t-2 and t-1 is that the time is constant, then can be obtained:
P(Xt=it|Xt-1=it-1,Xt-2=it-2)=P (Xt=k | Xt-1=j, Xt-2=i)=pi-j-k (2)
Wherein, the transfer matrix between intermediate state is expressed as follows:
Wherein, set P'={ p1,p2,...,pnIndicate identification probability,T=t0,t1,...,tn∈ T, it∈
{ 1,2 ..., s }, itIt is a union feature or union feature sequence, type of message MT and block length PT constitute the joint
Feature<MT, PT>, union feature indicates the state of Markov chain.
4. the HTTPS application and identification method in IPv6 network as claimed in claim 3, which is characterized in that enter second order
The probability distribution ENPD of the first two state of Markov chain are as follows:
Q=[q1-1,...,q1-s,q2-1,...,q2-s,...,qs-s] (4)
Wherein qi-j=P (Xt+1=j,Xt=i);
Exit the probability distribution EXPD of most latter two state of second order Markov chain are as follows:
W=[w1-1,...,w1-s,w2-1,...,w2-s,...,ws-s] (5)
wi-j=P (Xt+1=j,Xt=i) indicate when in state i, time tnTerminate the probability of session;
Then the probability of SSL/TLS session is expressed as follows:
Obtained probability indicates that the characteristic sequence of SSL/TLS session close to the degree of application model, is worth bigger indicate currently
SSL/TLS session is closer to corresponding application model.
5. the HTTPS application and identification method in IPv6 network as claimed in claim 4, which is characterized in that in the step 3,
The HMM model is indicated with a five-tuple: λ=(S, K, A, B, π), and wherein S is state set, and K is observation set, and A is to turn
Matrix is moved, B is observation probability matrix, and π is initial state distribution;
Seek observation sequence o1,o2...otThe probability P (O | λ) of appearance:
Wherein, O=(O1=o1,...,OM=oM) it is the observable state exported, M is the observation number in sequence, Aij={ aijGeneration
The probability matrix of table network operating state transfer, 1≤i, j≤N, N is status number, aij=p (uj/ui) indicate from state uiTransfer
To ujProbability,1≤i≤N, B={ bimIndicate that the corresponding A/D PT obtained in given time from network operation is defeated
The probability being worth out, ADPT are using data packet length, bim=P (vm/ui) indicate given state uiOutput characteristic value vmProbability,
It is random to generate π={ πi, i=1 ... N, }It is the initial probability distribution of network operating state, αt(i)=P
(O1O2Ot,qt=si| λ), βi(t)=P (Ot+1...OT|qt=si,λ)。
6. the HTTPS application and identification method in IPv6 network as claimed in claim 5, which is characterized in that given SSL/TLS meeting
The corresponding HMM model of words is Happ, with set F={ F1,F2,...,FlIndicate the continuous l item stream applied, select ADPT feature structure
Build training pattern Happ, for unknown flow rate Fi={ g1,g2,...gr... }, using λapp=P2(Fi|Happ) indicate that application is marked
Knowing is FiProbability.
7. the HTTPS application and identification method in IPv6 network as claimed in claim 6, which is characterized in that in the step 4,
It is sequentially in time a data block S by newcomer's encryption flow pointi, and by each data block SiBe dimensioned to phase
Together, by each data block SiConstruct classifier Ci, each classifier CiWeight be inversely proportional with error.
8. the HTTPS application and identification method in IPv6 network as claimed in claim 7, which is characterized in that building test sample,
Test sample is calculated in classifier CiOn identification error rate, and by test sample in classifier CiOn identification error rate setting
For classifier CiWeight.
9. the HTTPS application and identification method in IPv6 network as claimed in claim 8, which is characterized in that test sample (x, c)
Identification error rate in classifier CiOn be It is classifier CiThe accuracy rate provided, x are the one of c class application
A example, classifier CiMean square deviation be:
Assuming that random guess, in entire space-like C={ all classes }, the probability distribution P (Y) of Y indicates that X is classified as the general of Y
Rate, the mean square deviation of the probabilistic classifier of classified instance is as follows in this way:
For determining data set, MSErIt is fixed value, classifier CiWeight wiIt calculates as follows:
wi=MSEr-MSEi (10)
If classifier CiRecognition performance is poorer than random guess, then by classifier CiWeight be set as 0 so that classifier CiNo
It is used for integrated classifier, it is ensured that if error rate is larger, weight can be smaller;
The result that second order Markov chain model and HMM model integrate can be described as:
Wherein, Hi∈P1、P2。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811637611.1A CN109617904A (en) | 2018-12-29 | 2018-12-29 | A kind of HTTPS application and identification method in IPv6 network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811637611.1A CN109617904A (en) | 2018-12-29 | 2018-12-29 | A kind of HTTPS application and identification method in IPv6 network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109617904A true CN109617904A (en) | 2019-04-12 |
Family
ID=66015396
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811637611.1A Pending CN109617904A (en) | 2018-12-29 | 2018-12-29 | A kind of HTTPS application and identification method in IPv6 network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109617904A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310187A (en) * | 2020-04-01 | 2020-06-19 | 深信服科技股份有限公司 | Malicious software detection method and device, electronic equipment and storage medium |
CN111917694A (en) * | 2019-05-09 | 2020-11-10 | 中兴通讯股份有限公司 | TLS encrypted traffic identification method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105871832A (en) * | 2016-03-29 | 2016-08-17 | 北京理工大学 | Network application encrypted traffic recognition method and device based on protocol attributes |
CN107274011A (en) * | 2017-06-05 | 2017-10-20 | 上海电力学院 | The equipment state recognition methods of comprehensive Markov model and probability net |
CN108768986A (en) * | 2018-05-17 | 2018-11-06 | 中国科学院信息工程研究所 | A kind of encryption traffic classification method and server, computer readable storage medium |
CN108900432A (en) * | 2018-07-05 | 2018-11-27 | 中山大学 | A kind of perception of content method based on network Flow Behavior |
-
2018
- 2018-12-29 CN CN201811637611.1A patent/CN109617904A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105871832A (en) * | 2016-03-29 | 2016-08-17 | 北京理工大学 | Network application encrypted traffic recognition method and device based on protocol attributes |
CN107274011A (en) * | 2017-06-05 | 2017-10-20 | 上海电力学院 | The equipment state recognition methods of comprehensive Markov model and probability net |
CN108768986A (en) * | 2018-05-17 | 2018-11-06 | 中国科学院信息工程研究所 | A kind of encryption traffic classification method and server, computer readable storage medium |
CN108900432A (en) * | 2018-07-05 | 2018-11-27 | 中山大学 | A kind of perception of content method based on network Flow Behavior |
Non-Patent Citations (1)
Title |
---|
WUBIN PAN: "WENC:HTTPS Encrypted Traffic Classification Using Weighted Ensemble Learning and Markov Chain", 《IEEE TRUSTCOM/BIGDATASE/ICESS》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111917694A (en) * | 2019-05-09 | 2020-11-10 | 中兴通讯股份有限公司 | TLS encrypted traffic identification method and device |
CN111310187A (en) * | 2020-04-01 | 2020-06-19 | 深信服科技股份有限公司 | Malicious software detection method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shen et al. | Classification of encrypted traffic with second-order markov chains and application attribute bigrams | |
CN105631296B (en) | A kind of safe face authentication system design method based on CNN feature extractors | |
Erman et al. | Offline/realtime traffic classification using semi-supervised learning | |
Wang et al. | App-net: A hybrid neural network for encrypted mobile traffic classification | |
Shen et al. | Certificate-aware encrypted traffic classification using second-order markov chain | |
CN111245860A (en) | Encrypted malicious flow detection method and system based on two-dimensional characteristics | |
Pan et al. | Wenc: Https encrypted traffic classification using weighted ensemble learning and markov chain | |
CN113239336B (en) | Privacy protection biological characteristic authentication method based on decision tree | |
CN110460502B (en) | Application program flow identification method under VPN based on distributed feature random forest | |
CN113676348A (en) | Network channel cracking method, device, server and storage medium | |
CN112270351A (en) | Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification | |
Yan et al. | Identifying wechat red packets and fund transfers via analyzing encrypted network traffic | |
Yang et al. | Bayesian neural network based encrypted traffic classification using initial handshake packets | |
Wang et al. | Using entropy to classify traffic more deeply | |
WO2023173790A1 (en) | Data packet-based encrypted traffic classification system | |
US20220174083A1 (en) | Method and device for detecting malicious activity over encrypted secure channel | |
Hejun et al. | Encrypted network behaviors identification based on dynamic time warping and k-nearest neighbor | |
CN109617904A (en) | A kind of HTTPS application and identification method in IPv6 network | |
Gu et al. | Realtime Encrypted Traffic Identification using Machine Learning. | |
Lin et al. | A novel multimodal deep learning framework for encrypted traffic classification | |
Ongun et al. | The house that knows you: User authentication based on iot data | |
Khatouni et al. | Integrating machine learning with off-the-shelf traffic flow features for http/https traffic classification | |
Liu et al. | Semi-supervised encrypted traffic classification using composite features set | |
Liu et al. | A cascade forest approach to application classification of mobile traces | |
Ding et al. | Adversarial sample attack and defense method for encrypted traffic data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190412 |