CN109617904A - A kind of HTTPS application and identification method in IPv6 network - Google Patents

A kind of HTTPS application and identification method in IPv6 network Download PDF

Info

Publication number
CN109617904A
CN109617904A CN201811637611.1A CN201811637611A CN109617904A CN 109617904 A CN109617904 A CN 109617904A CN 201811637611 A CN201811637611 A CN 201811637611A CN 109617904 A CN109617904 A CN 109617904A
Authority
CN
China
Prior art keywords
ssl
tls
classifier
probability
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811637611.1A
Other languages
Chinese (zh)
Inventor
潘吴斌
任国强
薛丽峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Tianchuang Technology Co Ltd
Original Assignee
Jiangsu Tianchuang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Tianchuang Technology Co Ltd filed Critical Jiangsu Tianchuang Technology Co Ltd
Priority to CN201811637611.1A priority Critical patent/CN109617904A/en
Publication of CN109617904A publication Critical patent/CN109617904A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/16Implementing security features at a particular protocol layer
    • H04L63/166Implementing security features at a particular protocol layer at the transport layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/618Details of network addresses
    • H04L2101/659Internet protocol version 6 [IPv6] addresses

Abstract

The present invention discloses the HTTPS application and identification method in a kind of IPv6 network, Step 1: capture SSL/TLS data packet and forming flow sample, by the data packet group stream of the same session, obtains stream feature;Step 2: establishing second order Markov chain model based on SSL/TLS flow sample;Step 3: establishing HMM model based on SSL/TLS flow sample;Step 4: according to second order Markov chain model and HMM model building weighting integrated classifier;Step 5: encrypting stream to the SSL/TLS newly arrived by weighting integrated classifier carries out identification classification.The present invention handles SSL/TLS encryption application class problem, enhances the controllability and safety of network.

Description

A kind of HTTPS application and identification method in IPv6 network
Technical field
The present invention relates to communication technique fields, and in particular to the HTTPS application and identification method in a kind of IPv6 network.
Background technique
Currently, domestic telecommunication operator has carried out IPv6 transformation to backbone network, IPv4/IPv6 dual stack is supported comprehensively;It is domestic IPv4/IPv6 dual stack is supported in the main business of Large-Scale Interconnected net company comprehensively;It is expected that transformation will be carried out comprehensively at 2019 beginning of the years, And it will be completed comprehensively in the year two thousand twenty.From the point of view of situation is transformed in current IPv6, ipv6 traffic will be broken out in a short time.Although Ipsec encryption agreement built in IPv6 agreement, practical realize is not used in the process, because IPSec is network layer encryption, and HTTPS is application layer encryption, and a HTTPS stream only needs encryption and decryption primary, if a HTTPS stream includes 10 IP packets, is needed It wants encryption and decryption 10 times.Therefore, there are still a large amount of HTTPS to encrypt application traffic in IPv6 network, encrypts flow to these Carry out the most important thing that Classification Management is IPv6 network security.
In recent years, with the rapid development of IPv6,5G, Internet of Things and industry internet, new network application is continued to bring out. Disparate networks are applied while providing convenient service for user on internet, also bring security risk, such as net to network On network there is the risk for being illegally listened, kidnap, stealing and modifying in transmitting subscriber identify.SSL/TLS agreement is guaranteeing network It comes into being under the overall background of safety, SSL/TLS agreement is established safe logical between clients and servers by encryption technology Road is widely used in the critical networks service such as online payment, social activity, it is contemplated that is up to the year two thousand twenty SSL/TLS flow accounting 80% or more.
SSL/TLS refined net is applied more and more on current internet, and is become increasingly complex, due to traditional based on end Mouth and the method based on load can not realize effective fining classification to SSL/TLS network application.SSL/TLS agreement is being protected While network security, also under cover abnormal flow, abnormal flow can easily escape DPI detection.In order to pacify in Logistics networks Better service quality is provided while complete, needs effectively to supervise all kinds of SSL/TLS encryption application on network.
It, can only be in order to effectively identify SSL/TLS encryption application since the available information of SSL/TLS encryption application is limited In SSL/TLS handshake procedure based on type of message sequence signature.In view of the type of message of handshake procedure between SSL/TLS application Sequence be it is similar, can not identify more SSL/TLS application well.
Summary of the invention
It is an object of the invention to solve at least the above problems, and provide the advantages of at least will be described later.
The object of the present invention is to provide the HTTPS application and identification methods in a kind of IPv6 network, encrypt and apply to SSL/TLS Classification problem is handled, and machine learning method is used for SSL/TLS encryption application class, enhances the controllability and safety of network Property.
Specifically, the present invention proposes that a kind of weighting Ensemble classifier method WENC solves existing SSL/TLS encryption application identification Existing deficiency.In order to enhance the ga s safety degree of application model, joint considers type of message and corresponding message in handshake procedure Size two dimensional character establishes second order Markov chain model.In addition, utilizing the big foreword of application datagrams text in data transmission procedure Column feature establishes HMM, and improves emission probability according to adjacent message size correlation.Finally, weighting integrated classifier raising is general Change performance.
In order to realize these purposes according to the present invention and other advantages, the HTTPS provided in a kind of IPv6 network is answered With recognition methods, comprising the following steps:
Step 1: capturing SSL/TLS data packet and forming flow sample, the data packet group stream of the same session obtains Flow feature;
Step 2: establishing second order Markov chain model based on SSL/TLS flow sample;
Step 3: establishing HMM model based on SSL/TLS flow sample;
Step 4: according to second order Markov chain model and HMM model building weighting integrated classifier;
Step 5: encrypting stream to the SSL/TLS newly arrived by weighting integrated classifier carries out identification classification.
Preferably, in the step 2, the type of message of SSL/TLS interaction and the two dimensional character of message size formation are utilized Sequence establishes the Fingerprint Model of application, i.e. second order Markov chain model.
Preferably, X is usedtIt indicates second order Markov chain model, estimates current state using the first two state:
P(Xt=it|Xt-1=it-1,Xt-2=it-2,...,X1=i1)=P (Xt=it|Xt-1=it-1,Xt-2=it-2) (1)
Assuming that second-order Markov chain be it is homogeneous, being transferred to time t from time t-2 and t-1 is that the time is constant, then can obtain It arrives:
P(Xt=it|Xt-1=it-1,Xt-2=it-2)=P (Xt=k | Xt-1=j, Xt-2=i)=pi-j-k (2)
Wherein, the transfer matrix between intermediate state is expressed as follows:
Wherein, set P'={ p1,p2,...,pnIndicate identification probability,T=t0,t1,...,tn∈ T, it∈ { 1,2 ..., s }, itIt is a union feature or union feature sequence, described in type of message MT and block length PT are constituted Union feature<MT, PT>, union feature indicates the state of Markov chain.
Preferably, into the probability distribution ENPD of the first two state of second order Markov chain are as follows:
Q=[q1-1,...,q1-s,q2-1,...,q2-s,...,qs-s](4)
Wherein qi-j=P (XT+1=j,XT=i);
Exit the probability distribution EXPD of most latter two state of second order Markov chain are as follows:
W=[w1-1,...,w1-s,w2-1,...,w2-s,...,ws-s] (5)
wi-j=P (XT+1=j,XT=i) indicate when in state i, time tnTerminate the probability of session;
The probability of SSL/TLS session is expressed as follows:
Obtained probability indicates the characteristic sequence of SSL/TLS session close to the degree of application model, and the bigger expression of value is worked as Preceding SSL/TLS session is closer to corresponding application model.
Preferably, in the step 3, the HMM model is indicated with a five-tuple: λ=(S, K, A, B, π), wherein S For state set, K is observation set, and A is transfer matrix, and B is observation probability matrix, and π is initial state distribution;
Seek observation sequence o1,o2...otThe probability P (O | λ) of appearance:
Wherein, O=(O1=o1,...,OM=oM) it is the observable state exported, M is the observation number in sequence, Aij= {aijThe probability matrix that network operating state shifts is represented, 1≤i, j≤N, N is status number, aij=p (uj/ui) indicate from shape State uiIt is transferred to ujProbability,1≤i≤N, B={ bimIndicate the phase obtained in given time from network operation The probability of ADPT output valve is answered, ADPT is using data packet length, bim=P (vm/ui) indicate given state uiOutput characteristic value vm Probability, at random generate π={ πi, i=1 ... N, }It is the initial probability distribution of network operating state, α t (i) =P (O1O2Ot, qt=si | λ), β i (t)=P (Ot+1...OT | qt=si, λ).
Preferably, the corresponding HMM model for giving SSL/TLS session is Happ, with set F={ F1,F2,...,FlIndicate to answer Continuous l item stream selects ADPT feature construction training pattern Happ, for unknown flow rate Fi={ g1,g2,...gr... }, it adopts Use λapp=P2(Fi|Happ) indicate that application is identified as FiProbability.
It preferably, is sequentially in time a data block S by newcomer's encryption flow point in the step 4i, and will be every A data block SiBe dimensioned to it is identical, by each data block SiConstruct classifier Ci, each classifier CiWeight with Error is inversely proportional.
Preferably, test sample is constructed, calculates test sample in classifier CiOn identification error rate, and by test sample In classifier CiOn identification error rate be set as classifier CiWeight.
Preferably, the identification error rate of test sample (x, c) is in classifier CiOn be It is classifier Ci The accuracy rate provided, x are an example of c class application, classifier CiMean square deviation be:
Assuming that random guess, in entire space-like C={ all classes }, the probability distribution P (Y) of Y indicates that X is classified as Y's Probability, the mean square deviation of the probabilistic classifier of classified instance is as follows in this way:
For determining data set, MSErIt is fixed value, classifier CiWeight wiIt calculates as follows:
wi=MSEr-MSEi (10)
If classifier CiRecognition performance is poorer than random guess, then by classifier CiWeight be set as 0 so that classifier CiIt is not used for integrated classifier, it is ensured that if error rate is larger, weight can be smaller;
The result that second order Markov chain model and HMM model integrate can be described as:
Wherein, Hi∈P1、P2
Compared with prior art, the beneficial effect that the present invention includes is:
1, recognition methods through the invention effectively carries out identification classification to SSL/TLS encryption application, and enhance network can Control property and safety.
2, the classification accuracy of recognition methods of the present invention is higher.
Further advantage, target and feature of the invention will be partially reflected by the following instructions, and part will also be by this The research and practice of invention and be understood by the person skilled in the art.
Detailed description of the invention
Fig. 1 is the message exchange schematic diagram of SSL/TLS agreement;
Fig. 2 is SSL/TLS encryption application class system architecture schematic diagram;
Fig. 3 is classification accuracy histogram;
Fig. 4 is overall merit histogram.
Specific embodiment
Present invention will be described in further detail below with reference to the accompanying drawings, to enable those skilled in the art referring to comment It can implement accordingly.
For network protocol, in entire interactive process from start to end, agreement is different in the different stages Movement, correspondingly show as the difference " state " of protocol interaction process, and this " shape for having sequencing of network protocol State " sequence is exactly accurately reflecting for network flow temporal aspect.From start to end, SSL/TLS agreement experiencings different " shapes State ", and correspondingly take different operations." state " sequence is accurately reflected with regard to the temporal aspect of SSL/TLS agreement.Such as Fig. 1 It is shown, describe the message exchange example between SSL/TLS ession for telecommunication client and server.
Initial ClientHello message between client and server contains the random number of client generation, agreement The information such as version, cipher suite.The exchange of Server Key Exchange key includes four message: server certificate, server Key exchange, client certificate and client key exchange.Then, client sends change password specification Change Cipher Spec, and encrypted using the new next message of algorithm and key pair.In response, server is according to new cryptographic specification Change password specification message and the Server Finished Message message for sending oneself complete SSL/TLS protocol handshake, double Side starts exchange using data.Alert message expiration session can be used in server.SSL/TLS most of information during shaking hands It is plaintext transmission.After SSL handshake process, only protocol type, record length and SSL/TLS version information are not encrypted, Remaining equal encrypted transmission, to guarantee the safety of intercommunication.
Although SSL/TLS agreement is cryptographic protocol, there are the format of some fixations in the load of SSL/TLS flow, give SSL/TLS encryption provides some information using identification.Ssl protocol can be divided into two sublayers: upper layer includes that SSL shakes hands association View, SSL change cryptographic specification agreement and SSL alarm protocol;Lower layer is SSL record protocol.The head of SSL interrecord structure is by content Type, version and length composition.Facilitate description in order to subsequent, according to protocol message types information to SSL/TLS protocol interaction into Row coding, encoding scheme are as shown in table 1.
1 SSL/TLS protocol encoding scheme of table
" state " feature is indicated using clear text field, for example, Application Data is 23:, ClientHello message Therefore it can establish the application model of SSL/TLS encryption stream according to " state " characteristic sequence for 22:02.The present invention considers The type of message of client-server in SSL/TLS session separately facilitates at client and server end to solve asymmetric road By problem, the flow in a direction need to be only observed.Different according to configuration, client features are slightly different, and server end feature It is representative in a network.
In order to improve the ga s safety degree for applying recognition mode, the long feature combination type of message sequence of Certificate packet is introduced Column feature is to improve the characteristic polymorphic that SSL/TLS is applied.However, the Certificate message size of different application is at some In the case of still can cluster in same class, in particular with the increase of application.
There is the single order Ma Erke established using type of message sequence signature during SSL/TLS protocol interaction in the prior art Husband's chain model identifies SSL/TLS encryption application.Although these type of message features facilitate SSL/TLS encryption application class, It may fail in some cases.Firstly, these models only consider two discrete transfering states, these states cannot be complete Ga s safety degree between SSL/TLS application is described.Second, although the type of message sequence of SSL/TLS application is that state transfer mentions Visible fingerprint is supplied, but the plyability of the fingerprint from different SSL/TLS application may be very high, therefore fingerprint matching probability is high It may still result in wrong classification.Secondly, being based on single variable, i.e. the state ability to express of the type of message of SSL/TLS session has Limit, is easy to cause the low discrimination of fingerprint.Since the Certificate message size from different application may be clustered one It rises, so mistake classification is inevitable under maximum-likelihood criterion.
In view of the above-mentioned problems, firstly, in view of current state it is not only related with original state, also with first the first two state phase It closes, it is hypodynamic that introducing second order Markov chain solves the problems, such as that single order Markov chain shows on employing fingerprint.Second, observation is shaken hands Correlation between block length and corresponding application in the process, by type of message MT and block length PT constitute union feature < MT, PT >, improve the ability to express of state in application model.Third will be introduced using HMM using data packet length ADPT feature Identification process.Obviously, ADPT is relevant to the behavior of respective application.Although ADPT feature is used alone to identify using tool There is uncertainty, but ADPT feature process for identification can be combined.This method by second order Markov chain model in conjunction with HMM, The SSL/TLS that high generalization ability is constructed by weighting integrated study encrypts application class model.
As shown in Fig. 2 the framework of the SSL/TLS application class system based on weighting integrated study, by following functions mould Block composition: flow pretreatment, learning process and assorting process.Flow preprocessing module captures SSL/TLS data packet, will be same Then the data packet group stream of session obtains stream feature, prepare for building disaggregated model.The major function of learning process module is Ensemble classifier model is established based on SSL/TLS stream sample, according to the handshake procedure and data transmission procedure of SSL/TLS protocol interaction Second order Markov chain and HMM model are established respectively, then building weighting integrated classifier.Categorization module is namely based on adding for building The SSL/TLS encryption stream that power Ensemble classifier identification is newly arrived.
According to above system, the present invention provides the HTTPS application and identification methods in a kind of IPv6 network, including following step It is rapid:
Step 1: capturing SSL/TLS data packet and forming flow sample, the data packet group stream of the same session obtains Flow feature;
Step 2: second order Markov chain model is established based on SSL/TLS flow sample, based on second order Markov chain SSL/TLS encryption method for recognizing flux only needs to observe server to the one-way flow of client, and this method is by introducing one A little new features enhance the ga s safety degree between SSL/TLS application;Utilize the type of message and message size shape of SSL/TLS interaction At two dimensional character sequence establish the Fingerprint Model of application, i.e. second order Markov chain model;As network flow si={ f1,f2,..., fmPass through set of applications P={ p1,p2,...,pn" fingerprint " model when, successively calculate network flow siIt is identified as applying p1,p2,...,pnProbability, the corresponding application of maximum probability is determined as the application that the network flow belongs to.
Specifically, with discrete random variable XtIt indicates second order Markov chain model, is estimated using the first two state Current state:
P(Xt=it|Xt-1=it-1,Xt-2=it-2,...,X1=i1)=P (Xt=it|Xt-1=it-1,Xt-2=it-2) (1)
Assuming that second-order Markov chain be it is homogeneous, being transferred to time t from time t-2 and t-1 is that the time is constant, then can obtain It arrives:
P(Xt=it|Xt-1=it-1,Xt-2=it-2)=P (Xt=k | Xt-1=j, Xt-2=i)=pi-j-k (2)
Wherein, the transfer matrix between intermediate state is expressed as follows:
Wherein, set P'={ p1,p2,...,pnIndicate identification probability,T=t0,t1,...,tn∈ T, it∈ { 1,2 ..., s }, itIt is a union feature or union feature sequence, described in type of message MT and block length PT are constituted Union feature<MT, PT>, union feature indicates the state of Markov chain.
Into the probability distribution ENPD of the first two state of second order Markov chain are as follows:
Q=[q1-1,...,q1-s,q2-1,...,q2-s,...,qs-s] (4)
Wherein qi-j=P (XT+1=j,XT=i);
Exit the probability distribution EXPD of most latter two state of second order Markov chain are as follows:
W=[w1-1,...,w1-s,w2-1,...,w2-s,...,ws-s] (5)
wi-j=P (XT+1=j,XT=i) indicate when in state i, time tnTerminate the probability of session;
The probability of SSL/TLS session is expressed as follows:
Obtained probability indicates the characteristic sequence of SSL/TLS session close to the degree of application model, and the bigger expression of value is worked as Preceding SSL/TLS session is closer to corresponding application model.
In the present invention, it is based on union feature<MT, PT>and its temporal correlation establish second order Markov chain model, directly make For two-dimentional variable<MT, PT>, without any pretreatment such as vector quantization, treatment process is more simple and efficient.
Step 3: establishing HMM model based on SSL/TLS flow sample, the HMM model is indicated with a five-tuple: λ= (S, K, A, B, π), wherein S is state set, and K is observation set, and A is transfer matrix, and B is observation probability matrix, and π is initial shape State distribution;
Seek observation sequence o1, probability P that o2...ot occurs (O | λ):
P (O | λ)=∑ NiP (qt=si, O | λ)=∑ Ni=1 α i (t) β i (t) (7)
Wherein, O=(O1=o1,...,OM=oM) it is the observable state exported, M is the observation number in sequence, Aij= {aijThe probability matrix that network operating state shifts is represented, 1≤i, j≤N, N is status number, aij=p (uj/ui) indicate from shape State uiIt is transferred to ujProbability,1≤i≤N, B={ bimIndicate the phase obtained in given time from network operation The probability of ADPT output valve is answered, ADPT is using data packet length, bim=P (vm/ui) indicate given state uiOutput characteristic value vmProbability, vmIt indicates the ADPT characteristic value in discretization, generates π={ π at randomi, i=1 ..., N }, It is the initial probability distribution of network operating state, αt(i)=P (O1O2Ot,qt=si| λ), βi(t)=P (Ot+1...OT|qt=si, λ)。
HMM model is constructed by network operating state and ADPT significant condition, gives the corresponding HMM model of SSL/TLS session For Happ, with set F={ F1,F2,...,FlIndicate the continuous l item stream applied, select ADPT feature construction training pattern Happ, For unknown flow rate Fi={ g1,g2,...gr... }, using λapp=P2(Fi|Happ) indicate that application is identified as FiProbability.
Step 4: according to second order Markov chain model and HMM model building weighting integrated classifier;Specifically, pressing first It is a data block S that newcomer, which is encrypted flow point, according to time sequencingi, and by each data block SiBe dimensioned to it is identical, By each data block SiConstruct classifier Ci, each classifier CiWeight be inversely proportional with error.
Test sample is constructed, calculates test sample in classifier CiOn identification error rate, and test sample is being classified Device CiOn identification error rate be set as classifier CiWeight.After tested, the classification results based on newest training sample, which approach, works as The distribution of preceding test sample therefore can be by measurement test sample in classifier CiOn identification error rate approximatively obtain The weight of classifier.
Specifically, the identification error rate of test sample (x, c) is in classifier CiOn be It is classifier Ci The accuracy rate provided, x are an example of c class application, classifier CiMean square deviation be:
Assuming that random guess, in entire space-like C={ all classes }, the probability distribution P (Y) of Y indicates that X is classified as Y's Probability, the mean square deviation of the probabilistic classifier of classified instance is as follows in this way:
For determining data set, MSErFixed value, for example, equally distributed two class be distributed, the classification task with The mean square deviation of machine conjecture will be 0.25.Classifier CiWeight wiIt calculates as follows:
wi=MSEr-MSEi (10)
If classifier CiRecognition performance is poorer than random guess, then by classifier CiWeight be set as 0 so that classifier CiIt is not used for integrated classifier, this setting can ensure that, if error rate is larger, weight can be smaller;
The result that second order Markov chain model and HMM model integrate can be described as:
Wherein, Hi∈P1、P2
Thus the weighting integrated classifier of second order Markov chain model and HMM model.
Step 5: encrypting stream to the SSL/TLS newly arrived by weighting integrated classifier carries out identification classification.
Application and identification method of the invention is embodied with WENC algorithm, and WENC algorithm flow pseudocode is as shown in table 2:
2 WENC pseudo-code of the algorithm of table
The algorithm is divided into two parts, and first part is to establish weighting integrated classifier (row 1-6).The building of row 2 is based on difference Feature set classifier;Row 3 calculates hiClassification error rate;The error rate of the calculating probabilistic classifier of row 4;Row 5 calculates wiAnd really Determine weighting classification device.Second part describes assorting process (row 7-10), exports final classification result according to maximum confidence.
The verification process of the method for the present invention identification classification accuracy:
Construct data set: several selected border routers capture flow, and logarithm between campus network and Internet Data preprocess such as rejects non-SSL/TLS encryption flow.Campus1 and Campus2 data set, which comes from, to be located in same campus network not With two particular routers of website.The two data sets respectively collect one week flow, 13-19 days in June, 2016, Campus1 Data set includes 139035 streams, and 2144631 data packets, Campus2 data set includes 124128 streams, 2004996 data Packet.
Grab the Web application under 15 kinds of common SSL/TLS agreements, including video, network direct broadcasting, mail, search, network Payment, social, network storage and social networks etc..Taobao includes Alipay and Taobao, and Facebook includes Facebook And Instagram, Google include Google search and Gmail.As table 3 describes the fluxion and packet number that every kind of SSL/TLS is applied. Every kind of application at least grabs thousands of streams, to ensure that disaggregated model can cover all characteristics of respective application.
3 SSL/TLS of table encrypts applied statistics information
Building standard data set: WENC method be all under any circumstance it is universal effective because it only needs SSL/TLS In indispensable handshake session information and ADPT feature.For the validity of verification method, need to prepare normal data in advance Collection.By two step marker samples, the first step searches domain name by Fiddler and uses open Web in this process Plug-in unit Fiddler, for searching for the corresponding domain name of given application.Second step extracts and analyzes some specific character strings, so as to true Surely belong to the specific stream of which application.The conditional some applications of selection in evaluation process, because the standard data set constructs Method, which is not applied for us, can not collect the application of its domain name signature.The specific word in specific application domain name is described such as table 4 Symbol string.
Specific character string in the application domain name of table 4
For comprehensive assessment WENC method, compared with following three kinds of methods.(1) single order Markov chain method (FOM). (2) second-order Markov chain (SOM) replaces the single order Markov chain in FOM with second-order Markov chain.Based on two dimensional character Second order Markov chain (TSOM), using<MT, PT>feature replaces the type of message in SOM.And TFOM is using two dimensional character Single order Markov chain.(3) the second order Markov chain method (SOCRT) for considering certificate message size introduces certificate message size and increases Add characteristic polymorphic.
In order to verify the employing fingerprint validity of WENC method, using two heterogeneous datasets Campus1 and Campus2 into Row cross validation.When using Campus1 as training dataset, Campus2 data set is for verifying.Similarly, when When Campus2 is used as training dataset, Campus1 is for verifying.
Performance Evaluation
Accuracy rate: the accuracy in order to verify WENC compares it with three kinds of algorithms (SOCRT, SOM and FOM), such as Shown in Fig. 3.It is mutually verified as trained and test data set using Campus1 and Campus2 data set.Due to WENC The advantages of being integrated with weighting classification device has better applicability and separating capacity.3 display WENC algorithms are in terms of nicety of grading Better than other three kinds of methods.In addition, the classifying quality of fingerprint method SOM and FOM are performed poor, because selected fingerprint is distinguished Property it is insufficient.Although SOCRT considers certificate message size feature, it is better than SOM method, improved feature differentiation ability is still not Foot.
Classification accuracy can only the entire data set of overall merit accuracy of identification, precision ratio and recall ratio can be with effective evaluations All kinds of classification situations, precision ratio, recall ratio are as shown in table 5, and overall merit F-Measure is as shown in Figure 4.
5 precision ratio of table and recall ratio
Obviously, WENC has preferable F-Measure to each application, since weighting Ensemble classifier device is with stronger Applicability.Although WENC method in most cases has preferable accuracy to SSL/TLS application class, but still exists The mistake classification of small probability.WENC method cause misclassification it is following it is several due to: (1) irregular SSL/TLS agreement is real Existing: the realization of many SSL/TLS agreements does not follow RFC specification and shows with common SSL/TLS agreement slightly different.(2) it takes Business device configuration: some SSL/TLS protocol messages are optional.The messaging parameter of SSL/TLS server can be configured and may As the time changes.(3) abuse of SSL/TLS agreement: the tunnel SSL/TLS be increasingly used for hiding by network configuration and The limitation of safety inspection, rather than for executing the SSL/TLS application for ensureing transmission safety.However, similar with above situation Mistake class probability is lower.In general, WENC is suitable for most of SSL/TLS application identification.
Be based on two dimensional character<MT to verify, the validity of PT>with the second order Markov chain of one-dimensional characteristic, by TSOM with SOM method is compared, and the single order Markov chain TFOM of two dimensional character and second order Markov chain TSOM is compared, campus 1 is used as training set and campus 2 to be used as test set, and the results are shown in Table 6.
The influence of 6 two-dimensional characteristics of table
Since the ga s safety degree of two dimensional character is strong, the precision of the TSOM and TFOM method based on two dimensional character is substantially better than one Dimensional feature method SOM and FOM.Compared with SOM, TSOM shows more preferable, precision raising about 20% in each application, because two Dimensional feature improves the ga s safety degree of application.Compared with TFOM, TSOM is almost performed better than in each application, and accuracy rate mentions It is high by about 4%, because second order Markov chain can preferably describe the state transfer of SSL/TLS session.
From the above mentioned, recognition methods through the invention effectively carries out identification classification to SSL/TLS encryption application, enhances net The controllability and safety of network, meanwhile, the classification accuracy of recognition methods of the present invention is higher.
Although the embodiments of the present invention have been disclosed as above, but its is not only in the description and the implementation listed With it can be fully applied to various fields suitable for the present invention, for those skilled in the art, can be easy Realize other modification, therefore without departing from the general concept defined in the claims and the equivalent scope, the present invention is simultaneously unlimited In specific details and legend shown and described herein.

Claims (9)

1. the HTTPS application and identification method in a kind of IPv6 network, which comprises the following steps:
Step 1: capturing SSL/TLS data packet and forming flow sample, by the data packet group stream of the same session, it is special to obtain stream Sign;
Step 2: establishing second order Markov chain model based on SSL/TLS flow sample;
Step 3: establishing HMM model based on SSL/TLS flow sample;
Step 4: according to second order Markov chain model and HMM model building weighting integrated classifier;
Step 5: encrypting stream to the SSL/TLS newly arrived by weighting integrated classifier carries out identification classification.
2. the HTTPS application and identification method in IPv6 network as described in claim 1, which is characterized in that in the step 2, The Fingerprint Model applied using the two dimensional character sequence foundation that the type of message and message size of SSL/TLS interaction are formed, i.e., two Rank Markov chain model.
3. the HTTPS application and identification method in IPv6 network as claimed in claim 2, which is characterized in that use XtIndicate second order Markov chain model estimates current state using the first two state:
P(Xt=it|Xt-1=it-1,Xt-2=it-2,...,X1=i1)=P (Xt=it|Xt-1=it-1,Xt-2=it-2) (1)
Assuming that second-order Markov chain be it is homogeneous, being transferred to time t from time t-2 and t-1 is that the time is constant, then can be obtained:
P(Xt=it|Xt-1=it-1,Xt-2=it-2)=P (Xt=k | Xt-1=j, Xt-2=i)=pi-j-k (2)
Wherein, the transfer matrix between intermediate state is expressed as follows:
Wherein, set P'={ p1,p2,...,pnIndicate identification probability,T=t0,t1,...,tn∈ T, it∈ { 1,2 ..., s }, itIt is a union feature or union feature sequence, type of message MT and block length PT constitute the joint Feature<MT, PT>, union feature indicates the state of Markov chain.
4. the HTTPS application and identification method in IPv6 network as claimed in claim 3, which is characterized in that enter second order The probability distribution ENPD of the first two state of Markov chain are as follows:
Q=[q1-1,...,q1-s,q2-1,...,q2-s,...,qs-s] (4)
Wherein qi-j=P (Xt+1=j,Xt=i);
Exit the probability distribution EXPD of most latter two state of second order Markov chain are as follows:
W=[w1-1,...,w1-s,w2-1,...,w2-s,...,ws-s] (5)
wi-j=P (Xt+1=j,Xt=i) indicate when in state i, time tnTerminate the probability of session;
Then the probability of SSL/TLS session is expressed as follows:
Obtained probability indicates that the characteristic sequence of SSL/TLS session close to the degree of application model, is worth bigger indicate currently SSL/TLS session is closer to corresponding application model.
5. the HTTPS application and identification method in IPv6 network as claimed in claim 4, which is characterized in that in the step 3, The HMM model is indicated with a five-tuple: λ=(S, K, A, B, π), and wherein S is state set, and K is observation set, and A is to turn Matrix is moved, B is observation probability matrix, and π is initial state distribution;
Seek observation sequence o1,o2...otThe probability P (O | λ) of appearance:
Wherein, O=(O1=o1,...,OM=oM) it is the observable state exported, M is the observation number in sequence, Aij={ aijGeneration The probability matrix of table network operating state transfer, 1≤i, j≤N, N is status number, aij=p (uj/ui) indicate from state uiTransfer To ujProbability,1≤i≤N, B={ bimIndicate that the corresponding A/D PT obtained in given time from network operation is defeated The probability being worth out, ADPT are using data packet length, bim=P (vm/ui) indicate given state uiOutput characteristic value vmProbability, It is random to generate π={ πi, i=1 ... N, }It is the initial probability distribution of network operating state, αt(i)=P (O1O2Ot,qt=si| λ), βi(t)=P (Ot+1...OT|qt=si,λ)。
6. the HTTPS application and identification method in IPv6 network as claimed in claim 5, which is characterized in that given SSL/TLS meeting The corresponding HMM model of words is Happ, with set F={ F1,F2,...,FlIndicate the continuous l item stream applied, select ADPT feature structure Build training pattern Happ, for unknown flow rate Fi={ g1,g2,...gr... }, using λapp=P2(Fi|Happ) indicate that application is marked Knowing is FiProbability.
7. the HTTPS application and identification method in IPv6 network as claimed in claim 6, which is characterized in that in the step 4, It is sequentially in time a data block S by newcomer's encryption flow pointi, and by each data block SiBe dimensioned to phase Together, by each data block SiConstruct classifier Ci, each classifier CiWeight be inversely proportional with error.
8. the HTTPS application and identification method in IPv6 network as claimed in claim 7, which is characterized in that building test sample, Test sample is calculated in classifier CiOn identification error rate, and by test sample in classifier CiOn identification error rate setting For classifier CiWeight.
9. the HTTPS application and identification method in IPv6 network as claimed in claim 8, which is characterized in that test sample (x, c) Identification error rate in classifier CiOn be It is classifier CiThe accuracy rate provided, x are the one of c class application A example, classifier CiMean square deviation be:
Assuming that random guess, in entire space-like C={ all classes }, the probability distribution P (Y) of Y indicates that X is classified as the general of Y Rate, the mean square deviation of the probabilistic classifier of classified instance is as follows in this way:
For determining data set, MSErIt is fixed value, classifier CiWeight wiIt calculates as follows:
wi=MSEr-MSEi (10)
If classifier CiRecognition performance is poorer than random guess, then by classifier CiWeight be set as 0 so that classifier CiNo It is used for integrated classifier, it is ensured that if error rate is larger, weight can be smaller;
The result that second order Markov chain model and HMM model integrate can be described as:
Wherein, Hi∈P1、P2
CN201811637611.1A 2018-12-29 2018-12-29 A kind of HTTPS application and identification method in IPv6 network Pending CN109617904A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811637611.1A CN109617904A (en) 2018-12-29 2018-12-29 A kind of HTTPS application and identification method in IPv6 network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811637611.1A CN109617904A (en) 2018-12-29 2018-12-29 A kind of HTTPS application and identification method in IPv6 network

Publications (1)

Publication Number Publication Date
CN109617904A true CN109617904A (en) 2019-04-12

Family

ID=66015396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811637611.1A Pending CN109617904A (en) 2018-12-29 2018-12-29 A kind of HTTPS application and identification method in IPv6 network

Country Status (1)

Country Link
CN (1) CN109617904A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310187A (en) * 2020-04-01 2020-06-19 深信服科技股份有限公司 Malicious software detection method and device, electronic equipment and storage medium
CN111917694A (en) * 2019-05-09 2020-11-10 中兴通讯股份有限公司 TLS encrypted traffic identification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105871832A (en) * 2016-03-29 2016-08-17 北京理工大学 Network application encrypted traffic recognition method and device based on protocol attributes
CN107274011A (en) * 2017-06-05 2017-10-20 上海电力学院 The equipment state recognition methods of comprehensive Markov model and probability net
CN108768986A (en) * 2018-05-17 2018-11-06 中国科学院信息工程研究所 A kind of encryption traffic classification method and server, computer readable storage medium
CN108900432A (en) * 2018-07-05 2018-11-27 中山大学 A kind of perception of content method based on network Flow Behavior

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105871832A (en) * 2016-03-29 2016-08-17 北京理工大学 Network application encrypted traffic recognition method and device based on protocol attributes
CN107274011A (en) * 2017-06-05 2017-10-20 上海电力学院 The equipment state recognition methods of comprehensive Markov model and probability net
CN108768986A (en) * 2018-05-17 2018-11-06 中国科学院信息工程研究所 A kind of encryption traffic classification method and server, computer readable storage medium
CN108900432A (en) * 2018-07-05 2018-11-27 中山大学 A kind of perception of content method based on network Flow Behavior

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WUBIN PAN: "WENC:HTTPS Encrypted Traffic Classification Using Weighted Ensemble Learning and Markov Chain", 《IEEE TRUSTCOM/BIGDATASE/ICESS》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111917694A (en) * 2019-05-09 2020-11-10 中兴通讯股份有限公司 TLS encrypted traffic identification method and device
CN111310187A (en) * 2020-04-01 2020-06-19 深信服科技股份有限公司 Malicious software detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Shen et al. Classification of encrypted traffic with second-order markov chains and application attribute bigrams
CN105631296B (en) A kind of safe face authentication system design method based on CNN feature extractors
Erman et al. Offline/realtime traffic classification using semi-supervised learning
Wang et al. App-net: A hybrid neural network for encrypted mobile traffic classification
Shen et al. Certificate-aware encrypted traffic classification using second-order markov chain
CN111245860A (en) Encrypted malicious flow detection method and system based on two-dimensional characteristics
Pan et al. Wenc: Https encrypted traffic classification using weighted ensemble learning and markov chain
CN113239336B (en) Privacy protection biological characteristic authentication method based on decision tree
CN110460502B (en) Application program flow identification method under VPN based on distributed feature random forest
CN113676348A (en) Network channel cracking method, device, server and storage medium
CN112270351A (en) Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification
Yan et al. Identifying wechat red packets and fund transfers via analyzing encrypted network traffic
Yang et al. Bayesian neural network based encrypted traffic classification using initial handshake packets
Wang et al. Using entropy to classify traffic more deeply
WO2023173790A1 (en) Data packet-based encrypted traffic classification system
US20220174083A1 (en) Method and device for detecting malicious activity over encrypted secure channel
Hejun et al. Encrypted network behaviors identification based on dynamic time warping and k-nearest neighbor
CN109617904A (en) A kind of HTTPS application and identification method in IPv6 network
Gu et al. Realtime Encrypted Traffic Identification using Machine Learning.
Lin et al. A novel multimodal deep learning framework for encrypted traffic classification
Ongun et al. The house that knows you: User authentication based on iot data
Khatouni et al. Integrating machine learning with off-the-shelf traffic flow features for http/https traffic classification
Liu et al. Semi-supervised encrypted traffic classification using composite features set
Liu et al. A cascade forest approach to application classification of mobile traces
Ding et al. Adversarial sample attack and defense method for encrypted traffic data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190412