CN103840983A - WEB tunnel detection method based on protocol behavior analysis - Google Patents

WEB tunnel detection method based on protocol behavior analysis Download PDF

Info

Publication number
CN103840983A
CN103840983A CN201410008778.7A CN201410008778A CN103840983A CN 103840983 A CN103840983 A CN 103840983A CN 201410008778 A CN201410008778 A CN 201410008778A CN 103840983 A CN103840983 A CN 103840983A
Authority
CN
China
Prior art keywords
length
tunnel
tcp data
data bag
https
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410008778.7A
Other languages
Chinese (zh)
Inventor
黄刘生
王飞
杨威
陈志立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Institute for Advanced Study USTC
Original Assignee
Suzhou Institute for Advanced Study USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Institute for Advanced Study USTC filed Critical Suzhou Institute for Advanced Study USTC
Priority to CN201410008778.7A priority Critical patent/CN103840983A/en
Publication of CN103840983A publication Critical patent/CN103840983A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a WEB tunnel detection method based on protocol behavior analysis. The WEB tunnel detection method comprises the following steps that (1) statistical features in HTTP and HTTPS normal communication are obtained; (2) a few tunnel sessions are generated, the tunnel sessions and a few collected normal sessions are used as training session samples, and a two-category SVM classifier is trained by using statistical features of the session samples; (3) WEB tunnel detection equipment is used for capturing suspicious HTTP and HTTPS communication flows, protocol session partition is conducted on the flows, features of suspicious protocol sessions to be detected are classified through the trained SVM classifier, and whether the sessions are in tunnel communication is determined. Compared with the network tunnel detection technique in the prior art, under the same data constraints, the detection method has higher detection efficiency.

Description

WEB tunnel detection method based on agreement behavioural analysis
Technical field
The invention belongs to Ubiquitous Network safe practice field, be specifically related to a kind of WEB tunnel detection method based on agreement behavioural analysis.
Background technology
For a long time, network tunnel technology always is a kind of major way of network attack.Utilize network tunnel technology, assailant can implant virus to target machine, reveals individual subscriber privacy information and steals classified papers.Along with Modern Network communication develop rapidly and the exponential type of network traffics increases, the disguise of network tunnel attack is also more and more higher.Owing to having easy to implement and being difficult to detect, this cyber-attack techniques always is the thorny problem in network protection and network management.Network tunnel is mainly in the application layer data of legitimate network agreement, to embed illegal communication data, thereby reaches communications cover, walks around the object of cyber-defence.Network tunnel based on plaintext application layer protocol is plaintext tunnel, and the network tunnel based on encrypting application layer protocol is ciphertext tunnel.
The existing fire compartment wall for the protection of network perimeter security and application level gateway can successfully detect simple network attack, as the unauthorized access to internal host that directly loads dangerous code and initiate from external host from malicious websites.Realize from existing technology, most of fire compartment walls are to block unreliable communication by restriction external reference IP, port or agreement with application level gateway, thereby reach the effect of packet filtering.Although these general simple and crude passive detection methods can reach the object of maintaining network safety to a certain extent by the strategy of suspicious data packet filtering, but this strategy can not be defeated some high-level network attack technologies such as network tunnel and hidden channel, and this strategy also can directly block the legitimate correspondence connection that some strange communication sources are initiated, and brings extremely disagreeableness application to experience to user.Therefore the smart packages filtering technique, detecting based on deep message in recent years becomes the main development direction of fire compartment wall and application level gateway.This technology adopts aggressive network packet to detect defence policies, therefore, in the situation that communication source is strange, can not filter roughly packet, thereby guarantee that the normal legitimate correspondence of user is interference-free.This technology, in realizing the basic network service hostile content detection ability that conventional bag filtering technique has, also can capture some hidden channels and tunneling data stream expressly well.For hidden channel, not use or the redundant field of deep packet inspection technology in can direct-detection network data message, thus find the vestige of agreement storage class covert communications.For example, for TCP(transmission control protocol) message and IP(Internet protocol) determination and analysis of header format fields.For plaintext tunnel, deep packet inspection technology can communicate essence semantic analysis with the method for mode matching based on message characteristic field, thereby finds the network tunnel under plaintext agreement.If 80 ports of the outside host's machine of for example intranet host are initiated access, this technology can communicate essential analysis to the packet content of this communication, judges whether it is to carry out real HTTP(HTML (Hypertext Markup Language)) communication.
Although existing deep packet inspection technology can meet the multiple demand of network safety prevention to a great extent, along with the fast development of network encryption agreement, this technology has also run into unprecedented challenge.Because Modern Network is more and more stronger for the protection idea of communication security and privacy, the cryptographic protocol of application layer has obtained developing very fast and popularizing in recent years, for example SSH(safety shell protocol) and HTTPS(Secure Hypertext Transfer Protocol).In cryptographic protocol, deep packet inspection technology cannot read application layer data, and therefore ciphertext tunnel becomes the blind area that network security detects.For two kinds of network tunnels, various countries scholar also conducts in-depth research from the angle of data traffic analysis, has proposed analytical method under some comparatively effective general lines.That wherein effect is best is communication fingerprint method of identification [the Maurizio Dusi based on discharge model training that the people such as Maurizio Dusi propose, Manuel Crotti, Francesco Gringoli, Luca Salgarelli, " Tunnel Hunter:Detecting application-layer tunnels with statistical fingerprinting ", 2009ELSEVIER].Although the method has all obtained nearly 99% Detection accuracy to plaintext and two kinds of network tunnels of ciphertext, but communication nature and the communication behavior to application layer protocol do not carry out the analysis of science due to the method, cause the efficiency of its discharge model training very low, in reaching testing goal, will consume a large amount of training datas and training time.Under limited data resource, the effect of the method is also unsatisfactory.Therefore, design and a kind ofly will have a good application prospect for detection method under the line of two kinds of network tunnels relatively efficiently.
Summary of the invention
The object of the invention is to provide a kind of WEB tunnel detection method based on agreement behavioural analysis, has solved under existing line the problems such as detection method need to consume a large amount of training datas and training time, detection efficiency is low.
In order to solve these problems of the prior art, technical scheme provided by the invention is as follows:
A WEB tunnel detection method based on agreement behavioural analysis, is characterized in that said method comprising the steps of:
(1) HTTP and the HTTPS proper communication flow of collection local area network Intranet, carries out protocol conversation division and the classification of tcp data bag to flow, obtains the statistical nature in HTTP and HTTPS proper communication;
(2) use HTTP and HTTPS tunnel communication tool software to generate a small amount of tunnel session, conduct training session sample together with a small amount of normal conversation having gathered; Use statistical nature in the interactive information that in session sample, TCP connects and the proper communication of having obtained extract each session request average length, request length variance, respond distribution bias, the pairing entropy of tcp data bag and the mutual information distance feature of tcp data bag of average length, response length variance, tcp data bag, by these features, two classification svm classifier devices are trained;
(3) use WEB tunnel checkout equipment to catch suspicious HTTP and https traffic flow, flow is carried out to protocol conversation division; Use statistical nature in the interactive information that TCP in suspicious protocol conversation to be detected connects and the proper communication of having obtained extract each session to be measured request average length, request length variance, respond distribution bias, the pairing entropy of tcp data bag and the mutual information distance feature of tcp data bag of average length, response length variance, tcp data bag, these features are classified by the svm classifier device training, determine whether this session is tunnel communication;
Wherein by the HTTP or https traffic of a pair of main frame, the time interval is less than or equals client resource request and the response of corresponding server thereof of 10 seconds and is denoted as protocol conversation one time.
Preferred technical scheme is: in all communication flowss that in described method step (1), hypothesis obtains, Transmission Control Protocol package definition is a tlv triple: length, interval and direction; Wherein length represents the real load length of tcp data bag, the time interval of the previous packet in this packet of time interval and this session; This packet of direction indication is to send to server or send to client from server from client; Carrying out packet classification carries out in accordance with the following steps:
1) if the real load length of tcp data bag is 0, judge that tcp data bag is that TCP controls message, irrelevant with application layer protocol, all reject;
2) if the real load length of tcp data bag is not 0, classify according to length, interval and direction.
Preferred technical scheme is: described method step 2) in adopt cumulative probability bisection method under Density Estimator probability density length and the interval to packet to classify, specifically carry out in accordance with the following steps:
A), for length value or the spacing value of each emerging tcp data bag in HTTP and HTTPS proper communication flow, suppose that the length of other tcp data bags and interval are standardized normal distribution take this value as average; Be weighted by this rule cumulative, thereby obtain the probability-distribution function at length and interval, that is:
f ^ l = 1 2 π h ^ l Σ j = 1 1460 CL j Σ i = 1 1460 C L i × e - ( l - i ) 2 2 h ^ l 2 ;
f ^ t = 1 2 π h ^ t Σ j = 1 10000 CT j Σ i = 0 10000 C T i × e - ( l - i ) 2 2 h ^ t 2 ;
Wherein, CL ifor the number of times that length i occurs in gathered HTTP and HTTPS protocol conversation, 1≤i≤1460; CT jfor the number of times that interval j occurs in gathered HTTP and HTTPS protocol conversation, 0≤j≤10000;
Figure BDA0000454817480000043
with
Figure BDA0000454817480000044
be smoothing factor,
Figure BDA0000454817480000045
σ is the standard deviation of all length value or spacing value, and n is the sample size of length value or spacing value;
B), according to the probability density function of Density Estimator gained, obtain length and the interval cumulative probability density CP in span separately:
CP t = ∫ 1 1460 f ^ l dl ;
CP t = ∫ 0 10000 f ^ t dt ;
The span of supposing length or interval is divided into b classification, passes through ∫ L i ( T i ) L i + 1 ( T i + 1 ) f ^ l ( f ^ t ) dl ( dt ) = CP l ( CP t ) b ( 0 ≤ i ≤ b - 1 ) Obtain the interval [L of this classification representative i(T i), L i+1(T i+1)] (0≤i≤b-1), wherein, L 0=1, L b=1460, T 0=0, T b=10000; Length and interval are divided into respectively to BL and BT class, the coated 2 × BL × BT class that is divided into of all tcp datas.
Preferred technical scheme is: the feature of extracting in described method step (2) or (3) forms a 7 degree of freedom characteristic vector < req avg, req var, res avg, res var, D kL, E rPP, D rMI>, wherein req avgfor request average length, it is the average length of all resource request message of being sent by client in a protocol conversation; req varfor request length variance, it is the length variance of all resource request message of being sent by client in a protocol conversation; res avgfor response average length, it is the average length of all request response messages that sent by server in a protocol conversation; res varfor response length variance, it is the length variance of all request response messages that sent by server in a protocol conversation;
Wherein D kLfor the distribution bias of tcp data bag,
Figure BDA0000454817480000051
p (i) and Q (i) represent that respectively i kind tcp data wraps in the probability occurring in suspicious session and the normal conversation that gathers, and 2 × BL × BT is the total species number of tcp data bag;
Wherein E rPPfor the pairing entropy of tcp data bag, E RPP = - &Sigma; i = 1 ( 2 &times; BL &times; BT ) 2 F ( i ) log 2 F ( i ) , ( 2 &times; BL &times; BT ) 2 For the right total quantity of tcp data bag, F (i) represents that i kind tcp data bag is to the probability occurring;
Wherein D rMIfor the mutual information distance of tcp data bag, D RMI = &Sigma; i = 1 2 &times; BL &times; BT &Sigma; j = 1 2 &times; BL &times; BT | RMI < i , j > - RMI &prime; < i , j > | &times; &alpha; ( i , j ) , RMI < x , y > = log 2 p ( x , y ) p ( x ) p ( y ) ; P (x, y) represent in suspicious session that packet is to < x, the probability of occurrence of y >, p (x) and p (y) represent in suspicious session that packet is to < x;? > and <? the probability of occurrence of y >, "? " represent any type tcp data bag; RMI < i, j >and RMI' < i, j >represent that respectively packet is to < i, the packet mutual information of j > in normal conversation that can session and gather; In the calculating of mutual information, only choose some representative packet pair, if selected packet to comprising < i, j >, α (i, j)=1, otherwise, α (i, j)=0.
Preferred technical scheme is: in described method step (3), WEB tunnel checkout equipment is placed in local area network (LAN) inside, and is connected with fire compartment wall or application level gateway; On local area network (LAN) Intranet gateway interface, WEB tunnel checkout equipment is operated and is obtained HTTP and https traffic flow by Network Mirror.
Preferred technical scheme is: in described method step (2), HTTP and HTTPS tunnel session generate by HTTPTunnel and Barracuda HTTPS Tunnel tool software, and HTTP and HTTPS normal conversation are extracted from the proper communication flow of the local area network Intranet of collection.
Technical solution of the present invention provides a kind of WEB(World Wide Web (WWW) based on agreement behavioural analysis) tunnel detection technique.Because existing plaintext tunnel and the ciphertext tunnel overwhelming majority are based on HTTP and these two kinds of WEB agreements of HTTPS, so technical solution of the present invention is mainly for detection of expressly tunnel and HTTPS ciphertext tunnel of HTTP.Certainly technical method and theoretical thought that, in the present invention, related tunnel detects are not limited to the application in WEB tunnel context of detection.On the basis of fully agreement behavioural analysis, the present invention also can be generalized to through proper transformation in the detection of the network tunnel based on other agreements and goes.
Use the technical method in the present invention, detect expressly tunnel and HTTPS ciphertext tunnel of HTTP under can be successfully online.By the understanding of science to agreement behavior and analysis, technical solution of the present invention can be extracted network flow characteristic efficiently, thereby finds tunnel communication behavior in normal HTTP and HTTPS flow.With respect to the existing the most successfully communication fingerprint method of identification based on discharge model training, the detection efficiency of technical solution of the present invention is higher.Under identical data qualification, the Detection accuracy that the present invention can reach far above communication fingerprint method of identification the Detection accuracy in HTTP plaintext tunnel and HTTPS ciphertext tunnel.
With respect to scheme of the prior art, advantage of the present invention is:
The WEB tunnel detection technique based on agreement behavioural analysis of the present invention's design not only can detect expressly tunnel, and can detect ciphertext tunnel.By the analysis to tcp data bag in network service, the present invention can carry out to the agreement behavioural characteristic of HTTP and HTTPS analysis and the extraction of science.Therefore,, than existing network tunnel detection technique, under the restriction of same data qualification, the designed detection method efficiency of the present invention is higher.
Accompanying drawing explanation
Below in conjunction with drawings and Examples, the invention will be further described:
Fig. 1 is the operational environment allocation plan that the WEB tunnel detection method based on agreement behavioural analysis is implemented.
Fig. 2 is the workflow diagram of the WEB tunnel detection method based on agreement behavioural analysis.
Embodiment
Below in conjunction with specific embodiment, such scheme is described further.Should be understood that these embodiment are not limited to limit the scope of the invention for the present invention is described.The implementation condition adopting in embodiment can be done further adjustment according to the condition of concrete producer, and not marked implementation condition is generally the condition in normal experiment.
Embodiment
The operational environment that the WEB tunnel detection method of the present embodiment based on agreement behavioural analysis implemented configures as shown in Figure 1.Tunnel checkout equipment carries out Network Mirror on the interior network interface of the application level gateway in local area network (LAN) exit, thereby copies and record HTTP and HTTPS data on flows.In selecting, network interface is the NAT(Network address translators that may exist in order to get rid of) translation of technology and NAPT(network address port) the technology IP address of bringing and the interference of port address, thus can determine exactly tunnel communication source.
The workflow of the WEB tunnel detection method based on agreement behavioural analysis as shown in Figure 2.First, by a large amount of HTTP collecting in local area network (LAN) within a period of time and HTTPS proper communication flow for carrying out the calculating of agreement statistical nature.Subsequently, extract the behavioural characteristic in suspicious HTTP and https traffic with calculating the statistical nature of gained, obtain the behavioural characteristic vector of suspicious communication.Finally, by characteristic vector by a SVM(SVMs of having trained) grader classifies, thereby determine that this suspicious communication is proper communication or tunnel communication.
Specifically comprise the following steps:
Step 1: WEB tunnel checkout equipment is placed in to local area network (LAN) inside, is connected with fire compartment wall or application level gateway.On Intranet gateway interface, carry out Network Mirror operation, catch all HTTP and https traffic flows in interior local area network (LAN) of a period of time.
Step 2: obtained all communication flowss are divided into protocol conversation, and in the HTTP or https traffic of a pair of main frame, the time interval is less than or equals client resource request and the response of corresponding server thereof of 10 seconds and all belongs to protocol conversation one time.Tcp data bag in session is carried out to packet classification by formula (1), (2), (3), (4), (5) and (6):
f ^ l = 1 2 &pi; h ^ l &Sigma; j = 1 1460 CL j &Sigma; i = 1 1460 C L i &times; e - ( l - i ) 2 2 h ^ l 2 - - - ( 1 ) ;
f ^ t = 1 2 &pi; h ^ t &Sigma; j = 1 10000 CT j &Sigma; i = 0 10000 C T i &times; e - ( l - i ) 2 2 h ^ t 2 - - - ( 2 ) ;
h ^ = ( 4 &sigma; 5 3 n ) 1 5 &ap; 1.06 &sigma;n - 1 5 - - - ( 3 ) ;
CP l = &Integral; 1 1460 f ^ l dl - - - ( 4 ) ;
CP t = &Integral; 0 10000 f ^ t dt - - - ( 5 ) ;
&Integral; L i ( T i ) L i + 1 ( T i + 1 ) f ^ l ( f ^ t ) dl ( dt ) = CP l ( CP t ) b ( 0 &le; i &le; b - 1 )
In to the calculating of HTTP and HTTPS normal protocol communication statistic information feature, mainly contain division and tcp data bag classification two parts content of protocol conversation.In the time that user uses HTTP and HTTPS to carry out access to netwoks, have an access peak and access low ebb for each Internet resources.Access peak is exactly the process that user passes through client downloads target resource and all affiliated partners thereof, and in this process, user will send a large amount of resource request message to server continuously.And access low ebb is exactly the process of after user's downloaded resources, it being read, being thought deeply or stores, so low ebb process is also known as " think time ".Conventionally user is in the time carrying out the connected reference of resource, and think time can not exceed 10 seconds, and therefore, the present invention is less than interval time or equals the resource request of 10 seconds and is considered as one section of request continuously.Based on above analysis, the present invention define one section of client continuously request and the corresponding one section of continuous response of server be a protocol conversation, and according to this data stream is carried out the division of protocol conversation.
Because the length of application layer data bag has lack of standard, span is larger in actual applications, does not have the feasibility that standardization is analyzed.Therefore, the Transmission Control Protocol packet with standardized format that the present invention is directed to carrying HTTP and HTTPS agreement is classified, thereby carries out further agreement behavioural analysis.In classification, all reject controlling message with the irrelevant real load of the application layer protocol TCP that is 0, as ACK, FIN, SYN and RST message.A Transmission Control Protocol bag can be defined as a tlv triple: length, interval and direction.Length refers to the real load length (byte) of tcp data bag, and scope is 1~1460; Interval refers to the time interval (millisecond) of the previous packet in this packet and this session, is 0~10000 according to " think time " known scope; Direction refers to that this packet is to send to server or send to client from server from client.In direction, use " 0 " to represent to send to server from client, " 1 " represents to send to client from server.On length and interval, and length that need not be absolute and interval weigh, but represent through the category label of dividing with one.For example, if this TCP message is to send to server from client, length drops in the interval of classification 2, and interval is dropped in the interval of classification 5, and this TCP message can be expressed as { 2,5,0}.In the division of length and time, adopt the cumulative probability bisection method under Density Estimator probability density.For length value or the spacing value of each appearance, all suppose that length and interval are standardized normal distribution take this value as average.Subsequently, these normal distributions are weighted cumulative, thereby obtain the probability-distribution function at length and interval, as shown in formula (1) and (2).Wherein, CL i(1≤i≤1460) and CT jthe number of times that (0≤j≤10000) occur in gathered HTTP and HTTPS protocol conversation for length i and interval j,
Figure BDA0000454817480000095
l with
Figure BDA0000454817480000096
be called as smoothing factor, conventionally can calculated and be tried to achieve by formula (3), σ be the standard deviation of all length value or spacing value.According to the probability density function of Density Estimator gained, can calculate length and the interval cumulative probability density CP in span separately by formula (4) and (5).If the span at length or interval will be divided into b classification, can obtain the interval [L of this b classification representative by formula (6) i(T i), L i+1(T i+1)] (0≤i≤b-1).Wherein, L 0=1, L b=1460, T 0=0, T b=10000.So just length and interval can be divided into respectively to BL and BT class, thereby all tcp data bags are also just divided into 2 × BL × BT class.
Step 3: by HTTPTunnel and a small amount of HTTP and the HTTPS tunnel session of Barracuda HTTPS Tunnel Software Create, and extract a small amount of normal HTTP and HTTPS protocol conversation from gathered Intranet communication flows.Use interactive information and formula (7), (8), (9) and (10) that in session, TCP connects to extract respectively agreement behavioural characteristic to this two group session;
D KL = &Sigma; i = 1 2 &times; BL &times; BT P ( i ) ln P ( i ) Q ( i ) - - - ( 7 ) ;
E RPP = - &Sigma; i = 1 ( 2 &times; BL &times; BT ) 2 F ( i ) log 2 F ( i ) - - - ( 8 ) ;
RMI < x , y > = log 2 p ( x , y ) p ( x ) p ( y ) - - - ( 9 ) ;
D RMI = &Sigma; i = 1 2 &times; BL &times; BT &Sigma; j = 1 2 &times; BL &times; BT | RMI < i , j > - RMI &prime; < i , j > | &times; &alpha; ( i , j ) - - - ( 10 ) ;
Can obtain a 7 degree of freedom characteristic vector < req to each session avg, req var, res avg, res var, D kL, E rPP, D rMI>, for the training to svm classifier device.
In characteristic extraction procedure, need to extract 7 agreement behavioural characteristic composition characteristic vectors of suspicious session traffic: request average length (req avg), request length variance (req var), response average length (res avg), response length variance (res var), the distribution bias of tcp data bag, the pairing entropy of tcp data bag and the mutual information distance of tcp data bag.Request average length is the average length of all resource request message of being sent by client in a protocol conversation; Request length variance is the length variance of all resource request message of being sent by client in a protocol conversation; Response average length is the average length of all request response messages that sent by server in a protocol conversation; Response length variance is the length variance of all request response messages that sent by server in a protocol conversation.Due in single TCP connects, the protocol interaction of HTTP and HTTPS is linear, be just can send next request message after client can be waited until the complete response message that returns to a request of server, so easily obtain the data interaction that above 4 statistics can connect from TCP.The distribution bias of tcp data bag adopts Kullback-Leibler divergence (D kL) calculate, as shown in Equation (7).Wherein, P (i) and Q (i) represent that respectively i kind tcp data wraps in the probability occurring in suspicious session and the normal conversation that gathers.Packet is less than or equal to referring to protocol conversation middle distance the order couple that two tcp data bags of 2 form.For example, for tcp data packet sequence <1,2,5,4>, can obtain 5 packet pair: <1,2>, <1,5>, <2,5>, <2,4> and <5,4>.Because tcp data bag has 2 × BL × BT class, so total (2 × BL × BT) 2plant tcp data bag pair.Pairing entropy (the E of tcp data bag rPP) can pass through formula (8) calculating, wherein, F (i) represents that i kind tcp data bag is to the probability occurring.The mutual information of tcp data bag refers to the pointwise mutual information (RMI of two packet x of a kind of packet centering and y < x, y >), can pass through formula (9) and calculate.Wherein, p (x, y) represent in suspicious session that packet is to < x, the probability of occurrence of y >, p (x) and p (y) represent in suspicious session that packet is to < x;? > and <?, the probability of occurrence of y >, "? " represent any type tcp data bag.Mutual information distance (the D of tcp data bag rMI) be right mutual information value and the absolute difference sum of these 25 packets to the mutual information value in normal protocol session of 25 packets of mutual information value maximum in suspicious protocol conversation, can pass through formula (10) and calculate.Wherein, RMI < i,j >and RMI' < i, j >represent that respectively packet is to < i, the packet mutual information of j > in normal conversation that can session and gather.If selected 25 packets are to comprising < i, j >, α (i, j)=1, otherwise, α (i, j)=0.
Under a small amount of training data, svm classifier device just can reach ideal effect.For the training of svm classifier device, need to gather a small amount of normal HTTP and the characteristic vector of HTTPS protocol conversation and the characteristic vector of HTTP and HTTPS tunnel session.The proper communication flow that normal HTTP and HTTPS protocol conversation can gather from Intranet, obtain, HTTP and HTTPS tunnel session can be used HTTPTunnel and two open source softwares of Barracuda HTTPS Tunnel to generate, and wherein HTTPTunnel can be with reference to http://sourceforge.net/projects/http-tunnel/files/http-tunnel/H TTPTunnel v1.2.1; Barracuda HTTPS Tunnel can be with reference to http://barracudadrive.com/HttpsTunnel.lsp.After training by two category feature vectors, svm classifier device can judge whether it is tunnel communication by the characteristic vector of suspicious session.Choosing of svm classifier device, the svm classifier device module that can select MATLAB software to carry, also can adopt the most popular LIBSVM grader at present.Wherein LIBSVM grader can be with reference to the explanation of http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Step 4: catch suspicious protocol conversation to be detected in network, use interactive information and formula (7), (8), (9) and (10) that in session, TCP connects to extract the 7 degree of freedom characteristic vector < req of session to be measured avg, req var, res avg, res var, D kL, E rPP, D rMI>.Characteristic vector is classified by the svm classifier device training, determine whether this session is tunnel communication.
By the analysis to tcp data bag, the agreement behavioural characteristic of HTTP and HTTPS can scientifically be analyzed and extract to WEB tunnel detection technique, thereby distinguish efficiently tunnel communication and proper communication.Because detected object is not particular content and the form of protocol massages, so be different from the deep packet inspection technology of current extensive use, the designed tunnel detection method of the present invention both can be applied to expressly tunnel, also can be applied to ciphertext tunnel.Specifically, the WEB tunnel detection technique of the present invention's design can be divided into two stages, and the first stage is the data preparatory stage, comprises step 1, step 2 and step 3; The implementation phase that second stage being detection, i.e. step 4.
Explain the course of work of this detection technique below with a specific embodiment.In this example, institute's 14462 http sessions that collected and 9154 HTTPS sessions in nearly 300 main frames network traffic of month in local area network (LAN), generate respectively 200 ftp sessions based on HTTP tunnel and HTTPS tunnel by HTTPTunnel and Barracuda HTTPS Tunnel, 200 SMTP sessions and 200 POP3 sessions simultaneously.Wherein, 14062 http sessions and 8754 HTTPS sessions are for communicate by letter with the HTTPS normal protocol calculating of statistical nature of HTTP; 300 http sessions, 300 HTTPS sessions, 100 ftp sessions, 100 SMTP sessions and 100 POP3 sessions are for the training of svm classifier device; 100 http sessions, 100 HTTPS sessions, 100 ftp sessions, 100 SMTP sessions and 100 POP3 sessions are taken as the assessment of suspicious traffic for detection of effect.
First stage: the statistical nature in 14062 http sessions and 8754 the HTTPS sessions that first uses formula (1), (2), (3), (4), (5) and (6) to extract to gather.Utilize subsequently interactive information and formula (7), (8), (9), (10) that TCP connects to extract 300 http sessions, 300 HTTPS sessions, 100 ftp sessions, the characteristic vector of 100 SMTP sessions and 100 POP3 sessions.Finally, by the characteristic vector obtaining, svm classifier device is trained.
Second stage: utilize interactive information and formula (7), (8), (9), (10) that TCP connects to extract 100 http sessions, 100 HTTPS sessions, 100 ftp sessions, the characteristic vector of 100 SMTP sessions and 100 POP3 sessions, and characteristic vector is inputted to svm classifier device and classify, thereby judge whether detection suspicious traffic is tunnel communication.Testing result and truth are made comparisons, draw Detection accuracy.
Parameter arranges:
In concrete enforcement, the outlet bandwidth of local area network (LAN) is 50Mbps, and BL is set to 20, BT and is set to 15.In Practical Calculation, excessive if these two parameter values arrange, the similitude of tcp data parlor is with regard to None-identified so; If it is too small that these two parameter values arrange, the otherness of tcp data parlor cannot embody so.Therefore, under the outlet bandwidth of 50Mbps, fully take into account network jitter, two parameters are set to respectively 20 and 15 proper.
Testing result:
Higher in order to embody the designed detection method efficiency of the present invention, gathered data are detected with communication fingerprint method of identification, to the testing result in HTTP tunnel and HTTPS tunnel respectively as shown in Tables 1 and 2.Can see, reach 500 packets when above at the communication flows of a session, the designed WEB tunnel detection technique based on agreement behavioural analysis of the present invention to the Detection accuracy in HTTP tunnel and HTTPS tunnel up to 82.5% and 91.8%.Under identical data qualification, the Detection accuracy that the present invention can reach also will be far away higher than the Detection accuracy of communication fingerprint method of identification.Therefore, the present invention has extraordinary effect and good application prospect.
The detection effect in table 1HTTP tunnel
Figure BDA0000454817480000121
Figure BDA0000454817480000131
The detection effect in table 2HTTPS tunnel
Figure BDA0000454817480000132
Above-mentioned example is only explanation technical conceive of the present invention and feature, and its object is to allow person skilled in the art can understand content of the present invention and implement according to this, can not limit the scope of the invention with this.All equivalent transformations that Spirit Essence does according to the present invention or modification, within all should being encompassed in protection scope of the present invention.

Claims (6)

1. the WEB tunnel detection method based on agreement behavioural analysis, is characterized in that said method comprising the steps of:
(1) HTTP and the HTTPS proper communication flow of collection local area network Intranet, carries out protocol conversation division and the classification of tcp data bag to flow, obtains the statistical nature in HTTP and HTTPS proper communication;
(2) use HTTP and HTTPS tunnel communication instrument to generate tunnel session, together with the normal conversation gathering as training session sample; Use statistical nature in the interactive information that in session sample, TCP connects and the proper communication of having obtained extract each session request average length, request length variance, respond distribution bias, the pairing entropy of tcp data bag and the mutual information distance feature of tcp data bag of average length, response length variance, tcp data bag, by these features, two classification svm classifier devices are trained;
(3) use WEB tunnel checkout equipment to catch suspicious HTTP and https traffic flow, flow is carried out to protocol conversation division; Use statistical nature in the interactive information that TCP in suspicious protocol conversation to be detected connects and the proper communication of having obtained extract each session to be measured request average length, request length variance, respond distribution bias, the pairing entropy of tcp data bag and the mutual information distance feature of tcp data bag of average length, response length variance, tcp data bag, these features are classified by the svm classifier device training, determine whether this session is tunnel communication;
Wherein by the HTTP or https traffic of a pair of main frame, the time interval is less than or equals client resource request and the response of corresponding server thereof of 10 seconds and is denoted as protocol conversation one time.
2. WEB tunnel detection method according to claim 1, in all communication flowss that it is characterized in that supposing to obtain in described method step (1), Transmission Control Protocol package definition is a tlv triple: length, interval and direction; Wherein length represents the real load length of tcp data bag, the time interval of the previous packet in this packet of time interval and this session; This packet of direction indication is to send to server or send to client from server from client; Carrying out packet classification carries out in accordance with the following steps:
1) if the real load length of tcp data bag is 0, judge that tcp data bag is that TCP controls message, irrelevant with application layer protocol, all reject;
2) if the real load length of tcp data bag is not 0, classify according to length, interval and direction.
3. WEB tunnel detection method according to claim 2, is characterized in that described method step 2) adopt cumulative probability bisection method under Density Estimator probability density length and the interval to packet to classify, specifically carry out in accordance with the following steps:
A), for length value or the spacing value of each emerging tcp data bag in HTTP and HTTPS proper communication flow, suppose that the length of other tcp data bags and interval are standardized normal distribution take this value as average; Be weighted by this rule cumulative, thereby obtain the probability-distribution function at length and interval, that is:
f ^ l = 1 2 &pi; h ^ l &Sigma; j = 1 1460 CL j &Sigma; i = 1 1460 C L i &times; e - ( l - i ) 2 2 h ^ l 2 ;
f ^ t = 1 2 &pi; h ^ t &Sigma; j = 1 10000 CT j &Sigma; i = 0 10000 C T i &times; e - ( l - i ) 2 2 h ^ t 2 ;
Wherein, CL ifor the number of times that length i occurs in gathered HTTP and HTTPS protocol conversation, 1≤i≤1460; CT jfor the number of times that interval j occurs in gathered HTTP and HTTPS protocol conversation, 0≤j≤10000;
Figure FDA0000454817470000023
with be smoothing factor,
Figure FDA0000454817470000025
σ is the standard deviation of all length value or spacing value, and n is the sample size of length value or spacing value;
B), according to the probability density function of Density Estimator gained, obtain length and the interval cumulative probability density CP in span separately:
CP t = &Integral; 1 1460 f ^ l dl ;
CP t = &Integral; 0 10000 f ^ t dt ;
The span of supposing length or interval is divided into b classification, passes through
Figure FDA0000454817470000028
0≤i≤b-1 obtains the interval [L of this classification representative i(T i), L i+1(T i+1)], wherein, 0≤i≤b-1, L 0=1, L b=1460, T 0=0, T b=10000; Length and interval are divided into respectively to BL and BT class, the coated 2 × BL × BT class that is divided into of all tcp datas.
4. WEB tunnel detection method according to claim 1, is characterized in that the feature of extracting in described method step (2) or (3) forms a 7 degree of freedom characteristic vector < req avg, req var, res avg, res var, D kL, E rPP, D rMI>, wherein req avgfor request average length, it is the average length of all resource request message of being sent by client in a protocol conversation; req varfor request length variance, it is the length variance of all resource request message of being sent by client in a protocol conversation; res avgfor response average length, it is the average length of all request response messages that sent by server in a protocol conversation; res varfor response length variance, it is the length variance of all request response messages that sent by server in a protocol conversation;
Wherein D kLfor the distribution bias of tcp data bag,
Figure FDA0000454817470000031
p (i) and Q (i) represent that respectively i kind tcp data wraps in the probability occurring in suspicious session and the normal conversation that gathers, and 2 × BL × BT is the total species number of tcp data bag;
Wherein E rPPfor the pairing entropy of tcp data bag, E RPP = - &Sigma; i = 1 ( 2 &times; BL &times; BT ) 2 F ( i ) log 2 F ( i ) , ( 2 &times; BL &times; BT ) 2 For the right total quantity of tcp data bag, F (i) represents that i kind tcp data bag is to the probability occurring;
Wherein D rMIfor the mutual information distance of tcp data bag, D RMI = &Sigma; i = 1 2 &times; BL &times; BT &Sigma; j = 1 2 &times; BL &times; BT | RMI < i , j > - RMI &prime; < i , j > | &times; &alpha; ( i , j ) , RMI < x , y > = log 2 p ( x , y ) p ( x ) p ( y ) ; P (x, y) represent in suspicious session that packet is to < x, the probability of occurrence of y >, p (x) and p (y) represent in suspicious session that packet is to < x;? > and <? the probability of occurrence of y >, "? " represent any type tcp data bag; RMI < i, j >and RMI' < i, j >represent that respectively packet is to < i, the packet mutual information of j > in normal conversation that can session and gather; In the calculating of mutual information, only choose some representative packet pair, if selected packet to comprising < i, j >, α (i, j)=1, otherwise, α (i, j)=0.
5. WEB tunnel detection method according to claim 1, is characterized in that in described method step (3), WEB tunnel checkout equipment is placed in local area network (LAN) inside, and is connected with fire compartment wall or application level gateway; On local area network (LAN) Intranet gateway interface, WEB tunnel checkout equipment is operated and is obtained HTTP and https traffic flow by Network Mirror.
6. WEB tunnel detection method according to claim 1; it is characterized in that the middle HTTP of described method step (2) and HTTPS tunnel session generate by HTTPTunnel and Barracuda HTTPS Tunnel tool software, HTTP and HTTPS normal conversation are extracted from the proper communication flow of the local area network Intranet of collection.
CN201410008778.7A 2014-01-09 2014-01-09 WEB tunnel detection method based on protocol behavior analysis Pending CN103840983A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410008778.7A CN103840983A (en) 2014-01-09 2014-01-09 WEB tunnel detection method based on protocol behavior analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410008778.7A CN103840983A (en) 2014-01-09 2014-01-09 WEB tunnel detection method based on protocol behavior analysis

Publications (1)

Publication Number Publication Date
CN103840983A true CN103840983A (en) 2014-06-04

Family

ID=50804145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410008778.7A Pending CN103840983A (en) 2014-01-09 2014-01-09 WEB tunnel detection method based on protocol behavior analysis

Country Status (1)

Country Link
CN (1) CN103840983A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning
CN104883346A (en) * 2014-09-28 2015-09-02 北京匡恩网络科技有限责任公司 Network equipment behavior analysis method and system
CN105791039A (en) * 2014-12-22 2016-07-20 北京启明星辰信息安全技术有限公司 Method and system for detecting suspicious tunnel based on characteristic fragment self-discovery
CN106302586A (en) * 2015-05-25 2017-01-04 中兴通讯股份有限公司 A kind of realization method and system of remote terminal instrument
CN108886515A (en) * 2016-01-08 2018-11-23 百通股份有限公司 Pass through the method and protective device for preventing the fallacious message in IP network from communicating using benign networking protocol
CN109040141A (en) * 2018-10-17 2018-12-18 腾讯科技(深圳)有限公司 Detection method, device, computer equipment and the storage medium of abnormal flow
CN109450932A (en) * 2018-12-17 2019-03-08 北京天融信网络安全技术有限公司 A kind of detection method and device
CN109831448A (en) * 2019-03-05 2019-05-31 南京理工大学 For the detection method of particular encryption web page access behavior
CN111327596A (en) * 2020-01-19 2020-06-23 深信服科技股份有限公司 Method and device for detecting hypertext transfer protocol tunnel and readable storage medium
CN111478922A (en) * 2020-04-27 2020-07-31 深信服科技股份有限公司 Method, device and equipment for detecting communication of hidden channel
CN112822204A (en) * 2021-01-28 2021-05-18 深信服科技股份有限公司 NAT detection method, device, equipment and medium
US20210218754A1 (en) * 2020-01-13 2021-07-15 Shanghai Jiaotong University System for Malicious HTTP Traffic Detection with Multi-Field Relation
CN115643087A (en) * 2022-10-24 2023-01-24 天津大学 DNS tunnel detection method based on fusion of coding characteristics and statistical behavior characteristics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG,FEI ET.AL: "A Novel Web Tunnel Detection Method Based on Protocol Behaviors", 《LECTURE NOTES OF THE INSTITUTE FOR COMPUTER SCIENCES SOCIAL INFORMATICS AND TELECOMMUNICATIONS ENGINEERING》 *
饶孟良等: "基于SVM的HTTP隧道检测技术研究", 《计算机工程》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104883346A (en) * 2014-09-28 2015-09-02 北京匡恩网络科技有限责任公司 Network equipment behavior analysis method and system
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning
CN104270392B (en) * 2014-10-24 2017-09-26 中国科学院信息工程研究所 A kind of network protocol identification method learnt based on three grader coorinated trainings and system
CN105791039A (en) * 2014-12-22 2016-07-20 北京启明星辰信息安全技术有限公司 Method and system for detecting suspicious tunnel based on characteristic fragment self-discovery
CN105791039B (en) * 2014-12-22 2019-02-26 北京启明星辰信息安全技术有限公司 A kind of suspicious tunnel detection method and system based on characteristic fragment self-discovery
CN106302586A (en) * 2015-05-25 2017-01-04 中兴通讯股份有限公司 A kind of realization method and system of remote terminal instrument
CN108886515A (en) * 2016-01-08 2018-11-23 百通股份有限公司 Pass through the method and protective device for preventing the fallacious message in IP network from communicating using benign networking protocol
CN108886515B (en) * 2016-01-08 2021-06-15 百通股份有限公司 Method and protection device for preventing malicious information communication in an IP network by utilizing a benign networking protocol
US11888865B2 (en) 2016-01-08 2024-01-30 Belden, Inc. Method and protection apparatus to prevent malicious information communication in IP networks by exploiting benign networking protocols
CN109040141A (en) * 2018-10-17 2018-12-18 腾讯科技(深圳)有限公司 Detection method, device, computer equipment and the storage medium of abnormal flow
CN109450932A (en) * 2018-12-17 2019-03-08 北京天融信网络安全技术有限公司 A kind of detection method and device
CN109831448A (en) * 2019-03-05 2019-05-31 南京理工大学 For the detection method of particular encryption web page access behavior
US11483340B2 (en) * 2020-01-13 2022-10-25 Shanghai Jiaotong University System for malicious HTTP traffic detection with multi-field relation
US20210218754A1 (en) * 2020-01-13 2021-07-15 Shanghai Jiaotong University System for Malicious HTTP Traffic Detection with Multi-Field Relation
CN111327596A (en) * 2020-01-19 2020-06-23 深信服科技股份有限公司 Method and device for detecting hypertext transfer protocol tunnel and readable storage medium
CN111327596B (en) * 2020-01-19 2022-08-05 深信服科技股份有限公司 Method and device for detecting hypertext transfer protocol tunnel and readable storage medium
CN111478922A (en) * 2020-04-27 2020-07-31 深信服科技股份有限公司 Method, device and equipment for detecting communication of hidden channel
CN112822204A (en) * 2021-01-28 2021-05-18 深信服科技股份有限公司 NAT detection method, device, equipment and medium
CN115643087A (en) * 2022-10-24 2023-01-24 天津大学 DNS tunnel detection method based on fusion of coding characteristics and statistical behavior characteristics
CN115643087B (en) * 2022-10-24 2024-04-30 天津大学 DNS tunnel detection method based on fusion of coding features and statistical behavior features

Similar Documents

Publication Publication Date Title
CN103840983A (en) WEB tunnel detection method based on protocol behavior analysis
US11936683B2 (en) Analyzing encrypted traffic behavior using contextual traffic data
Stevanovic et al. An efficient flow-based botnet detection using supervised machine learning
CN111277587A (en) Malicious encrypted traffic detection method and system based on behavior analysis
Hofmann et al. Online intrusion alert aggregation with generative data stream modeling
Kotenko et al. Agent-based modeling and simulation of botnets and botnet defense
EP3673641A1 (en) Triggering targeted scanning to detect rats and other malware
CN104168272A (en) Trojan horse detection method based on communication behavior clustering
Chawla et al. Security as a service: real-time intrusion detection in internet of things
Yan et al. Identifying wechat red packets and fund transfers via analyzing encrypted network traffic
CN104283897A (en) Trojan horse communication feature fast extraction method based on clustering analysis of multiple data streams
US20130117205A1 (en) Method of identifying a protocol giving rise to a data flow
CN104468507A (en) Torjan detection method based on uncontrolled end flow analysis
Sheikh et al. Procedures, criteria, and machine learning techniques for network traffic classification: a survey
Iglesias et al. DAT detectors: uncovering TCP/IP covert channels by descriptive analytics
Zolotukhin et al. Data mining approach for detection of DDoS attacks utilizing SSL/TLS protocol
Allard et al. Tunneling activities detection using machine learning techniques
Seo et al. A new DDoS detection model using multiple SVMs and TRA
Komisarek et al. Modern netflow network dataset with labeled attacks and detection methods
Barati et al. Features selection for IDS in encrypted traffic using genetic algorithm
CN111835720B (en) VPN flow WEB fingerprint identification method based on feature enhancement
Langthasa et al. Classification of network traffic in LAN
Ding et al. Machine learning for cybersecurity: Network-based botnet detection using time-limited flows
Shyla et al. The Geo-Spatial Distribution of Targeted Attacks sources using Honeypot Networks
CN116506216B (en) Lightweight malicious flow detection and evidence-storage method, device, equipment and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140604

RJ01 Rejection of invention patent application after publication