CN106953854A

CN106953854A - A kind of method for building up of the darknet flow identification model based on SVM machine learning

Info

Publication number: CN106953854A
Application number: CN201710156258.4A
Authority: CN
Inventors: 苏宏; 陈周国; 丁建伟; 赵越; 郭宇斌
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2016-12-15
Filing date: 2017-03-16
Publication date: 2017-07-14
Anticipated expiration: 2037-03-16
Also published as: CN106953854B

Abstract

The invention discloses a kind of method for building up of the darknet flow identification model based on SVM machine learning, comprise the following steps：Build the flow detection model of the machine learning based on SVM；Machine learning is carried out to the parameter in flow detection model, four characteristic values of pure anonymous flow and pure non-anonymous flow are obtained；Four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model and carry out computing, the parameter of flow detection model is obtained.Compared with prior art, the positive effect of the present invention is：Pass through the inventive method, the Mathematical Modeling of Anonymizing networks data traffic identification can extremely accurate be depicted, applied in the detection of Anonymizing networks data traffic, Detection accuracy is high, computing is simply efficient, and after Anonymizing networks are upgraded, because this method uses the algorithm based on machine learning, as long as therefore re-starting study for the Anonymizing networks after upgrading, new Anonymizing networks data traffic just can be detected.

Description

A kind of method for building up of the darknet flow identification model based on SVM machine learning

Technical field

The present invention relates to a kind of method for building up of the darknet flow identification model based on SVM machine learning.

Background technology

The analysis and control of Anonymizing networks (darknet) flow, particularly flow detection are currently in the exploratory development stage, At present do not have a kind of method can all Anonymizing networks flows of effective detection, some methods may be only to certain Anonymizing networks Therefore the detection of Anonymizing networks flow is an eternal research topic, it is necessary to not effectively, even only for some version effectively, Disconnected follow-up research, is changed with the continuous upgrading for tackling Anonymizing networks, and improves the accuracy rate of Anonymizing networks flow detection, crucial It is in the accuracy of flow identification model foundation.The method that this method uses machine learning, accurately sets up one as far as possible and hides The Mathematical Modeling of name network traffics identification, it is intended to drop to the upgrading change due to Anonymizing networks most to the influence that detection band is come It is low, can be with the accurate flow for detecting Anonymizing networks.

The content of the invention

In order to overcome the disadvantages mentioned above of prior art, the invention provides a kind of darknet flow based on SVM machine learning The method for building up of identification model, it is intended to set up a dynamic change and accurately Mathematical Modeling for the flow identification of Anonymizing networks.

The technical solution adopted for the present invention to solve the technical problems is：A kind of darknet flow based on SVM machine learning The method for building up of identification model, comprises the following steps：

Step 1: building the flow detection model of the machine learning based on SVM；

Step 2: carry out machine learning to the parameter in flow detection model, obtain pure anonymous flow and pure non-hide Four characteristic values of name flow；

Step 3: four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model Computing is carried out, the parameter of flow detection model is obtained.

Compared with prior art, the positive effect of the present invention is：

By the inventive method, the Mathematical Modeling of Anonymizing networks data traffic identification can be extremely accurate depicted, should In being detected for Anonymizing networks data traffic, Detection accuracy is high, and computing is simply efficient, and after Anonymizing networks are upgraded, Because this method uses the algorithm based on machine learning, as long as therefore re-starting for the Anonymizing networks after upgrading Practise, just can detect new Anonymizing networks data traffic.

Brief description of the drawings

Examples of the present invention will be described by way of reference to the accompanying drawings, wherein：

Fig. 1 is the flow detection modular concept figure based on SVM.

Embodiment

A kind of method for building up of the darknet flow identification model based on SVM machine learning, comprises the following steps：

Step 1: model is set up

The detection of Anonymizing networks flow is implemented on the basis of founding mathematical models, but most detection at present Model, in order to solve this problem, may be successfully managed even only for some version effectively only to certain Anonymizing networks effectively The continuous upgrading change of Anonymizing networks, improves the accuracy rate of Anonymizing networks flow detection, it is necessary to set up a kind of new anonymous net Network flow detection model.

In this method, detection model uses the flow detection model of the machine learning based on SVM, Anonymizing networks flow detection Model is as shown in Figure 1：X is the characteristic vector of input in figure, and the quantity of feature is d；x_nIt is d dimensional vectors for the sample of collection；y_n For the value (1, -1) of desired output, the corresponding anonymous flow of correspondence yes or no.The model mathematic(al) representation can table of equal value It is shown as：

Y=kx+b

Wherein, k, b are the parameter of Anonymizing networks flow identification model, and k is the weight vector that d is tieed up, and b is amount of bias, in machine The device study stage needs to calculate the k and b value by substantial amounts of x and y input, once complete Anonymizing networks flow identification mould Type foundation can treat measurement of discharge and be detected, work as y>When 0, it can determine whether to treat that measurement of discharge is corresponding anonymous flow, work as y<When 0, It can determine whether to treat that measurement of discharge is not anonymous flow.

Step 2: parameter is determined

, it is necessary to carry out machine learning to determine its parameter value to the parameter in model after flow detection model is selected.Machine It will learn the correspondence pure Anonymizing networks flow of Anonymizing networks and pure non-anonymous network traffics respectively in the overall process of study Four features of (background traffic), classification, one are re-started for all flows for being collected into by host profile forms One pacp file of main frame, and with the self-study of the mathematical model parameter of following four characteristic values progress Anonymizing networks flow identification Practise, this four features are respectively：The similar messages of Ping-pong go out in UDP connections number, weights of climbing over the walls, UDP flow comentropy, flow Existing frequency.Their definition and computational methods is as follows：

(1) UDP connections number：Each Pcap files difference UDP connection numbers in unit interval：

Calculate different IP addresses quantity K altogether in each Hostprofie (pcap) file, then using K divided by Hostprofile time T, obtain this feature value；

(2) climb over the walls weights：Weights are multiplied by the number of times of the sensitive domain name mapping such as Amazon server, Dynamic Networks：

A sensitive DNS query list is safeguarded, different domain names distribute different weights, if deposited in Hostprofile Sensitivity DNS inquiry is being accessed, then is increasing corresponding weights of climbing over the walls；

(3) UDP flow comentropy：UDP flow comentropy size in average each Host profile：

Each UDP flow in Hostprofile is carried out comentropy calculating and to sum, then divided by UDP flow sum, letter Breath entropy definition be

(4) there is frequency in similar message:The similar message occurrence numbers of Ping-pong：

The similar number of continuous data bag in Hostprofile is counted, number of times adds 1 if similar.

Machine learning is finished, by four characteristic values of the pure anonymous flow learnt and pure non-anonymous flow band repeatedly Enter and carry out computing into Anonymizing networks flow identification model, finally obtain the parameter k and b in Anonymizing networks flow identification model, Model, which is set up, to be completed.

Step 3: model is verified

Freegate Anonymizing networks are built, enough Freegate are captured respectively in the Anonymizing networks environment The background traffic of anonymous flow and non-Freegate, four features of each flow are calculated for a certain main frame respectively：UDP There is frequency in the similar messages of Ping-pong in connection number, weights of climbing over the walls, UDP flow comentropy, flow, are then brought into flow inspection Computing is carried out in the Mathematical Modeling of survey, parameter k and b in model is calculated, the flow detection model of the Anonymizing networks environment is Build and complete.

It can be examined in real time in the Freegate Anonymizing networks environment using the Anonymizing networks flow detection model built Measure the data on flows of Anonymizing networks.In machine-learning process, the time of study is longer, and the data on flows of acquisition is more, structure The flow detection model built is more accurate, and follow-up flow detection is also more accurate.

Claims

1. a kind of method for building up of the darknet flow identification model based on SVM machine learning, it is characterised in that：Including following step Suddenly：

Step 2: carrying out machine learning to the parameter in flow detection model, pure anonymous flow and pure non-anonymous stream are obtained Four characteristic values of amount；

Carried out Step 3: four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model Computing, obtains the parameter of flow detection model.

2. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 1, it is special Levy and be：The mathematical equivalent expression formula of the flow detection model is：Y=kx+b, wherein：K, b are the ginseng of flow detection model Number, k is weight vector, and b is amount of bias.

3. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 2, it is special Levy and be：Four characteristic values of pure anonymous flow described in step 2 and pure non-anonymous flow are UDP connections number, the power of climbing over the walls There is frequency in value, UDP flow comentropy and similar message.

4. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 3, it is special Levy and be：The computational methods of four characteristic values are respectively：

UDP connection numbers：Different IP addresses quantity divided by Hostprofile times in each Hostprofie files altogether are obtained Arrive；

Climb over the walls weights：The number of times of sensitive domain name mapping, which is multiplied by, to be distributed to the weights of the domain name and obtains；

UDP flow comentropy：Then divided by UDP flow each UDP flow in Hostprofile is carried out comentropy calculating and to sum, Sum obtain；

There is frequency in similar message：The statistical value of the similar number of continuous data bag in Hostprofile.

5. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 4, it is special Levy and be：When treating measurement of discharge using flow detection model and being detected, if y>0, then judge to treat that measurement of discharge is corresponding anonymity Flow, if y<0, then judge to treat that measurement of discharge is not anonymous flow.