CN106953854B - A kind of method for building up of the darknet flow identification model based on SVM machine learning - Google Patents

A kind of method for building up of the darknet flow identification model based on SVM machine learning Download PDF

Info

Publication number
CN106953854B
CN106953854B CN201710156258.4A CN201710156258A CN106953854B CN 106953854 B CN106953854 B CN 106953854B CN 201710156258 A CN201710156258 A CN 201710156258A CN 106953854 B CN106953854 B CN 106953854B
Authority
CN
China
Prior art keywords
flow
machine learning
detection model
anonymous
building
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710156258.4A
Other languages
Chinese (zh)
Other versions
CN106953854A (en
Inventor
苏宏
陈周国
丁建伟
赵越
郭宇斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Publication of CN106953854A publication Critical patent/CN106953854A/en
Application granted granted Critical
Publication of CN106953854B publication Critical patent/CN106953854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The method for building up for the darknet flow identification model based on SVM machine learning that the invention discloses a kind of, includes the following steps: the flow detection model for constructing the machine learning based on SVM;Machine learning is carried out to the parameter in flow detection model, obtains four characteristic values of pure anonymous flow and pure non-anonymous flow;Four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model and carry out operation, obtain the parameter of flow detection model.Compared with prior art, the positive effect of the present invention is: by the method for the invention, the mathematical model of Anonymizing networks data traffic identification can extremely accurate be depicted, applied in the detection of Anonymizing networks data traffic, Detection accuracy is high, and operation is simple and efficient, and after Anonymizing networks upgrading, since this method is using the algorithm based on machine learning, as long as re-starting study for the Anonymizing networks after upgrading, new Anonymizing networks data traffic can be detected.

Description

A kind of method for building up of the darknet flow identification model based on SVM machine learning
Technical field
The method for building up for the darknet flow identification model based on SVM machine learning that the present invention relates to a kind of.
Background technique
The analysis and control of Anonymizing networks (darknet) flow, especially flow detection are currently in the exploratory development stage, All Anonymizing networks flows can be effectively detected there is no a kind of method at present, some methods may be only to certain Anonymizing networks Effectively, effective even only for some version, therefore the detection of Anonymizing networks flow is an eternal research topic, is needed not Disconnected follow-up research to cope with the continuous upgrading variation of Anonymizing networks, and improves the accuracy rate of Anonymizing networks flow detection, crucial It is in the accuracy of flow identification model foundation.The method that this method uses machine learning, accurately establishes one as far as possible and hides The mathematical model of name network flow identification, it is intended to will be dropped to most since the upgrading of Anonymizing networks changes to detection bring influence It is low, it accurate can detect the flow of Anonymizing networks.
Summary of the invention
In order to overcome the disadvantages mentioned above of the prior art, the present invention provides a kind of darknet flows based on SVM machine learning The method for building up of identification model, it is intended to establish a dynamic change and accurate mathematical model for the flow identification of Anonymizing networks.
The technical solution adopted by the present invention to solve the technical problems is: a kind of darknet flow based on SVM machine learning The method for building up of identification model, includes the following steps:
Step 1: the flow detection model of machine learning of the building based on SVM;
Step 2: carry out machine learning to the parameter in flow detection model, obtains pure anonymous flow and pure non-hide Four characteristic values of name flow;
Step 3: four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model Operation is carried out, the parameter of flow detection model is obtained.
Compared with prior art, the positive effect of the present invention is:
By the method for the invention, the mathematical model that can extremely accurate depict the identification of Anonymizing networks data traffic, is answered For in the detection of Anonymizing networks data traffic, Detection accuracy to be high, and operation is simple and efficient, and after Anonymizing networks upgrade, Since this method is using the algorithm based on machine learning, as long as re-starting for the Anonymizing networks after upgrading It practises, can detect new Anonymizing networks data traffic.
Detailed description of the invention
Examples of the present invention will be described by way of reference to the accompanying drawings, in which:
Fig. 1 is the flow detection modular concept figure based on SVM.
Specific embodiment
A kind of method for building up of the darknet flow identification model based on SVM machine learning, includes the following steps:
Step 1: model foundation
The detection of Anonymizing networks flow is to implement on the basis of founding mathematical models, but mostly detect at present Model may be only effective to certain Anonymizing networks, effective even only for some version, in order to solve this problem, successfully manages The continuous upgrading of Anonymizing networks changes, and improves the accuracy rate of Anonymizing networks flow detection, needs to establish a kind of novel anonymous net Network flow detection model.
In this method, detection model uses the flow detection model of the machine learning based on SVM, Anonymizing networks flow detection Model as shown in Figure 1: in figure x be input feature vector, the quantity of feature is d;xnIt is d dimensional vector for the sample of acquisition;yn For the value (1, -1) of desired output, yes or no anonymous flow accordingly is corresponded to.The model mathematic(al) representation can be with equivalent table It is shown as:
Y=kx+b
Wherein, k, b are the parameter of Anonymizing networks flow identification model, and k is the weight vector of d dimension, and b is amount of bias, in machine The device study stage needs to calculate the value of the k and b by the input of a large amount of x and y, once completing Anonymizing networks flow identifies mould Type foundation can treat measurement of discharge and be detected, and as y>0, can determine whether to be corresponding anonymous flow to measurement of discharge, as y<0, It can determine whether not being anonymous flow to measurement of discharge.
Step 2: parameter determines
After flow detection model is selected, need to carry out the parameter in model machine learning to determine its parameter value.Machine It will learn the corresponding pure Anonymizing networks flow of Anonymizing networks and pure non-anonymous network flow in the overall process of study respectively Four features of (background traffic) re-start classification by host profile format for all flows being collected into, and one One pacp file of host, and with the self-study of the mathematical model parameter of following four characteristic values progress Anonymizing networks flow identification Practise, this four features are respectively: UDP connection number, weight of climbing over the walls, UDP flow comentropy, the similar message of Ping-pong goes out in flow Existing frequency.Their definition and calculation method is as follows:
(1) UDP connection number: each Pcap file difference UDP connection number in the unit time:
Calculate different IP addresses quantity K in total in each Hostprofie (pcap) file, then using K divided by Hostprofile time T, obtains this feature value;
(2) it climbs over the walls weight: to the number of the sensitive domain name mapping such as Amazon server, Dynamic Networks multiplied by weight:
Safeguard a sensitive DNS query list, different domain names distributes different weights, if deposited in Hostprofile In the inquiry of access sensitivity DNS, then increase weight of climbing over the walls accordingly;
(3) UDP flow comentropy: UDP flow comentropy size in average each Host profile:
Comentropy calculating is carried out to each UDP flow in Hostprofile and is summed, then divided by the sum of UDP flow, letter Breath entropy definition be
(4) there is frequency in similar message: the similar message frequency of occurrence of Ping-pong:
The similar number of continuous data packet in Hostprofile is counted, number adds 1 if similar.
Machine learning finishes, by four characteristic values of the pure anonymous flow and pure non-anonymous flow that learn band repeatedly Enter and carry out operation into Anonymizing networks flow identification model, finally obtains the parameter k and b in Anonymizing networks flow identification model, Model foundation is completed.
Step 3: model is verified
The Anonymizing networks for constructing a Freegate grab enough Freegate in the Anonymizing networks environment respectively The background traffic of anonymous flow and non-Freegate calculates separately out four features of each flow: UDP for a certain host Connection number, weight of climbing over the walls, UDP flow comentropy, there is frequency in the similar message of Ping-pong in flow, is then brought into flow inspection Operation is carried out in the mathematical model of survey, calculates parameter k and b in model, and the flow detection model of the Anonymizing networks environment is Building is completed.
It can be examined in real time in the Freegate Anonymizing networks environment using the Anonymizing networks flow detection model constructed Measure the data on flows of Anonymizing networks.In machine-learning process, the time of study is longer, and the data on flows of acquisition is more, structure The flow detection model built is more accurate, and subsequent flow detection is also more accurate.

Claims (3)

1. a kind of method for building up of the darknet flow identification model based on SVM machine learning, it is characterised in that: including walking as follows It is rapid:
Step 1: the flow detection model of machine learning of the building based on SVM;
Step 2: carrying out machine learning to the parameter in flow detection model, pure anonymous flow and pure non-anonymous stream are obtained Four characteristic values of amount:
(1) UDP connection number: the different IP addresses quantity in each Hostprofie file in total is divided by the Hostprofile time It obtains;
(2) climb over the walls weight: the number of sensitive domain name mapping is obtained multiplied by the weight for distributing to the domain name;
(3) UDP flow comentropy: comentropy calculating is carried out to each UDP flow in Hostprofile and is summed, then divided by UDP The sum of stream obtains;
(4) there is frequency in similar message: the statistical value of the similar number of continuous data packet in Hostprofile;
It is carried out Step 3: four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model Operation obtains the parameter of flow detection model.
2. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 1, special Sign is: the mathematical equivalent expression formula of the flow detection model are as follows: y=kx+b, in which: k, b are the ginseng of flow detection model Number, k is weight vector, and b is amount of bias.
3. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 2, special Sign is: when treating measurement of discharge using flow detection model and being detected, if y > 0, judging to measurement of discharge as corresponding anonymity Flow judges not being anonymous flow to measurement of discharge if y < 0.
CN201710156258.4A 2016-12-15 2017-03-16 A kind of method for building up of the darknet flow identification model based on SVM machine learning Active CN106953854B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611157218 2016-12-15
CN2016111572183 2016-12-15

Publications (2)

Publication Number Publication Date
CN106953854A CN106953854A (en) 2017-07-14
CN106953854B true CN106953854B (en) 2019-10-18

Family

ID=59473479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710156258.4A Active CN106953854B (en) 2016-12-15 2017-03-16 A kind of method for building up of the darknet flow identification model based on SVM machine learning

Country Status (1)

Country Link
CN (1) CN106953854B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108933846B (en) * 2018-06-21 2021-08-27 北京谷安天下科技有限公司 Method and device for identifying domain name by pan-resolution and electronic equipment
KR102129375B1 (en) * 2019-11-01 2020-07-02 (주)에이아이딥 Deep running model based tor site active fingerprinting system and method thereof
CN111224940B (en) * 2019-11-15 2021-03-09 中国科学院信息工程研究所 Anonymous service traffic correlation identification method and system nested in encrypted tunnel
CN112887291A (en) * 2021-01-20 2021-06-01 中国科学院计算技术研究所 I2P traffic identification method and system based on deep learning
CN113938290B (en) * 2021-09-03 2022-11-11 华中科技大学 Website de-anonymization method and system for user side flow data analysis
CN115001861B (en) * 2022-07-20 2022-12-09 中国电子科技集团公司第三十研究所 Method and system for detecting abnormal services of hidden network based on mixed fingerprint characteristics

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510841A (en) * 2008-12-31 2009-08-19 成都市华为赛门铁克科技有限公司 Method and system for recognizing end-to-end flux
CN101695035A (en) * 2009-10-21 2010-04-14 成都市华为赛门铁克科技有限公司 Flow rate identification method and device thereof
CN102984131A (en) * 2012-11-09 2013-03-20 华为技术有限公司 Information recognition method and device
CN104052639A (en) * 2014-07-02 2014-09-17 山东大学 Real-time multi-application network flow identification method based on support vector machine
CN105471883A (en) * 2015-12-10 2016-04-06 中国电子科技集团公司第三十研究所 Tor network tracing system and tracing method based on web injection
CN105721242A (en) * 2016-01-26 2016-06-29 国家信息技术安全研究中心 Information entropy-based encrypted traffic identification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8448242B2 (en) * 2006-02-28 2013-05-21 The Trustees Of Columbia University In The City Of New York Systems, methods, and media for outputting data based upon anomaly detection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510841A (en) * 2008-12-31 2009-08-19 成都市华为赛门铁克科技有限公司 Method and system for recognizing end-to-end flux
CN101695035A (en) * 2009-10-21 2010-04-14 成都市华为赛门铁克科技有限公司 Flow rate identification method and device thereof
CN102984131A (en) * 2012-11-09 2013-03-20 华为技术有限公司 Information recognition method and device
CN104052639A (en) * 2014-07-02 2014-09-17 山东大学 Real-time multi-application network flow identification method based on support vector machine
CN105471883A (en) * 2015-12-10 2016-04-06 中国电子科技集团公司第三十研究所 Tor network tracing system and tracing method based on web injection
CN105721242A (en) * 2016-01-26 2016-06-29 国家信息技术安全研究中心 Information entropy-based encrypted traffic identification method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
("A multi-granularity heuristic-combining approach for censorship circumvention activity identification";Zhongliu Zhou等;《Security and communication networks》;20160704;第3178-3189页 *
"僵尸网络分析及其防御";陈周国等;《信息安全与通信保密》;20110610;全文 *
"匿名网络追踪溯源综述";陈周国等;《计算机研究与发展》;20121015;全文 *
"网络加密流量识别研究综述及展望";潘吴斌等;《通信学报》;20160925;全文 *
"网络攻击追踪溯源层次分析";陈周国等;《计算机系统应用》;20140115;全文 *

Also Published As

Publication number Publication date
CN106953854A (en) 2017-07-14

Similar Documents

Publication Publication Date Title
CN106953854B (en) A kind of method for building up of the darknet flow identification model based on SVM machine learning
CN105678248B (en) Face key point alignment algorithm based on deep learning
CN107293115B (en) Traffic flow prediction method for microscopic simulation
CN107590565A (en) A kind of method and device for building building energy consumption forecast model
CN107506938A (en) A kind of quality of material appraisal procedure based on machine learning
Xiang et al. A new hybrid network traffic prediction method
Abudu et al. Modeling of daily pan evaporation using partial least squares regression
CN107729908A (en) A kind of method for building up, the apparatus and system of machine learning classification model
CN110441478A (en) A kind of river ecological environmental data on-line monitoring method, system and storage medium
Demirci et al. Suspended sediment estimation using an artificial intelligence approach
CN104933418B (en) A kind of crowd&#39;s demographic method of double image machine
CN105898691B (en) Wireless sensor network target tracking method based on particlized sum-product algorithm
CN115688288B (en) Aircraft pneumatic parameter identification method and device, computer equipment and storage medium
CN108390775A (en) A kind of user experience quality evaluation method and system based on SPICE
CN115277102A (en) Network attack detection method and device, electronic equipment and storage medium
Gui et al. Comparative study of different types of hydrological models applied to hydrological simulation
CN114022035A (en) Method for evaluating carbon emission of building in urban heat island effect
CN110186533A (en) A kind of short-term tide prediction method in high-precision river mouth
CN113392851B (en) Intelligent discrimination method and device for black tea fermentation degree
CN106972968A (en) A kind of exception flow of network detection method for combining mahalanobis distance based on cross entropy
CN108364098B (en) Method for measuring influence of weather characteristics on user sign-in
CN106980675B (en) A kind of efficient bridge structure health early warning system
Murthy et al. Cloud technology on agriculture using sensors
JP2010079325A (en) Model construction method, construction system, and program for construction
Bhattacharjya et al. Geomorphology based semi-distributed approach for modelling rainfall-runoff process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant