CN106953854A - A kind of method for building up of the darknet flow identification model based on SVM machine learning - Google Patents

A kind of method for building up of the darknet flow identification model based on SVM machine learning Download PDF

Info

Publication number
CN106953854A
CN106953854A CN201710156258.4A CN201710156258A CN106953854A CN 106953854 A CN106953854 A CN 106953854A CN 201710156258 A CN201710156258 A CN 201710156258A CN 106953854 A CN106953854 A CN 106953854A
Authority
CN
China
Prior art keywords
flow
machine learning
anonymous
detection model
building
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710156258.4A
Other languages
Chinese (zh)
Other versions
CN106953854B (en
Inventor
苏宏
陈周国
丁建伟
赵越
郭宇斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Publication of CN106953854A publication Critical patent/CN106953854A/en
Application granted granted Critical
Publication of CN106953854B publication Critical patent/CN106953854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of method for building up of the darknet flow identification model based on SVM machine learning, comprise the following steps:Build the flow detection model of the machine learning based on SVM;Machine learning is carried out to the parameter in flow detection model, four characteristic values of pure anonymous flow and pure non-anonymous flow are obtained;Four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model and carry out computing, the parameter of flow detection model is obtained.Compared with prior art, the positive effect of the present invention is:Pass through the inventive method, the Mathematical Modeling of Anonymizing networks data traffic identification can extremely accurate be depicted, applied in the detection of Anonymizing networks data traffic, Detection accuracy is high, computing is simply efficient, and after Anonymizing networks are upgraded, because this method uses the algorithm based on machine learning, as long as therefore re-starting study for the Anonymizing networks after upgrading, new Anonymizing networks data traffic just can be detected.

Description

A kind of method for building up of the darknet flow identification model based on SVM machine learning
Technical field
The present invention relates to a kind of method for building up of the darknet flow identification model based on SVM machine learning.
Background technology
The analysis and control of Anonymizing networks (darknet) flow, particularly flow detection are currently in the exploratory development stage, At present do not have a kind of method can all Anonymizing networks flows of effective detection, some methods may be only to certain Anonymizing networks Therefore the detection of Anonymizing networks flow is an eternal research topic, it is necessary to not effectively, even only for some version effectively, Disconnected follow-up research, is changed with the continuous upgrading for tackling Anonymizing networks, and improves the accuracy rate of Anonymizing networks flow detection, crucial It is in the accuracy of flow identification model foundation.The method that this method uses machine learning, accurately sets up one as far as possible and hides The Mathematical Modeling of name network traffics identification, it is intended to drop to the upgrading change due to Anonymizing networks most to the influence that detection band is come It is low, can be with the accurate flow for detecting Anonymizing networks.
The content of the invention
In order to overcome the disadvantages mentioned above of prior art, the invention provides a kind of darknet flow based on SVM machine learning The method for building up of identification model, it is intended to set up a dynamic change and accurately Mathematical Modeling for the flow identification of Anonymizing networks.
The technical solution adopted for the present invention to solve the technical problems is:A kind of darknet flow based on SVM machine learning The method for building up of identification model, comprises the following steps:
Step 1: building the flow detection model of the machine learning based on SVM;
Step 2: carry out machine learning to the parameter in flow detection model, obtain pure anonymous flow and pure non-hide Four characteristic values of name flow;
Step 3: four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model Computing is carried out, the parameter of flow detection model is obtained.
Compared with prior art, the positive effect of the present invention is:
By the inventive method, the Mathematical Modeling of Anonymizing networks data traffic identification can be extremely accurate depicted, should In being detected for Anonymizing networks data traffic, Detection accuracy is high, and computing is simply efficient, and after Anonymizing networks are upgraded, Because this method uses the algorithm based on machine learning, as long as therefore re-starting for the Anonymizing networks after upgrading Practise, just can detect new Anonymizing networks data traffic.
Brief description of the drawings
Examples of the present invention will be described by way of reference to the accompanying drawings, wherein:
Fig. 1 is the flow detection modular concept figure based on SVM.
Embodiment
A kind of method for building up of the darknet flow identification model based on SVM machine learning, comprises the following steps:
Step 1: model is set up
The detection of Anonymizing networks flow is implemented on the basis of founding mathematical models, but most detection at present Model, in order to solve this problem, may be successfully managed even only for some version effectively only to certain Anonymizing networks effectively The continuous upgrading change of Anonymizing networks, improves the accuracy rate of Anonymizing networks flow detection, it is necessary to set up a kind of new anonymous net Network flow detection model.
In this method, detection model uses the flow detection model of the machine learning based on SVM, Anonymizing networks flow detection Model is as shown in Figure 1:X is the characteristic vector of input in figure, and the quantity of feature is d;xnIt is d dimensional vectors for the sample of collection;yn For the value (1, -1) of desired output, the corresponding anonymous flow of correspondence yes or no.The model mathematic(al) representation can table of equal value It is shown as:
Y=kx+b
Wherein, k, b are the parameter of Anonymizing networks flow identification model, and k is the weight vector that d is tieed up, and b is amount of bias, in machine The device study stage needs to calculate the k and b value by substantial amounts of x and y input, once complete Anonymizing networks flow identification mould Type foundation can treat measurement of discharge and be detected, work as y>When 0, it can determine whether to treat that measurement of discharge is corresponding anonymous flow, work as y<When 0, It can determine whether to treat that measurement of discharge is not anonymous flow.
Step 2: parameter is determined
, it is necessary to carry out machine learning to determine its parameter value to the parameter in model after flow detection model is selected.Machine It will learn the correspondence pure Anonymizing networks flow of Anonymizing networks and pure non-anonymous network traffics respectively in the overall process of study Four features of (background traffic), classification, one are re-started for all flows for being collected into by host profile forms One pacp file of main frame, and with the self-study of the mathematical model parameter of following four characteristic values progress Anonymizing networks flow identification Practise, this four features are respectively:The similar messages of Ping-pong go out in UDP connections number, weights of climbing over the walls, UDP flow comentropy, flow Existing frequency.Their definition and computational methods is as follows:
(1) UDP connections number:Each Pcap files difference UDP connection numbers in unit interval:
Calculate different IP addresses quantity K altogether in each Hostprofie (pcap) file, then using K divided by Hostprofile time T, obtain this feature value;
(2) climb over the walls weights:Weights are multiplied by the number of times of the sensitive domain name mapping such as Amazon server, Dynamic Networks:
A sensitive DNS query list is safeguarded, different domain names distribute different weights, if deposited in Hostprofile Sensitivity DNS inquiry is being accessed, then is increasing corresponding weights of climbing over the walls;
(3) UDP flow comentropy:UDP flow comentropy size in average each Host profile:
Each UDP flow in Hostprofile is carried out comentropy calculating and to sum, then divided by UDP flow sum, letter Breath entropy definition be
(4) there is frequency in similar message:The similar message occurrence numbers of Ping-pong:
The similar number of continuous data bag in Hostprofile is counted, number of times adds 1 if similar.
Machine learning is finished, by four characteristic values of the pure anonymous flow learnt and pure non-anonymous flow band repeatedly Enter and carry out computing into Anonymizing networks flow identification model, finally obtain the parameter k and b in Anonymizing networks flow identification model, Model, which is set up, to be completed.
Step 3: model is verified
Freegate Anonymizing networks are built, enough Freegate are captured respectively in the Anonymizing networks environment The background traffic of anonymous flow and non-Freegate, four features of each flow are calculated for a certain main frame respectively:UDP There is frequency in the similar messages of Ping-pong in connection number, weights of climbing over the walls, UDP flow comentropy, flow, are then brought into flow inspection Computing is carried out in the Mathematical Modeling of survey, parameter k and b in model is calculated, the flow detection model of the Anonymizing networks environment is Build and complete.
It can be examined in real time in the Freegate Anonymizing networks environment using the Anonymizing networks flow detection model built Measure the data on flows of Anonymizing networks.In machine-learning process, the time of study is longer, and the data on flows of acquisition is more, structure The flow detection model built is more accurate, and follow-up flow detection is also more accurate.

Claims (5)

1. a kind of method for building up of the darknet flow identification model based on SVM machine learning, it is characterised in that:Including following step Suddenly:
Step 1: building the flow detection model of the machine learning based on SVM;
Step 2: carrying out machine learning to the parameter in flow detection model, pure anonymous flow and pure non-anonymous stream are obtained Four characteristic values of amount;
Carried out Step 3: four characteristic values of pure anonymous flow and pure non-anonymous flow are brought into flow detection model Computing, obtains the parameter of flow detection model.
2. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 1, it is special Levy and be:The mathematical equivalent expression formula of the flow detection model is:Y=kx+b, wherein:K, b are the ginseng of flow detection model Number, k is weight vector, and b is amount of bias.
3. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 2, it is special Levy and be:Four characteristic values of pure anonymous flow described in step 2 and pure non-anonymous flow are UDP connections number, the power of climbing over the walls There is frequency in value, UDP flow comentropy and similar message.
4. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 3, it is special Levy and be:The computational methods of four characteristic values are respectively:
UDP connection numbers:Different IP addresses quantity divided by Hostprofile times in each Hostprofie files altogether are obtained Arrive;
Climb over the walls weights:The number of times of sensitive domain name mapping, which is multiplied by, to be distributed to the weights of the domain name and obtains;
UDP flow comentropy:Then divided by UDP flow each UDP flow in Hostprofile is carried out comentropy calculating and to sum, Sum obtain;
There is frequency in similar message:The statistical value of the similar number of continuous data bag in Hostprofile.
5. a kind of method for building up of darknet flow identification model based on SVM machine learning according to claim 4, it is special Levy and be:When treating measurement of discharge using flow detection model and being detected, if y>0, then judge to treat that measurement of discharge is corresponding anonymity Flow, if y<0, then judge to treat that measurement of discharge is not anonymous flow.
CN201710156258.4A 2016-12-15 2017-03-16 A kind of method for building up of the darknet flow identification model based on SVM machine learning Active CN106953854B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611157218 2016-12-15
CN2016111572183 2016-12-15

Publications (2)

Publication Number Publication Date
CN106953854A true CN106953854A (en) 2017-07-14
CN106953854B CN106953854B (en) 2019-10-18

Family

ID=59473479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710156258.4A Active CN106953854B (en) 2016-12-15 2017-03-16 A kind of method for building up of the darknet flow identification model based on SVM machine learning

Country Status (1)

Country Link
CN (1) CN106953854B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108933846A (en) * 2018-06-21 2018-12-04 北京谷安天下科技有限公司 A kind of recognition methods, device and the electronic equipment of general parsing domain name
CN111224940A (en) * 2019-11-15 2020-06-02 中国科学院信息工程研究所 Anonymous service traffic correlation identification method and system nested in encrypted tunnel
KR102129375B1 (en) * 2019-11-01 2020-07-02 (주)에이아이딥 Deep running model based tor site active fingerprinting system and method thereof
CN112887291A (en) * 2021-01-20 2021-06-01 中国科学院计算技术研究所 I2P traffic identification method and system based on deep learning
CN113938290A (en) * 2021-09-03 2022-01-14 华中科技大学 Website de-anonymization method and system for user side traffic data analysis
CN115001861A (en) * 2022-07-20 2022-09-02 中国电子科技集团公司第三十研究所 Method and system for detecting abnormal services of hidden network based on mixed fingerprint characteristics

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510841A (en) * 2008-12-31 2009-08-19 成都市华为赛门铁克科技有限公司 Method and system for recognizing end-to-end flux
CN101695035A (en) * 2009-10-21 2010-04-14 成都市华为赛门铁克科技有限公司 Flow rate identification method and device thereof
CN102984131A (en) * 2012-11-09 2013-03-20 华为技术有限公司 Information recognition method and device
US20140082725A1 (en) * 2006-02-28 2014-03-20 The Trustees Of Columbia University In The City Of New York Systems, Methods, and Media for Outputting a Dataset Based Upon Anomaly Detection
CN104052639A (en) * 2014-07-02 2014-09-17 山东大学 Real-time multi-application network flow identification method based on support vector machine
CN105471883A (en) * 2015-12-10 2016-04-06 中国电子科技集团公司第三十研究所 Tor network tracing system and tracing method based on web injection
CN105721242A (en) * 2016-01-26 2016-06-29 国家信息技术安全研究中心 Information entropy-based encrypted traffic identification method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140082725A1 (en) * 2006-02-28 2014-03-20 The Trustees Of Columbia University In The City Of New York Systems, Methods, and Media for Outputting a Dataset Based Upon Anomaly Detection
CN101510841A (en) * 2008-12-31 2009-08-19 成都市华为赛门铁克科技有限公司 Method and system for recognizing end-to-end flux
CN101695035A (en) * 2009-10-21 2010-04-14 成都市华为赛门铁克科技有限公司 Flow rate identification method and device thereof
CN102984131A (en) * 2012-11-09 2013-03-20 华为技术有限公司 Information recognition method and device
CN104052639A (en) * 2014-07-02 2014-09-17 山东大学 Real-time multi-application network flow identification method based on support vector machine
CN105471883A (en) * 2015-12-10 2016-04-06 中国电子科技集团公司第三十研究所 Tor network tracing system and tracing method based on web injection
CN105721242A (en) * 2016-01-26 2016-06-29 国家信息技术安全研究中心 Information entropy-based encrypted traffic identification method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ZHONGLIU ZHOU等: "("A multi-granularity heuristic-combining approach for censorship circumvention activity identification"", 《SECURITY AND COMMUNICATION NETWORKS》 *
潘吴斌等: ""网络加密流量识别研究综述及展望"", 《通信学报》 *
陈周国等: ""僵尸网络分析及其防御"", 《信息安全与通信保密》 *
陈周国等: ""匿名网络追踪溯源综述"", 《计算机研究与发展》 *
陈周国等: ""网络攻击追踪溯源层次分析"", 《计算机系统应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108933846A (en) * 2018-06-21 2018-12-04 北京谷安天下科技有限公司 A kind of recognition methods, device and the electronic equipment of general parsing domain name
CN108933846B (en) * 2018-06-21 2021-08-27 北京谷安天下科技有限公司 Method and device for identifying domain name by pan-resolution and electronic equipment
KR102129375B1 (en) * 2019-11-01 2020-07-02 (주)에이아이딥 Deep running model based tor site active fingerprinting system and method thereof
CN111224940A (en) * 2019-11-15 2020-06-02 中国科学院信息工程研究所 Anonymous service traffic correlation identification method and system nested in encrypted tunnel
CN111224940B (en) * 2019-11-15 2021-03-09 中国科学院信息工程研究所 Anonymous service traffic correlation identification method and system nested in encrypted tunnel
CN112887291A (en) * 2021-01-20 2021-06-01 中国科学院计算技术研究所 I2P traffic identification method and system based on deep learning
CN113938290A (en) * 2021-09-03 2022-01-14 华中科技大学 Website de-anonymization method and system for user side traffic data analysis
CN115001861A (en) * 2022-07-20 2022-09-02 中国电子科技集团公司第三十研究所 Method and system for detecting abnormal services of hidden network based on mixed fingerprint characteristics

Also Published As

Publication number Publication date
CN106953854B (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN106953854A (en) A kind of method for building up of the darknet flow identification model based on SVM machine learning
CN105606499B (en) Suspended particulate matter mass concentration real-time detection device, and measuring method
CN109145516B (en) Analog circuit fault identification method based on improved extreme learning machine
CN103840988A (en) Network traffic measurement method based on RBF neural network
CN109238455B (en) A kind of characteristic of rotating machines vibration signal monitoring method and system based on graph theory
CN105025515B (en) A kind of wireless sensor network Traffic anomaly detection method based on GM models
CN110309609B (en) Building indoor air quality evaluation method based on rough set and WNN
CN104112062B (en) The acquisition methods of wind-resources distribution based on interpolation method
CN110441478A (en) A kind of river ecological environmental data on-line monitoring method, system and storage medium
CN116994999B (en) Mechanical arm suction adjusting method and system for ultra-clean environment
CN115688288B (en) Aircraft pneumatic parameter identification method and device, computer equipment and storage medium
Demirci et al. Suspended sediment estimation using an artificial intelligence approach
CN111143999A (en) Method, device and equipment for calculating regional surface roughness
CN105898691B (en) Wireless sensor network target tracking method based on particlized sum-product algorithm
CN110147827A (en) A kind of failure prediction method based on IAALO-SVM and similarity measurement
CN108256238A (en) A kind of optic fiber grating wavelength demodulation method and device based on deep learning
CN109688112A (en) Industrial Internet of Things unusual checking device
CN110889207B (en) Deep learning-based intelligent assessment method for credibility of system combination model
CN117495205B (en) Industrial Internet experiment system and method
CN114022035A (en) Method for evaluating carbon emission of building in urban heat island effect
CN106972968A (en) A kind of exception flow of network detection method for combining mahalanobis distance based on cross entropy
CN111914488B (en) Data area hydrologic parameter calibration method based on antagonistic neural network
CN109450876A (en) A kind of DDos recognition methods and system based on various dimensions state-transition matrix feature
CN116562171B (en) Error assessment method for online measurement of temperature and humidity
CN112632862A (en) Method and device for determining wind field stability, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant