CN104052639B - Real-time multi-application network flow identification method based on support vector machine - Google Patents

Real-time multi-application network flow identification method based on support vector machine Download PDF

Info

Publication number
CN104052639B
CN104052639B CN201410313090.XA CN201410313090A CN104052639B CN 104052639 B CN104052639 B CN 104052639B CN 201410313090 A CN201410313090 A CN 201410313090A CN 104052639 B CN104052639 B CN 104052639B
Authority
CN
China
Prior art keywords
support vector
vector machine
network
time
network flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410313090.XA
Other languages
Chinese (zh)
Other versions
CN104052639A (en
Inventor
刘琚
马衍庆
乔美华
于智源
郭志鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201410313090.XA priority Critical patent/CN104052639B/en
Publication of CN104052639A publication Critical patent/CN104052639A/en
Application granted granted Critical
Publication of CN104052639B publication Critical patent/CN104052639B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a real-time network flow identification method based on a support vector machine, wherein the method has low complexity and a high identification accuracy rate and aims to solve problems of an existing network flow identification method. According to the method, the time window method is adopted, it is only required that simple and effective characteristics are obtained from data packet headers of a network flow, the support vector machine algorithm with low algorithm complexity and small computation amount is adopted, and therefore rapid modeling can be carried out to generate a classifier, the high identification accuracy rate can be achieved under the circumstance of small samples, measurement and identification can be carried out on multiple applications of the network flow at any time point, and the real-time multi-application requirement is met.

Description

Real-time many application network method for recognizing flux based on support vector machine
Technical field
The present invention relates to a kind of network flow identification method, belongs to Network Measurement Technologies field.
Background technology
With the arrival developed rapidly with the information age of computer networking technology, the continuous popularization of the Internet also causes The problems such as network congestion, P2P are using bandwidth-hogging without restraint and network security, Virtual network operator and Internet Service Provider need to adopt Network is managed with a kind of suitable network measure method.Network flow is paid close attention to increasingly in academic and application in recent years The research of amount recognition methodss, also increasingly pays close attention to feasibility and the effectiveness of flow identification, i.e., how to rapidly process magnanimity Data and how to correctly identify the various applications in network.Therefore, method for recognizing flux should simply effectively, spirit again Living and wide application.
Existing network flow identification method is broadly divided into four big class:Based on the method for recognizing flux of port mapping, it is based on The method for recognizing flux of deep message detection, the method for recognizing flux of Behavior-based control feature and the flow based on machine learning are recognized Method.With the continuous development of network technology and constantly weeding out the old and bring forth the new for network application, examined based on port mapping, deep message Survey, the method for recognizing flux of behavior characteristicss has increasing restriction and defect.Nowadays academia has been focused on and has been based on On the method for recognizing flux of machine learning, data mining ability of this method using machine learning is huge, multiple from network traffics Miscellaneous extracting data is implicit, potential validity feature information.The key of such method be select rational traffic characteristic and Select suitable machine learning algorithm.However, research is concentrated mainly in the flow identification of non real-time nature, i.e., first collect very long by one The network flow data of section time, then Classification and Identification is carried out to which, this None- identified goes out service condition of the active user to network.Mesh Before, in real-time network method for recognizing flux, when network flow is begun setting up by some schemes before several packets as feature It is identified, although such method simple and fast, the time point for needing capture network stream to begin setting up, if miss be difficult to Result is identified again.Also some schemes are by several continuous data packet groups of the different time point selection from network flow life cycle (if 25 packets are one group) is identified as feature, and such method needs to consider the life cycle of network flow, if raw The life cycle is very long, and the time needed for recognizing can also increase.These schemes all excessively rely on of network flow itself, and very flexible has Certain restriction.
The content of the invention
The deficiency that the present invention is present for existing network method for recognizing flux, there is provided a kind of to be based on support vector machine (SVM) Can in Real time identification network environment various application types method, the method is using " time window method " only from the number of network flow Simple and effective feature is obtained according to packet header, and from the algorithm of support vector machine that algorithm complex is low, operand is little so as to not only Can rapid modeling generate grader, and can just reach very high recognition accuracy under Small Sample Size, can be with office What various applications of the time point to network flow measures identification, meets the demand of many applications in real time." time window method " is referred to Statistics a period of time continuous to network flow, and be divided into meansigma methodss departure degree size according to the network traffics in time period " peak region " and " stable region ", the feature needed for the data genaration identification in time window.
Network flow identification method based on support vector machine proposed by the present invention, including the off-line training of support vector machine With the online real-time grading step of support vector machine:
The off-line training step of support vector machine includes:
(1) packet is captured from network line using packet catcher;
(2) packet is counted, obtains the bag number of network flow, wraps length, source address, destination address, transport layer protocol With the flow direction of upstream or downstream;
(3) sample from the data for obtaining, select sample of network application when normally running, respectively the application class to sample It is not labeled;
(4) according to " time window method ", from the beginning of arbitrary time point, setting a period of time, according to connecting in this time The network traffics of continuous collection and the departure degree of meansigma methodss, the flow that will be above 1.6 times of meansigma methodss is referred to as " peak region ", in flat 0.6~1.4 times of interval flow of average is referred to as " stable region ", and thus the network traffics in the time period generate various features value;
(5) study is trained using support vector machine method to sample characteristics, generates classifying ruless, build grader Model.
The online real-time grading step of support vector machine includes:
(1) packet is captured from network line using packet catcher;
(2) packet is counted, obtains the bag number of network flow, wraps length, source address, destination address, transport layer protocol With the flow direction of upstream or downstream;
(3) various features value is generated using (4) identical method the step of the off-line training step of support vector machine;
(4) classifying ruless for having been generated using (5) the step of the off-line training step of support vector machine and grader mould Type, carries out Classification and Identification to the eigenvalue of network flow, draws recognition result.
Various features value bag in the off-line training step in (4th) step and online real-time grading step in (3rd) step Downstream packets number is included, uplink packet number, downlink data amount, upstream data amount, upper and lower row bag number ratio, upper and lower row data volume ratio are upper and lower Row bag number variance ratio, upper and lower row data volume variance ratio, the IP numbers of descending middle big data quantity, the proportion of data volume in peak region, The proportion of number of samples in stable region.
Support vector machine are obtained using cross-validation method in step (5) in the off-line training step of the support vector machine Kernel functional parameter and punishment parameter.
The present invention obtains various features initially with " time window method " from the data packet head of network flow, then by supporting Vector machine algorithm is trained and is recognized to the eigenvalue of multiple network application type." time window method " obtains network flow feature Process it is simple;And can put at any time feature extraction is carried out to network flow.Support vector machine are a kind of for sample This machine learning method, and Nonlinear Classification is realized by inner product kernel function, its optimal decision function for obtaining be by The Optimal Separating Hyperplane that minority supporting vector is constituted;This algorithm is simple, operand is few, also with generalization ability and robustness.This The bright demand for meeting many application network flow identifications in real time.
Description of the drawings
Fig. 1 is the schematic block diagram of real-time network flux recognition system.
Fig. 2 (a) is time window schematic diagram;B () is the division schematic diagram of flow rate zone in window.
Fig. 3 is the schematic flow sheet for calling libpcap function libraries.
Fig. 4 is the schematic diagram of the network flow identification method based on support vector machine.
Fig. 5 is the displaying schematic diagram of network flow identification method accuracy rate.
Fig. 6 is that network flow identification method generates the displaying schematic diagram the time required to sorter model.
Specific embodiment
For existing network method for recognizing flux exist problem, there is provided it is a kind of based on the low complex degree of support vector machine, Can network flow identification method in real time, needed for the method, training sample is few, and computation complexity is especially suitable for solving net than relatively low Network flow recognizes this big data, multifarious non-linear many classification problems.
Fig. 1 gives the principle steps of the network traffics identifying system off-line training and online real-time grading of the present invention.Fig. 4 Give the principle of the network flow identification method based on support vector machine.With reference to the accompanying drawings and examples the present invention is carried out Further instruction, but not limited to this example.
Consider that the real-time network flux recognition system is present in family lan, and network traffics are recognized as family One function of gateway.The upstream or downstream of network flow data bag are determined according to source address.Assume with household internal local Used as local, external the Internet if the IP that source address is local thinks that data flow is up, that is, goes up as distal end net Pass;Think that data flow is descending if the IP that source address is distal end, that is, download.
For frequently m=6 kinds application type used in family lan:The multimedia of P2P or download, non-P2P it is many Media or download, WWW (web browsing), online game (client game), video calling/meeting and file-sharing (LAN It is interior).From the beginning of random time point, with 1 second as unit of time, the network flow to capturing in each second is counted, and obtains network The bag number of stream, bag length, source address, destination address, transport layer protocol and flow direction (upstream or downstream).Continuous statistics τ=n (sets n= 15) situation of change of network flow in a time window in Fig. 2 (a) figures is obtained after second.In Fig. 2 (b) figures, according to time window Interior flow meansigma methodss, are stable region and peak region by the traffic partition in the τ time periods.Therefore can obtain in this window The master datas such as the bag number of each second flow, bag length, can analyze burst of the flow within the τ time periods, stationarity again.During by τ Between in section the network flow data of statistics generate d=11 kind features:Downstream packets number, uplink packet number, downlink data amount, upstream data Amount, upper and lower row bag number ratio, upper and lower row data volume ratio, upper and lower row bag number variance ratio, upper and lower row data volume variance ratio, it is descending in The IP numbers of big data quantity, the proportion of data volume in peak region, the proportion of number of samples in stable region.
The off-line training step of support vector machine is as follows:
(1) packet is captured from network line using the libpcap function libraries under linux system, call libpcap each The flow process of individual function is as shown in Figure 3;Open system interconnection reference model (OSI/RM) is obtained by parsing the data packet head of each layer Each layer information, the such as MAC Address of data link layer, the source IP of IP layers and purpose IP, the port numbers of transport layer and agreement etc.;
(2) simple statistics are carried out to packet, obtain the five-tuple information of packet:Source address, destination address, source Mouthful, destination interface and transport layer protocol (such as TCP/UDP), and data packet length and packet flow direction (as it is up or under OK);
(3) the artificial sampling from the mass data for obtaining selects the sample under stabilizing network environment, and respectively to sample This applicating category is labeled;M=6 kinds application type can be with reference numerals as 1,2,3,4,5,6.
(4) " time window method " is adopted, is generated the d=11 kind features in time window by the packet information of simple statistics Value;
(5) study is trained using support vector machine method to sample characteristics.Support vector machine construct most optimal sorting Class hyperplane, draws decision function:Wherein (xi,yi) sample chosen when being training This, αiFor Lagrange multiplier, K (xi, x) it is inner product kernel function, selects RBF as kernel function, i=1 ..., b are to divide Class hyperplane amount of bias.Decision functionIt is exactly what support vector machine off-line training was generated Classifying ruless and sorter model.The reliable and stable kernel functional parameter of support vector machine can be obtained using cross-validation method and punished Penalty parameter, will training sample be divided into K subsample, retain one of them single subsample as checking model number According to other K-1 sample is used for training;Repeat K time, each subsample is verified once, obtained by last average K time training As a result.
The online real-time grading step of support vector machine is as follows:
(1) packet is captured from network line using the libpcap function libraries under linux system, call libpcap each The flow process of individual function is as shown in Figure 3;Open system interconnection reference model (OSI/RM) is obtained by parsing the data packet head of each layer Each layer information, the such as MAC Address of data link layer, the source IP of IP layers and purpose IP, the port numbers of transport layer and agreement etc.;
(2) simple statistics are carried out to packet, obtain the five-tuple information of packet:Source address, destination address, source Mouthful, destination interface and transport layer protocol (such as TCP/UDP), and data packet length and packet flow direction (as it is up or under OK);
(3) " time window method " is adopted, is generated the d=11 kind features in time window by the packet information of simple statistics Value;
(4) classifying ruless for having been generated using (5) the step of the off-line training step of support vector machine and disaggregated model, That is decision functionClassification and Identification is carried out to sample characteristics, recognition result is drawn;Tool Body process is as shown in figure 4,11 kinds of features of input network flow, export the type number of corresponding network application.
Fig. 5 provides the accuracy rate of network flow identification method.By accompanying drawing 5 as can be seen that the method for the present invention is directed to 6 kinds of nets Network application is respectively adopted support vector machine (SVM), back propagation (standard BP) neutral net and by the anti-of particle cluster algorithm optimization Respectively network traffics are trained and are recognized to three kinds of machine learning algorithms of Propagation Neural Network (BP-PSO).By comparing Analysis understands that the recognition accuracy of three kinds of algorithms all can increase as number of training purpose increases;Both SVM and BP-PSO Standard BP is completely superior to, not only recognition accuracy is high but also stable;Particularly the recognition accuracy of SVM algorithm is in Small Sample Size Under still keep more than 98%, with good effectiveness.
The time required to Fig. 6 provides the generation sorter model of network flow identification method.By analyzing 6 three kinds of algorithms of accompanying drawing Understand the time required to generating sorter model, the BP-PSO modeling times are far longer than SVM, standard BP, and required operand is very big; The modeling time of SVM is minimum in three, is completed, with good feasibility within 0.1s.

Claims (2)

1. a kind of network flow identification method based on support vector machine, the off-line training and supporting vector including support vector machine The online real-time grading step of machine:
The off-line training step of support vector machine includes:
(1) packet is captured from network line using packet catcher;
(2) packet is counted, obtain network flow bag number, bag length, source address, destination address, transport layer protocol and on Capable or descending flow direction;
(3) sample from the data for obtaining, sample when selecting network application normally to run enters to the applicating category of sample respectively Rower is noted;
(4) according to " time window method ", from the beginning of arbitrary time point, setting a period of time, according to continuously adopting in this time The network traffics of collection and the departure degree of meansigma methodss, the flow that will be above 1.6 times of meansigma methodss are referred to as " peak region ", in meansigma methodss 0.6~1.4 times of interval flow is referred to as " stable region ", and thus the network traffics in the time period generate various features value;
(5) study is trained using support vector machine method to sample characteristics, generates classifying ruless, build grader mould Type;
The online real-time grading step of support vector machine includes:
(1) packet is captured from network line using packet catcher;
(2) packet is counted, obtain network flow bag number, bag length, source address, destination address, transport layer protocol and on Capable or descending flow direction;
(3) various features value is generated using (4) identical method the step of the off-line training step of support vector machine;
(4) classifying ruless for having been generated using (5) the step of the off-line training step of support vector machine and sorter model, it is right The eigenvalue of network flow carries out Classification and Identification, draws recognition result.
2. the network flow identification method based on support vector machine according to claim 1, it is characterised in that:It is described offline Various features value in training step in (4th) step and online real-time grading step in (3rd) step includes downstream packets number, uplink packet Number, downlink data amount, upstream data amount, upper and lower row bag number ratio, upper and lower row data volume ratio, upper and lower row bag number variance ratio, under, Upstream data amount variance ratio, the IP numbers of descending middle big data quantity, the proportion of data volume in peak region, number of samples in stable region Proportion.
CN201410313090.XA 2014-07-02 2014-07-02 Real-time multi-application network flow identification method based on support vector machine Expired - Fee Related CN104052639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410313090.XA CN104052639B (en) 2014-07-02 2014-07-02 Real-time multi-application network flow identification method based on support vector machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410313090.XA CN104052639B (en) 2014-07-02 2014-07-02 Real-time multi-application network flow identification method based on support vector machine

Publications (2)

Publication Number Publication Date
CN104052639A CN104052639A (en) 2014-09-17
CN104052639B true CN104052639B (en) 2017-03-22

Family

ID=51505023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410313090.XA Expired - Fee Related CN104052639B (en) 2014-07-02 2014-07-02 Real-time multi-application network flow identification method based on support vector machine

Country Status (1)

Country Link
CN (1) CN104052639B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104144089B (en) * 2014-08-06 2017-06-16 山东大学 It is a kind of that flow knowledge method for distinguishing is carried out based on BP neural network
CN104468567B (en) * 2014-12-05 2018-03-06 南京邮电大学 A kind of system and method for the identification of network multimedia Business Stream and mapping
CN104657747A (en) * 2015-01-30 2015-05-27 南京邮电大学 Online game stream classifying method based on statistical characteristics
CN105049277B (en) * 2015-06-08 2018-11-13 国家计算机网络与信息安全管理中心 A kind of network flow generation method based on data flow characteristics
CN105100091B (en) * 2015-07-13 2018-12-14 北京奇安信科技有限公司 A kind of protocol recognition method and system
CN105915396A (en) * 2016-06-20 2016-08-31 中国联合网络通信集团有限公司 Home network traffic recognition system and method
CN106953854B (en) * 2016-12-15 2019-10-18 中国电子科技集团公司第三十研究所 A kind of method for building up of the darknet flow identification model based on SVM machine learning
CN110519177B (en) * 2018-05-22 2022-01-21 华为技术有限公司 Network traffic identification method and related equipment
CN109309630B (en) * 2018-09-25 2021-09-21 深圳先进技术研究院 Network traffic classification method and system and electronic equipment
CN109660656A (en) * 2018-11-20 2019-04-19 重庆邮电大学 A kind of intelligent terminal method for identifying application program
CN111371689B (en) * 2018-12-25 2022-03-18 上海大学 TCP congestion control version identification method and device based on deep learning
CN111131073A (en) * 2020-01-02 2020-05-08 深圳市高德信通信股份有限公司 Network traffic classification processing system
CN112134871A (en) * 2020-09-16 2020-12-25 天津大学 Abnormal flow detection device and method for energy internet information support network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101800744A (en) * 2010-02-03 2010-08-11 中国人民解放军国防科学技术大学 Extraction method for packet size distribution characteristics of P2P-TV platform and a P2P-TV platform identification method and an identification system based on same
US8040798B2 (en) * 2008-09-25 2011-10-18 Microsoft Corporation Discovering communication rules in a network trace
EP2584496A1 (en) * 2011-10-20 2013-04-24 Telefonaktiebolaget L M Ericsson AB (Publ) Creating and using multiple packet traffic profiling models to profile packet flows
CN103312565A (en) * 2013-06-28 2013-09-18 南京邮电大学 Independent learning based peer-to-peer (P2P) network flow identification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050060295A1 (en) * 2003-09-12 2005-03-17 Sensory Networks, Inc. Statistical classification of high-speed network data through content inspection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8040798B2 (en) * 2008-09-25 2011-10-18 Microsoft Corporation Discovering communication rules in a network trace
CN101800744A (en) * 2010-02-03 2010-08-11 中国人民解放军国防科学技术大学 Extraction method for packet size distribution characteristics of P2P-TV platform and a P2P-TV platform identification method and an identification system based on same
EP2584496A1 (en) * 2011-10-20 2013-04-24 Telefonaktiebolaget L M Ericsson AB (Publ) Creating and using multiple packet traffic profiling models to profile packet flows
CN103312565A (en) * 2013-06-28 2013-09-18 南京邮电大学 Independent learning based peer-to-peer (P2P) network flow identification method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Internet流量识别技术研究";赵国锋,吉朝明,徐川;《小型微型计算机系统》;20110106;第31卷(第8期);1514 - 1520 *
"Training on multiple sub-flows to optimise the use of machine learning classifiers in real-world IP networks";Thuy T.T. Nguyen,Grenville Armitage;《Proceedings.2006 31st IEEE Conference on Local Computer Networks》;20061116;369-376 *
"基于支持向量机的流量分类方法";林森,徐鹏,刘琼;《计算机应用研究》;20080924;第25卷(第8期);2488 - 2490 *

Also Published As

Publication number Publication date
CN104052639A (en) 2014-09-17

Similar Documents

Publication Publication Date Title
CN104052639B (en) Real-time multi-application network flow identification method based on support vector machine
Shafiq et al. Data mining and machine learning methods for sustainable smart cities traffic classification: A survey
CN111277578B (en) Encrypted flow analysis feature extraction method, system, storage medium and security device
CN104579823B (en) A kind of exception of network traffic detecting system based on high amount of traffic and method
CN104144089B (en) It is a kind of that flow knowledge method for distinguishing is carried out based on BP neural network
EP2521312B1 (en) Creating and using multiple packet traffic profiling models to profile packet flows
Alshammari et al. A flow based approach for SSH traffic detection
CN106790050A (en) A kind of anomalous traffic detection method and detecting system
CN105516020B (en) A kind of parallel network flow sorting technique based on ontology knowledge reasoning
CN104468567B (en) A kind of system and method for the identification of network multimedia Business Stream and mapping
CN104092588B (en) A kind of exception flow of network detection method combined based on SNMP with NetFlow
Vinayakumar et al. Secure shell (ssh) traffic analysis with flow based features using shallow and deep networks
Sheikh et al. Procedures, criteria, and machine learning techniques for network traffic classification: a survey
Muliukha et al. Analysis and classification of encrypted network traffic using machine learning
Feng et al. BotFlowMon: Learning-based, content-agnostic identification of social bot traffic flows
Portela et al. Evaluation of the performance of supervised and unsupervised Machine learning techniques for intrusion detection
Ubik et al. Evaluating application-layer classification using a Machine Learning technique over different high speed networks
Dixit et al. Internet traffic detection using naïve bayes and K-Nearest neighbors (KNN) algorithm
Min et al. Online Internet traffic identification algorithm based on multistage classifier
Cherepanov et al. Visualization of class activation maps to explain AI classification of network packet captures
CN105577438B (en) A kind of network flow body constructing method based on MapReduce
Ding et al. Internet traffic classification based on expanding vector of flow
Chen et al. A novel semi-supervised learning method for Internet application identification
Oudah et al. A novel features set for internet traffic classification using burstiness
Zhang et al. Network traffic clustering with QoS-awareness

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170322

CF01 Termination of patent right due to non-payment of annual fee