CN107222343A - Dedicated network stream sorting technique based on SVMs - Google Patents

Dedicated network stream sorting technique based on SVMs Download PDF

Info

Publication number
CN107222343A
CN107222343A CN201710410330.1A CN201710410330A CN107222343A CN 107222343 A CN107222343 A CN 107222343A CN 201710410330 A CN201710410330 A CN 201710410330A CN 107222343 A CN107222343 A CN 107222343A
Authority
CN
China
Prior art keywords
stream
classification
network
svms
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710410330.1A
Other languages
Chinese (zh)
Inventor
于卫波
王海
米志超
董超
牛大伟
郭晓
李艾静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PLA University of Science and Technology
Original Assignee
PLA University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PLA University of Science and Technology filed Critical PLA University of Science and Technology
Priority to CN201710410330.1A priority Critical patent/CN107222343A/en
Publication of CN107222343A publication Critical patent/CN107222343A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/164Adaptation or special uses of UDP protocol

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention relates to a kind of dedicated network stream sorting technique based on SVMs, comprise the following steps:Independent network packet is captured from network, and is stream independent one by one according to feature differentiation network packet;Feature extraction is carried out to each stream, each stream is described using vector form;After being classified per class stream, generate sample set, it is then based on sample set and performs algorithm of support vector machine, sample is classified, using Radial basis kernel function by DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension so as to realize two classification of the SVMs to sample, classified using 1 couple of N extended method from two and extended to many classification.The present invention is solved in dedicated network based on connectionless protocol, and current flux sorting algorithm adaptability is not high, while many classification extend computationally intensive, the problems such as accuracy is not high, with good popularizing application prospect.

Description

Dedicated network stream sorting technique based on SVMs
Technical field
The invention belongs to network measure analysis field, and in particular to a kind of dedicated network flow point class based on SVMs Method.
Background technology
With the continuous development of network technology, the network bandwidth increases sharply, and the various applications in network rapidly increase, often It has new applicating category to add network, to network management, the demand more and more higher of network analysis.Net flow assorted With basis of the identification technology as user's behaviors analysis, not only in terms of network management, analysis, and in network security, network In terms of service quality guarantee, have and be increasingly widely applied.
Network traffics identification classification is related to the every aspect of business conduct, and every kind of network traffics have respective behavior special Levy, with the network application of property and continuing to bring out for new network application-level protocol, the complexity of network traffics is also constantly strengthened, What its changeable, dynamic, heterogeneous characteristic also became becomes apparent.Research at present in traffic classification field is very active, current Net flow assorted is broadly divided into following a few classes:
(1) the traffic classification method based on port mapping.It is main solid to application program by early stage IANA and later stage ICANN Surely the fixed port distributed recognizes the type of various network applications, and this mode can recognize net early stage internet occurs Most of applications on network, but with the appearance of new network application type, some applications use dynamic port, or non-know Port, such as increasing P2P is applied and some illegal programs are in order to bypass the supervision of fire wall and network supervisor, and Used it is some know port.These result in the traffic classification method based on port mapping to the ability of Network Recognition increasingly It is weak.
(2) the traffic classification method based on pay(useful) load.By the analysis to flow Payload, formed for effectively negative The feature of lotus, adds feature database, by extracting the application layer content characteristic of stream during analysis, is compared, so that effectively Flow is classified, the classification accuracy of this mode classification is higher, but can only analyze and process known non-encrypted Flow, it is also possible to bring safety problem.
(3) the traffic classification method of Behavior-based control feature.This method analyzes behavioural characteristic of the main frame in transport layer, such as TCP and UDP is used sequentially, IP address number, speed change etc., so as to distinguish different discharge patterns.But this method pair In some application-specific types, such as P2P, user-defined protocol etc. can not be handled.
What most research work was carried out both for Internet, target is more preferable monitoring and managing internet, is made Its healthy work, and the discharge pattern on internet is varied, wherein the flow based on Transmission Control Protocol occupies ratio big absolutely Example, therefore its feature can be extracted according to the workflow of Transmission Control Protocol, the method for passing through Behavior-based control feature carries out flow point Class.But in some dedicated networks, such as sensor network, military network, in the dedicated network such as emergency disaster relief net, due to logical Believe channel width, the reason such as reliability and traffic performance causes the flow in network to be mostly based on udp protocol, and upper strata is carried Application layer protocol be also mostly specialized protocol, majority without Open Standard, simultaneously because the user in network is separate, mutually Between lack unified coordination and planning, cause agreement to lack exclusiveness, i.e. application layer identification field, port numbers etc. are all likely to occur The situation that different agreement is reused, while the specificity of these networks causes its transferring content to have certain privacy requirements, generally Transferring content will be protected by using some AESs.These above-mentioned characteristics result in first three sorting technique can not be Used in these networks.
The content of the invention
It is an object of the invention to provide a kind of dedicated network stream sorting technique based on SVMs, by machine learning In support vector machine classification method be applied in dedicated network based on UDP business, and simplify SVMs by two Classify to polytypic scaling problem.
The technical scheme for realizing the object of the invention is:A kind of dedicated network stream sorting technique based on SVMs, bag Include following steps:
Step 1, independent network packet is captured from network, and is independent one by one according to feature differentiation network packet Stream;
Step 2, feature extraction is carried out to each stream, each stream is described using vector form;
Step 3, after being classified per class stream, sample set is generated, sample set is then based on and performs SVMs Algorithm, classifies to sample, realizes SVMs to the two of sample DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension using Radial basis kernel function Classification, is classified from two using 1 couple of N extended method and is extended to many classification.
Compared with prior art, remarkable advantage of the invention is:
It is applied to dedicated network traffic classification method of the transport layer protocol based on udp protocol the present invention relates to a kind of, point Class algorithm is realized simply, is improved SVMs and is classified from two to many classification expansion efficiencies, reduces supporting vector dimension.
Brief description of the drawings
Fig. 1 is many sorting technique extension schematic diagrames of a pair of N of the invention.
Fig. 2 is the execution flow chart of sorting algorithm.
Embodiment
A kind of dedicated network stream sorting technique based on SVMs, comprises the following steps:
Step 1, independent network packet is captured from network, and is independent one by one according to feature differentiation by network packet Stream;
Step 2, feature extraction is carried out to each stream, each stream is described using vector form;
Step 3, after being classified per class stream, sample set is generated, sample set is then based on and performs SVMs Algorithm, classifies to sample, using Radial basis kernel function by DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension so as to realize SVMs to sample Two classification, classified using 1 couple of N extended method from two and extended to many classification.
Further, the five-tuple identical packet acknowledgement occurred in continuous time section is used in step 1 to be same The method of stream, if the interval that two neighboring five-tuple occurs is more than threshold value with other packet gaps in the stream, then it is assumed that on One stream terminates;Wherein, five-tuple refers to<Source address, destination address, source port, destination interface, protocol type>.
Further, the preceding n byte in each packet in application layer payload data is chosen in step 2 and is used as characteristic statisticses Object, chooses average, variance, maximum, the minimum value of each byte value.
Further, 1 couple of N extended method is specially:If M classification, sample set is set up for each class respectively, Optimal hyperlane is set up between any one classification and other N number of classifications, N number of classification is to be randomly selected from remaining M-1, its Middle N<(K-1)/2, K is classification number total in network, then carries out discriminant classification according still further to one-to-many method.
The invention will be further described below in conjunction with the accompanying drawings.
Need to capture independent network packet from network before the flow in network is classified, and these nets Network packet is stream independent one by one according to certain feature differentiation, because the transport layer protocol of flow in traditional internet Mostly Transmission Control Protocol, flow judging is usually the foundation that is linked according to TCP and demolishing process to define the border of a flow, root According to<Source address, destination address, source port, destination interface, protocol type>Five-tuple judges whether the packet in border belongs to Same stream.And for the network traffics that transport layer protocol is UDP, due to not strict link setup and tear chain process open, thus it is logical Chang Wufa defines the border of a flow by packet type, but most flow all has the duration, therefore right Sorting out in the flow of this kind of dedicated network can use the five-tuple identical packet acknowledgement occurred in continuous time section to be same One stream method, if two neighboring five-tuple occur interval with the stream other be grouped gaps it is excessive, then it is assumed that on One stream terminates.
, it is necessary to carry out feature extraction to each stream after flow separation work is completed, each stream is carried out with vector form Description, the dimension on sample determines that in support vector machine method, the dimension of sample means that amount of calculation is bigger, In some cases, the dimension of sample is more, and classification results are more accurate, but dimension is excessive, and many features are useless features, increase The complexity calculated, is not helped discriminant classification, generally goes reduction by methods such as principal component analysis in conventional methods where The dimension of feature, but need substantial amounts of sample data to be handled and analyzed.Andrew moore proposed network in 2004 248 traffic characteristics of flow, these traffic characteristics can describe the characteristic of various flows in network, but multiple features therein It is the discharge pattern of Transmission Control Protocol in particular for transport layer protocol both for traditional Internet business, and in some non-biographies Business in system network, such as specific dedicated network, network is all based on udp protocol, lacks link setup process, should or use Link setup is carried out with layer protocol, therefore uses new feature method for building up, description below is for burst class of the transport layer protocol for UDP The feature method for selecting of application:
The preceding n byte in each packet in application layer payload data is chosen as characteristic statisticses object, each object has 255 possible values, choose the average of each byte value, variance, maximum, minimum value these statistical properties.Thus produce 4n characteristic value is given birth to, the dimension of supporting vector is just tieed up for 4n.Some of which dimension is key dimension, closely related with differentiating, but It is also to have differentiation of some features on class to influence little, dimensionality reduction can be carried out by PCA during specifically used Processing.
It will be carried out per class stream after manual sort, and generate the number in sample set, sample set per class stream no less than M, If total classification number is K, total classification number is MK, is then based on sample set and performs algorithm of support vector machine, right Sample is classified, using Radial basis kernel function by DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension so as to realizing SVMs to two points of sample Class, and flow much not only two classes in network, it is therefore desirable to realize and two classify to polytypic extension, the branch commonly used at present Holding the implementation method of classification more than vector machine mainly has one-to-one and one-to-many two kinds.
One-against-rest is that two classification are carried out between any one unitary class and the stream of other classifications, chooses classification value most Big class as such ownership class method, the judgement that this kind of sorting technique each flows need carry out M time two classification judgement, its Middle M be in network must traffic category number, there is sample class during producing some fuzzy regions, and classified calculating in this method Not unbalanced problem, causes this method not to be suitable for internet traffic differentiation.
One-against-one is again that two classification relations are set up between any two class, chooses the maximum class of classification value and is used as this The method of the ownership class of class, the judgement that this kind of sorting technique each flows needs to carry out the secondary two classification judgements of M* (M-1), and wherein M is In network must flow point class number, the problem of this method improves one-against-rest use current SVMs flow judging more This method, but this method classification number it is more in the case of, it is computationally intensive, discriminant function it is computationally intensive.
Many classification extended methods of a pair of N a kind of are proposed in the present invention:
Basic ideas:If M classification, sample set is set up for each class respectively, in any one classification and other N Optimal hyperlane is set up between individual classification, N number of classification is to be randomly selected from remaining M-1, wherein N<(K-1)/2, then press again Discriminant classification is carried out according to one-to-many method.In so avoiding the unbalanced problem of the sample class of one-against-rest, and Reduce One-against-one it is computationally intensive the problem of.
Complete to extract grouped data after classification work and partly update Sample Storehouse data or extension Sample Storehouse, in order to Discriminant function is updated after a certain time.
As shown in Figure 1 and Figure 2, to have many classification implementation procedures in the case of 4 sorting flows in network.As illustrated, In order to avoid judging every time since flow 1, cause the calculation times of flow 4 maximum, so as to unfair problem occur, calculate into Mouth uses the entrance of the stream of round robin, i.e., first to start from (1, N) differentiation, and the differentiation of next stream differentiates from (2, N) to be started, The like, it is ensured that the differentiation of various types of streams calculates statistically fair.
The present invention is a kind of suitable for dedicated network traffic classification method of the transport layer protocol based on udp protocol.The party Method is based on SVMs machine learning algorithm, and N is arrived using 1 when two sorting algorithms of SVMs are expanded into classify more Extended method, to improve the accuracy rate of classification, while reduce the amount of calculation of algorithm, while employing a kind of based on applying the number of plies According to new characterizing definition method, by carrying out statistical analysis to the preceding n byte in application layer data, form the spy that each flows Levy, this method avoid to the dependence in original feature to link setup process, be allowed to preferably serve the network of connectionless protocol Flow analysis.The present invention is solved in dedicated network based on connectionless protocol, and current flux sorting algorithm adaptability is not high, together When more than classification extend computationally intensive, the problem of accuracy is not high, with good popularizing application prospect.
The present invention is described in detail with reference to embodiment.
Embodiment
A kind of dedicated network stream sorting technique based on SVMs, comprises the following steps:
Firstth, the known type data flow in network is extracted, traffic characteristic is calculated, sets up Sample Storehouse, each type stream sample This number is all M.
Secondth, SVM (SVMs) learning method is run on the basis of Sample Storehouse, 1 couple of N classification function is produced Storehouse, situation about how can not classify in two dimensional surface is classified, it is necessary to which each flow vector is mapped into higher-dimension using kernel function, Recommend to use radial direction base core, i.e. RBF cores in the present invention, situations such as kernel function is to low-dimensional, higher-dimension, small sample, large sample is all It is applicable, is to compare outstanding classification foundation function at present.
3rd, a number of new type data acquisition system An is gathered.
4th, to the data in data acquisition system, first according to tuple (source address, destination address, source port number, destination interface Number, protocol type) sort out, then by it is multiple packet be defined as different streams.
5th, convection-type first according to can discrimination standard classified.Can discrimination standard be that system correspondence is acted and first arranged , it is special such as to define specific port numbers, forbids other flows to use, now just can judge some according to the port numbers The type of flow.
6th, the flow that can not classify for the 5th step, traffic characteristic is set up to every kind of flow, forms characteristic vector.
7th, according to method shown in Fig. 1, traffic characteristic vector to be sorted is brought into different classification functions and performed, most The class categories of the flow are obtained eventually.
8th, the line sampling that flows into for completing classification is handled, selected part stream is replaced to the stream in sample set, or In order to improve the precision of classification function, increase the size of sample set, feature database is extended by the stream of subsequent classification.
9th, the judgement that the 3rd step continues executing with rear afterflow rate is returned to.

Claims (4)

1. a kind of dedicated network stream sorting technique based on SVMs, it is characterised in that comprise the following steps:
Step 1, independent network packet is captured from network, and is independent one by one according to feature differentiation by network packet Stream;
Step 2, feature extraction is carried out to each stream, each stream is described using vector form;
Step 3, after being classified per class stream, sample set is generated, sample set is then based on and performs algorithm of support vector machine, Sample is classified, DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension is realized by two classification of the SVMs to sample using Radial basis kernel function, Classified using 1 couple of N extended method from two to many classification extensions.
2. the dedicated network stream sorting technique according to claim 1 based on SVMs, it is characterised in that step 1 In continuous time section in occur five-tuple identical packet acknowledgement be same stream, if two neighboring five-tuple appearance Interval is more than threshold value with other packet gaps in the stream, then it is assumed that a upper stream terminates;Wherein, five-tuple refers to<Source address, Destination address, source port, destination interface, protocol type>.
3. the dedicated network stream sorting technique according to claim 1 based on SVMs, it is characterised in that step 2 Preceding n byte in the middle each packet of selection in application layer payload data chooses each byte value as characteristic statisticses object Average, variance, maximum, minimum value.
4. the dedicated network stream sorting technique according to claim 1 based on SVMs, it is characterised in that 1 couple of N's Extended method is specially:If M classification, set up sample set for each class respectively, any one classification with it is other N number of Optimal hyperlane is set up between classification, N number of classification is to be randomly selected from remaining M-1, wherein N<(K-1)/2, K is total in network Classification number, then carry out discriminant classification according still further to one-to-many method.
CN201710410330.1A 2017-06-03 2017-06-03 Dedicated network stream sorting technique based on SVMs Pending CN107222343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710410330.1A CN107222343A (en) 2017-06-03 2017-06-03 Dedicated network stream sorting technique based on SVMs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710410330.1A CN107222343A (en) 2017-06-03 2017-06-03 Dedicated network stream sorting technique based on SVMs

Publications (1)

Publication Number Publication Date
CN107222343A true CN107222343A (en) 2017-09-29

Family

ID=59947303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710410330.1A Pending CN107222343A (en) 2017-06-03 2017-06-03 Dedicated network stream sorting technique based on SVMs

Country Status (1)

Country Link
CN (1) CN107222343A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729952A (en) * 2017-11-29 2018-02-23 新华三信息安全技术有限公司 A kind of traffic flow classification method and device
CN110011931A (en) * 2019-01-25 2019-07-12 中国科学院信息工程研究所 A kind of encryption traffic classes detection method and system
WO2019179473A1 (en) * 2018-03-23 2019-09-26 Telefonaktiebolaget Lm Ericsson (Publ) Methods and devices for chunk based iot service inspection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025862A (en) * 2007-02-12 2007-08-29 吉林大学 Video based mixed traffic flow parameter detecting method
CN102315974A (en) * 2011-10-17 2012-01-11 北京邮电大学 Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows
CN102420833A (en) * 2011-12-27 2012-04-18 华为技术有限公司 Network protocol identification method, device and system
CN104156733A (en) * 2014-08-12 2014-11-19 中国人民解放军理工大学 Foundation cloud form identification method based on multiscale structure characteristics
US20160188876A1 (en) * 2014-12-30 2016-06-30 Battelle Memorial Institute Anomaly detection for vehicular networks for intrusion and malfunction detection
CN106529576A (en) * 2016-10-20 2017-03-22 天津大学 Piano score difficulty recognition algorithm based on improved measure learning support vector machine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025862A (en) * 2007-02-12 2007-08-29 吉林大学 Video based mixed traffic flow parameter detecting method
CN102315974A (en) * 2011-10-17 2012-01-11 北京邮电大学 Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows
CN102420833A (en) * 2011-12-27 2012-04-18 华为技术有限公司 Network protocol identification method, device and system
CN104156733A (en) * 2014-08-12 2014-11-19 中国人民解放军理工大学 Foundation cloud form identification method based on multiscale structure characteristics
US20160188876A1 (en) * 2014-12-30 2016-06-30 Battelle Memorial Institute Anomaly detection for vehicular networks for intrusion and malfunction detection
CN106529576A (en) * 2016-10-20 2017-03-22 天津大学 Piano score difficulty recognition algorithm based on improved measure learning support vector machine

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729952A (en) * 2017-11-29 2018-02-23 新华三信息安全技术有限公司 A kind of traffic flow classification method and device
CN107729952B (en) * 2017-11-29 2021-04-30 新华三信息安全技术有限公司 Service flow classification method and device
WO2019179473A1 (en) * 2018-03-23 2019-09-26 Telefonaktiebolaget Lm Ericsson (Publ) Methods and devices for chunk based iot service inspection
CN110011931A (en) * 2019-01-25 2019-07-12 中国科学院信息工程研究所 A kind of encryption traffic classes detection method and system
CN110011931B (en) * 2019-01-25 2020-10-16 中国科学院信息工程研究所 Encrypted flow type detection method and system

Similar Documents

Publication Publication Date Title
CN102315974B (en) Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows
Zhang et al. Robust network traffic classification
CN101741744B (en) Network flow identification method
CN101645806B (en) Network flow classifying system and network flow classifying method combining DPI and DFI
CN106341337B (en) Flow detection and control mechanism and method capable of realizing application awareness under SDN
CN104270392A (en) Method and system for network protocol recognition based on tri-classifier cooperative training learning
CN102420723A (en) Anomaly detection method for various kinds of intrusion
CN109150859B (en) Botnet detection method based on network traffic flow direction similarity
CN101841440B (en) Peer-to-peer network flow identification method based on support vector machine and deep packet inspection
CN105871832A (en) Network application encrypted traffic recognition method and device based on protocol attributes
CN111817982A (en) Encrypted flow identification method for category imbalance
CN108028807B (en) Method and system for online automatic identification of network traffic models
CN109981474A (en) A kind of network flow fine grit classification system and method for application-oriented software
CN105141455B (en) A kind of net flow assorted modeling method of making an uproar based on statistical nature
CN107222343A (en) Dedicated network stream sorting technique based on SVMs
CN110034966B (en) Data flow classification method and system based on machine learning
CN104468252A (en) Intelligent network service identification method based on positive transfer learning
KR101448550B1 (en) Apparatus and Method for Traffic Classificaiton
Kong et al. Identification of abnormal network traffic using support vector machine
Aureli et al. Going beyond diffserv in ip traffic classification
CN102664807B (en) Method and device for controlling flow
CN101854330A (en) Method and system for collecting and analyzing network applications of Internet
CN113259367B (en) Industrial control network flow multistage anomaly detection method and device
CN110266603A (en) Authentication business network flow analysis system and method based on http protocol
CN101764754B (en) Sample acquiring method in business identifying system based on DPI and DFI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170929