CN107222343A - Dedicated network stream sorting technique based on SVMs - Google Patents
Dedicated network stream sorting technique based on SVMs Download PDFInfo
- Publication number
- CN107222343A CN107222343A CN201710410330.1A CN201710410330A CN107222343A CN 107222343 A CN107222343 A CN 107222343A CN 201710410330 A CN201710410330 A CN 201710410330A CN 107222343 A CN107222343 A CN 107222343A
- Authority
- CN
- China
- Prior art keywords
- stream
- classification
- network
- svms
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/164—Adaptation or special uses of UDP protocol
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention relates to a kind of dedicated network stream sorting technique based on SVMs, comprise the following steps:Independent network packet is captured from network, and is stream independent one by one according to feature differentiation network packet;Feature extraction is carried out to each stream, each stream is described using vector form;After being classified per class stream, generate sample set, it is then based on sample set and performs algorithm of support vector machine, sample is classified, using Radial basis kernel function by DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension so as to realize two classification of the SVMs to sample, classified using 1 couple of N extended method from two and extended to many classification.The present invention is solved in dedicated network based on connectionless protocol, and current flux sorting algorithm adaptability is not high, while many classification extend computationally intensive, the problems such as accuracy is not high, with good popularizing application prospect.
Description
Technical field
The invention belongs to network measure analysis field, and in particular to a kind of dedicated network flow point class based on SVMs
Method.
Background technology
With the continuous development of network technology, the network bandwidth increases sharply, and the various applications in network rapidly increase, often
It has new applicating category to add network, to network management, the demand more and more higher of network analysis.Net flow assorted
With basis of the identification technology as user's behaviors analysis, not only in terms of network management, analysis, and in network security, network
In terms of service quality guarantee, have and be increasingly widely applied.
Network traffics identification classification is related to the every aspect of business conduct, and every kind of network traffics have respective behavior special
Levy, with the network application of property and continuing to bring out for new network application-level protocol, the complexity of network traffics is also constantly strengthened,
What its changeable, dynamic, heterogeneous characteristic also became becomes apparent.Research at present in traffic classification field is very active, current
Net flow assorted is broadly divided into following a few classes:
(1) the traffic classification method based on port mapping.It is main solid to application program by early stage IANA and later stage ICANN
Surely the fixed port distributed recognizes the type of various network applications, and this mode can recognize net early stage internet occurs
Most of applications on network, but with the appearance of new network application type, some applications use dynamic port, or non-know
Port, such as increasing P2P is applied and some illegal programs are in order to bypass the supervision of fire wall and network supervisor, and
Used it is some know port.These result in the traffic classification method based on port mapping to the ability of Network Recognition increasingly
It is weak.
(2) the traffic classification method based on pay(useful) load.By the analysis to flow Payload, formed for effectively negative
The feature of lotus, adds feature database, by extracting the application layer content characteristic of stream during analysis, is compared, so that effectively
Flow is classified, the classification accuracy of this mode classification is higher, but can only analyze and process known non-encrypted
Flow, it is also possible to bring safety problem.
(3) the traffic classification method of Behavior-based control feature.This method analyzes behavioural characteristic of the main frame in transport layer, such as
TCP and UDP is used sequentially, IP address number, speed change etc., so as to distinguish different discharge patterns.But this method pair
In some application-specific types, such as P2P, user-defined protocol etc. can not be handled.
What most research work was carried out both for Internet, target is more preferable monitoring and managing internet, is made
Its healthy work, and the discharge pattern on internet is varied, wherein the flow based on Transmission Control Protocol occupies ratio big absolutely
Example, therefore its feature can be extracted according to the workflow of Transmission Control Protocol, the method for passing through Behavior-based control feature carries out flow point
Class.But in some dedicated networks, such as sensor network, military network, in the dedicated network such as emergency disaster relief net, due to logical
Believe channel width, the reason such as reliability and traffic performance causes the flow in network to be mostly based on udp protocol, and upper strata is carried
Application layer protocol be also mostly specialized protocol, majority without Open Standard, simultaneously because the user in network is separate, mutually
Between lack unified coordination and planning, cause agreement to lack exclusiveness, i.e. application layer identification field, port numbers etc. are all likely to occur
The situation that different agreement is reused, while the specificity of these networks causes its transferring content to have certain privacy requirements, generally
Transferring content will be protected by using some AESs.These above-mentioned characteristics result in first three sorting technique can not be
Used in these networks.
The content of the invention
It is an object of the invention to provide a kind of dedicated network stream sorting technique based on SVMs, by machine learning
In support vector machine classification method be applied in dedicated network based on UDP business, and simplify SVMs by two
Classify to polytypic scaling problem.
The technical scheme for realizing the object of the invention is:A kind of dedicated network stream sorting technique based on SVMs, bag
Include following steps:
Step 1, independent network packet is captured from network, and is independent one by one according to feature differentiation network packet
Stream;
Step 2, feature extraction is carried out to each stream, each stream is described using vector form;
Step 3, after being classified per class stream, sample set is generated, sample set is then based on and performs SVMs
Algorithm, classifies to sample, realizes SVMs to the two of sample DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension using Radial basis kernel function
Classification, is classified from two using 1 couple of N extended method and is extended to many classification.
Compared with prior art, remarkable advantage of the invention is:
It is applied to dedicated network traffic classification method of the transport layer protocol based on udp protocol the present invention relates to a kind of, point
Class algorithm is realized simply, is improved SVMs and is classified from two to many classification expansion efficiencies, reduces supporting vector dimension.
Brief description of the drawings
Fig. 1 is many sorting technique extension schematic diagrames of a pair of N of the invention.
Fig. 2 is the execution flow chart of sorting algorithm.
Embodiment
A kind of dedicated network stream sorting technique based on SVMs, comprises the following steps:
Step 1, independent network packet is captured from network, and is independent one by one according to feature differentiation by network packet
Stream;
Step 2, feature extraction is carried out to each stream, each stream is described using vector form;
Step 3, after being classified per class stream, sample set is generated, sample set is then based on and performs SVMs
Algorithm, classifies to sample, using Radial basis kernel function by DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension so as to realize SVMs to sample
Two classification, classified using 1 couple of N extended method from two and extended to many classification.
Further, the five-tuple identical packet acknowledgement occurred in continuous time section is used in step 1 to be same
The method of stream, if the interval that two neighboring five-tuple occurs is more than threshold value with other packet gaps in the stream, then it is assumed that on
One stream terminates;Wherein, five-tuple refers to<Source address, destination address, source port, destination interface, protocol type>.
Further, the preceding n byte in each packet in application layer payload data is chosen in step 2 and is used as characteristic statisticses
Object, chooses average, variance, maximum, the minimum value of each byte value.
Further, 1 couple of N extended method is specially:If M classification, sample set is set up for each class respectively,
Optimal hyperlane is set up between any one classification and other N number of classifications, N number of classification is to be randomly selected from remaining M-1, its
Middle N<(K-1)/2, K is classification number total in network, then carries out discriminant classification according still further to one-to-many method.
The invention will be further described below in conjunction with the accompanying drawings.
Need to capture independent network packet from network before the flow in network is classified, and these nets
Network packet is stream independent one by one according to certain feature differentiation, because the transport layer protocol of flow in traditional internet
Mostly Transmission Control Protocol, flow judging is usually the foundation that is linked according to TCP and demolishing process to define the border of a flow, root
According to<Source address, destination address, source port, destination interface, protocol type>Five-tuple judges whether the packet in border belongs to
Same stream.And for the network traffics that transport layer protocol is UDP, due to not strict link setup and tear chain process open, thus it is logical
Chang Wufa defines the border of a flow by packet type, but most flow all has the duration, therefore right
Sorting out in the flow of this kind of dedicated network can use the five-tuple identical packet acknowledgement occurred in continuous time section to be same
One stream method, if two neighboring five-tuple occur interval with the stream other be grouped gaps it is excessive, then it is assumed that on
One stream terminates.
, it is necessary to carry out feature extraction to each stream after flow separation work is completed, each stream is carried out with vector form
Description, the dimension on sample determines that in support vector machine method, the dimension of sample means that amount of calculation is bigger,
In some cases, the dimension of sample is more, and classification results are more accurate, but dimension is excessive, and many features are useless features, increase
The complexity calculated, is not helped discriminant classification, generally goes reduction by methods such as principal component analysis in conventional methods where
The dimension of feature, but need substantial amounts of sample data to be handled and analyzed.Andrew moore proposed network in 2004
248 traffic characteristics of flow, these traffic characteristics can describe the characteristic of various flows in network, but multiple features therein
It is the discharge pattern of Transmission Control Protocol in particular for transport layer protocol both for traditional Internet business, and in some non-biographies
Business in system network, such as specific dedicated network, network is all based on udp protocol, lacks link setup process, should or use
Link setup is carried out with layer protocol, therefore uses new feature method for building up, description below is for burst class of the transport layer protocol for UDP
The feature method for selecting of application:
The preceding n byte in each packet in application layer payload data is chosen as characteristic statisticses object, each object has
255 possible values, choose the average of each byte value, variance, maximum, minimum value these statistical properties.Thus produce
4n characteristic value is given birth to, the dimension of supporting vector is just tieed up for 4n.Some of which dimension is key dimension, closely related with differentiating, but
It is also to have differentiation of some features on class to influence little, dimensionality reduction can be carried out by PCA during specifically used
Processing.
It will be carried out per class stream after manual sort, and generate the number in sample set, sample set per class stream no less than M,
If total classification number is K, total classification number is MK, is then based on sample set and performs algorithm of support vector machine, right
Sample is classified, using Radial basis kernel function by DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension so as to realizing SVMs to two points of sample
Class, and flow much not only two classes in network, it is therefore desirable to realize and two classify to polytypic extension, the branch commonly used at present
Holding the implementation method of classification more than vector machine mainly has one-to-one and one-to-many two kinds.
One-against-rest is that two classification are carried out between any one unitary class and the stream of other classifications, chooses classification value most
Big class as such ownership class method, the judgement that this kind of sorting technique each flows need carry out M time two classification judgement, its
Middle M be in network must traffic category number, there is sample class during producing some fuzzy regions, and classified calculating in this method
Not unbalanced problem, causes this method not to be suitable for internet traffic differentiation.
One-against-one is again that two classification relations are set up between any two class, chooses the maximum class of classification value and is used as this
The method of the ownership class of class, the judgement that this kind of sorting technique each flows needs to carry out the secondary two classification judgements of M* (M-1), and wherein M is
In network must flow point class number, the problem of this method improves one-against-rest use current SVMs flow judging more
This method, but this method classification number it is more in the case of, it is computationally intensive, discriminant function it is computationally intensive.
Many classification extended methods of a pair of N a kind of are proposed in the present invention:
Basic ideas:If M classification, sample set is set up for each class respectively, in any one classification and other N
Optimal hyperlane is set up between individual classification, N number of classification is to be randomly selected from remaining M-1, wherein N<(K-1)/2, then press again
Discriminant classification is carried out according to one-to-many method.In so avoiding the unbalanced problem of the sample class of one-against-rest, and
Reduce One-against-one it is computationally intensive the problem of.
Complete to extract grouped data after classification work and partly update Sample Storehouse data or extension Sample Storehouse, in order to
Discriminant function is updated after a certain time.
As shown in Figure 1 and Figure 2, to have many classification implementation procedures in the case of 4 sorting flows in network.As illustrated,
In order to avoid judging every time since flow 1, cause the calculation times of flow 4 maximum, so as to unfair problem occur, calculate into
Mouth uses the entrance of the stream of round robin, i.e., first to start from (1, N) differentiation, and the differentiation of next stream differentiates from (2, N) to be started,
The like, it is ensured that the differentiation of various types of streams calculates statistically fair.
The present invention is a kind of suitable for dedicated network traffic classification method of the transport layer protocol based on udp protocol.The party
Method is based on SVMs machine learning algorithm, and N is arrived using 1 when two sorting algorithms of SVMs are expanded into classify more
Extended method, to improve the accuracy rate of classification, while reduce the amount of calculation of algorithm, while employing a kind of based on applying the number of plies
According to new characterizing definition method, by carrying out statistical analysis to the preceding n byte in application layer data, form the spy that each flows
Levy, this method avoid to the dependence in original feature to link setup process, be allowed to preferably serve the network of connectionless protocol
Flow analysis.The present invention is solved in dedicated network based on connectionless protocol, and current flux sorting algorithm adaptability is not high, together
When more than classification extend computationally intensive, the problem of accuracy is not high, with good popularizing application prospect.
The present invention is described in detail with reference to embodiment.
Embodiment
A kind of dedicated network stream sorting technique based on SVMs, comprises the following steps:
Firstth, the known type data flow in network is extracted, traffic characteristic is calculated, sets up Sample Storehouse, each type stream sample
This number is all M.
Secondth, SVM (SVMs) learning method is run on the basis of Sample Storehouse, 1 couple of N classification function is produced
Storehouse, situation about how can not classify in two dimensional surface is classified, it is necessary to which each flow vector is mapped into higher-dimension using kernel function,
Recommend to use radial direction base core, i.e. RBF cores in the present invention, situations such as kernel function is to low-dimensional, higher-dimension, small sample, large sample is all
It is applicable, is to compare outstanding classification foundation function at present.
3rd, a number of new type data acquisition system An is gathered.
4th, to the data in data acquisition system, first according to tuple (source address, destination address, source port number, destination interface
Number, protocol type) sort out, then by it is multiple packet be defined as different streams.
5th, convection-type first according to can discrimination standard classified.Can discrimination standard be that system correspondence is acted and first arranged
, it is special such as to define specific port numbers, forbids other flows to use, now just can judge some according to the port numbers
The type of flow.
6th, the flow that can not classify for the 5th step, traffic characteristic is set up to every kind of flow, forms characteristic vector.
7th, according to method shown in Fig. 1, traffic characteristic vector to be sorted is brought into different classification functions and performed, most
The class categories of the flow are obtained eventually.
8th, the line sampling that flows into for completing classification is handled, selected part stream is replaced to the stream in sample set, or
In order to improve the precision of classification function, increase the size of sample set, feature database is extended by the stream of subsequent classification.
9th, the judgement that the 3rd step continues executing with rear afterflow rate is returned to.
Claims (4)
1. a kind of dedicated network stream sorting technique based on SVMs, it is characterised in that comprise the following steps:
Step 1, independent network packet is captured from network, and is independent one by one according to feature differentiation by network packet
Stream;
Step 2, feature extraction is carried out to each stream, each stream is described using vector form;
Step 3, after being classified per class stream, sample set is generated, sample set is then based on and performs algorithm of support vector machine,
Sample is classified, DUAL PROBLEMS OF VECTOR MAPPING to higher-dimension is realized by two classification of the SVMs to sample using Radial basis kernel function,
Classified using 1 couple of N extended method from two to many classification extensions.
2. the dedicated network stream sorting technique according to claim 1 based on SVMs, it is characterised in that step 1
In continuous time section in occur five-tuple identical packet acknowledgement be same stream, if two neighboring five-tuple appearance
Interval is more than threshold value with other packet gaps in the stream, then it is assumed that a upper stream terminates;Wherein, five-tuple refers to<Source address,
Destination address, source port, destination interface, protocol type>.
3. the dedicated network stream sorting technique according to claim 1 based on SVMs, it is characterised in that step 2
Preceding n byte in the middle each packet of selection in application layer payload data chooses each byte value as characteristic statisticses object
Average, variance, maximum, minimum value.
4. the dedicated network stream sorting technique according to claim 1 based on SVMs, it is characterised in that 1 couple of N's
Extended method is specially:If M classification, set up sample set for each class respectively, any one classification with it is other N number of
Optimal hyperlane is set up between classification, N number of classification is to be randomly selected from remaining M-1, wherein N<(K-1)/2, K is total in network
Classification number, then carry out discriminant classification according still further to one-to-many method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710410330.1A CN107222343A (en) | 2017-06-03 | 2017-06-03 | Dedicated network stream sorting technique based on SVMs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710410330.1A CN107222343A (en) | 2017-06-03 | 2017-06-03 | Dedicated network stream sorting technique based on SVMs |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107222343A true CN107222343A (en) | 2017-09-29 |
Family
ID=59947303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710410330.1A Pending CN107222343A (en) | 2017-06-03 | 2017-06-03 | Dedicated network stream sorting technique based on SVMs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107222343A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729952A (en) * | 2017-11-29 | 2018-02-23 | 新华三信息安全技术有限公司 | A kind of traffic flow classification method and device |
CN110011931A (en) * | 2019-01-25 | 2019-07-12 | 中国科学院信息工程研究所 | A kind of encryption traffic classes detection method and system |
WO2019179473A1 (en) * | 2018-03-23 | 2019-09-26 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods and devices for chunk based iot service inspection |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101025862A (en) * | 2007-02-12 | 2007-08-29 | 吉林大学 | Video based mixed traffic flow parameter detecting method |
CN102315974A (en) * | 2011-10-17 | 2012-01-11 | 北京邮电大学 | Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows |
CN102420833A (en) * | 2011-12-27 | 2012-04-18 | 华为技术有限公司 | Network protocol identification method, device and system |
CN104156733A (en) * | 2014-08-12 | 2014-11-19 | 中国人民解放军理工大学 | Foundation cloud form identification method based on multiscale structure characteristics |
US20160188876A1 (en) * | 2014-12-30 | 2016-06-30 | Battelle Memorial Institute | Anomaly detection for vehicular networks for intrusion and malfunction detection |
CN106529576A (en) * | 2016-10-20 | 2017-03-22 | 天津大学 | Piano score difficulty recognition algorithm based on improved measure learning support vector machine |
-
2017
- 2017-06-03 CN CN201710410330.1A patent/CN107222343A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101025862A (en) * | 2007-02-12 | 2007-08-29 | 吉林大学 | Video based mixed traffic flow parameter detecting method |
CN102315974A (en) * | 2011-10-17 | 2012-01-11 | 北京邮电大学 | Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows |
CN102420833A (en) * | 2011-12-27 | 2012-04-18 | 华为技术有限公司 | Network protocol identification method, device and system |
CN104156733A (en) * | 2014-08-12 | 2014-11-19 | 中国人民解放军理工大学 | Foundation cloud form identification method based on multiscale structure characteristics |
US20160188876A1 (en) * | 2014-12-30 | 2016-06-30 | Battelle Memorial Institute | Anomaly detection for vehicular networks for intrusion and malfunction detection |
CN106529576A (en) * | 2016-10-20 | 2017-03-22 | 天津大学 | Piano score difficulty recognition algorithm based on improved measure learning support vector machine |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729952A (en) * | 2017-11-29 | 2018-02-23 | 新华三信息安全技术有限公司 | A kind of traffic flow classification method and device |
CN107729952B (en) * | 2017-11-29 | 2021-04-30 | 新华三信息安全技术有限公司 | Service flow classification method and device |
WO2019179473A1 (en) * | 2018-03-23 | 2019-09-26 | Telefonaktiebolaget Lm Ericsson (Publ) | Methods and devices for chunk based iot service inspection |
CN110011931A (en) * | 2019-01-25 | 2019-07-12 | 中国科学院信息工程研究所 | A kind of encryption traffic classes detection method and system |
CN110011931B (en) * | 2019-01-25 | 2020-10-16 | 中国科学院信息工程研究所 | Encrypted flow type detection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102315974B (en) | Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows | |
Zhang et al. | Robust network traffic classification | |
CN101741744B (en) | Network flow identification method | |
CN101645806B (en) | Network flow classifying system and network flow classifying method combining DPI and DFI | |
CN106341337B (en) | Flow detection and control mechanism and method capable of realizing application awareness under SDN | |
CN104270392A (en) | Method and system for network protocol recognition based on tri-classifier cooperative training learning | |
CN102420723A (en) | Anomaly detection method for various kinds of intrusion | |
CN109150859B (en) | Botnet detection method based on network traffic flow direction similarity | |
CN101841440B (en) | Peer-to-peer network flow identification method based on support vector machine and deep packet inspection | |
CN105871832A (en) | Network application encrypted traffic recognition method and device based on protocol attributes | |
CN111817982A (en) | Encrypted flow identification method for category imbalance | |
CN108028807B (en) | Method and system for online automatic identification of network traffic models | |
CN109981474A (en) | A kind of network flow fine grit classification system and method for application-oriented software | |
CN105141455B (en) | A kind of net flow assorted modeling method of making an uproar based on statistical nature | |
CN107222343A (en) | Dedicated network stream sorting technique based on SVMs | |
CN110034966B (en) | Data flow classification method and system based on machine learning | |
CN104468252A (en) | Intelligent network service identification method based on positive transfer learning | |
KR101448550B1 (en) | Apparatus and Method for Traffic Classificaiton | |
Kong et al. | Identification of abnormal network traffic using support vector machine | |
Aureli et al. | Going beyond diffserv in ip traffic classification | |
CN102664807B (en) | Method and device for controlling flow | |
CN101854330A (en) | Method and system for collecting and analyzing network applications of Internet | |
CN113259367B (en) | Industrial control network flow multistage anomaly detection method and device | |
CN110266603A (en) | Authentication business network flow analysis system and method based on http protocol | |
CN101764754B (en) | Sample acquiring method in business identifying system based on DPI and DFI |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170929 |