CN104767692A - Network traffic classification method - Google Patents
Network traffic classification method Download PDFInfo
- Publication number
- CN104767692A CN104767692A CN201510176138.1A CN201510176138A CN104767692A CN 104767692 A CN104767692 A CN 104767692A CN 201510176138 A CN201510176138 A CN 201510176138A CN 104767692 A CN104767692 A CN 104767692A
- Authority
- CN
- China
- Prior art keywords
- algorithm
- net flow
- training
- classification
- assorted method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a network traffic classification method. The method comprises the steps that (1) a network traffic behavior characteristic set is extracted; (2) a classifier model is obtained, the behavior characteristic set is input in a classifier, classification training is carried out, and corresponding parameters are obtained; (3) the performance of the classifier is evaluated and optimized. According to the method, a non-supervision algorithm and a supervision algorithm in a machine learning method are combined to carry out classification. By means of the combination of the non-supervision algorithm and the supervision algorithm, under the premise that a high classification accuracy rate is ensured, the time and memory expenditure of a system is reduced, and the classification efficiency is improved. A clustering algorithm is improved, the clustering accuracy rate is increased, and therefore the overall performance is improved.
Description
Technical field
The present invention relates to a kind of net flow assorted method, be specifically related to a kind of net flow assorted method.
Background technology
Along with the extensive use of the expansion of internet scale and various high bandwidth, multiple types network service, Network data volume sharply rises, and network data flow intelligent management becomes more and more important.Prerequisite carries out Classification and Identification to data stream type exactly.
Traffic classification not only will ensure accuracy rate, and reduction time and memory cost also become study hotspot.In existing sorting technique, machine learning method is study hotspot, be divided into without supervise algorithm and have supervise algorithm, wherein carry out class bunch without supervise algorithm according to sample similarity to divide, cluster is that Application comparison is widely without supervise algorithm, mainly contain based on partition clustering, based on hierarchical clustering, density based cluster and the algorithm based on Grid Clustering, wherein k-means algorithm is the clustering algorithm be most widely used; There is supervise algorithm to pass through to train the sample set adjustment classifier parameters of known class, obtain sorter model.Mainly contain the methods such as neural net, categorised decision tree, bayesian theory, SVMs (SVM).Clustering algorithm accuracy rate is lower, the system time having supervise algorithm to consume and internal memory higher.
Summary of the invention
In order to overcome above-mentioned the deficiencies in the prior art, the invention provides a kind of net flow assorted method, the method, by cluster and the combination having supervise algorithm, can make time complexity reduce, and classification accuracy raises, and improves classification effectiveness.
In order to realize foregoing invention object, the present invention takes following technical scheme:
A kind of net flow assorted method, described method comprises the steps:
(1) network-flow characteristic feature set is extracted;
(2) obtain sorter model, by described behavioural characteristic collection input grader, carry out classification based training, obtain relevant parameter; Classifier performance is assessed, and Optimum Classification device performance.
In optimal technical scheme provided by the invention, described step (1) comprises the steps:
A. network traffics capture and screen the key message of unknown flow rate data;
B. process described key message, obtain effective data message more directly perceived, to represent the behavioural characteristic of unknown flow rate;
C. determine feature set dimension, integrate described behavioural characteristic, avoid information redundancy, and retain the information of effective identification data stream.
In second optimal technical scheme provided by the invention, described key message comprises data traffic types, data pack protocol, time and type.
In 3rd optimal technical scheme provided by the invention, described step (2) comprises the steps:
A. the network-flow characteristic feature set that selected part has extracted carries out classification mark, as the behavioural characteristic newly added, obtains training set and the test set of grader training;
B. training set is carried out clustering algorithm, obtain new training set;
C. exercise supervision new training set algorithm, determines sorting criterion;
D. test set input grader carries out sorting criterion test, and carries out accuracy rate assessment;
E. whether judging nicety rate meets the requirements, and is, terminates; Otherwise return step B, meet the requirements to accuracy rate.
In 4th optimal technical scheme provided by the invention, described step B comprises the steps:
(a) clustering algorithm parameter initialization, training set input is trained;
B () judges whether cluster centre restrains, and is, performs step (c), otherwise perform step (a);
C the training of () cluster terminates after, preserve cluster centre and convergence radius, adjusting training collection also removes the sample of cluster, forms new training set.
In 5th optimal technical scheme provided by the invention, involved step (C) comprises the steps:
A () supervise algorithm parameter initialization, inputs new training set and trains;
Whether (b) evaluation algorithm restrains, and is, performs step (c), otherwise performs step (a);
C () supervise algorithm parameter is determined, sorting criterion is determined.
In 6th optimal technical scheme provided by the invention, described training set comprises the feature set that part has marked and the feature set partly do not marked.
In 7th optimal technical scheme provided by the invention, described test set comprises the feature set that part has marked and the feature set partly do not marked.
In 8th optimal technical scheme provided by the invention, described classification mark adopts deep packet inspection technical.
In 9th optimal technical scheme provided by the invention, described classifier performance assesses the assessment comprised grader accuracy rate, system time and memory cost.
Compared with prior art, beneficial effect of the present invention is:
The present invention is in conjunction with nothing supervision in machine learning method and have supervision two kinds of algorithms to classify.The two combines and ensureing to reduce system time and memory cost under the prerequisite compared with high-class accuracy rate, can promote classification effectiveness.
Whether the present invention is improved clustering algorithm, adds the sample point marked in cluster process, can detect cluster centre and choose proper, round-off error and mistake, reduce the center of a sample's point not meeting actual conditions, promote cluster accuracy rate, thus improve overall performance.
Accompanying drawing explanation
Fig. 1 is a kind of net flow assorted method flow diagram
Fig. 2 is the schematic flow sheet that network-flow characteristic feature set is extracted
Fig. 3 is that sorter model obtains and Optimizing Flow figure
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
As shown in Figure 1, a kind of based on cluster and the net flow assorted method having supervise algorithm, step is as follows:
Step S101: network data flow extracts behavioural characteristic;
Step S102: obtain sorter model, by above-mentioned behavioural characteristic collection input grader, carries out grader training, obtains relevant parameter;
Step S103: classifier performance is assessed, and comprises accuracy rate, system time and memory cost etc., test set input grader is classified, according to accuracy rate adjustment classifier algorithm and parameter, Optimum Classification device performance.
As shown in Figure 2, network-flow characteristic feature set extracting method, step is as follows:
Step S201: network traffics capture and screen the key message of unknown flow rate data, and key message comprises data traffic types, data pack protocol, time and size;
Step S202: to above-mentioned key message process, obtains effective data message more directly perceived, to represent the behavioural characteristic of unknown flow rate;
Step S203: determine feature set dimension, integrates above-mentioned behavioural characteristic collection, avoids information redundancy, and retains the information of effective identification data stream.
As shown in Figure 3, sorter model obtains and optimization method, and step is as follows:
Step S301: the network-flow characteristic feature set that selected part has extracted carries out classification mark, as the behavioural characteristic newly added, obtains training set and the test set of grader training; Wherein mask method can adopt artificial or DPI (deep packet inspection technical, flow detection and control based on application layer) method;
Step S302: clustering algorithm parameter initialization, training set input is trained;
Step S303: judge whether cluster centre restrains, and is, performs S304, otherwise perform step S302;
Specific algorithm is: the sample that random selecting part marked trains clustering algorithm jointly with the sample do not marked in a large number, choose cluster centre, whether the sample point marked can detect center of a sample's point and choose proper, round-off error and mistake, reduce because of without supervision, only obtain with clustering criteria center of a sample's point that result obtains not meeting actual conditions.According to clustering criteria repetitive exercise sample set, end condition is algorithmic statement.Now cluster centre and suitable cluster scope are determined, cluster scope adopts convergence radius to represent;
Step S304: preserve cluster centre, convergence radius, adjusting training collection, removes the sample point of cluster and preserve;
After cluster training terminates, preserve cluster centre and convergence radius, as the cluster judgment foundation of unknown flow rate Data classification, remove the sample point that cluster is crossed, the data volume having supervise algorithm to train decreases, if the complexity of SVM algorithm is at O (n
2) ~ O (n
3) between, n is number of samples, has supervise algorithm complexity greatly to reduce like this, can training for promotion efficiency;
Step S305: have supervise algorithm parameter initialization, inputs new training set and trains;
Step S306: whether evaluation algorithm restrains, and is, performs S307, otherwise perform S305;
Step S307: have supervise algorithm parameter to determine, sorting criterion is determined;
Step S308: test set input grader carries out class test, and carries out accuracy rate assessment;
Step S309: whether judging nicety rate meets the requirements, does not meet and then returns S302, meet the requirements, be mainly classification accuracy higher to classifying quality; Satisfied then determine that this sorter model is feasible, terminate algorithm.
Finally should be noted that: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although with reference to above-described embodiment to invention has been detailed description, those of ordinary skill in the field are to be understood that: still can modify to the specific embodiment of the present invention or equivalent replacement, and not departing from any amendment of spirit and scope of the invention or equivalent replacement, it all should be encompassed in the middle of right of the present invention.
Claims (10)
1. a net flow assorted method, is characterized in that, described method comprises the steps:
(1) network-flow characteristic feature set is extracted;
(2) obtain sorter model, by described behavioural characteristic collection input grader, carry out classification based training, obtain relevant parameter; Classifier performance is assessed, and Optimum Classification device performance.
2. net flow assorted method according to claim 1, it is characterized in that, described step (1) comprises the steps:
A. network traffics capture and screen the key message of unknown flow rate data;
B. process described key message, obtain effective data message more directly perceived, to represent the behavioural characteristic of unknown flow rate;
C. determine feature set dimension, integrate described behavioural characteristic, avoid information redundancy, and retain the information of effective identification data stream.
3. net flow assorted method according to claim 2, it is characterized in that, described key message comprises data traffic types, data pack protocol, time and type.
4. net flow assorted method according to claim 1, it is characterized in that, described step (2) comprises the steps:
A. the network-flow characteristic feature set that selected part has extracted carries out classification mark, as the behavioural characteristic newly added, obtains training set and the test set of grader training;
B. training set is carried out clustering algorithm, obtain new training set;
C. exercise supervision new training set algorithm, determines sorting criterion;
D. test set input grader carries out sorting criterion test, and carries out accuracy rate assessment;
E. whether judging nicety rate meets the requirements, and is, terminates; Otherwise return step B, meet the requirements to accuracy rate.
5. net flow assorted method according to claim 4, it is characterized in that, described step B comprises the steps:
(a) clustering algorithm parameter initialization, training set input is trained;
B () judges whether cluster centre restrains, and is, performs step (c), otherwise perform step (a);
C the training of () cluster terminates after, preserve cluster centre and convergence radius, adjusting training collection also removes the sample of cluster, forms new training set.
6. net flow assorted method according to claim 4, it is characterized in that, involved step (C) comprises the steps:
A () supervise algorithm parameter initialization, inputs new training set and trains;
Whether (b) evaluation algorithm restrains, and is, performs step (c), otherwise performs step (a);
C () supervise algorithm parameter is determined, sorting criterion is determined.
7. net flow assorted method according to claim 4, is characterized in that, described training set comprises feature set that part marked and the feature set that part does not mark.
8. net flow assorted method according to claim 4, is characterized in that, described test set comprises feature set that part marked and the feature set that part does not mark.
9. net flow assorted method according to claim 4, is characterized in that, described classification mark adopts deep packet inspection technical.
10. net flow assorted method according to claim 1, it is characterized in that, described classifier performance assesses the assessment comprised grader accuracy rate, system time and memory cost.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510176138.1A CN104767692B (en) | 2015-04-15 | 2015-04-15 | A kind of net flow assorted method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510176138.1A CN104767692B (en) | 2015-04-15 | 2015-04-15 | A kind of net flow assorted method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104767692A true CN104767692A (en) | 2015-07-08 |
CN104767692B CN104767692B (en) | 2018-05-29 |
Family
ID=53649314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510176138.1A Active CN104767692B (en) | 2015-04-15 | 2015-04-15 | A kind of net flow assorted method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104767692B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105022960A (en) * | 2015-08-10 | 2015-11-04 | 济南大学 | Multi-feature mobile terminal malicious software detecting method based on network flow and multi-feature mobile terminal malicious software detecting system based on network flow |
CN106411775A (en) * | 2016-08-31 | 2017-02-15 | 国家计算机网络与信息安全管理中心 | Internet traffic classification sample labeling method |
CN106713324A (en) * | 2016-12-28 | 2017-05-24 | 北京奇艺世纪科技有限公司 | Flow detection method and device |
CN106959967A (en) * | 2016-01-12 | 2017-07-18 | 中国科学院声学研究所 | A kind of training of link prediction model and link prediction method |
CN108197666A (en) * | 2018-01-30 | 2018-06-22 | 咪咕文化科技有限公司 | Image classification model processing method and device and storage medium |
CN109376797A (en) * | 2018-11-20 | 2019-02-22 | 大连理工大学 | A kind of net flow assorted method based on binary coder and more Hash tables |
CN109450740A (en) * | 2018-12-21 | 2019-03-08 | 青岛理工大学 | SDN controller for carrying out traffic classification based on DPI and machine learning algorithm |
CN109922083A (en) * | 2019-04-10 | 2019-06-21 | 武汉金盛方圆网络科技发展有限公司 | A kind of network protocol flow control system |
CN110149280A (en) * | 2019-05-27 | 2019-08-20 | 中国科学技术大学 | Net flow assorted method and apparatus |
CN110445800A (en) * | 2019-08-15 | 2019-11-12 | 上海寰创通信科技股份有限公司 | A kind of deep message resolution system based on self study |
CN110753049A (en) * | 2019-10-21 | 2020-02-04 | 清华大学 | Safety situation sensing system based on industrial control network flow |
WO2020062390A1 (en) * | 2018-09-25 | 2020-04-02 | 深圳先进技术研究院 | Network traffic classification method and system, and electronic device |
CN111126419A (en) * | 2018-10-30 | 2020-05-08 | 顺丰科技有限公司 | Dot clustering method and device |
CN111983429A (en) * | 2020-08-19 | 2020-11-24 | Oppo广东移动通信有限公司 | Chip verification system, chip verification method, terminal and storage medium |
CN112637084A (en) * | 2020-12-10 | 2021-04-09 | 中山职业技术学院 | Distributed network flow novelty detection method and classifier |
US11586971B2 (en) | 2018-07-19 | 2023-02-21 | Hewlett Packard Enterprise Development Lp | Device identifier classification |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101060443A (en) * | 2006-04-17 | 2007-10-24 | 中国科学院自动化研究所 | An improved adaptive boosting algorithm based Internet intrusion detection method |
CN103150454A (en) * | 2013-03-27 | 2013-06-12 | 山东大学 | Dynamic machine learning modeling method based on sample recommending and labeling |
CN103793510A (en) * | 2014-01-29 | 2014-05-14 | 苏州融希信息科技有限公司 | Classifier construction method based on active learning |
-
2015
- 2015-04-15 CN CN201510176138.1A patent/CN104767692B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101060443A (en) * | 2006-04-17 | 2007-10-24 | 中国科学院自动化研究所 | An improved adaptive boosting algorithm based Internet intrusion detection method |
CN103150454A (en) * | 2013-03-27 | 2013-06-12 | 山东大学 | Dynamic machine learning modeling method based on sample recommending and labeling |
CN103793510A (en) * | 2014-01-29 | 2014-05-14 | 苏州融希信息科技有限公司 | Classifier construction method based on active learning |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105022960B (en) * | 2015-08-10 | 2017-11-21 | 济南大学 | Multiple features mobile terminal from malicious software detecting method and system based on network traffics |
CN105022960A (en) * | 2015-08-10 | 2015-11-04 | 济南大学 | Multi-feature mobile terminal malicious software detecting method based on network flow and multi-feature mobile terminal malicious software detecting system based on network flow |
CN106959967A (en) * | 2016-01-12 | 2017-07-18 | 中国科学院声学研究所 | A kind of training of link prediction model and link prediction method |
CN106411775A (en) * | 2016-08-31 | 2017-02-15 | 国家计算机网络与信息安全管理中心 | Internet traffic classification sample labeling method |
CN106411775B (en) * | 2016-08-31 | 2019-06-14 | 国家计算机网络与信息安全管理中心 | A kind of internet traffic classification samples mask method |
CN106713324A (en) * | 2016-12-28 | 2017-05-24 | 北京奇艺世纪科技有限公司 | Flow detection method and device |
CN108197666A (en) * | 2018-01-30 | 2018-06-22 | 咪咕文化科技有限公司 | Image classification model processing method and device and storage medium |
US11586971B2 (en) | 2018-07-19 | 2023-02-21 | Hewlett Packard Enterprise Development Lp | Device identifier classification |
US12026597B2 (en) | 2018-07-19 | 2024-07-02 | Hewlett Packard Enterprise Development Lp | Device identifier classification |
WO2020062390A1 (en) * | 2018-09-25 | 2020-04-02 | 深圳先进技术研究院 | Network traffic classification method and system, and electronic device |
CN111126419B (en) * | 2018-10-30 | 2023-12-01 | 顺丰科技有限公司 | Dot clustering method and device |
CN111126419A (en) * | 2018-10-30 | 2020-05-08 | 顺丰科技有限公司 | Dot clustering method and device |
CN109376797A (en) * | 2018-11-20 | 2019-02-22 | 大连理工大学 | A kind of net flow assorted method based on binary coder and more Hash tables |
CN109450740A (en) * | 2018-12-21 | 2019-03-08 | 青岛理工大学 | SDN controller for carrying out traffic classification based on DPI and machine learning algorithm |
CN109922083A (en) * | 2019-04-10 | 2019-06-21 | 武汉金盛方圆网络科技发展有限公司 | A kind of network protocol flow control system |
CN110149280A (en) * | 2019-05-27 | 2019-08-20 | 中国科学技术大学 | Net flow assorted method and apparatus |
CN110149280B (en) * | 2019-05-27 | 2020-08-28 | 中国科学技术大学 | Network traffic classification method and device |
CN110445800B (en) * | 2019-08-15 | 2022-06-14 | 上海寰创通信科技股份有限公司 | Self-learning-based deep packet parsing system |
CN110445800A (en) * | 2019-08-15 | 2019-11-12 | 上海寰创通信科技股份有限公司 | A kind of deep message resolution system based on self study |
CN110753049A (en) * | 2019-10-21 | 2020-02-04 | 清华大学 | Safety situation sensing system based on industrial control network flow |
CN111983429A (en) * | 2020-08-19 | 2020-11-24 | Oppo广东移动通信有限公司 | Chip verification system, chip verification method, terminal and storage medium |
CN112637084A (en) * | 2020-12-10 | 2021-04-09 | 中山职业技术学院 | Distributed network flow novelty detection method and classifier |
CN112637084B (en) * | 2020-12-10 | 2022-09-23 | 中山职业技术学院 | Distributed network flow novelty detection method and classifier |
Also Published As
Publication number | Publication date |
---|---|
CN104767692B (en) | 2018-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104767692A (en) | Network traffic classification method | |
Kayacik et al. | A hierarchical SOM-based intrusion detection system | |
CN101841440B (en) | Peer-to-peer network flow identification method based on support vector machine and deep packet inspection | |
CN104102700A (en) | Categorizing method oriented to Internet unbalanced application flow | |
CN101996241A (en) | Bayesian algorithm-based content filtering method | |
CN107846326A (en) | A kind of adaptive semi-supervised net flow assorted method, system and equipment | |
CN105141455B (en) | A kind of net flow assorted modeling method of making an uproar based on statistical nature | |
CN114269007A (en) | Method, device and method storage medium for determining energy-saving strategy of base station | |
CN110266528B (en) | Traffic prediction method for Internet of vehicles communication based on machine learning | |
CN104883278A (en) | Method for classifying network equipment by utilizing machine learning | |
CN105873105A (en) | Method for anomaly detection and positioning of mobile communication network based on network experience quality | |
CN109981474A (en) | A kind of network flow fine grit classification system and method for application-oriented software | |
CN109167680A (en) | A kind of traffic classification method based on deep learning | |
CN109462853B (en) | Network capacity prediction method based on neural network model | |
CN108199863A (en) | A kind of net flow assorted method and system based on the study of two benches sequence signature | |
CN109547251B (en) | Service system fault and performance prediction method based on monitoring data | |
CN109995611B (en) | Traffic classification model establishing and traffic classification method, device, equipment and server | |
CN111478904A (en) | Method and device for detecting communication anomaly of Internet of things equipment based on concept drift | |
CN105791151A (en) | Dynamic flow control method and device | |
Binglei et al. | Fuzzy-logic-based traffic incident detection algorithm for freeway | |
CN110009005A (en) | A kind of net flow assorted method based on feature strong correlation | |
CN103634829B (en) | A kind of section screening technique based on drive test information and equipment | |
CN109450876B (en) | DDos identification method and system based on multi-dimensional state transition matrix characteristics | |
CN105553574A (en) | Support-vector-machine-based MAC protocol identification method in cognitive radio | |
CN101951330A (en) | Bidirectional joint detection device and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |