CN104767692A - Network traffic classification method - Google Patents

Network traffic classification method Download PDF

Info

Publication number
CN104767692A
CN104767692A CN201510176138.1A CN201510176138A CN104767692A CN 104767692 A CN104767692 A CN 104767692A CN 201510176138 A CN201510176138 A CN 201510176138A CN 104767692 A CN104767692 A CN 104767692A
Authority
CN
China
Prior art keywords
algorithm
net flow
training
classification
assorted method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510176138.1A
Other languages
Chinese (zh)
Other versions
CN104767692B (en
Inventor
张庚
孙勇
孙振超
张然
周禹
钟卓健
李思珍
汪洋
刘世栋
郭经红
苏斓
丁慧霞
王智慧
王妙心
李哲
高强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Beijing University of Posts and Telecommunications
China Electric Power Research Institute Co Ltd CEPRI
State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Beijing University of Posts and Telecommunications
China Electric Power Research Institute Co Ltd CEPRI
State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Beijing University of Posts and Telecommunications, China Electric Power Research Institute Co Ltd CEPRI, State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201510176138.1A priority Critical patent/CN104767692B/en
Publication of CN104767692A publication Critical patent/CN104767692A/en
Application granted granted Critical
Publication of CN104767692B publication Critical patent/CN104767692B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a network traffic classification method. The method comprises the steps that (1) a network traffic behavior characteristic set is extracted; (2) a classifier model is obtained, the behavior characteristic set is input in a classifier, classification training is carried out, and corresponding parameters are obtained; (3) the performance of the classifier is evaluated and optimized. According to the method, a non-supervision algorithm and a supervision algorithm in a machine learning method are combined to carry out classification. By means of the combination of the non-supervision algorithm and the supervision algorithm, under the premise that a high classification accuracy rate is ensured, the time and memory expenditure of a system is reduced, and the classification efficiency is improved. A clustering algorithm is improved, the clustering accuracy rate is increased, and therefore the overall performance is improved.

Description

A kind of net flow assorted method
Technical field
The present invention relates to a kind of net flow assorted method, be specifically related to a kind of net flow assorted method.
Background technology
Along with the extensive use of the expansion of internet scale and various high bandwidth, multiple types network service, Network data volume sharply rises, and network data flow intelligent management becomes more and more important.Prerequisite carries out Classification and Identification to data stream type exactly.
Traffic classification not only will ensure accuracy rate, and reduction time and memory cost also become study hotspot.In existing sorting technique, machine learning method is study hotspot, be divided into without supervise algorithm and have supervise algorithm, wherein carry out class bunch without supervise algorithm according to sample similarity to divide, cluster is that Application comparison is widely without supervise algorithm, mainly contain based on partition clustering, based on hierarchical clustering, density based cluster and the algorithm based on Grid Clustering, wherein k-means algorithm is the clustering algorithm be most widely used; There is supervise algorithm to pass through to train the sample set adjustment classifier parameters of known class, obtain sorter model.Mainly contain the methods such as neural net, categorised decision tree, bayesian theory, SVMs (SVM).Clustering algorithm accuracy rate is lower, the system time having supervise algorithm to consume and internal memory higher.
Summary of the invention
In order to overcome above-mentioned the deficiencies in the prior art, the invention provides a kind of net flow assorted method, the method, by cluster and the combination having supervise algorithm, can make time complexity reduce, and classification accuracy raises, and improves classification effectiveness.
In order to realize foregoing invention object, the present invention takes following technical scheme:
A kind of net flow assorted method, described method comprises the steps:
(1) network-flow characteristic feature set is extracted;
(2) obtain sorter model, by described behavioural characteristic collection input grader, carry out classification based training, obtain relevant parameter; Classifier performance is assessed, and Optimum Classification device performance.
In optimal technical scheme provided by the invention, described step (1) comprises the steps:
A. network traffics capture and screen the key message of unknown flow rate data;
B. process described key message, obtain effective data message more directly perceived, to represent the behavioural characteristic of unknown flow rate;
C. determine feature set dimension, integrate described behavioural characteristic, avoid information redundancy, and retain the information of effective identification data stream.
In second optimal technical scheme provided by the invention, described key message comprises data traffic types, data pack protocol, time and type.
In 3rd optimal technical scheme provided by the invention, described step (2) comprises the steps:
A. the network-flow characteristic feature set that selected part has extracted carries out classification mark, as the behavioural characteristic newly added, obtains training set and the test set of grader training;
B. training set is carried out clustering algorithm, obtain new training set;
C. exercise supervision new training set algorithm, determines sorting criterion;
D. test set input grader carries out sorting criterion test, and carries out accuracy rate assessment;
E. whether judging nicety rate meets the requirements, and is, terminates; Otherwise return step B, meet the requirements to accuracy rate.
In 4th optimal technical scheme provided by the invention, described step B comprises the steps:
(a) clustering algorithm parameter initialization, training set input is trained;
B () judges whether cluster centre restrains, and is, performs step (c), otherwise perform step (a);
C the training of () cluster terminates after, preserve cluster centre and convergence radius, adjusting training collection also removes the sample of cluster, forms new training set.
In 5th optimal technical scheme provided by the invention, involved step (C) comprises the steps:
A () supervise algorithm parameter initialization, inputs new training set and trains;
Whether (b) evaluation algorithm restrains, and is, performs step (c), otherwise performs step (a);
C () supervise algorithm parameter is determined, sorting criterion is determined.
In 6th optimal technical scheme provided by the invention, described training set comprises the feature set that part has marked and the feature set partly do not marked.
In 7th optimal technical scheme provided by the invention, described test set comprises the feature set that part has marked and the feature set partly do not marked.
In 8th optimal technical scheme provided by the invention, described classification mark adopts deep packet inspection technical.
In 9th optimal technical scheme provided by the invention, described classifier performance assesses the assessment comprised grader accuracy rate, system time and memory cost.
Compared with prior art, beneficial effect of the present invention is:
The present invention is in conjunction with nothing supervision in machine learning method and have supervision two kinds of algorithms to classify.The two combines and ensureing to reduce system time and memory cost under the prerequisite compared with high-class accuracy rate, can promote classification effectiveness.
Whether the present invention is improved clustering algorithm, adds the sample point marked in cluster process, can detect cluster centre and choose proper, round-off error and mistake, reduce the center of a sample's point not meeting actual conditions, promote cluster accuracy rate, thus improve overall performance.
Accompanying drawing explanation
Fig. 1 is a kind of net flow assorted method flow diagram
Fig. 2 is the schematic flow sheet that network-flow characteristic feature set is extracted
Fig. 3 is that sorter model obtains and Optimizing Flow figure
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
As shown in Figure 1, a kind of based on cluster and the net flow assorted method having supervise algorithm, step is as follows:
Step S101: network data flow extracts behavioural characteristic;
Step S102: obtain sorter model, by above-mentioned behavioural characteristic collection input grader, carries out grader training, obtains relevant parameter;
Step S103: classifier performance is assessed, and comprises accuracy rate, system time and memory cost etc., test set input grader is classified, according to accuracy rate adjustment classifier algorithm and parameter, Optimum Classification device performance.
As shown in Figure 2, network-flow characteristic feature set extracting method, step is as follows:
Step S201: network traffics capture and screen the key message of unknown flow rate data, and key message comprises data traffic types, data pack protocol, time and size;
Step S202: to above-mentioned key message process, obtains effective data message more directly perceived, to represent the behavioural characteristic of unknown flow rate;
Step S203: determine feature set dimension, integrates above-mentioned behavioural characteristic collection, avoids information redundancy, and retains the information of effective identification data stream.
As shown in Figure 3, sorter model obtains and optimization method, and step is as follows:
Step S301: the network-flow characteristic feature set that selected part has extracted carries out classification mark, as the behavioural characteristic newly added, obtains training set and the test set of grader training; Wherein mask method can adopt artificial or DPI (deep packet inspection technical, flow detection and control based on application layer) method;
Step S302: clustering algorithm parameter initialization, training set input is trained;
Step S303: judge whether cluster centre restrains, and is, performs S304, otherwise perform step S302;
Specific algorithm is: the sample that random selecting part marked trains clustering algorithm jointly with the sample do not marked in a large number, choose cluster centre, whether the sample point marked can detect center of a sample's point and choose proper, round-off error and mistake, reduce because of without supervision, only obtain with clustering criteria center of a sample's point that result obtains not meeting actual conditions.According to clustering criteria repetitive exercise sample set, end condition is algorithmic statement.Now cluster centre and suitable cluster scope are determined, cluster scope adopts convergence radius to represent;
Step S304: preserve cluster centre, convergence radius, adjusting training collection, removes the sample point of cluster and preserve;
After cluster training terminates, preserve cluster centre and convergence radius, as the cluster judgment foundation of unknown flow rate Data classification, remove the sample point that cluster is crossed, the data volume having supervise algorithm to train decreases, if the complexity of SVM algorithm is at O (n 2) ~ O (n 3) between, n is number of samples, has supervise algorithm complexity greatly to reduce like this, can training for promotion efficiency;
Step S305: have supervise algorithm parameter initialization, inputs new training set and trains;
Step S306: whether evaluation algorithm restrains, and is, performs S307, otherwise perform S305;
Step S307: have supervise algorithm parameter to determine, sorting criterion is determined;
Step S308: test set input grader carries out class test, and carries out accuracy rate assessment;
Step S309: whether judging nicety rate meets the requirements, does not meet and then returns S302, meet the requirements, be mainly classification accuracy higher to classifying quality; Satisfied then determine that this sorter model is feasible, terminate algorithm.
Finally should be noted that: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although with reference to above-described embodiment to invention has been detailed description, those of ordinary skill in the field are to be understood that: still can modify to the specific embodiment of the present invention or equivalent replacement, and not departing from any amendment of spirit and scope of the invention or equivalent replacement, it all should be encompassed in the middle of right of the present invention.

Claims (10)

1. a net flow assorted method, is characterized in that, described method comprises the steps:
(1) network-flow characteristic feature set is extracted;
(2) obtain sorter model, by described behavioural characteristic collection input grader, carry out classification based training, obtain relevant parameter; Classifier performance is assessed, and Optimum Classification device performance.
2. net flow assorted method according to claim 1, it is characterized in that, described step (1) comprises the steps:
A. network traffics capture and screen the key message of unknown flow rate data;
B. process described key message, obtain effective data message more directly perceived, to represent the behavioural characteristic of unknown flow rate;
C. determine feature set dimension, integrate described behavioural characteristic, avoid information redundancy, and retain the information of effective identification data stream.
3. net flow assorted method according to claim 2, it is characterized in that, described key message comprises data traffic types, data pack protocol, time and type.
4. net flow assorted method according to claim 1, it is characterized in that, described step (2) comprises the steps:
A. the network-flow characteristic feature set that selected part has extracted carries out classification mark, as the behavioural characteristic newly added, obtains training set and the test set of grader training;
B. training set is carried out clustering algorithm, obtain new training set;
C. exercise supervision new training set algorithm, determines sorting criterion;
D. test set input grader carries out sorting criterion test, and carries out accuracy rate assessment;
E. whether judging nicety rate meets the requirements, and is, terminates; Otherwise return step B, meet the requirements to accuracy rate.
5. net flow assorted method according to claim 4, it is characterized in that, described step B comprises the steps:
(a) clustering algorithm parameter initialization, training set input is trained;
B () judges whether cluster centre restrains, and is, performs step (c), otherwise perform step (a);
C the training of () cluster terminates after, preserve cluster centre and convergence radius, adjusting training collection also removes the sample of cluster, forms new training set.
6. net flow assorted method according to claim 4, it is characterized in that, involved step (C) comprises the steps:
A () supervise algorithm parameter initialization, inputs new training set and trains;
Whether (b) evaluation algorithm restrains, and is, performs step (c), otherwise performs step (a);
C () supervise algorithm parameter is determined, sorting criterion is determined.
7. net flow assorted method according to claim 4, is characterized in that, described training set comprises feature set that part marked and the feature set that part does not mark.
8. net flow assorted method according to claim 4, is characterized in that, described test set comprises feature set that part marked and the feature set that part does not mark.
9. net flow assorted method according to claim 4, is characterized in that, described classification mark adopts deep packet inspection technical.
10. net flow assorted method according to claim 1, it is characterized in that, described classifier performance assesses the assessment comprised grader accuracy rate, system time and memory cost.
CN201510176138.1A 2015-04-15 2015-04-15 A kind of net flow assorted method Active CN104767692B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510176138.1A CN104767692B (en) 2015-04-15 2015-04-15 A kind of net flow assorted method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510176138.1A CN104767692B (en) 2015-04-15 2015-04-15 A kind of net flow assorted method

Publications (2)

Publication Number Publication Date
CN104767692A true CN104767692A (en) 2015-07-08
CN104767692B CN104767692B (en) 2018-05-29

Family

ID=53649314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510176138.1A Active CN104767692B (en) 2015-04-15 2015-04-15 A kind of net flow assorted method

Country Status (1)

Country Link
CN (1) CN104767692B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022960A (en) * 2015-08-10 2015-11-04 济南大学 Multi-feature mobile terminal malicious software detecting method based on network flow and multi-feature mobile terminal malicious software detecting system based on network flow
CN106411775A (en) * 2016-08-31 2017-02-15 国家计算机网络与信息安全管理中心 Internet traffic classification sample labeling method
CN106713324A (en) * 2016-12-28 2017-05-24 北京奇艺世纪科技有限公司 Flow detection method and device
CN106959967A (en) * 2016-01-12 2017-07-18 中国科学院声学研究所 A kind of training of link prediction model and link prediction method
CN108197666A (en) * 2018-01-30 2018-06-22 咪咕文化科技有限公司 Image classification model processing method and device and storage medium
CN109376797A (en) * 2018-11-20 2019-02-22 大连理工大学 A kind of net flow assorted method based on binary coder and more Hash tables
CN109450740A (en) * 2018-12-21 2019-03-08 青岛理工大学 SDN controller for carrying out traffic classification based on DPI and machine learning algorithm
CN109922083A (en) * 2019-04-10 2019-06-21 武汉金盛方圆网络科技发展有限公司 A kind of network protocol flow control system
CN110149280A (en) * 2019-05-27 2019-08-20 中国科学技术大学 Net flow assorted method and apparatus
CN110445800A (en) * 2019-08-15 2019-11-12 上海寰创通信科技股份有限公司 A kind of deep message resolution system based on self study
CN110753049A (en) * 2019-10-21 2020-02-04 清华大学 Safety situation sensing system based on industrial control network flow
WO2020062390A1 (en) * 2018-09-25 2020-04-02 深圳先进技术研究院 Network traffic classification method and system, and electronic device
CN111126419A (en) * 2018-10-30 2020-05-08 顺丰科技有限公司 Dot clustering method and device
CN111983429A (en) * 2020-08-19 2020-11-24 Oppo广东移动通信有限公司 Chip verification system, chip verification method, terminal and storage medium
CN112637084A (en) * 2020-12-10 2021-04-09 中山职业技术学院 Distributed network flow novelty detection method and classifier
US11586971B2 (en) 2018-07-19 2023-02-21 Hewlett Packard Enterprise Development Lp Device identifier classification

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101060443A (en) * 2006-04-17 2007-10-24 中国科学院自动化研究所 An improved adaptive boosting algorithm based Internet intrusion detection method
CN103150454A (en) * 2013-03-27 2013-06-12 山东大学 Dynamic machine learning modeling method based on sample recommending and labeling
CN103793510A (en) * 2014-01-29 2014-05-14 苏州融希信息科技有限公司 Classifier construction method based on active learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101060443A (en) * 2006-04-17 2007-10-24 中国科学院自动化研究所 An improved adaptive boosting algorithm based Internet intrusion detection method
CN103150454A (en) * 2013-03-27 2013-06-12 山东大学 Dynamic machine learning modeling method based on sample recommending and labeling
CN103793510A (en) * 2014-01-29 2014-05-14 苏州融希信息科技有限公司 Classifier construction method based on active learning

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022960B (en) * 2015-08-10 2017-11-21 济南大学 Multiple features mobile terminal from malicious software detecting method and system based on network traffics
CN105022960A (en) * 2015-08-10 2015-11-04 济南大学 Multi-feature mobile terminal malicious software detecting method based on network flow and multi-feature mobile terminal malicious software detecting system based on network flow
CN106959967A (en) * 2016-01-12 2017-07-18 中国科学院声学研究所 A kind of training of link prediction model and link prediction method
CN106411775A (en) * 2016-08-31 2017-02-15 国家计算机网络与信息安全管理中心 Internet traffic classification sample labeling method
CN106411775B (en) * 2016-08-31 2019-06-14 国家计算机网络与信息安全管理中心 A kind of internet traffic classification samples mask method
CN106713324A (en) * 2016-12-28 2017-05-24 北京奇艺世纪科技有限公司 Flow detection method and device
CN108197666A (en) * 2018-01-30 2018-06-22 咪咕文化科技有限公司 Image classification model processing method and device and storage medium
US11586971B2 (en) 2018-07-19 2023-02-21 Hewlett Packard Enterprise Development Lp Device identifier classification
US12026597B2 (en) 2018-07-19 2024-07-02 Hewlett Packard Enterprise Development Lp Device identifier classification
WO2020062390A1 (en) * 2018-09-25 2020-04-02 深圳先进技术研究院 Network traffic classification method and system, and electronic device
CN111126419B (en) * 2018-10-30 2023-12-01 顺丰科技有限公司 Dot clustering method and device
CN111126419A (en) * 2018-10-30 2020-05-08 顺丰科技有限公司 Dot clustering method and device
CN109376797A (en) * 2018-11-20 2019-02-22 大连理工大学 A kind of net flow assorted method based on binary coder and more Hash tables
CN109450740A (en) * 2018-12-21 2019-03-08 青岛理工大学 SDN controller for carrying out traffic classification based on DPI and machine learning algorithm
CN109922083A (en) * 2019-04-10 2019-06-21 武汉金盛方圆网络科技发展有限公司 A kind of network protocol flow control system
CN110149280A (en) * 2019-05-27 2019-08-20 中国科学技术大学 Net flow assorted method and apparatus
CN110149280B (en) * 2019-05-27 2020-08-28 中国科学技术大学 Network traffic classification method and device
CN110445800B (en) * 2019-08-15 2022-06-14 上海寰创通信科技股份有限公司 Self-learning-based deep packet parsing system
CN110445800A (en) * 2019-08-15 2019-11-12 上海寰创通信科技股份有限公司 A kind of deep message resolution system based on self study
CN110753049A (en) * 2019-10-21 2020-02-04 清华大学 Safety situation sensing system based on industrial control network flow
CN111983429A (en) * 2020-08-19 2020-11-24 Oppo广东移动通信有限公司 Chip verification system, chip verification method, terminal and storage medium
CN112637084A (en) * 2020-12-10 2021-04-09 中山职业技术学院 Distributed network flow novelty detection method and classifier
CN112637084B (en) * 2020-12-10 2022-09-23 中山职业技术学院 Distributed network flow novelty detection method and classifier

Also Published As

Publication number Publication date
CN104767692B (en) 2018-05-29

Similar Documents

Publication Publication Date Title
CN104767692A (en) Network traffic classification method
Kayacik et al. A hierarchical SOM-based intrusion detection system
CN101841440B (en) Peer-to-peer network flow identification method based on support vector machine and deep packet inspection
CN104102700A (en) Categorizing method oriented to Internet unbalanced application flow
CN101996241A (en) Bayesian algorithm-based content filtering method
CN107846326A (en) A kind of adaptive semi-supervised net flow assorted method, system and equipment
CN105141455B (en) A kind of net flow assorted modeling method of making an uproar based on statistical nature
CN114269007A (en) Method, device and method storage medium for determining energy-saving strategy of base station
CN110266528B (en) Traffic prediction method for Internet of vehicles communication based on machine learning
CN104883278A (en) Method for classifying network equipment by utilizing machine learning
CN105873105A (en) Method for anomaly detection and positioning of mobile communication network based on network experience quality
CN109981474A (en) A kind of network flow fine grit classification system and method for application-oriented software
CN109167680A (en) A kind of traffic classification method based on deep learning
CN109462853B (en) Network capacity prediction method based on neural network model
CN108199863A (en) A kind of net flow assorted method and system based on the study of two benches sequence signature
CN109547251B (en) Service system fault and performance prediction method based on monitoring data
CN109995611B (en) Traffic classification model establishing and traffic classification method, device, equipment and server
CN111478904A (en) Method and device for detecting communication anomaly of Internet of things equipment based on concept drift
CN105791151A (en) Dynamic flow control method and device
Binglei et al. Fuzzy-logic-based traffic incident detection algorithm for freeway
CN110009005A (en) A kind of net flow assorted method based on feature strong correlation
CN103634829B (en) A kind of section screening technique based on drive test information and equipment
CN109450876B (en) DDos identification method and system based on multi-dimensional state transition matrix characteristics
CN105553574A (en) Support-vector-machine-based MAC protocol identification method in cognitive radio
CN101951330A (en) Bidirectional joint detection device and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant