CN104767692B - A kind of net flow assorted method - Google Patents

A kind of net flow assorted method Download PDF

Info

Publication number
CN104767692B
CN104767692B CN201510176138.1A CN201510176138A CN104767692B CN 104767692 B CN104767692 B CN 104767692B CN 201510176138 A CN201510176138 A CN 201510176138A CN 104767692 B CN104767692 B CN 104767692B
Authority
CN
China
Prior art keywords
training
net flow
feature set
algorithm
assorted method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510176138.1A
Other languages
Chinese (zh)
Other versions
CN104767692A (en
Inventor
张庚
孙勇
孙振超
张然
周禹
钟卓健
李思珍
汪洋
刘世栋
郭经红
苏斓
丁慧霞
王智慧
王妙心
李哲
高强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Beijing University of Posts and Telecommunications
China Electric Power Research Institute Co Ltd CEPRI
State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Beijing University of Posts and Telecommunications
China Electric Power Research Institute Co Ltd CEPRI
State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Beijing University of Posts and Telecommunications, China Electric Power Research Institute Co Ltd CEPRI, State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201510176138.1A priority Critical patent/CN104767692B/en
Publication of CN104767692A publication Critical patent/CN104767692A/en
Application granted granted Critical
Publication of CN104767692B publication Critical patent/CN104767692B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present invention provides a kind of net flow assorted method, and the described method includes (1) to extract network-flow characteristic feature set;(2) sorter model is obtained, the behavioural characteristic collection is inputted into grader, classification based training is carried out, obtains relevant parameter;Classifier performance is assessed, and Optimum Classification device performance.The present invention combines unsupervised in machine learning method and has two kinds of algorithms of supervision to classify.The two combines can reduce system time and memory overhead on the premise of ensureing compared with high-class accuracy rate, promote classification effectiveness.Clustering algorithm is improved, cluster accuracy rate is promoted, so as to improve overall performance.

Description

A kind of net flow assorted method
Technical field
The present invention relates to a kind of net flow assorted methods, and in particular to a kind of net flow assorted method.
Background technology
With the expansion of internet scale and the extensive use of various high bandwidths, multiple types network service, network service number It is steeply risen according to amount, network data flow intelligent management becomes more and more important.Premise is exactly to carry out Classification and Identification to data stream type.
Traffic classification will not only ensure accuracy rate, and reducing time and memory overhead also becomes research hotspot.Existing classification skill Machine learning method is research hotspot in art, is divided into unsupervised algorithm and has supervision algorithm, wherein unsupervised algorithm is according to sample Similitude carries out class cluster division, and cluster is using relatively more extensive unsupervised algorithm, is mainly had based on partition clustering, based on level Cluster, the algorithm based on Density Clustering and based on Grid Clustering, wherein k-means algorithms are that the cluster being most widely used is calculated Method;There is supervision algorithm to adjust classifier parameters by the sample set of training known class, obtain sorter model.Mainly there is nerve The methods of network, categorised decision tree, bayesian theory, support vector machines (SVM).Clustering algorithm accuracy rate is relatively low, has supervision to calculate The system time and memory of method consumption are higher.
The content of the invention
In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides a kind of net flow assorted method, and this method passes through The combination of supervision algorithm is clustered and had, time complexity can be reduced, classification accuracy rise improves classification effectiveness.
In order to realize foregoing invention purpose, the present invention adopts the following technical scheme that:
A kind of net flow assorted method, described method includes following steps:
(1) network-flow characteristic feature set is extracted;
(2) sorter model is obtained, the behavioural characteristic collection is inputted into grader, classification based training is carried out, is accordingly joined Number;Classifier performance is assessed, and Optimum Classification device performance.
In optimal technical scheme provided by the invention, the step (1) includes the following steps:
A. network traffics capture and screen the key message of unknown flow rate data;
B. the key message is handled, effective data message more directly perceived is obtained, to represent the behavioural characteristic of unknown flow rate;
C. determine feature set dimension, integrate the behavioural characteristic, avoid information redundancy, and retain and effectively identify data flow Information.
In second optimal technical scheme provided by the invention, the key message includes data traffic types, data packet is assisted View, time and type.
In 3rd optimal technical scheme provided by the invention, the step (2) includes the following steps:
A. the extracted network-flow characteristic feature set of selected part carries out classification mark, as the behavior newly added in Feature obtains the training set and test set of classifier training;
B. training set is subjected to clustering algorithm, obtains new training set;
C. new training set is exercised supervision algorithm, determines sorting criterion;
D. test set input grader carries out sorting criterion test, and carries out accuracy rate assessment;
E. whether judging nicety rate meets the requirements, and is, terminates;Otherwise return to step B, until accuracy rate meets the requirements.
In 4th optimal technical scheme provided by the invention, the step B includes the following steps:
(a) clustering algorithm parameter initialization, training set input are trained;
(b) judge whether cluster centre restrains, be, perform step (c), otherwise perform step (a);
(c) after cluster training, cluster centre and convergence radius are preserved, adjusting training collection simultaneously removes the sample clustered This, forms new training set.
In 5th optimal technical scheme provided by the invention, involved step (C) includes the following steps:
(a) algorithm parameter initialization is supervised, new training set is inputted and is trained;
(b) judge whether algorithm restrains, be, perform step (c), otherwise perform step (a);
(c) supervision algorithm parameter determines that sorting criterion determines.
In 6th optimal technical scheme provided by the invention, the training set includes feature set and the part that part has marked The feature set not marked.
In 7th optimal technical scheme provided by the invention, the test set includes feature set and the part that part has marked The feature set not marked.
In 8th optimal technical scheme provided by the invention, the classification mark is using deep packet inspection technical.
In 9th optimal technical scheme provided by the invention, classifier performance assessment include to grader accuracy rate, The assessment of system time and memory overhead.
Compared with prior art, the beneficial effects of the present invention are:
The present invention combines unsupervised in machine learning method and has two kinds of algorithms of supervision to classify.The two combines can be Ensure to reduce system time and memory overhead on the premise of compared with high-class accuracy rate, promote classification effectiveness.
The present invention improves clustering algorithm, and the sample point marked is added in cluster process, can detect poly- It is proper whether class center is chosen, and corrects error and mistake, reduces the center of a sample's point for not meeting actual conditions, and it is accurate to promote cluster Rate, so as to improve overall performance.
Description of the drawings
Fig. 1 is a kind of net flow assorted method flow diagram
Fig. 2 is the flow diagram of network-flow characteristic feature set extraction
Fig. 3 is that sorter model obtains and Optimizing Flow figure
Specific embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
As shown in Figure 1, a kind of based on clustering and having the net flow assorted method for supervising algorithm, step is as follows:
Step S101:Network data flow extracts behavioural characteristic;
Step S102:Sorter model is obtained, above-mentioned behavioural characteristic collection is inputted into grader, classifier training is carried out, obtains To relevant parameter;
Step S103:Classifier performance is assessed, including accuracy rate, system time and memory overhead etc., test set input point Class device is classified, and classifier algorithm and parameter, Optimum Classification device performance are adjusted according to accuracy rate.
As shown in Fig. 2, network-flow characteristic feature set extracting method, step are as follows:
Step S201:Network traffics capture and screen the key message of unknown flow rate data, and key message includes data industry Service type, data pack protocol, time and size;
Step S202:To the processing of above-mentioned key message, effective data message more directly perceived is obtained, to represent unknown flow rate Behavioural characteristic;
Step S203:It determines feature set dimension, integrates above-mentioned behavioural characteristic collection, avoid information redundancy, and retain and effectively know The information of other data flow.
As shown in figure 3, sorter model obtains and optimization method, step are as follows:
Step S301:The extracted network-flow characteristic feature set of selected part carries out classification mark, as what is newly added in One behavioural characteristic obtains the training set and test set of classifier training;Wherein artificial or DPI (depth can be used in mask method Packet inspection technical, flow detection and control based on application layer) method;
Step S302:Clustering algorithm parameter initialization, training set input are trained;
Step S303:Judge whether cluster centre restrains, be, perform S304, otherwise perform step S302;
Specific algorithm is:Randomly select sample and the training cluster calculation jointly of a large amount of samples not marked that part marked Method chooses cluster centre, and the sample point marked can detect whether center of a sample's point chooses proper, amendment error and mistake, It reduces because unsupervised, only obtains center of a sample's point that result obtains not meeting actual conditions with clustering criteria.According to clustering criteria Repetitive exercise sample set, end condition are algorithmic statement.Cluster centre and suitable cluster scope determine at this time, and cluster scope is adopted It is represented with convergence radius;
Step S304:Cluster centre, convergence radius are preserved, adjusting training collection removes the sample point clustered and preservation;
After cluster training, cluster centre and convergence radius are preserved, the cluster judgment as the classification of unknown flow rate data Foundation removes the sample point clustered, and the data volume for having supervision Algorithm for Training decreases, if the complexity of SVM algorithm is in O (n2)~O (n3) between, n is number of samples, so has supervision algorithm complexity to be greatly lowered, can be with training for promotion efficiency;
Step S305:There is supervision algorithm parameter initialization, input new training set and be trained;
Step S306:Judge whether algorithm restrains, be, perform S307, otherwise perform S305;
Step S307:There is supervision algorithm parameter to determine that sorting criterion determines;
Step S308:Test set input grader carries out class test, and carries out accuracy rate assessment;
Step S309:Whether judging nicety rate meets the requirements, and is unsatisfactory for, and returns to S302, until classifying quality meets the requirements, Predominantly classification accuracy is higher;Meet and then determine that the sorter model is feasible, terminate algorithm.
Finally it should be noted that:The above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, to the greatest extent Pipe is described in detail the present invention with reference to above-described embodiment, those of ordinary skills in the art should understand that:Still The specific embodiment of the present invention can be modified or replaced equivalently, and without departing from any of spirit and scope of the invention Modification or equivalent substitution, should be covered by the scope of the claims of the present invention.

Claims (8)

  1. A kind of 1. net flow assorted method, which is characterized in that described method includes following steps:
    (1) network-flow characteristic feature set is extracted;
    (2) sorter model is obtained, the behavioural characteristic collection is inputted into grader, classification based training is carried out, obtains relevant parameter;
    Classifier performance is assessed, and Optimum Classification device performance;
    The step (1) includes the following steps:
    A. network traffics capture and screen the key message of unknown flow rate data;
    B. the key message is handled, effective data message more directly perceived is obtained, to represent the behavioural characteristic of unknown flow rate;
    C. determine feature set dimension, integrate the behavioural characteristic, avoid information redundancy, and retain the letter of effectively identification data flow Breath;
    The step (2) includes the following steps:
    A. the extracted network-flow characteristic feature set of selected part carries out classification mark, as the behavior spy newly added in Sign, obtains the training set and test set of classifier training;
    B. training set is subjected to clustering algorithm, obtains new training set;
    C. new training set is exercised supervision algorithm, determines sorting criterion;
    D. test set input grader carries out sorting criterion test, and carries out accuracy rate assessment;
    E. whether judging nicety rate meets the requirements, and is, terminates;Otherwise return to step B, until accuracy rate meets the requirements.
  2. 2. net flow assorted method according to claim 1, which is characterized in that the key message includes classes of data traffic Type, data pack protocol, time and type.
  3. 3. net flow assorted method according to claim 1, which is characterized in that the step B includes the following steps:
    (a) clustering algorithm parameter initialization, training set input are trained;
    (b) judge whether cluster centre restrains, be, perform step (c), otherwise perform step (a);
    (c) after cluster training, cluster centre and convergence radius are preserved, adjusting training collection simultaneously removes the sample clustered, shape The training set of Cheng Xin.
  4. 4. net flow assorted method according to claim 1, which is characterized in that involved step (C) includes the following steps:
    (a) algorithm parameter initialization is supervised, new training set is inputted and is trained;
    (b) judge whether algorithm restrains, be, perform step (c), otherwise perform step (a);
    (c) supervision algorithm parameter determines that sorting criterion determines.
  5. 5. net flow assorted method according to claim 1, which is characterized in that the training set includes what part had marked The feature set that feature set and part do not mark.
  6. 6. net flow assorted method according to claim 1, which is characterized in that the test set includes what part had marked The feature set that feature set and part do not mark.
  7. 7. net flow assorted method according to claim 1, which is characterized in that the classification mark is using deep-packet detection Technology.
  8. 8. net flow assorted method according to claim 1, which is characterized in that the classifier performance assessment is included to dividing The assessment of class device accuracy rate, system time and memory overhead.
CN201510176138.1A 2015-04-15 2015-04-15 A kind of net flow assorted method Active CN104767692B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510176138.1A CN104767692B (en) 2015-04-15 2015-04-15 A kind of net flow assorted method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510176138.1A CN104767692B (en) 2015-04-15 2015-04-15 A kind of net flow assorted method

Publications (2)

Publication Number Publication Date
CN104767692A CN104767692A (en) 2015-07-08
CN104767692B true CN104767692B (en) 2018-05-29

Family

ID=53649314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510176138.1A Active CN104767692B (en) 2015-04-15 2015-04-15 A kind of net flow assorted method

Country Status (1)

Country Link
CN (1) CN104767692B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022960B (en) * 2015-08-10 2017-11-21 济南大学 Multiple features mobile terminal from malicious software detecting method and system based on network traffics
CN106959967B (en) * 2016-01-12 2019-11-19 中国科学院声学研究所 A kind of training and link prediction method of link prediction model
CN106411775B (en) * 2016-08-31 2019-06-14 国家计算机网络与信息安全管理中心 A kind of internet traffic classification samples mask method
CN106713324B (en) * 2016-12-28 2020-03-06 北京奇艺世纪科技有限公司 Flow detection method and device
CN108197666A (en) * 2018-01-30 2018-06-22 咪咕文化科技有限公司 A kind of processing method, device and the storage medium of image classification model
US11586971B2 (en) 2018-07-19 2023-02-21 Hewlett Packard Enterprise Development Lp Device identifier classification
CN109309630B (en) * 2018-09-25 2021-09-21 深圳先进技术研究院 Network traffic classification method and system and electronic equipment
CN111126419B (en) * 2018-10-30 2023-12-01 顺丰科技有限公司 Dot clustering method and device
CN109376797B (en) * 2018-11-20 2023-05-16 大连理工大学 Network traffic classification method based on binary encoder and multi-hash table
CN109450740A (en) * 2018-12-21 2019-03-08 青岛理工大学 A kind of SDN controller carrying out traffic classification based on DPI and machine learning algorithm
CN109922083B (en) * 2019-04-10 2021-01-05 武汉金盛方圆网络科技发展有限公司 Network protocol flow control system
CN110149280B (en) * 2019-05-27 2020-08-28 中国科学技术大学 Network traffic classification method and device
CN110445800B (en) * 2019-08-15 2022-06-14 上海寰创通信科技股份有限公司 Self-learning-based deep packet parsing system
CN110753049B (en) * 2019-10-21 2021-04-13 清华大学 Safety situation sensing system based on industrial control network flow
CN111983429B (en) * 2020-08-19 2023-07-18 Oppo广东移动通信有限公司 Chip verification system, chip verification method, terminal and storage medium
CN112637084B (en) * 2020-12-10 2022-09-23 中山职业技术学院 Distributed network flow novelty detection method and classifier

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101060443A (en) * 2006-04-17 2007-10-24 中国科学院自动化研究所 An improved adaptive boosting algorithm based Internet intrusion detection method
CN103150454A (en) * 2013-03-27 2013-06-12 山东大学 Dynamic machine learning modeling method based on sample recommending and labeling
CN103793510A (en) * 2014-01-29 2014-05-14 苏州融希信息科技有限公司 Classifier construction method based on active learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101060443A (en) * 2006-04-17 2007-10-24 中国科学院自动化研究所 An improved adaptive boosting algorithm based Internet intrusion detection method
CN103150454A (en) * 2013-03-27 2013-06-12 山东大学 Dynamic machine learning modeling method based on sample recommending and labeling
CN103793510A (en) * 2014-01-29 2014-05-14 苏州融希信息科技有限公司 Classifier construction method based on active learning

Also Published As

Publication number Publication date
CN104767692A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN104767692B (en) A kind of net flow assorted method
CN106817248B (en) APT attack detection method
CN100536411C (en) An improved adaptive boosting algorithm based Internet intrusion detection method
CN112381121A (en) Unknown class network flow detection and identification method based on twin network
CN109818793A (en) For the device type identification of Internet of Things and network inbreak detection method
WO2022037130A1 (en) Network traffic anomaly detection method and apparatus, and electronic apparatus and storage medium
CN110311829A (en) A kind of net flow assorted method accelerated based on machine learning
CN103903441B (en) Road traffic state distinguishing method based on semi-supervised learning
CN102176698A (en) Method for detecting abnormal behaviors of user based on transfer learning
CN105141455B (en) A kind of net flow assorted modeling method of making an uproar based on statistical nature
CN107579846B (en) Cloud computing fault data detection method and system
CN107360152A (en) A kind of Web based on semantic analysis threatens sensory perceptual system
CN103780588A (en) User abnormal behavior detection method in digital home network
WO2020001311A1 (en) Method for detecting interference, apparatus, device, and storage medium
CN109143848A (en) Industrial control system intrusion detection method based on FCM-GASVM
CN111107077B (en) SVM-based attack flow classification method
CN109995611B (en) Traffic classification model establishing and traffic classification method, device, equipment and server
CN107895171A (en) A kind of intrusion detection method based on K averages Yu depth confidence network
CN109286576A (en) A kind of network agent encryption traffic characteristic extracting method of data packet frequency analysis
CN110009005A (en) A kind of net flow assorted method based on feature strong correlation
CN108241662A (en) The optimization method and device of data mark
CN109660656A (en) A kind of intelligent terminal method for identifying application program
CN109450876B (en) DDos identification method and system based on multi-dimensional state transition matrix characteristics
CN102984131B (en) A kind of information identifying method and device
CN103853720B (en) User attention based network sensitive information monitoring system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant