CN104767692B - A kind of net flow assorted method - Google Patents
A kind of net flow assorted method Download PDFInfo
- Publication number
- CN104767692B CN104767692B CN201510176138.1A CN201510176138A CN104767692B CN 104767692 B CN104767692 B CN 104767692B CN 201510176138 A CN201510176138 A CN 201510176138A CN 104767692 B CN104767692 B CN 104767692B
- Authority
- CN
- China
- Prior art keywords
- training
- net flow
- feature set
- algorithm
- assorted method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 36
- 230000003542 behavioural effect Effects 0.000 claims abstract description 12
- 238000012360 testing method Methods 0.000 claims description 12
- 230000006399 behavior Effects 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 abstract description 3
- 238000012706 support-vector machine Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention provides a kind of net flow assorted method, and the described method includes (1) to extract network-flow characteristic feature set;(2) sorter model is obtained, the behavioural characteristic collection is inputted into grader, classification based training is carried out, obtains relevant parameter;Classifier performance is assessed, and Optimum Classification device performance.The present invention combines unsupervised in machine learning method and has two kinds of algorithms of supervision to classify.The two combines can reduce system time and memory overhead on the premise of ensureing compared with high-class accuracy rate, promote classification effectiveness.Clustering algorithm is improved, cluster accuracy rate is promoted, so as to improve overall performance.
Description
Technical field
The present invention relates to a kind of net flow assorted methods, and in particular to a kind of net flow assorted method.
Background technology
With the expansion of internet scale and the extensive use of various high bandwidths, multiple types network service, network service number
It is steeply risen according to amount, network data flow intelligent management becomes more and more important.Premise is exactly to carry out Classification and Identification to data stream type.
Traffic classification will not only ensure accuracy rate, and reducing time and memory overhead also becomes research hotspot.Existing classification skill
Machine learning method is research hotspot in art, is divided into unsupervised algorithm and has supervision algorithm, wherein unsupervised algorithm is according to sample
Similitude carries out class cluster division, and cluster is using relatively more extensive unsupervised algorithm, is mainly had based on partition clustering, based on level
Cluster, the algorithm based on Density Clustering and based on Grid Clustering, wherein k-means algorithms are that the cluster being most widely used is calculated
Method;There is supervision algorithm to adjust classifier parameters by the sample set of training known class, obtain sorter model.Mainly there is nerve
The methods of network, categorised decision tree, bayesian theory, support vector machines (SVM).Clustering algorithm accuracy rate is relatively low, has supervision to calculate
The system time and memory of method consumption are higher.
The content of the invention
In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides a kind of net flow assorted method, and this method passes through
The combination of supervision algorithm is clustered and had, time complexity can be reduced, classification accuracy rise improves classification effectiveness.
In order to realize foregoing invention purpose, the present invention adopts the following technical scheme that:
A kind of net flow assorted method, described method includes following steps:
(1) network-flow characteristic feature set is extracted;
(2) sorter model is obtained, the behavioural characteristic collection is inputted into grader, classification based training is carried out, is accordingly joined
Number;Classifier performance is assessed, and Optimum Classification device performance.
In optimal technical scheme provided by the invention, the step (1) includes the following steps:
A. network traffics capture and screen the key message of unknown flow rate data;
B. the key message is handled, effective data message more directly perceived is obtained, to represent the behavioural characteristic of unknown flow rate;
C. determine feature set dimension, integrate the behavioural characteristic, avoid information redundancy, and retain and effectively identify data flow
Information.
In second optimal technical scheme provided by the invention, the key message includes data traffic types, data packet is assisted
View, time and type.
In 3rd optimal technical scheme provided by the invention, the step (2) includes the following steps:
A. the extracted network-flow characteristic feature set of selected part carries out classification mark, as the behavior newly added in
Feature obtains the training set and test set of classifier training;
B. training set is subjected to clustering algorithm, obtains new training set;
C. new training set is exercised supervision algorithm, determines sorting criterion;
D. test set input grader carries out sorting criterion test, and carries out accuracy rate assessment;
E. whether judging nicety rate meets the requirements, and is, terminates;Otherwise return to step B, until accuracy rate meets the requirements.
In 4th optimal technical scheme provided by the invention, the step B includes the following steps:
(a) clustering algorithm parameter initialization, training set input are trained;
(b) judge whether cluster centre restrains, be, perform step (c), otherwise perform step (a);
(c) after cluster training, cluster centre and convergence radius are preserved, adjusting training collection simultaneously removes the sample clustered
This, forms new training set.
In 5th optimal technical scheme provided by the invention, involved step (C) includes the following steps:
(a) algorithm parameter initialization is supervised, new training set is inputted and is trained;
(b) judge whether algorithm restrains, be, perform step (c), otherwise perform step (a);
(c) supervision algorithm parameter determines that sorting criterion determines.
In 6th optimal technical scheme provided by the invention, the training set includes feature set and the part that part has marked
The feature set not marked.
In 7th optimal technical scheme provided by the invention, the test set includes feature set and the part that part has marked
The feature set not marked.
In 8th optimal technical scheme provided by the invention, the classification mark is using deep packet inspection technical.
In 9th optimal technical scheme provided by the invention, classifier performance assessment include to grader accuracy rate,
The assessment of system time and memory overhead.
Compared with prior art, the beneficial effects of the present invention are:
The present invention combines unsupervised in machine learning method and has two kinds of algorithms of supervision to classify.The two combines can be
Ensure to reduce system time and memory overhead on the premise of compared with high-class accuracy rate, promote classification effectiveness.
The present invention improves clustering algorithm, and the sample point marked is added in cluster process, can detect poly-
It is proper whether class center is chosen, and corrects error and mistake, reduces the center of a sample's point for not meeting actual conditions, and it is accurate to promote cluster
Rate, so as to improve overall performance.
Description of the drawings
Fig. 1 is a kind of net flow assorted method flow diagram
Fig. 2 is the flow diagram of network-flow characteristic feature set extraction
Fig. 3 is that sorter model obtains and Optimizing Flow figure
Specific embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
As shown in Figure 1, a kind of based on clustering and having the net flow assorted method for supervising algorithm, step is as follows:
Step S101:Network data flow extracts behavioural characteristic;
Step S102:Sorter model is obtained, above-mentioned behavioural characteristic collection is inputted into grader, classifier training is carried out, obtains
To relevant parameter;
Step S103:Classifier performance is assessed, including accuracy rate, system time and memory overhead etc., test set input point
Class device is classified, and classifier algorithm and parameter, Optimum Classification device performance are adjusted according to accuracy rate.
As shown in Fig. 2, network-flow characteristic feature set extracting method, step are as follows:
Step S201:Network traffics capture and screen the key message of unknown flow rate data, and key message includes data industry
Service type, data pack protocol, time and size;
Step S202:To the processing of above-mentioned key message, effective data message more directly perceived is obtained, to represent unknown flow rate
Behavioural characteristic;
Step S203:It determines feature set dimension, integrates above-mentioned behavioural characteristic collection, avoid information redundancy, and retain and effectively know
The information of other data flow.
As shown in figure 3, sorter model obtains and optimization method, step are as follows:
Step S301:The extracted network-flow characteristic feature set of selected part carries out classification mark, as what is newly added in
One behavioural characteristic obtains the training set and test set of classifier training;Wherein artificial or DPI (depth can be used in mask method
Packet inspection technical, flow detection and control based on application layer) method;
Step S302:Clustering algorithm parameter initialization, training set input are trained;
Step S303:Judge whether cluster centre restrains, be, perform S304, otherwise perform step S302;
Specific algorithm is:Randomly select sample and the training cluster calculation jointly of a large amount of samples not marked that part marked
Method chooses cluster centre, and the sample point marked can detect whether center of a sample's point chooses proper, amendment error and mistake,
It reduces because unsupervised, only obtains center of a sample's point that result obtains not meeting actual conditions with clustering criteria.According to clustering criteria
Repetitive exercise sample set, end condition are algorithmic statement.Cluster centre and suitable cluster scope determine at this time, and cluster scope is adopted
It is represented with convergence radius;
Step S304:Cluster centre, convergence radius are preserved, adjusting training collection removes the sample point clustered and preservation;
After cluster training, cluster centre and convergence radius are preserved, the cluster judgment as the classification of unknown flow rate data
Foundation removes the sample point clustered, and the data volume for having supervision Algorithm for Training decreases, if the complexity of SVM algorithm is in O
(n2)~O (n3) between, n is number of samples, so has supervision algorithm complexity to be greatly lowered, can be with training for promotion efficiency;
Step S305:There is supervision algorithm parameter initialization, input new training set and be trained;
Step S306:Judge whether algorithm restrains, be, perform S307, otherwise perform S305;
Step S307:There is supervision algorithm parameter to determine that sorting criterion determines;
Step S308:Test set input grader carries out class test, and carries out accuracy rate assessment;
Step S309:Whether judging nicety rate meets the requirements, and is unsatisfactory for, and returns to S302, until classifying quality meets the requirements,
Predominantly classification accuracy is higher;Meet and then determine that the sorter model is feasible, terminate algorithm.
Finally it should be noted that:The above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, to the greatest extent
Pipe is described in detail the present invention with reference to above-described embodiment, those of ordinary skills in the art should understand that:Still
The specific embodiment of the present invention can be modified or replaced equivalently, and without departing from any of spirit and scope of the invention
Modification or equivalent substitution, should be covered by the scope of the claims of the present invention.
Claims (8)
- A kind of 1. net flow assorted method, which is characterized in that described method includes following steps:(1) network-flow characteristic feature set is extracted;(2) sorter model is obtained, the behavioural characteristic collection is inputted into grader, classification based training is carried out, obtains relevant parameter;Classifier performance is assessed, and Optimum Classification device performance;The step (1) includes the following steps:A. network traffics capture and screen the key message of unknown flow rate data;B. the key message is handled, effective data message more directly perceived is obtained, to represent the behavioural characteristic of unknown flow rate;C. determine feature set dimension, integrate the behavioural characteristic, avoid information redundancy, and retain the letter of effectively identification data flow Breath;The step (2) includes the following steps:A. the extracted network-flow characteristic feature set of selected part carries out classification mark, as the behavior spy newly added in Sign, obtains the training set and test set of classifier training;B. training set is subjected to clustering algorithm, obtains new training set;C. new training set is exercised supervision algorithm, determines sorting criterion;D. test set input grader carries out sorting criterion test, and carries out accuracy rate assessment;E. whether judging nicety rate meets the requirements, and is, terminates;Otherwise return to step B, until accuracy rate meets the requirements.
- 2. net flow assorted method according to claim 1, which is characterized in that the key message includes classes of data traffic Type, data pack protocol, time and type.
- 3. net flow assorted method according to claim 1, which is characterized in that the step B includes the following steps:(a) clustering algorithm parameter initialization, training set input are trained;(b) judge whether cluster centre restrains, be, perform step (c), otherwise perform step (a);(c) after cluster training, cluster centre and convergence radius are preserved, adjusting training collection simultaneously removes the sample clustered, shape The training set of Cheng Xin.
- 4. net flow assorted method according to claim 1, which is characterized in that involved step (C) includes the following steps:(a) algorithm parameter initialization is supervised, new training set is inputted and is trained;(b) judge whether algorithm restrains, be, perform step (c), otherwise perform step (a);(c) supervision algorithm parameter determines that sorting criterion determines.
- 5. net flow assorted method according to claim 1, which is characterized in that the training set includes what part had marked The feature set that feature set and part do not mark.
- 6. net flow assorted method according to claim 1, which is characterized in that the test set includes what part had marked The feature set that feature set and part do not mark.
- 7. net flow assorted method according to claim 1, which is characterized in that the classification mark is using deep-packet detection Technology.
- 8. net flow assorted method according to claim 1, which is characterized in that the classifier performance assessment is included to dividing The assessment of class device accuracy rate, system time and memory overhead.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510176138.1A CN104767692B (en) | 2015-04-15 | 2015-04-15 | A kind of net flow assorted method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510176138.1A CN104767692B (en) | 2015-04-15 | 2015-04-15 | A kind of net flow assorted method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104767692A CN104767692A (en) | 2015-07-08 |
CN104767692B true CN104767692B (en) | 2018-05-29 |
Family
ID=53649314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510176138.1A Active CN104767692B (en) | 2015-04-15 | 2015-04-15 | A kind of net flow assorted method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104767692B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105022960B (en) * | 2015-08-10 | 2017-11-21 | 济南大学 | Multiple features mobile terminal from malicious software detecting method and system based on network traffics |
CN106959967B (en) * | 2016-01-12 | 2019-11-19 | 中国科学院声学研究所 | A kind of training and link prediction method of link prediction model |
CN106411775B (en) * | 2016-08-31 | 2019-06-14 | 国家计算机网络与信息安全管理中心 | A kind of internet traffic classification samples mask method |
CN106713324B (en) * | 2016-12-28 | 2020-03-06 | 北京奇艺世纪科技有限公司 | Flow detection method and device |
CN108197666A (en) * | 2018-01-30 | 2018-06-22 | 咪咕文化科技有限公司 | A kind of processing method, device and the storage medium of image classification model |
US11586971B2 (en) | 2018-07-19 | 2023-02-21 | Hewlett Packard Enterprise Development Lp | Device identifier classification |
CN109309630B (en) * | 2018-09-25 | 2021-09-21 | 深圳先进技术研究院 | Network traffic classification method and system and electronic equipment |
CN111126419B (en) * | 2018-10-30 | 2023-12-01 | 顺丰科技有限公司 | Dot clustering method and device |
CN109376797B (en) * | 2018-11-20 | 2023-05-16 | 大连理工大学 | Network traffic classification method based on binary encoder and multi-hash table |
CN109450740A (en) * | 2018-12-21 | 2019-03-08 | 青岛理工大学 | A kind of SDN controller carrying out traffic classification based on DPI and machine learning algorithm |
CN109922083B (en) * | 2019-04-10 | 2021-01-05 | 武汉金盛方圆网络科技发展有限公司 | Network protocol flow control system |
CN110149280B (en) * | 2019-05-27 | 2020-08-28 | 中国科学技术大学 | Network traffic classification method and device |
CN110445800B (en) * | 2019-08-15 | 2022-06-14 | 上海寰创通信科技股份有限公司 | Self-learning-based deep packet parsing system |
CN110753049B (en) * | 2019-10-21 | 2021-04-13 | 清华大学 | Safety situation sensing system based on industrial control network flow |
CN111983429B (en) * | 2020-08-19 | 2023-07-18 | Oppo广东移动通信有限公司 | Chip verification system, chip verification method, terminal and storage medium |
CN112637084B (en) * | 2020-12-10 | 2022-09-23 | 中山职业技术学院 | Distributed network flow novelty detection method and classifier |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101060443A (en) * | 2006-04-17 | 2007-10-24 | 中国科学院自动化研究所 | An improved adaptive boosting algorithm based Internet intrusion detection method |
CN103150454A (en) * | 2013-03-27 | 2013-06-12 | 山东大学 | Dynamic machine learning modeling method based on sample recommending and labeling |
CN103793510A (en) * | 2014-01-29 | 2014-05-14 | 苏州融希信息科技有限公司 | Classifier construction method based on active learning |
-
2015
- 2015-04-15 CN CN201510176138.1A patent/CN104767692B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101060443A (en) * | 2006-04-17 | 2007-10-24 | 中国科学院自动化研究所 | An improved adaptive boosting algorithm based Internet intrusion detection method |
CN103150454A (en) * | 2013-03-27 | 2013-06-12 | 山东大学 | Dynamic machine learning modeling method based on sample recommending and labeling |
CN103793510A (en) * | 2014-01-29 | 2014-05-14 | 苏州融希信息科技有限公司 | Classifier construction method based on active learning |
Also Published As
Publication number | Publication date |
---|---|
CN104767692A (en) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104767692B (en) | A kind of net flow assorted method | |
CN106817248B (en) | APT attack detection method | |
CN100536411C (en) | An improved adaptive boosting algorithm based Internet intrusion detection method | |
CN112381121A (en) | Unknown class network flow detection and identification method based on twin network | |
CN109818793A (en) | For the device type identification of Internet of Things and network inbreak detection method | |
WO2022037130A1 (en) | Network traffic anomaly detection method and apparatus, and electronic apparatus and storage medium | |
CN110311829A (en) | A kind of net flow assorted method accelerated based on machine learning | |
CN103903441B (en) | Road traffic state distinguishing method based on semi-supervised learning | |
CN102176698A (en) | Method for detecting abnormal behaviors of user based on transfer learning | |
CN105141455B (en) | A kind of net flow assorted modeling method of making an uproar based on statistical nature | |
CN107579846B (en) | Cloud computing fault data detection method and system | |
CN107360152A (en) | A kind of Web based on semantic analysis threatens sensory perceptual system | |
CN103780588A (en) | User abnormal behavior detection method in digital home network | |
WO2020001311A1 (en) | Method for detecting interference, apparatus, device, and storage medium | |
CN109143848A (en) | Industrial control system intrusion detection method based on FCM-GASVM | |
CN111107077B (en) | SVM-based attack flow classification method | |
CN109995611B (en) | Traffic classification model establishing and traffic classification method, device, equipment and server | |
CN107895171A (en) | A kind of intrusion detection method based on K averages Yu depth confidence network | |
CN109286576A (en) | A kind of network agent encryption traffic characteristic extracting method of data packet frequency analysis | |
CN110009005A (en) | A kind of net flow assorted method based on feature strong correlation | |
CN108241662A (en) | The optimization method and device of data mark | |
CN109660656A (en) | A kind of intelligent terminal method for identifying application program | |
CN109450876B (en) | DDos identification method and system based on multi-dimensional state transition matrix characteristics | |
CN102984131B (en) | A kind of information identifying method and device | |
CN103853720B (en) | User attention based network sensitive information monitoring system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |