CN105141455B - A kind of net flow assorted modeling method of making an uproar based on statistical nature - Google Patents
A kind of net flow assorted modeling method of making an uproar based on statistical nature Download PDFInfo
- Publication number
- CN105141455B CN105141455B CN201510521906.2A CN201510521906A CN105141455B CN 105141455 B CN105141455 B CN 105141455B CN 201510521906 A CN201510521906 A CN 201510521906A CN 105141455 B CN105141455 B CN 105141455B
- Authority
- CN
- China
- Prior art keywords
- network flow
- noise
- data
- network
- flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The net flow assorted modeling method of making an uproar based on statistical nature that the invention discloses a kind of, it includes:Network data acquisition processing, from Network Traffic Monitoring station extract real-time network flow data, and pre-processes network flow data;It establishes network flow noise judgment models and removes the noise in network flow data;Establish network flow noise tolerance model;Disaggregated model step 5, the sorting technique using random forest for establishing robust are classified using online network flow data as test set using the disaggregated model of robust;It solves the prior art not being inconsistent great amount of samples content existing for big data net flow assorted with the classification marked, these noise samples can make class concepts in training sample fuzzy, the classification Heuristics that it is provided is insufficient, cause the categorised decision that grader is built indefinite, to be judged by accident to test sample generic, the technical problems such as final classification performance are influenced.
Description
Technical field
The invention belongs to net flow assorted technology more particularly to a kind of net flow assorteds of making an uproar based on statistical nature
Modeling method.
Background technology
Influence of the resource management and security control of network to national economy and national security is huge, is increasingly subject to state's housekeeping
The attention at mansion.China also develops information management and network security for Research Priorities.Net flow assorted is to solve network
A series of basic technology of major issues in resource management and security control.In order to implement correctly management and control strategy, net
It is usually necessary to use traffic classifications by network manager to be apparent from current network state.In order to realize service quality (QoS)
Control, different applications will be endowed different priority with the limited network bandwidth of reasonable distribution.In terms of network security, root
It, can be more effective according to traffic classification as a result, intruding detection system takes fine-grained detection scheme to different flow classification
Identify suspicious network stream.
Net flow assorted technology is also continued to develop with the differentiation of network.Initial technology is referred to according to internet address
Port mapping table as defined in mechanism IANA is sent, the network flow of particular port is divided into corresponding network application.However, more next
More network applications uses dynamic random port, causes the technology no longer reliable.Existing commercial system is mainly using based on negative
The traffic classification technology of load, the application layer by analyzing packet load, and detect the feature field of different application to divide net
Network flow.The problem of this kind of technology is to analyze that complete application layer load computing cost is big, and the user privacy right that may be brought entangles
Confusingly, and None- identified uses the network application that payload encryption technology or feature field maintain secrecy.In the current big data epoch,
Network flow data amount is huge and complicated, inevitably there are some noise datas.Especially to network flow
During data are labeled or obtain, much noise can be inevitably introduced, the classification for leading to great amount of samples content and being marked
It is not inconsistent.These noise samples can make in training sample class concepts fuzzy, and the classification Heuristics provided is insufficient, cause point
The categorised decision of class device structure is indefinite, to be judged by accident to test sample generic, influences final classification performance.By
This explanation, raising have the precision for net flow assorted of making an uproar imperative.
Invention content:
The technical problem to be solved in the present invention:A kind of net flow assorted modeling side that makes an uproar based on statistical nature is provided
Method is not inconsistent great amount of samples content existing for big data net flow assorted with the classification marked with solving the prior art,
It is exactly the presence of a large amount of classification noises in network flow, these noise samples can make class concepts in training sample fuzzy,
The classification Heuristics that it is provided is insufficient, causes the categorised decision that grader is built indefinite, to the affiliated class of test sample
It is not judged by accident, influences the technical problems such as final classification performance.
Technical solution of the present invention:
A kind of net flow assorted modeling method of making an uproar based on statistical nature, it includes:
Step 1, network data acquisition processing, from Network Traffic Monitoring station extract real-time network flow data, and to network
Data on flows is pre-processed;
Step 2 establishes network flow noise judgment models and removes the noise in network flow data, the network flow
Noise judgment models are:In formula:RjThe noise network flow for representing j-th strip network flow judges knot
Fruit, PijRepresent the result that j-th strip network flow is judged as noise by i-th of grader;Step 3 establishes network flow noise appearance
Bear model, the network flow noise tolerance model includes:The noise grade expression formula of doubtful noise data:
With the weight expression formula of doubtful noise data:
In formula:LjThe noise grade of j-th strip network flow is represented, W (t) represents weight score, NLtRepresent t kind noises etc.
The value of grade;
Step 4, the network flow noise judgment models according to step 2 and step 3 and network flow noise tolerate mould
Type establishes the disaggregated model of robustIn formula:StRepresent noise grade in network flow data
For the data of t, R*Represent the set of the training set of robust;
Step 5, the sorting technique using random forest utilize robust using online network flow data as test set
Disaggregated model is classified.
Described pre-processes network flow data, and processing method includes:Step 1 is integrated from network flow number
According to the IP data packets of middle collection, and by IP data packet network consisting streams;Network flow is converted to unified data format by step 2;
Step 3, there are the data of missing values for removing;Step 4, the feature for extracting every network flow, step 5 utilize feature selecting algorithm
Remove redundancy and incoherent feature in network flow feature.
Beneficial effects of the present invention:
The present invention is using the statistical nature of network flow and the technology of machine learning come the different network flow of Classification and Identification.Base
It loads, has the advantages that a series of due to avoiding analysis in the traffic classification technology of statistical nature:(1) independent of port
Match, the network application using dynamic port can be handled;(2) simple network statistical flow characteristic is used, computing cost is small, classification
Speed is fast;(3) statistical nature used is unrelated with load, can identify the network application using payload encryption technology;(4) it does not relate to
And user's private data, avoid privacy of user dispute.
The present invention is based on the network flows of statistical nature to provide the modeling that noise judges, noise cleaning and noise are tolerated, such as
Only simple network flow noise cleaning can very likely dispose some non-noise network flows to fruit, can also influence in this way
Precision carries out tolerance meter so needing after the removing of exact network flow noise data to remaining doubtful noise data
It calculates, to improve nicety of grading, the present invention mainly has the characteristics that following:
(1) it is directed to network flow data and has carried out integrated IP data packets, network consisting stream, extraction feature, feature selecting
Pretreatment operation.
(2) the characteristics of analyzing definite network flow noise data proposes removal network flow noise in conjunction with its feature
Model.
(3) on removing network flow effective noise floor, the model for tolerating doubtful network flow noise is proposed.
The present invention is differentiated by noise and the modeling method of removing, noise tolerance, can preferably establish and accurately classify
Decision provides technical guarantee for net flow assorted, improves in the classification performance for having network flow of making an uproar, meets and work as
The active demand of preceding network flow big data classification, solves the prior art to a large amount of samples existing for big data net flow assorted
This content is not inconsistent with the classification marked, these noise samples can make class concepts in training sample fuzzy, the classification provided
Heuristics is insufficient, causes the categorised decision that grader is built indefinite, to be judged by accident to test sample generic, shadow
Ring the technical problems such as final classification performance.
Description of the drawings:
Fig. 1 is classification model construction method overall framework figure of the present invention.
Specific implementation mode
A kind of net flow assorted modeling method of making an uproar based on statistical nature, it includes:
Step 1, network data acquisition processing, from Network Traffic Monitoring station extract real-time network flow data, and to network
Data on flows is pre-processed;
Described pre-processes network flow data, and processing method includes:Step 1 is integrated from network flow number
According to the IP data packets of middle collection, and by IP data packet network consisting streams;Network flow is converted to unified data format by step 2;
Step 3, there are the data of missing values for removing;Step 4, the feature for extracting every network flow, step 5 utilize feature selecting algorithm
Remove redundancy and incoherent feature in network flow feature.
Described pre-processes i.e. in multiple Network Traffic Monitoring station extract real-time network flows network flow data
Then data integrate these IP packet datas collected from multiple data sources, and be transformed to unification includes IP data packets
And the network flow data in packet header, the data that wherein there are missing values and apparent noise are removed, there are the data of missing values to lack
The data in few packet header, there are the packet length of the data of apparent noise, that is, IP data packets or send packet time interval it is apparent and its
He wraps the network flow data to differ greatly.On the basis of above-mentioned data processing, each network flow data, extraction are identified
The feature of network flow data, feature include:The number of flow packet, maximum stream flow packet length, minimum discharge packet length, flow packet
Average length, the time interval of flow packet, source address, destination address, source port and destination interface etc..Complete network flow
After the extraction task of data characteristics, feature is extracted using feature selecting algorithm, removes redundancy and incoherent in characteristic
Feature.The efficiency of training pattern is established in order to ensure and online network flow of classifying in real time, only extracting a part has simultaneously
Representative and effective network flow data establishes disaggregated model.It can ensure that net flow assorted decision model is established in this way
High efficiency.
Step 2 establishes network flow noise judgment models and removes the noise in network flow data, the network flow
Noise judgment models are:In formula:RjThe noise network flow for representing j-th strip network flow judges knot
Fruit represents whether j-th strip network flow is to be worth that be represented for 1 be noise network flow for noise flow, and be worth is not for 0 representative
Noise network flow;PijRepresent j-th strip network flow by i-th of grader be judged as noise as a result, value indicates jth as 1
Network flow is considered noise by i-th of grader, is worth and indicates that j-th strip network flow is considered by i-th of grader for 0
It is non-noise;N represents grader number, even if there is n-1 grader to differentiate that the network flow is noise, it can not be identified as really
The noise cut just can guarantee only by above-mentioned stringent discriminating and remove non-noise network flow less as possible.
According to make an uproar network flow the characteristics of, initially set up the judgment models of network flow noise.It first has to observe and divide
Analyse classical supervised classification method has performance when making an uproar network flow in classification, then selects n performance therein preferable
Grader (n is less than 10).Then together by the classifiers combination of screening, the classifiers combination { F of collaborative work is formed1,
F2,...,Fn, to differentiate the noise determined in offline network flow.Identical data are (i.e. in entire offline discrimination process
Network flow characteristic data that treated) it is used as training set and test set.By using the classifiers combination of collaborative work and one
The mechanism of ticket rejection judges exact noise.
Step 3 establishes network flow noise tolerance model, and the network flow noise tolerance model includes:
The noise grade expression formula of doubtful noise data:
With the weight expression formula of doubtful noise data:
In formula:LjThe noise grade of j-th strip network flow is represented, W (t) represents weight score, NLtRepresent the value of t kind noise grades;
Concrete methods of realizing explanation:After removing determining noise, need to differentiate making an uproar for remaining doubtful noise data
Sound grade, by following formula
It determines, LjRepresent the noise grade of j-th strip network flow, PijJ-th strip network flow is represented whether by i-th of grader
Differentiate to be noise network flow, if it is noise network flow to differentiate, value is 1, and then value is 0 on the contrary.Meanwhile will have identical
Noise grade LjNetwork flow be included into the network flow set of corresponding noise gradeIn.
After determining the noise grade of network flow, calculate the weight that the network flow of the grade should assign, weight by
Formula
It determines, NLtThe value of t kind noise grades is represented, { NLt|NLt=t, 0≤t≤n-1 }.W (t) is point from 0 to 1
Number, value are exactly the weight that the flow of t kind noise grades should assign.The network flow of different noise grades should be endowed not
The larger network flow of same weight, i.e. noise suspicion assigns smaller weight, and noise suspicion network flow less than normal assigns relatively
Big weight.
Step 4, the network flow noise judgment models according to step 2 and step 3 and network flow noise tolerate mould
Type establishes the disaggregated model of robustIn formula:StT kinds in network flow data are represented to make an uproar
The set of the flow of sound grade, R*Represent the training set set of robust;Formula is indicated from stIn extraction (W (t) * 100) % at random
Data are combined into the training set of robust, form the disaggregated model of robust.
Step 5, the sorting technique using random forest utilize robust using online network flow data as test set
Sorter model is classified.To improve the efficiency of online classification, selection sort speed and performance have the random of supervision
The sorting technique of forest is classified using online data as test set using the grader of robust.
Claims (2)
1. a kind of net flow assorted modeling method of making an uproar based on statistical nature, it includes:
Step 1, network data acquisition processing, from Network Traffic Monitoring station extract real-time network flow data, and to network flow
Data are pre-processed;
Step 2 establishes network flow noise judgment models and removes the noise in network flow data, the network flow noise
Judgment models are:In formula:RjRepresent the noise network flow judging result of j-th strip network flow, generation
Whether table j-th strip network flow is to be worth that be represented for 1 be noise network flow for noise flow, and it is not noise net to be worth for 0 representative
Network flow;PijRepresent the result that j-th strip network flow is judged as noise by i-th of grader;Value indicates j-th strip network for 1
Flow is considered noise by i-th of grader, is worth and indicates that j-th strip network flow thinks non-by i-th of grader and makes an uproar for 0
Sound;
Step 3 establishes network flow noise tolerance model, and the network flow noise tolerance model includes:
The noise grade expression formula of doubtful noise data:
With the weight expression formula of doubtful noise data:
In formula:LjThe noise grade of j-th strip network flow is represented, W (t) represents weight score, NLtRepresent t kind noise grades
Value, { NLt|NLt=t, 0≤t≤n-1 }, n is less than 10;
Step 4, the network flow noise judgment models according to step 2 and step 3 and network flow noise tolerate model, build
The disaggregated model of vertical robustIn formula:StNoise grade is represented in network flow data as t's
Data, R*Represent the set of the training set of robust;
Step 5, the sorting technique using random forest utilize the classification of robust using online network flow data as test set
Model is classified.
2. a kind of net flow assorted modeling method of making an uproar based on statistical nature according to claim 1, feature exist
In:Described pre-processes network flow data, and processing method includes:Step 1 is integrated and is received from network flow data
The IP data packets of collection, and by IP data packet network consisting streams;Network flow is converted to unified data format by step 2;Step 3,
There are the data of missing values for removing;Step 4, the feature for extracting every network flow, step 5 remove net using feature selecting algorithm
Redundancy and incoherent feature in network stream feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510521906.2A CN105141455B (en) | 2015-08-24 | 2015-08-24 | A kind of net flow assorted modeling method of making an uproar based on statistical nature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510521906.2A CN105141455B (en) | 2015-08-24 | 2015-08-24 | A kind of net flow assorted modeling method of making an uproar based on statistical nature |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105141455A CN105141455A (en) | 2015-12-09 |
CN105141455B true CN105141455B (en) | 2018-08-17 |
Family
ID=54726673
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510521906.2A Expired - Fee Related CN105141455B (en) | 2015-08-24 | 2015-08-24 | A kind of net flow assorted modeling method of making an uproar based on statistical nature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105141455B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107404398A (en) * | 2017-05-31 | 2017-11-28 | 中山大学 | A kind of networks congestion control judgement system |
CN109344204B (en) * | 2018-09-10 | 2020-05-19 | 中国人民解放军陆军工程大学 | Network traffic classification method with optimal individual convergence rate |
CN109151880B (en) * | 2018-11-08 | 2021-06-22 | 中国人民解放军国防科技大学 | Mobile application flow identification method based on multilayer classifier |
CN109639481B (en) * | 2018-12-11 | 2020-10-27 | 深圳先进技术研究院 | Deep learning-based network traffic classification method and system and electronic equipment |
CN109698836B (en) * | 2019-02-01 | 2021-07-23 | 重庆邮电大学 | Wireless local area network intrusion detection method and system based on deep learning |
CN111314310B (en) * | 2020-01-19 | 2021-02-12 | 浙江大学 | Attack detection method for unresolvable network data feature selection based on machine learning |
CN112367226A (en) * | 2020-12-22 | 2021-02-12 | 长沙树根互联技术有限公司 | Equipment-based working data acquisition method and device and electronic equipment |
CN114615007B (en) * | 2022-01-13 | 2023-05-23 | 中国科学院信息工程研究所 | Tunnel mixed flow classification method and system based on random forest |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102271090A (en) * | 2011-09-06 | 2011-12-07 | 电子科技大学 | Transport-layer-characteristic-based traffic classification method and device |
CN104270392A (en) * | 2014-10-24 | 2015-01-07 | 中国科学院信息工程研究所 | Method and system for network protocol recognition based on tri-classifier cooperative training learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2504952A1 (en) * | 2009-11-27 | 2012-10-03 | Telefonaktiebolaget LM Ericsson (publ) | Packet classification method and apparatus |
-
2015
- 2015-08-24 CN CN201510521906.2A patent/CN105141455B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102271090A (en) * | 2011-09-06 | 2011-12-07 | 电子科技大学 | Transport-layer-characteristic-based traffic classification method and device |
CN104270392A (en) * | 2014-10-24 | 2015-01-07 | 中国科学院信息工程研究所 | Method and system for network protocol recognition based on tri-classifier cooperative training learning |
Non-Patent Citations (6)
Title |
---|
A new measure of classifier diversity in multiple classifier system;Tie-Gang Fan;《Proceedings of the 7th International Conference on Machine Learning and Cybernetics》;20080715;第18-21页 * |
Assessing the predictive accuracy of diversity measures with domain-dependent, asymmetric misclassification costs;Mordechai Gal-Or;《Information Fusion》;ELSEVIER;20040721;第6卷;第37-48页 * |
Benoît Frénay.Classification in Presence of Label Noise: a Survey.《IEEE Transactions on Neural Networks and Learning Systems》.2014,第25卷(第5期),第845-869页. * |
Diversity creation methods: a survey and categorisation;Gavin Brown;《Information Fusion》;ELSEIER;20040519;第6卷;第5-20页 * |
Robust Network Traffic Classification;Jun Zhang;《IEEE/ ACM Transactions on Networking》;20150508;第23卷(第4期);第1257-1270页 * |
基于随机森林算法的网络流量分类方法;赵小欢;《中国电子科学研究院学报》;20130415;第8卷(第2期);第184-190页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105141455A (en) | 2015-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105141455B (en) | A kind of net flow assorted modeling method of making an uproar based on statistical nature | |
CN106341337B (en) | Flow detection and control mechanism and method capable of realizing application awareness under SDN | |
CN103714383B (en) | Rail transit fault diagnosis method and system based on rough set | |
CN102523241B (en) | Method and device for classifying network traffic on line based on decision tree high-speed parallel processing | |
CN104767692B (en) | A kind of net flow assorted method | |
CN107846326A (en) | A kind of adaptive semi-supervised net flow assorted method, system and equipment | |
CN102420723A (en) | Anomaly detection method for various kinds of intrusion | |
CN105871832A (en) | Network application encrypted traffic recognition method and device based on protocol attributes | |
CN101645806B (en) | Network flow classifying system and network flow classifying method combining DPI and DFI | |
CN109218223B (en) | Robust network traffic classification method and system based on active learning | |
CN107819698A (en) | A kind of net flow assorted method based on semi-supervised learning, computer equipment | |
CN109639734B (en) | Abnormal flow detection method with computing resource adaptivity | |
CN104468567B (en) | A kind of system and method for the identification of network multimedia Business Stream and mapping | |
CN101937445A (en) | Automatic file classification system | |
CN104702465A (en) | Parallel network flow classification method | |
CN112528277A (en) | Hybrid intrusion detection method based on recurrent neural network | |
CN109981474A (en) | A kind of network flow fine grit classification system and method for application-oriented software | |
CN104348741A (en) | Method and system for detecting P2P (peer-to-peer) traffic based on multi-dimensional analysis and decision tree | |
CN108494594A (en) | A kind of analysis method and system of EIGRP route networks failure | |
CN111709477A (en) | Method and tool for garbage classification based on improved MobileNet network | |
Kong et al. | Identification of abnormal network traffic using support vector machine | |
CN113408087A (en) | Substation inspection method based on cloud side system and video intelligent analysis | |
CN110034966A (en) | A kind of method for classifying data stream and system based on machine learning | |
CN103973589A (en) | Network traffic classification method and device | |
CN107404398A (en) | A kind of networks congestion control judgement system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180817 Termination date: 20190824 |