CN105141455B - A kind of net flow assorted modeling method of making an uproar based on statistical nature - Google Patents

A kind of net flow assorted modeling method of making an uproar based on statistical nature Download PDF

Info

Publication number
CN105141455B
CN105141455B CN201510521906.2A CN201510521906A CN105141455B CN 105141455 B CN105141455 B CN 105141455B CN 201510521906 A CN201510521906 A CN 201510521906A CN 105141455 B CN105141455 B CN 105141455B
Authority
CN
China
Prior art keywords
network flow
noise
data
network
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510521906.2A
Other languages
Chinese (zh)
Other versions
CN105141455A (en
Inventor
王斌锋
张军
张自力
夏大文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University
Original Assignee
Southwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University filed Critical Southwest University
Priority to CN201510521906.2A priority Critical patent/CN105141455B/en
Publication of CN105141455A publication Critical patent/CN105141455A/en
Application granted granted Critical
Publication of CN105141455B publication Critical patent/CN105141455B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The net flow assorted modeling method of making an uproar based on statistical nature that the invention discloses a kind of, it includes:Network data acquisition processing, from Network Traffic Monitoring station extract real-time network flow data, and pre-processes network flow data;It establishes network flow noise judgment models and removes the noise in network flow data;Establish network flow noise tolerance model;Disaggregated model step 5, the sorting technique using random forest for establishing robust are classified using online network flow data as test set using the disaggregated model of robust;It solves the prior art not being inconsistent great amount of samples content existing for big data net flow assorted with the classification marked, these noise samples can make class concepts in training sample fuzzy, the classification Heuristics that it is provided is insufficient, cause the categorised decision that grader is built indefinite, to be judged by accident to test sample generic, the technical problems such as final classification performance are influenced.

Description

A kind of net flow assorted modeling method of making an uproar based on statistical nature
Technical field
The invention belongs to net flow assorted technology more particularly to a kind of net flow assorteds of making an uproar based on statistical nature Modeling method.
Background technology
Influence of the resource management and security control of network to national economy and national security is huge, is increasingly subject to state's housekeeping The attention at mansion.China also develops information management and network security for Research Priorities.Net flow assorted is to solve network A series of basic technology of major issues in resource management and security control.In order to implement correctly management and control strategy, net It is usually necessary to use traffic classifications by network manager to be apparent from current network state.In order to realize service quality (QoS) Control, different applications will be endowed different priority with the limited network bandwidth of reasonable distribution.In terms of network security, root It, can be more effective according to traffic classification as a result, intruding detection system takes fine-grained detection scheme to different flow classification Identify suspicious network stream.
Net flow assorted technology is also continued to develop with the differentiation of network.Initial technology is referred to according to internet address Port mapping table as defined in mechanism IANA is sent, the network flow of particular port is divided into corresponding network application.However, more next More network applications uses dynamic random port, causes the technology no longer reliable.Existing commercial system is mainly using based on negative The traffic classification technology of load, the application layer by analyzing packet load, and detect the feature field of different application to divide net Network flow.The problem of this kind of technology is to analyze that complete application layer load computing cost is big, and the user privacy right that may be brought entangles Confusingly, and None- identified uses the network application that payload encryption technology or feature field maintain secrecy.In the current big data epoch, Network flow data amount is huge and complicated, inevitably there are some noise datas.Especially to network flow During data are labeled or obtain, much noise can be inevitably introduced, the classification for leading to great amount of samples content and being marked It is not inconsistent.These noise samples can make in training sample class concepts fuzzy, and the classification Heuristics provided is insufficient, cause point The categorised decision of class device structure is indefinite, to be judged by accident to test sample generic, influences final classification performance.By This explanation, raising have the precision for net flow assorted of making an uproar imperative.
Invention content:
The technical problem to be solved in the present invention:A kind of net flow assorted modeling side that makes an uproar based on statistical nature is provided Method is not inconsistent great amount of samples content existing for big data net flow assorted with the classification marked with solving the prior art, It is exactly the presence of a large amount of classification noises in network flow, these noise samples can make class concepts in training sample fuzzy, The classification Heuristics that it is provided is insufficient, causes the categorised decision that grader is built indefinite, to the affiliated class of test sample It is not judged by accident, influences the technical problems such as final classification performance.
Technical solution of the present invention:
A kind of net flow assorted modeling method of making an uproar based on statistical nature, it includes:
Step 1, network data acquisition processing, from Network Traffic Monitoring station extract real-time network flow data, and to network Data on flows is pre-processed;
Step 2 establishes network flow noise judgment models and removes the noise in network flow data, the network flow Noise judgment models are:In formula:RjThe noise network flow for representing j-th strip network flow judges knot Fruit, PijRepresent the result that j-th strip network flow is judged as noise by i-th of grader;Step 3 establishes network flow noise appearance Bear model, the network flow noise tolerance model includes:The noise grade expression formula of doubtful noise data:
With the weight expression formula of doubtful noise data:
In formula:LjThe noise grade of j-th strip network flow is represented, W (t) represents weight score, NLtRepresent t kind noises etc. The value of grade;
Step 4, the network flow noise judgment models according to step 2 and step 3 and network flow noise tolerate mould Type establishes the disaggregated model of robustIn formula:StRepresent noise grade in network flow data For the data of t, R*Represent the set of the training set of robust;
Step 5, the sorting technique using random forest utilize robust using online network flow data as test set Disaggregated model is classified.
Described pre-processes network flow data, and processing method includes:Step 1 is integrated from network flow number According to the IP data packets of middle collection, and by IP data packet network consisting streams;Network flow is converted to unified data format by step 2; Step 3, there are the data of missing values for removing;Step 4, the feature for extracting every network flow, step 5 utilize feature selecting algorithm Remove redundancy and incoherent feature in network flow feature.
Beneficial effects of the present invention:
The present invention is using the statistical nature of network flow and the technology of machine learning come the different network flow of Classification and Identification.Base It loads, has the advantages that a series of due to avoiding analysis in the traffic classification technology of statistical nature:(1) independent of port Match, the network application using dynamic port can be handled;(2) simple network statistical flow characteristic is used, computing cost is small, classification Speed is fast;(3) statistical nature used is unrelated with load, can identify the network application using payload encryption technology;(4) it does not relate to And user's private data, avoid privacy of user dispute.
The present invention is based on the network flows of statistical nature to provide the modeling that noise judges, noise cleaning and noise are tolerated, such as Only simple network flow noise cleaning can very likely dispose some non-noise network flows to fruit, can also influence in this way Precision carries out tolerance meter so needing after the removing of exact network flow noise data to remaining doubtful noise data It calculates, to improve nicety of grading, the present invention mainly has the characteristics that following:
(1) it is directed to network flow data and has carried out integrated IP data packets, network consisting stream, extraction feature, feature selecting Pretreatment operation.
(2) the characteristics of analyzing definite network flow noise data proposes removal network flow noise in conjunction with its feature Model.
(3) on removing network flow effective noise floor, the model for tolerating doubtful network flow noise is proposed.
The present invention is differentiated by noise and the modeling method of removing, noise tolerance, can preferably establish and accurately classify Decision provides technical guarantee for net flow assorted, improves in the classification performance for having network flow of making an uproar, meets and work as The active demand of preceding network flow big data classification, solves the prior art to a large amount of samples existing for big data net flow assorted This content is not inconsistent with the classification marked, these noise samples can make class concepts in training sample fuzzy, the classification provided Heuristics is insufficient, causes the categorised decision that grader is built indefinite, to be judged by accident to test sample generic, shadow Ring the technical problems such as final classification performance.
Description of the drawings:
Fig. 1 is classification model construction method overall framework figure of the present invention.
Specific implementation mode
A kind of net flow assorted modeling method of making an uproar based on statistical nature, it includes:
Step 1, network data acquisition processing, from Network Traffic Monitoring station extract real-time network flow data, and to network Data on flows is pre-processed;
Described pre-processes network flow data, and processing method includes:Step 1 is integrated from network flow number According to the IP data packets of middle collection, and by IP data packet network consisting streams;Network flow is converted to unified data format by step 2; Step 3, there are the data of missing values for removing;Step 4, the feature for extracting every network flow, step 5 utilize feature selecting algorithm Remove redundancy and incoherent feature in network flow feature.
Described pre-processes i.e. in multiple Network Traffic Monitoring station extract real-time network flows network flow data Then data integrate these IP packet datas collected from multiple data sources, and be transformed to unification includes IP data packets And the network flow data in packet header, the data that wherein there are missing values and apparent noise are removed, there are the data of missing values to lack The data in few packet header, there are the packet length of the data of apparent noise, that is, IP data packets or send packet time interval it is apparent and its He wraps the network flow data to differ greatly.On the basis of above-mentioned data processing, each network flow data, extraction are identified The feature of network flow data, feature include:The number of flow packet, maximum stream flow packet length, minimum discharge packet length, flow packet Average length, the time interval of flow packet, source address, destination address, source port and destination interface etc..Complete network flow After the extraction task of data characteristics, feature is extracted using feature selecting algorithm, removes redundancy and incoherent in characteristic Feature.The efficiency of training pattern is established in order to ensure and online network flow of classifying in real time, only extracting a part has simultaneously Representative and effective network flow data establishes disaggregated model.It can ensure that net flow assorted decision model is established in this way High efficiency.
Step 2 establishes network flow noise judgment models and removes the noise in network flow data, the network flow Noise judgment models are:In formula:RjThe noise network flow for representing j-th strip network flow judges knot Fruit represents whether j-th strip network flow is to be worth that be represented for 1 be noise network flow for noise flow, and be worth is not for 0 representative Noise network flow;PijRepresent j-th strip network flow by i-th of grader be judged as noise as a result, value indicates jth as 1 Network flow is considered noise by i-th of grader, is worth and indicates that j-th strip network flow is considered by i-th of grader for 0 It is non-noise;N represents grader number, even if there is n-1 grader to differentiate that the network flow is noise, it can not be identified as really The noise cut just can guarantee only by above-mentioned stringent discriminating and remove non-noise network flow less as possible.
According to make an uproar network flow the characteristics of, initially set up the judgment models of network flow noise.It first has to observe and divide Analyse classical supervised classification method has performance when making an uproar network flow in classification, then selects n performance therein preferable Grader (n is less than 10).Then together by the classifiers combination of screening, the classifiers combination { F of collaborative work is formed1, F2,...,Fn, to differentiate the noise determined in offline network flow.Identical data are (i.e. in entire offline discrimination process Network flow characteristic data that treated) it is used as training set and test set.By using the classifiers combination of collaborative work and one The mechanism of ticket rejection judges exact noise.
Step 3 establishes network flow noise tolerance model, and the network flow noise tolerance model includes:
The noise grade expression formula of doubtful noise data:
With the weight expression formula of doubtful noise data: In formula:LjThe noise grade of j-th strip network flow is represented, W (t) represents weight score, NLtRepresent the value of t kind noise grades;
Concrete methods of realizing explanation:After removing determining noise, need to differentiate making an uproar for remaining doubtful noise data Sound grade, by following formula
It determines, LjRepresent the noise grade of j-th strip network flow, PijJ-th strip network flow is represented whether by i-th of grader Differentiate to be noise network flow, if it is noise network flow to differentiate, value is 1, and then value is 0 on the contrary.Meanwhile will have identical Noise grade LjNetwork flow be included into the network flow set of corresponding noise gradeIn.
After determining the noise grade of network flow, calculate the weight that the network flow of the grade should assign, weight by Formula
It determines, NLtThe value of t kind noise grades is represented, { NLt|NLt=t, 0≤t≤n-1 }.W (t) is point from 0 to 1 Number, value are exactly the weight that the flow of t kind noise grades should assign.The network flow of different noise grades should be endowed not The larger network flow of same weight, i.e. noise suspicion assigns smaller weight, and noise suspicion network flow less than normal assigns relatively Big weight.
Step 4, the network flow noise judgment models according to step 2 and step 3 and network flow noise tolerate mould Type establishes the disaggregated model of robustIn formula:StT kinds in network flow data are represented to make an uproar The set of the flow of sound grade, R*Represent the training set set of robust;Formula is indicated from stIn extraction (W (t) * 100) % at random Data are combined into the training set of robust, form the disaggregated model of robust.
Step 5, the sorting technique using random forest utilize robust using online network flow data as test set Sorter model is classified.To improve the efficiency of online classification, selection sort speed and performance have the random of supervision The sorting technique of forest is classified using online data as test set using the grader of robust.

Claims (2)

1. a kind of net flow assorted modeling method of making an uproar based on statistical nature, it includes:
Step 1, network data acquisition processing, from Network Traffic Monitoring station extract real-time network flow data, and to network flow Data are pre-processed;
Step 2 establishes network flow noise judgment models and removes the noise in network flow data, the network flow noise Judgment models are:In formula:RjRepresent the noise network flow judging result of j-th strip network flow, generation Whether table j-th strip network flow is to be worth that be represented for 1 be noise network flow for noise flow, and it is not noise net to be worth for 0 representative Network flow;PijRepresent the result that j-th strip network flow is judged as noise by i-th of grader;Value indicates j-th strip network for 1 Flow is considered noise by i-th of grader, is worth and indicates that j-th strip network flow thinks non-by i-th of grader and makes an uproar for 0 Sound;
Step 3 establishes network flow noise tolerance model, and the network flow noise tolerance model includes:
The noise grade expression formula of doubtful noise data:
With the weight expression formula of doubtful noise data:
In formula:LjThe noise grade of j-th strip network flow is represented, W (t) represents weight score, NLtRepresent t kind noise grades Value, { NLt|NLt=t, 0≤t≤n-1 }, n is less than 10;
Step 4, the network flow noise judgment models according to step 2 and step 3 and network flow noise tolerate model, build The disaggregated model of vertical robustIn formula:StNoise grade is represented in network flow data as t's Data, R*Represent the set of the training set of robust;
Step 5, the sorting technique using random forest utilize the classification of robust using online network flow data as test set Model is classified.
2. a kind of net flow assorted modeling method of making an uproar based on statistical nature according to claim 1, feature exist In:Described pre-processes network flow data, and processing method includes:Step 1 is integrated and is received from network flow data The IP data packets of collection, and by IP data packet network consisting streams;Network flow is converted to unified data format by step 2;Step 3, There are the data of missing values for removing;Step 4, the feature for extracting every network flow, step 5 remove net using feature selecting algorithm Redundancy and incoherent feature in network stream feature.
CN201510521906.2A 2015-08-24 2015-08-24 A kind of net flow assorted modeling method of making an uproar based on statistical nature Expired - Fee Related CN105141455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510521906.2A CN105141455B (en) 2015-08-24 2015-08-24 A kind of net flow assorted modeling method of making an uproar based on statistical nature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510521906.2A CN105141455B (en) 2015-08-24 2015-08-24 A kind of net flow assorted modeling method of making an uproar based on statistical nature

Publications (2)

Publication Number Publication Date
CN105141455A CN105141455A (en) 2015-12-09
CN105141455B true CN105141455B (en) 2018-08-17

Family

ID=54726673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510521906.2A Expired - Fee Related CN105141455B (en) 2015-08-24 2015-08-24 A kind of net flow assorted modeling method of making an uproar based on statistical nature

Country Status (1)

Country Link
CN (1) CN105141455B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107404398A (en) * 2017-05-31 2017-11-28 中山大学 A kind of networks congestion control judgement system
CN109344204B (en) * 2018-09-10 2020-05-19 中国人民解放军陆军工程大学 Network traffic classification method with optimal individual convergence rate
CN109151880B (en) * 2018-11-08 2021-06-22 中国人民解放军国防科技大学 Mobile application flow identification method based on multilayer classifier
CN109639481B (en) * 2018-12-11 2020-10-27 深圳先进技术研究院 Deep learning-based network traffic classification method and system and electronic equipment
CN109698836B (en) * 2019-02-01 2021-07-23 重庆邮电大学 Wireless local area network intrusion detection method and system based on deep learning
CN111314310B (en) * 2020-01-19 2021-02-12 浙江大学 Attack detection method for unresolvable network data feature selection based on machine learning
CN112367226A (en) * 2020-12-22 2021-02-12 长沙树根互联技术有限公司 Equipment-based working data acquisition method and device and electronic equipment
CN114615007B (en) * 2022-01-13 2023-05-23 中国科学院信息工程研究所 Tunnel mixed flow classification method and system based on random forest

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102271090A (en) * 2011-09-06 2011-12-07 电子科技大学 Transport-layer-characteristic-based traffic classification method and device
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2504952A1 (en) * 2009-11-27 2012-10-03 Telefonaktiebolaget LM Ericsson (publ) Packet classification method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102271090A (en) * 2011-09-06 2011-12-07 电子科技大学 Transport-layer-characteristic-based traffic classification method and device
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A new measure of classifier diversity in multiple classifier system;Tie-Gang Fan;《Proceedings of the 7th International Conference on Machine Learning and Cybernetics》;20080715;第18-21页 *
Assessing the predictive accuracy of diversity measures with domain-dependent, asymmetric misclassification costs;Mordechai Gal-Or;《Information Fusion》;ELSEVIER;20040721;第6卷;第37-48页 *
Benoît Frénay.Classification in Presence of Label Noise: a Survey.《IEEE Transactions on Neural Networks and Learning Systems》.2014,第25卷(第5期),第845-869页. *
Diversity creation methods: a survey and categorisation;Gavin Brown;《Information Fusion》;ELSEIER;20040519;第6卷;第5-20页 *
Robust Network Traffic Classification;Jun Zhang;《IEEE/ ACM Transactions on Networking》;20150508;第23卷(第4期);第1257-1270页 *
基于随机森林算法的网络流量分类方法;赵小欢;《中国电子科学研究院学报》;20130415;第8卷(第2期);第184-190页 *

Also Published As

Publication number Publication date
CN105141455A (en) 2015-12-09

Similar Documents

Publication Publication Date Title
CN105141455B (en) A kind of net flow assorted modeling method of making an uproar based on statistical nature
CN106341337B (en) Flow detection and control mechanism and method capable of realizing application awareness under SDN
CN103714383B (en) Rail transit fault diagnosis method and system based on rough set
CN102523241B (en) Method and device for classifying network traffic on line based on decision tree high-speed parallel processing
CN104767692B (en) A kind of net flow assorted method
CN107846326A (en) A kind of adaptive semi-supervised net flow assorted method, system and equipment
CN102420723A (en) Anomaly detection method for various kinds of intrusion
CN105871832A (en) Network application encrypted traffic recognition method and device based on protocol attributes
CN101645806B (en) Network flow classifying system and network flow classifying method combining DPI and DFI
CN109218223B (en) Robust network traffic classification method and system based on active learning
CN107819698A (en) A kind of net flow assorted method based on semi-supervised learning, computer equipment
CN109639734B (en) Abnormal flow detection method with computing resource adaptivity
CN104468567B (en) A kind of system and method for the identification of network multimedia Business Stream and mapping
CN101937445A (en) Automatic file classification system
CN104702465A (en) Parallel network flow classification method
CN112528277A (en) Hybrid intrusion detection method based on recurrent neural network
CN109981474A (en) A kind of network flow fine grit classification system and method for application-oriented software
CN104348741A (en) Method and system for detecting P2P (peer-to-peer) traffic based on multi-dimensional analysis and decision tree
CN108494594A (en) A kind of analysis method and system of EIGRP route networks failure
CN111709477A (en) Method and tool for garbage classification based on improved MobileNet network
Kong et al. Identification of abnormal network traffic using support vector machine
CN113408087A (en) Substation inspection method based on cloud side system and video intelligent analysis
CN110034966A (en) A kind of method for classifying data stream and system based on machine learning
CN103973589A (en) Network traffic classification method and device
CN107404398A (en) A kind of networks congestion control judgement system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180817

Termination date: 20190824