CN108141377A - Network flow early stage classifies - Google Patents

Network flow early stage classifies Download PDF

Info

Publication number
CN108141377A
CN108141377A CN201580083836.5A CN201580083836A CN108141377A CN 108141377 A CN108141377 A CN 108141377A CN 201580083836 A CN201580083836 A CN 201580083836A CN 108141377 A CN108141377 A CN 108141377A
Authority
CN
China
Prior art keywords
training
stream
unfiled
network
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201580083836.5A
Other languages
Chinese (zh)
Other versions
CN108141377B (en
Inventor
亚历山大·阿列克谢耶维奇·谢罗夫
瓦列里·尼古拉耶维奇·格卢霍夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN108141377A publication Critical patent/CN108141377A/en
Application granted granted Critical
Publication of CN108141377B publication Critical patent/CN108141377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Abstract

The present invention proposes a kind of network flow early stage sorting technique.The method includes the training stage, including:Capture overall length training network stream;Overall length training network stream allocation class for capture is other;The training prediction model on overall length training network stream;It is obtained from overall length training network stream and blocks training network stream;The prediction model of the training on overall length training network stream is applied to block training network stream, to obtain multiple trained classifications;The training classification that the prediction model blocked on training network stream is predicted will be used to be compared with distribution classification;And the comparison by consideration training classification and distribution classification, the training calibration model on training network stream is blocked.Forecast period includes:Preceding several packets of unfiled network flow are received, the unfiled network flow is the object of early stage classification;Never it is obtained in sorter network stream and blocks unfiled network flow;Prediction model is applied to block unfiled network flow, and export prediction model classification results;By considering prediction model classification results, calibration model is applied to block unfiled network flow, and output calibration category of model result;And merge calibration model classification results to make the final prediction to unfiled network flow.

Description

Network flow early stage classifies
Technical field
The present invention relates generally to network flow classification field.In particular it relates to the various of such as Business Stream Using the machine learning techniques for carrying out earlier network traffic classification and identification.The classification and flow business should carry out in real time It manages related.
Background technology
The prior art includes some known methods for carrying out net flow assorted.It is simplest to carry out net flow assorted Method depends on port numbers.Due to not detecting payload package, this method is a kind of comparatively faster sorting technique.But this method It is just applicable in only when using fixed port, and camouflage flow can not be detected.
Analysis method based on stream demonstrates its validity.Stream refers to shared identical traffic layer protocol, identical source and destination IP address and the packet set of identical source and destination TCP or UDP port number.
Deep message detection (Deep Packet Inspection, DPI) is that another kind is carried out based on mode-matching technique Net flow assorted method.DPI can only detection with header payload package, therefore be required in a kind of calculating it is higher Method.Advantageously, this method applies feature and network flow to be sorted by comparing, in a manner of unknowable by a kind of port Operation.But DPI needs to safeguard huge application property data base.
Other methods include behavioural analysis and statistical analysis.Characterized by network host builds template, the latter is based on for the former The machine learning classification algorithm of training statistical nature.In packet level statistical data or construction feature in grade statistical data can be flowed.Fig. 6 The comparison of packet level statistical data according to prior art and stream grade statistical data is shown.As shown in fig. 6, for packet level statistical number According to for each in packets several before stream, seeking the average value for the stream for belonging to specific application.For flowing grade statistical data, needle It averages to all packets of specific stream.
The machine learning processing of classification problem includes the workflow in two stages.First stage, referred to as training stage, including Use the example of the extensive label of feature.The second stage of workflow, referred to as forecast period, including to unmarked example, i.e., not dividing Class example is predicted.In general, the training stage is terminated with production or discriminate statistical model.In network traffic analysis mistake Cheng Zhong can build this class model offline from packet capturing.Before the training stage, it should package is closed in stream and maps stream To feature space.Additionally it should label is distributed for each stream according to the application of generation stream.Once it is built on the data set of label Model, it is possible to apply it in unmarked stream namely unfiled stream, to speculate the application of unfiled stream in forecast period.In advance The survey stage does not need to a large amount of computing resources, therefore can carry out in real time.
Due to being online processing, the packet of unfiled stream reaches one by one, so as to which at any time, forecast period does not need to handle Entire unfiled stream, and need to only handle can be referred to as the preceding several packets for blocking stream.Forecast period is more early can only use it is former A packet convection current is classified, it is possible to more early to improve the overall performance of network using certain strategy or rule.Therefore, prediction It is in-advance most important, and predict the in-advance length by blocking stream and determine.
Next it summarizes and net flow assorted and its in-advance relevant method of the prior art.
8 095 635 B2 of document US propose managing network flow to improve the availability of network service.The document is related to Based on the net flow assorted of stream grade statistical data that standard netflow record can be used to obtain.NetFlow specifically refers to think Company of section includes the NetStream methods of Huawei Company in the statistical realization of packet trace, wherein equivalent method.From the record number Partial Feature according to middle structure is unrelated with applicating category, therefore should remove these features to avoid unnecessary calculating.This article Offer middle propose using following symmetrical uncertain measurement come to feature divided rank:
Wherein H represents entropy function, (A1... ..., Am) representing feature, C represents traffic category, is considered as feature.
In the publication it is further proposed that determining the goodness of any given subset S of feature by values below:
By pressing the incrementss that symmetrical probabilistic descending continuously adds feature and monitoring goodness measures, can select Required characteristic set.The method proposed can utilize NetFlow sampling features to realize.It is shown in the document, packet sampling is not The accuracy of grader can be significantly affected, this is because big adfluxion evenly can be obtained when sample rate is relatively low, in sampling It closes, thus can more accurately be classified on sense organ.
But disadvantageously, this method cannot solve the problems, such as that network flow early stage classifies.
Zhengzheng Xing, Jian Pei and Philip S.Yu exist《Knowledge and information system》(Knowledge and Information Systems) the 2012nd:31:What 1 the 105-127 pages of phase delivered《Time series early stage classifies》(Early Classification of time series) in propose following kind of time series early stage classification (early Classification of time series, ECTS) method:
S [i], 1≤i≤L (3)
Wherein L represents the overall length of time series.
Based on 1 neighbour (1NN) sorting algorithm, the method proposed in the document depends on two times according to following equation The measurement of the distance between sequence:
The document proposes to improve the concept of minimum prediction prefix (Minimum Prediction Prefix, MPP).Pass through Structure, for time series t, any prefix longer than MPP (t) is all extra in 1NN classification, can go divided by realize morning Phase classifies.
But the ECTS methods described in the document be based on 1NN sorting algorithms, and use it is assumed hereinafter that:
1st, any time sequence is all to sequence (timestamp, value);
2nd, the length of all time serieses is L;
3rd, there is the metric for defining the distance between two time serieses;And
4th, training dataset is sufficiently large and uniform timed sample sequence.
Inevitably, these hypothesis be must satisfy when adjustment this method is used for network flow.First, 1NN classification can not It can be best sorting algorithm.In fact, there are the other graders for largely showing excellent performance, such as naive Bayesian point Class device, support vector machines (support vector machines, SVM) grader or decision tree classifier etc..Second, even if Euclidian metric (4) is redesigned to calculate packet feature, it is also not possible to ensure " consistency " of training dataset, packet feature can To be nominal (for example, header flag) or numerical value (for example, load).Third, above-mentioned hypothesis 1 and 3 need to store packet level spy Sign, packet level aspect ratio stream grade feature consume more memories.Finally, the time each reliably classified by ECTS methods The prediction length of sequence is all different.
Invention content
Recognize disadvantages mentioned above and problem, the present invention is directed to improve the prior art.Particularly, the object of the present invention is to provide Improved network flow early stage classifies.
From in known sorting technique by optimal objective be set to maximize classification accuracy it is different, particularly, if classify Accuracy is satisfactory, the invention is intended to by optimize it is in-advance come improve early stage classify.
The above-mentioned purpose of the present invention is realized by the scheme provided in accompanying independent claim.It will in corresponding appurtenance The advantageous embodiment of the present invention is further defined in asking.
According to the first aspect of the invention, a kind of network flow early stage sorting technique is provided.The method includes the training stages And forecast period.
The training stage includes capture overall length training network stream.The training stage includes instructing for the overall length of the capture Practice network flow distribution classification.The training stage is included in training prediction model on the overall length training network stream.The training Stage, which includes obtaining from the overall length training network stream, blocks training network stream.The training stage includes will be in the overall length The prediction model of training blocks training network stream applied to described on training network stream, to obtain multiple trained classifications.Institute The training stage is stated including the training classification that the prediction model that block on training network stream is predicted and described point will be used It is compared with classification.The training stage includes the comparison by considering the trained classification and the distribution classification, in institute It states and blocks training calibration model on training network stream.
The forecast period includes the preceding several packets for receiving unfiled network flow, and the unfiled network flow is to classify early stage Object.The forecast period, which includes obtaining from the unfiled network flow, blocks unfiled network flow.The forecast period Including applied to described the prediction model is blocked unfiled network flow, and export prediction model classification results.The prediction Stage is included by considering the prediction model classification results, and the calibration model is blocked unfiled network applied to described Stream, and output calibration category of model result.The forecast period includes merging the calibration model classification results to make to institute State the final prediction of unfiled network flow.
Therefore, the method for the proposition solves the problems, such as that network flow early stage classifies by machine learning classification technology.Have Profit, the accuracy and the choice between property ahead of time that the method balances prediction.Further, the method is based on stream Grade feature rather than packet level feature.Further, the method is general enough, can be to avoid any specific machine learning classification algorithm The limitation of application.
Therefore, the method for the proposition supplements the network packet classification algorithm based on reference characteristic, referred to as prediction model, with And acquisition algorithm is directed to the calibration model of sensitivity that stream blocks.This technology has more convenient for realizing flow point class earlier High accuracy.Once being trained in disconnection mode, the method for the proposition can carry out in real time to blocking unfiled stream Classification.First, by primal algorithm, i.e., described prediction model is applied to block the feature calculated on stream, then calculates the school The feature that positive model uses finally applies the calibration model to improve the prediction that primal algorithm is made.
In the first possible form of implementation according to the method described in first aspect, in the overall length training network stream The upper training prediction model includes:Fisrt feature set is extracted from the overall length training network stream of the capture, to obtain the One training feature vector;And in each of first training feature vector and the overall length training network stream for distributing to the capture The training prediction model from classification.
It is possible into one in second according to the method described in the first of first aspect or first aspect form of implementation It walks in form of implementation, the prediction model of the training on the overall length training network stream is blocked into training network applied to described Stream, is included with obtaining multiple trained classifications:Extraction fisrt feature set in training network stream is blocked from described, to obtain the second instruction Practice feature vector;Extraction second feature set in training network stream is blocked from described, to obtain third training feature vector;And The prediction model is applied to second training feature vector for blocking training network stream, to obtain the trained class Not.By considering the comparison of the trained classification and the distribution classification, the training school on training network stream is blocked described Positive model includes:The training straightening die in the third training feature vector, the distribution classification and the trained classification Type.
The third of method described in any one in the foregoing embodiments according to first aspect or first aspect can In the further form of implementation of energy, the prediction model is blocked into unfiled network flow applied to described in and is included:It is blocked from described Fisrt feature set is extracted in unfiled network flow;By from the fisrt feature set for blocking unfiled network flow Calculate the first predicted characteristics vector;And the first predicted characteristics vector for calculating is to the prediction model described in providing, to obtain Take corresponding prediction model classification results.
In the 4th kind of possible further form of implementation of the method described in the third form of implementation according to first aspect In, the calibration model is blocked into unfiled network flow applied to described in and is included:It is extracted from described block in unfiled network flow Second feature set;By calculating the second predicted characteristics vector from the second feature set for blocking unfiled network flow; And the prediction model classification results obtained described in the second predicted characteristics vector sum calculated described in providing are to the straightening die Type, to obtain the classification for blocking unfiled network flow.
In the 5th kind of possibility of the method described in first according to first aspect to any one in the 4th kind of form of implementation Further form of implementation in, the fisrt feature set extracted from network flow includes:The transport layer protocol of the stream;Institute State the mean size wrapped in stream;The standard deviation for the size wrapped in the stream;The interarrival time wrapped in the stream;In the stream The standard deviation of the interarrival time of packet;The entropy of application layer load wrapped in the stream;Or its arbitrary combination.
In the 6th kind of possibility of the method described in second according to first aspect to any one in the 5th kind of form of implementation Further form of implementation in, the second feature set extracted from network flow includes:The transport layer protocol of the stream;It is false The preceding L for being located at the stream wraps the second prediction model classification results calculated in the case of the second predicted characteristics vector Conditional probability;Assuming that it is wrapped in the preceding k of the stream described second pre- in the case of calculating the second predicted characteristics vector Survey the conditional probability of category of model result, wherein k<L;The relative position for blocking k-th of packet in stream, k/L;It is described to block stream It is preceding k packet mean size and it is described block stream in all packets mean size relative mistake;The preceding k packet for blocking stream Interarrival time and it is described block stream in all packets interarrival time relative mistake;The preceding k packet for blocking stream Application layer load entropy and it is described block stream in all packets application layer load entropy relative mistake;Or its arbitrary group It closes.Therefore, the calibration model can be based on being different from the primal algorithm, i.e., the feature that described prediction model uses.It in addition, can The primal algorithm is being blocked the export item of prediction for flowing and making as feature.
The 7th kind of method described in any one in the foregoing embodiments according to first aspect or first aspect can Can further form of implementation in, described take is obtained from the unfiled network flow and blocks unfiled network flow and includes:By losing The last one or the multiple packets for abandoning the preceding several packets received obtain and block unfiled network flow.
The 8th kind of method described in any one in the foregoing embodiments according to first aspect or first aspect can Can further form of implementation in, it is at least one that described to block unfiled network flow be by the way that the unfiled network flow is blocked What one intercepted length obtained.
In the 9th kind of possible further form of implementation of the method described in the 8th kind of form of implementation according to first aspect In, the intercepted length is fixed.
In the tenth kind of possible further reality of the method described in the 8th according to first aspect or the 9th kind of form of implementation It applies in form, in addition the unfiled network flow is blocked into the length less than the intercepted length, it is shorter unfiled to obtain Network flow;And the forecast period is additionally applied to the shorter unfiled network flow, to improve the standard of the classification True property.
According to the second aspect of the invention, a kind of computer program with program code is provided, when the computer journey When sequence is run on the computing device, method of the said program code for execution according to the first aspect of the invention.
According to the third aspect of the invention we, a kind of network flow early stage sorting device is provided.The equipment is used to perform training Stage and forecast period.In the training stage, the equipment is used for:Overall length training network stream is captured, and is the capture Overall length training network stream allocation class is other;The training prediction model on the overall length training network stream;From the overall length training network It is obtained in stream and blocks training network stream;The prediction model of the training on the overall length training network stream is applied to described cut Disconnected training network stream, to obtain multiple trained classifications;The prediction model blocked on training network stream will be used to predict Training classification with it is described distribution classification be compared;And the ratio by considering the trained classification and the distribution classification Compared with blocking on training network stream training calibration model described.In the forecast period, the equipment is used for:It receives unfiled Preceding several packets of network flow, the unfiled network flow are the objects of early stage classification;By abandoning last received one or more A packet obtains from the unfiled network flow and blocks unfiled network flow;The prediction model is applied to described block not Sorter network stream, and export prediction model classification results;By considering the prediction model classification results, by the calibration model Unfiled network flow, and output calibration category of model result are blocked applied to described;And merge the calibration model classification knot Fruit is to make the final prediction to the unfiled network flow.
It must further be noted that the armamentarium described in this application, element, unit and component can be in software or hardware Implement in element or its any kind of combination.It all steps for being performed by each entity described herein and retouches It states into and the functionality performed by each entity is meant into corresponding entity for performing corresponding steps and functionality.Even if following The concrete function or step that in the description of specific embodiment, are performed by external entity are not reflected in and perform the specific steps Or the specific of the entity of function is described in detail in the description of element, the technical staff in the field also will be apparent to the skilled artisan that these methods and work( Corresponding software or hardware element or its any kind of combination can be used to realize.
Description of the drawings
With reference to appended attached drawing, below the description of specific embodiment will illustrate the various aspects of the invention described above and its realize shape Formula, wherein:
Fig. 1 is the schematic diagram according to the network flow early stage sorting technique of the embodiment of the present invention;
Fig. 2 is the schematic diagram according to the network flow early stage sorting technique of another embodiment of the present invention;
Fig. 3 is the schematic diagram according to the network flow early stage sorting technique of another embodiment of the present invention;
Fig. 4 is the schematic diagram according to the network flow early stage sorting technique of another embodiment of the present invention;
Fig. 5 is the schematic diagram according to the network flow early stage sorting technique of another embodiment of the present invention;
Fig. 6 shows packet level statistical data and flows the comparison of grade statistical data;
Fig. 7 shows workflow of classifying according to prior art;
Fig. 8 shows classification workflow according to an embodiment of the invention;And
Fig. 9 shows the improvement in terms of in-advance and accuracy according to an embodiment of the invention.
Specific embodiment
Fig. 1 is the schematic diagram according to the network flow early stage sorting technique of the embodiment of the present invention.
The method includes training stages 101 and forecast period 111.Training stage 101 can be from training network stream The network flow of form capture starts off-line execution, i.e., is performed before classification.Because this phases-time is not urgent, therefore It can be with off-line execution.On the other hand, forecast period 111 can perform online, you can with when receiving network flow to be sorted Classification is completed in real time.Forecast period 111 can perform online, to guarantee to realize early by optimizing in-advance and classification accuracy Phase classifies.
Method shown in schematic diagram includes the following steps during the training stage 101.
In first step 102, overall length training network stream is captured.Training network stream is the net captured during the training stage Network stream.Network flow can be by the transmission control protocol (Transmission of point of observation in network in sometime interval Control Protocol, TCP) or User Datagram Protocol (User Datagram Protocol, UDP) network flow.Network The five-tuple that the feature of stream can be made of the following contents:
Source IP address;
Source port;
Purpose IP address;
Destination interface;With
Transport layer protocol.
Transport layer protocol is according to open system interconnection (Open Systems Interconnection, OSI) model The agreement of layer 4.For example, transport layer protocol can be TCP or UDP.A pair that each network flow can be created by communications applications Slot transmits.
Network flow can correspond to packet sequence from source to destination especially with respect to packet network.Source can serve as reasons The network entity that source IP address and source port define, and destination can be another to be defined by purpose IP address and destination interface Network entity.Then, capture network flow can include capturing the packet sequence and by packet sequence combination in network flow.
According to Rajahalme et al. IETF in 2004 3697 files of RFC " IPv6 stream labels specification " (IPv6Flow Label Specification) is described, and network flow can be to be sent to particular unicast from particular source, arbitrarily broadcast Or the sequence packet of multicast destination, the source, which wishes the sequence packet being referred to as, flows, and this network flow can be by the specifically company of transmission Connect or Media Stream in all packet form.According to Quittek et al., in 3917 files of RFC of IETF in 2004, " IP stream informations are defeated Go out requirement " (Requirements for IP Flow Information Export, IPFIX) described, network flow can be Sometime by the IP packets set of point of observation in network in interval, wherein all packets for belonging to specific stream can have one group Denominator.
Overall length training network stream in its life cycle all packets that a pair of of slot transmits by being made up of.Alternatively, especially exist In practice, overall length training network stream may not be by being made up of in its life cycle all packets that a pair of of slot transmits, but by In enough long periods, i.e., the packet captured within the period longer than threshold value is formed.
In next step 103, classification c is distributed to the overall length training network stream of capture.According to generation overall length training net The application of network stream, classification c or label can distribute to the overall length training network stream of each capture.Generate overall length training network stream Using can be grouped, and corresponding classification c is distributed to the stream.It alternatively, can be according to for example generating network flow Platform or service distribute classification c.
In next step 104, the training prediction model on overall length training network stream.
In next step 105, obtained from overall length training network stream and block training network stream.
In next step 106, the prediction model of the training on overall length training network stream is applied to block training network Stream, to obtain multiple trained classifications
In next step 107, by use block the prediction model on training network stream prediction training classificationWith distribution Classification c is compared.
In next step 108, pass through consideration training classificationComparison with distributing classification c, on training network stream is blocked Training calibration model.
In the training stage 101, prediction model be on overall length training network stream training 104, and application 106 in block instruction Practice network flow, to obtain trained classification.Calibration model is the training 108 in the training classification of these acquisitions, is also considered simultaneously To 107 result of comparison of result and the classification distribution 103 of prediction model.
Particularly, during the training stage 101,104 prediction models of training first on overall length training network stream then should It is made prediction with 106 prediction models to blocking training network stream, to observe whether prediction model carries out to blocking training network stream The classification of mistake.Thereby, the present invention provides the calibration model for inputting and treating training 108.At this point, calibration model be meta-model or Person is that a kind of storehouse is extensive for integrated study.
Method shown in schematic diagram includes the following steps during forecast period 111.
In step 112, preceding several packets of unfiled network flow are received, unfiled network flow is the object of early stage classification. In the schematic diagram of Fig. 1, step 112 performs after the step 108 of training stage 101.In that connection, it is preferred that training rank 101 off-line executions of section, forecast period 111 perform online.
It is obtained in next step 113, never in sorter network stream and blocks unfiled network flow.It can receive to fixed number After the packet of amount, unfiled network flow is blocked by merging online obtain of the packet of given quantity of the reception.Preferably, it blocks The packet of the merging of unfiled network flow is a part for the sum for the packet for belonging to unfiled network flow.It can be by abandoning what is received Unfiled network flow is blocked in the last one or multiple packet acquisitions received.
In next step 114, prediction model is applied to block unfiled network flow, and exports prediction model classification knot Fruit
In next step 115, by considering prediction model classification resultsCalibration model is applied to block unfiled net Network stream, and output calibration category of model result.
In next step 116, merge calibration model classification results to make the final prediction to unfiled network flow.
In forecast period 111, by the way that prediction model is not divided using 115 in blocking using 114 in by calibration model Class network flow carrys out the classification of pre- flow measurement.In fact, when prediction model to be applied to block unfiled network flow rather than be applied to During the unfiled network flow of overall length, noise can be generated.By applying calibration model, it is possible to reduce this noise.
In fact, forecast period 111 can first start, without waiting for all packets of network flow to be sorted.Straightening die Type can be applied to than be supplied to the unfiled network flow of prediction model it is shorter block unfiled network flow.Therefore, it is predicting During stage 111, once prediction model is applied to after overall length training network stream during the training stage 101, it is possible to it will Prediction model is using 114 in all packets received.Then, by calibration model be applied multiple times 115 in from it is described receive carry What is obtained in the packet of availability forecast model blocks unfiled network flow.Thereby, the multiple predictions obtained from calibration model are then closed And 116 together.
In step 115, calibration model is generated in all multiple predictions blocked on unfiled network flow and made, i.e., multiple Calibration model classification results, described all to block unfiled network flow be from the reception for being supplied to prediction model in step 114 112 to packet in obtain 113.116 predictions can be merged for example, by scoring, to generate the classification of unfiled network flow Final prediction.
Fig. 2 is the schematic diagram according to the network flow early stage sorting technique of another embodiment of the present invention.According to the present embodiment, The step 104 of training prediction model includes step 201 and 202 on overall length training network stream in the embodiment of Fig. 1.
In step 201, fisrt feature set is extracted from the overall length training network stream of capture, it is special to obtain the first training Levy vector x.
In step 202, in the respective classification of the first training feature vector x He the overall length training network stream for distributing to capture Training prediction model on c.
Fig. 3 is the schematic diagram according to the network flow early stage sorting technique of another embodiment of the present invention.According to the present embodiment, In the embodiment of Fig. 1 by overall length training network stream training prediction model be applied to block training network stream, it is more to obtain A trained classificationStep 106 include step 301,302 and 303.Further, pass through consideration training class in the embodiment of Fig. 1 NotComparison with distributing classification c, the step 108 of training calibration model includes step 304 on training network stream is blocked.
In step 301, from extraction fisrt feature set in training network stream is blocked, to obtain the second training feature vector
In step 302, from extraction second feature set in training network stream is blocked, to obtain third training feature vector
In step 303, prediction model is applied to block to the second training feature vector of training network streamTo obtain Training classification
In step 304, in third training feature vectorDistribute classification c and training classificationUpper trained calibration model.
Fig. 4 is the schematic diagram according to the network flow early stage sorting technique of another embodiment of the present invention.According to the present embodiment, The step 114 that prediction model is applied to block to unfiled network flow in the embodiment of Fig. 1 includes step 401 to 403.
In step 401, from blocking in unfiled network flow extraction fisrt feature set.
In step 402, by from block unfiled network flow fisrt feature set calculate the first predicted characteristics to Amount
In step 403, the first predicted characteristics vector calculated is providedTo prediction model, to obtain corresponding prediction Category of model result
Fig. 5 is the schematic diagram according to the network flow early stage sorting technique of another embodiment of the present invention.According to the present embodiment, The step 115 that calibration model is applied to block to unfiled network flow in the embodiment of Fig. 1 includes step 501 to 503.
In step 501, from blocking in unfiled network flow extraction second feature set.
In step 502, by from block unfiled network flow second feature set calculate the second predicted characteristics to Amount
In step 503, the second predicted characteristics vector calculated is providedWith the prediction model classification results of acquisitionIt gives Calibration model, to obtain the classification for blocking unfiled network flow.
According to another embodiment of the present invention, traffic capture can include training network stream, and the network flow can be by example The application for such as generating network flow is grouped, and wherein c represents the classification or label of the application group.The present invention is proposed based on base The sorting algorithm of quasi- feature can learn on training network stream, to use the feature of unfiled or unlabelled network flow pre- Survey its application.
In the learning method of proposition, propose to extract 201 features, and phase from overall length training network stream in the training stage 101 It should 202 prediction models of ground training.Once prediction model adapts to data, it is possible to for predicting.For example, it proposes to obtain prediction mould The condition posterior probability p (c | x) of type.If x is the first training feature vector calculated for overall length training network flowmeter, count Value p (c | x) represent that overall length training network stream belongs to the probability of classification c.It may then pass through following equation and obtain answering for generation stream With:
class(x)←argmaxcp(c|x) (5)
In the prior art, it is assumed that calculate feature on overall length training network stream, it means that in each stream All wrap is averaged.Fig. 7 shows to correspond to classification workflow according to prior art.In off-line training step, make With the training dataset corresponding to overall length training network stream come training pattern, i.e. prediction model.In the on-line prediction stage, by training Prediction model be applied to the test stream that receives, to be predicted in real time.
Now, according to the present invention it is proposed that calculating same characteristic features set in the truncated version of overall length training network stream.If It averages just for packets several before overall length training network stream, it is possible to obtain the calculated on training network stream is blocked Two training feature vectorsBecause the first training feature vector x and the second training feature vectorIt can realize and be referred to as the first spy Collect the same characteristic features set closed, it is thus possible to by above-mentioned equation (5) applied to the second training feature vectorWith according to following The early stage classification of stream is blocked in equation execution:
By intercepted length or it is each it is to be sorted block network flow first and the last one packet between time interval come Determine the in-advance of prediction.Although two kinds of Limit Types are all eligible, it is particularly possible to using first.That is, this hair The most Accurate Prediction that can be realized on unfiled network flow is blocked based on fixed intercepted length is sought in bright effort.Meanwhile this hair It is bright that overall length training network stream is utilized during the training stage.
Because the secondary vector calculated on stream is blockedIt is classification that is noisy, being predicted according to equation (6)It can be with Different from the classification c calculated on corresponding overall length stream according to equation (5).The key concept of the present invention is to build and be referred to as The supplementary model of calibration model, capture prediction model are directed to the sensitivity that stream blocks." sensitivity " represents prediction model herein The prediction made on network flow is blocked and for the difference between the prediction of corresponding overall length training network stream.Calibration model is used InRight, feature can be different from the feature of prediction model.In addition, as second-level model, calibration model can utilize from The feature obtained in the prediction that prediction model is made.
Several packets, which are formed, before stream has featureBlock stream, and block stream be predicted model classifications for belong to In classificationUnder conditions of, it may be considered thatIt is the posterior probability that stream belongs to classification c.Then it can be calculated according to following equation Go out the classification of stream:
Fig. 8 shows the classification workflow of network flow early stage classification according to an embodiment of the invention.
Prediction and calibration model are trained in off-line mode.First, it proposes to extract 201 from overall length training network stream Feature x, the feature is considered as the example of prediction model structure, and 202 are trained in these features and associated class label Prediction model.
Second, it proposes to choose ladle sample sheet in overall length training network stream, and form 105 and block training network stream.By building, Any training network stream that blocks all is by unwrapping all packets to begin and in the sample flow of the end-of-packet of sample decimation from first It forms.In order to build calibration model, propose to extract 301 and 302 features from blocking training network streamWithIt is pre- by what is had been built up Survey model using 303 inTo obtain trained classificationTraining classificationExpression passes through the prediction model on training network stream is blocked The conjecture of the classification about corresponding overall length training network stream calculated.
Then, proposing will training classificationThe c predicted on corresponding training network stream with prediction model carries out control 107, And in featureWithTo upper 304 calibration model of training.
Model is after training for the online classification during forecast period 111.It is first for network flow to be sorted First calculate 401 featuresAnd by feature402 are provided to prediction model, to be obtained according to equation (6)Then, 502 features are calculatedAnd by feature503 are provided to calibration model, to obtain classification c according to equation (7).
If intercepted length L is fixed, and resource is available for blocking fluxion time weight for the shorter than various length of L Multiple above-mentioned assorting process, wherein L correspond to stream to be sorted, then several c that can merge can be obtained for example, by scoring Value, to further improve accuracy.
One embodiment of this method is tested on traffic capture, and the traffic capture includes continuous by 4 The stream of the P2P clients of Skype voip calls and running background generation.For the test, it is described stream be marked with from Deep message detection (Deep Packet Inspection, DPI) effectiveness of Skype extraction flows.Under represent these flows The statistical data of capture:
Capture Duration, second Size, byte Packet TCP packets UDP packets Stream Skype flows
1 167.45 6116576 12414 6755 5590 480 35
2 108.45 130444 6734 1330 5331 444 21
3 189.14 5634602 11389 5752 5620 508 25
4 98.17 2840772 9122 3163 5937 755 24
Benchmark support vector machines (support vector machines, SVM) algorithm is applied in a looping fashion These captures four times.Each during the experiment is all attempted to detect in primary capture using other captures three times as study Skype flows.Every time after experiment, all performance indicator is assessed, such as accuracy, recall ratio and combination the first two index The F1 scores of acquisition.
From the example for be supplied to prediction model extract following characteristics (element x and):
The transport layer protocol (TCP or UDP) of stream;
The mean size wrapped in stream;
The standard deviation for the size wrapped in stream;
The interarrival time wrapped in stream;
The standard deviation of the interarrival time wrapped in stream;And
The entropy of application layer load wrapped in stream.
Following characteristics (element is extracted from the example for be supplied to calibration model):
The transport layer protocol (TCP or UDP) of stream;
It is directed to and wraps the applicating category that calculates in the preceding L of streamWith's
It is directed to the preceding k (k in stream<L it is) a to wrap the applicating category calculatedWith'sAnd
Block the relative position of k-th of packet in stream, k/L;
Block the relative mistake of the mean size of the preceding k packet of stream and the mean size for all packets for blocking stream;
Block the relative mistake of the interarrival time of the preceding k packet of stream and the interarrival time for all packets for blocking stream; And
Block the entropy of the application layer load of the preceding k packet of stream and the entropy of the application layer load for all packets for blocking stream Relative mistake.
Fig. 9 shows the improvement in terms of in-advance and accuracy according to an embodiment of the invention, especially shows for classification The various intercepted lengths measurement of network flow can reach accuracy promotion.
Fig. 9 show for intercepted length L from 1 to 10 measure of the change to whole F1 scores.Dark color illustrates that benchmark SVM is calculated The accuracy of method, light color represent the improvement of the present invention, and range is from 13% to 30%.As demonstrated in Figure 9, for numerical value 3,4,5 Reach the improvement of highest 30%.
Horizontal dotted line corresponds to the accuracy of the benchmark algorithm for overall length flow measurement.The method of the present invention only uses each stream Preceding 5 packets just reach identical accuracy, it means that classification can be more quickly completed.
The different embodiments and embodiment for having been combined as example describe the present invention.But those skilled in the art By putting into practice claimed invention, research attached drawing, the disclosure and independent claim, it is to be understood that and obtain other variants.It will in right In asking and describing, term " comprising " is not excluded for other elements or step, and plural possibility is not precluded in "one".Discrete component Or other units can meet the function of several entities or project described in claims.Only it is being described with certain measures The fact that this is simple in mutually different dependent claims is not meant to that the combination of these measures cannot be advantageous Realization method in use.

Claims (13)

1. a kind of network flow early stage sorting technique, which is characterized in that
The method includes training stage (101) and forecast period (111),
The training stage (101) includes:
Capture (102) overall length training network stream;
Overall length training network stream for the capture distributes (103) classification (c);
Training (104) prediction model on the overall length training network stream;
(105) are obtained from the overall length training network stream and block training network stream;
The prediction model of the training on the overall length training network stream is blocked into training network stream using (106) in described, To obtain multiple trained classifications
The training classification that the prediction model blocked on training network stream will be used to predictWith the distribution classification (c) it is compared (107);And
By considering the trained classificationWith the comparison of the distribution classification (c), training on training network stream is blocked described (108) calibration model;And
The forecast period (111) includes:
Preceding several packets of (112) unfiled network flow are received, the unfiled network flow is the object of early stage classification;
(113) are obtained from the unfiled network flow and block unfiled network flow;
The prediction model is blocked into unfiled network flow, and export prediction model classification results using (114) in described
By considering the prediction model classification resultsThe calibration model is blocked into unfiled net using (115) in described Network stream, and output calibration category of model result;And
Merge (116) described calibration model classification results to make the final prediction to the unfiled network flow.
2. according to the method described in claim 1, it is characterized in that,
Training (104) described prediction model includes on the overall length training network stream:
(201) fisrt feature set is extracted from the overall length training network stream of the capture, to obtain the first training feature vector (x);And
In first training feature vector (x) and the respective classification (c) of overall length training network stream for distributing to the capture Training (202) described prediction model.
3. according to the method described in claim 2, it is characterized in that,
The prediction model of the training on the overall length training network stream is blocked into training network stream using (106) in described, To obtain multiple trained classificationsIncluding:
Extraction (301) described fisrt feature set in training network stream is blocked from described, to obtain the second training feature vector
Extraction (302) second feature set in training network stream is blocked from described, to obtain third training feature vectorAnd
By prediction model application (303) in second training feature vector for blocking training network streamTo obtain Take the trained classificationAnd
By considering the trained classificationWith the comparison of the distribution classification (c), training on training network stream is blocked described (108) calibration model includes:
In the third training feature vectorThe distribution classification (c) and the trained classificationUpper training (304) is described Calibration model.
4. method according to any one of the preceding claims, which is characterized in that
Prediction model application (114) is blocked unfiled network flow and included in described:
Extraction (401) fisrt feature set in unfiled network flow is blocked from described;
By calculating (402) first predicted characteristics vector from the fisrt feature set for blocking unfiled network flowAnd
(403) described the first predicted characteristics vector calculated is providedTo the prediction model, to obtain corresponding prediction mould Type classification results
5. according to the method described in claim 4, it is characterized in that,
Calibration model application (115) is blocked unfiled network flow and included in described:
Extraction (501) second feature set in unfiled network flow is blocked from described;
By calculating (502) second predicted characteristics vector from the second feature set for blocking unfiled network flowWith And
(503) described the second predicted characteristics vector calculated is providedWith the prediction model classification results of the acquisitionIt gives The calibration model, to block the classification of unfiled network flow described in acquisition.
6. the method according to any claim in claim 2 to 5, which is characterized in that
The fisrt feature set extracted from network flow includes:
The transport layer protocol of the stream;
The mean size wrapped in the stream;
The standard deviation for the size wrapped in the stream;
The interarrival time wrapped in the stream;
The standard deviation of the interarrival time wrapped in the stream;
The entropy of application layer load wrapped in the stream;Or
It is arbitrarily combined.
7. the method according to any claim in claim 3 to 6, which is characterized in that
The second feature set extracted from network flow includes:
The transport layer protocol of the stream;
Assuming that a wrap of preceding L in the stream calculates the second predicted characteristics vectorIn the case of it is described second prediction mould Type classification resultsConditional probability
Assuming that a wrap of preceding k in the stream calculates the second predicted characteristics vectorIn the case of it is described second prediction mould Type classification resultsConditional probabilityWherein k<L;
The relative position for blocking k-th of packet in stream, k/L;
The mean size and the relative mistake for blocking the mean size of all packets in stream of the preceding k packet for blocking stream;
It is described block stream it is preceding k packet interarrival time and it is described block stream in all packets interarrival time it is opposite Difference;
The entropy and the entropy for blocking the application layer load of all packets in stream of the application layer load of the preceding k packet for blocking stream The relative mistake of value;Or
It is arbitrarily combined.
8. method according to any one of the preceding claims, which is characterized in that
Unfiled network flow is blocked from the unfiled network flow described in acquisition (113) to include:
It obtains by the last one or multiple packets of packets several before being received described in discarding and blocks unfiled network flow.
9. method according to any one of the preceding claims, which is characterized in that
Unfiled network flow is blocked described at least one to be obtained by the way that the unfiled network flow is blocked an intercepted length (L) 's.
10. according to the method described in claim 9, it is characterized in that,
The intercepted length (L) is fixed.
11. method according to claim 9 or 10, which is characterized in that
In addition the unfiled network flow is blocked into the length less than the intercepted length (L), to obtain shorter unfiled net Network stream;And
The forecast period is additionally applied to the shorter unfiled network flow, to improve the accuracy of the classification.
12. a kind of computer program with program code, which is characterized in that when the computer program is transported on the computing device During row, said program code is used to perform the method described in any claim in preceding claims.
13. a kind of network flow early stage sorting device, which is characterized in that
The equipment is used to perform training stage and forecast period,
In the training stage, the equipment is used for:
Overall length training network stream is captured, and the overall length training network stream allocation class for the capture is not (c);
The training prediction model on the overall length training network stream;
It is obtained from the overall length training network stream and blocks training network stream;
The prediction model of the training on the overall length training network stream is blocked into training network stream applied to described, to obtain Multiple trained classifications
The training classification that the prediction model blocked on training network stream will be used to predictWith the distribution classification (c) it is compared;And
By considering the trained classificationWith the comparison of the distribution classification (c), training on training network stream is blocked described Calibration model;And
In the forecast period, the equipment is used for:
Preceding several packets of unfiled network flow are received, the unfiled network flow is the object of early stage classification;
It is obtained from the unfiled network flow and blocks unfiled network flow;
The prediction model is blocked into unfiled network flow, and export prediction model classification results applied to described
By considering the prediction model classification resultsThe calibration model is blocked into unfiled network flow applied to described, And output calibration category of model result;And
Merge the calibration model classification results to make the final prediction to the unfiled network flow.
CN201580083836.5A 2015-10-12 2015-10-12 Early classification of network flows Active CN108141377B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2015/000662 WO2017065627A1 (en) 2015-10-12 2015-10-12 Early classification of network flows

Publications (2)

Publication Number Publication Date
CN108141377A true CN108141377A (en) 2018-06-08
CN108141377B CN108141377B (en) 2020-08-07

Family

ID=55969445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201580083836.5A Active CN108141377B (en) 2015-10-12 2015-10-12 Early classification of network flows

Country Status (2)

Country Link
CN (1) CN108141377B (en)
WO (1) WO2017065627A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10721134B2 (en) * 2017-08-30 2020-07-21 Citrix Systems, Inc. Inferring radio type from clustering algorithms
US10772016B2 (en) 2018-12-05 2020-09-08 At&T Intellectual Property I, L.P. Real-time user traffic classification in wireless networks
CN111211948B (en) * 2020-01-15 2022-05-27 太原理工大学 Shodan flow identification method based on load characteristics and statistical characteristics

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090171662A1 (en) * 2007-12-27 2009-07-02 Sehda, Inc. Robust Information Extraction from Utterances
CN101645806A (en) * 2009-09-04 2010-02-10 东南大学 Network flow classifying system and network flow classifying method combining DPI and DFI
CN101741744A (en) * 2009-12-17 2010-06-16 东南大学 Network flow identification method
CN102739522A (en) * 2012-06-04 2012-10-17 华为技术有限公司 Method and device for classifying Internet data streams
EP2521312A2 (en) * 2011-05-02 2012-11-07 Telefonaktiebolaget L M Ericsson (publ) Creating and using multiple packet traffic profiling models to profile packet flows
CN102957579A (en) * 2012-09-29 2013-03-06 北京邮电大学 Network anomaly traffic monitoring method and device
CN103593470A (en) * 2013-11-29 2014-02-19 河南大学 Double-degree integrated unbalanced data stream classification algorithm
US20150161024A1 (en) * 2013-12-06 2015-06-11 Qualcomm Incorporated Methods and Systems of Generating Application-Specific Models for the Targeted Protection of Vital Applications

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8095635B2 (en) 2008-07-21 2012-01-10 At&T Intellectual Property I, Lp Managing network traffic for improved availability of network services

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090171662A1 (en) * 2007-12-27 2009-07-02 Sehda, Inc. Robust Information Extraction from Utterances
CN101645806A (en) * 2009-09-04 2010-02-10 东南大学 Network flow classifying system and network flow classifying method combining DPI and DFI
CN101741744A (en) * 2009-12-17 2010-06-16 东南大学 Network flow identification method
EP2521312A2 (en) * 2011-05-02 2012-11-07 Telefonaktiebolaget L M Ericsson (publ) Creating and using multiple packet traffic profiling models to profile packet flows
CN102739522A (en) * 2012-06-04 2012-10-17 华为技术有限公司 Method and device for classifying Internet data streams
CN102957579A (en) * 2012-09-29 2013-03-06 北京邮电大学 Network anomaly traffic monitoring method and device
CN103593470A (en) * 2013-11-29 2014-02-19 河南大学 Double-degree integrated unbalanced data stream classification algorithm
US20150161024A1 (en) * 2013-12-06 2015-06-11 Qualcomm Incorporated Methods and Systems of Generating Application-Specific Models for the Targeted Protection of Vital Applications

Also Published As

Publication number Publication date
CN108141377B (en) 2020-08-07
WO2017065627A1 (en) 2017-04-20

Similar Documents

Publication Publication Date Title
Shafiq et al. A machine learning approach for feature selection traffic classification using security analysis
US8797901B2 (en) Method and its devices of network TCP traffic online identification using features in the head of the data flow
Zhang et al. Internet traffic classification by aggregating correlated naive bayes predictions
US20110040706A1 (en) Scalable traffic classifier and classifier training system
CN107786388B (en) Anomaly detection system based on large-scale network flow data
CN103796183B (en) A kind of refuse messages recognition methods and device
CN107819698A (en) A kind of net flow assorted method based on semi-supervised learning, computer equipment
CN108629183A (en) Multi-model malicious code detecting method based on Credibility probability section
CN105141455B (en) A kind of net flow assorted modeling method of making an uproar based on statistical nature
CN106612511B (en) Wireless network throughput evaluation method and device based on support vector machine
Lazaris et al. An LSTM framework for modeling network traffic
CN109299742A (en) Method, apparatus, equipment and the storage medium of automatic discovery unknown network stream
CN110363427A (en) Model quality evaluation method and apparatus
CN111526101A (en) Machine learning-based dynamic traffic classification method for Internet of things
CN108141377A (en) Network flow early stage classifies
US11558769B2 (en) Estimating apparatus, system, method, and computer-readable medium, and learning apparatus, method, and computer-readable medium
Min et al. Online Internet traffic identification algorithm based on multistage classifier
Wehner et al. Improving web qoe monitoring for encrypted network traffic through time series modeling
CN101764754A (en) Sample acquiring method in business identifying system based on DPI and DFI
CN102098346B (en) Method for identifying flow of P2P (peer-to-peer) stream media in unknown flow
Bozkır et al. A new platform for machine-learning-based network traffic classification
Lu et al. Cascaded classifier for improving traffic classification accuracy
Ogino Evaluation of machine learning method for intrusion detection system on Jubatus
Drozdenko et al. Utilizing Deep Learning Techniques to Detect Zero Day Exploits in Network Traffic Flows
Shao et al. Emilie: Enhance the power of traffic identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant