CN102299863B - Method and equipment for clustering network flow - Google Patents
Method and equipment for clustering network flow Download PDFInfo
- Publication number
- CN102299863B CN102299863B CN201110295431.1A CN201110295431A CN102299863B CN 102299863 B CN102299863 B CN 102299863B CN 201110295431 A CN201110295431 A CN 201110295431A CN 102299863 B CN102299863 B CN 102299863B
- Authority
- CN
- China
- Prior art keywords
- flow
- sample data
- cluster
- network flow
- discharge pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a method and equipment for clustering network flow. The method comprises the following steps of: acquiring global network flow; cutting the global network flow according to a single user to generate sample data; classifying network flow types of the flow according to the sample data; and selecting different characteristic combinations for clustering according to the flow types. The equipment comprises an acquiring unit, a sample data generating unit, a primary clustering unit and a secondary clustering unit. The method for clustering network flow has the advantages of high accuracy, high efficiency, wide flow identification range and capability of accurately mining application quantity in the network flow, and can be realized as network flow control equipment.
Description
Technical field
The present invention relates to data processing technique, particularly relate to a kind of recognition methods and equipment thereof of network traffics.
Background technology
So-called cluster, the process that the set by physics or abstract object is divided into the multiple classes be made up of similar object is called as cluster.What generated by cluster bunch is the set of one group of data object, and these objects are similar each other to the object in same bunch, different with the object in other bunches.Network fluidic device, when monitoring network traffics, carries out cluster to excavate the number of applications in flow often through to the network traffics of whole local area network (LAN).
Clustering method of the prior art carries out cluster based on whole local area network traffic, and because the user in whole local area network (LAN), number of applications are various, network traffics are complicated, cluster accuracy is difficult to be guaranteed, and based on the clustering network flow of whole Intranet, sample space is large, and efficiency is low.
Summary of the invention
The technical problem to be solved in the present invention is that clustering network flow efficiency accuracy is low, and the problem that cluster efficiency is slow.
Solve the problems of the technologies described above, one aspect of the present invention provides a kind of method of clustering network flow, and the method comprises the following steps: gather global network flow; Global network flow is carried out flow cutting according to single user, generates sample data; According to sample data, discharge pattern classification is carried out to flow; Different Feature Combinations is selected to carry out cluster according to discharge pattern.
Second aspect present invention provides a kind of equipment of clustering network flow.This equipment comprises: collecting unit, for gathering global network flow; Sample data generation unit, for carrying out flow cutting according to global network flow according to single user, generates sample data; One-level cluster cell, carries out discharge pattern classification according to sample data; Secondary cluster cell, selects different Feature Combinations to carry out cluster according to discharge pattern.
According to method of the present invention and equipment thereof, clustering network flow accuracy is high, efficiency fast, Traffic identification scope is wide, accurately can excavate the number of applications in network traffics, can be used as network Flow Control functions of the equipments and realizes.
Accompanying drawing explanation
Fig. 1 is the method for clustering network flow of the present invention and the application scenarios of equipment thereof;
Fig. 2 is the method flow diagram of embodiment of the present invention clustering network flow;
Fig. 3 is the equipment structure chart of embodiment of the present invention clustering network flow;
Fig. 4 is frequent item set production process schematic diagram in a load characteristic combination;
Fig. 5 is frequent item set production process schematic diagram in the combination of another load characteristic.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is a part of embodiment of the present invention.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under the prerequisite not making creative work, all belongs to the scope of protection of the invention.
Fig. 1 is the method for clustering network flow and the application scenarios of equipment thereof.As shown in Figure 1, LAN subscriber 11, user 12 and user 13 are by network fluidic device 22 accesses network.Network fluidic device 22 can obtain the network traffics of all users (user 11, user 12 and user 13) in whole local area network (LAN).
Fig. 2 is the method flow diagram of embodiment of the present invention clustering network flow.The method comprising the steps of 201-204.
In step 201, gather global network flow, namely gather the network traffics of all users of whole Intranet.
In step 202, global network flow carries out flow cutting according to single user, generates sample data.Particularly, the global network flow gathered in step 201 is carried out flow cutting according to Intranet single user, be divided into the model flow of multiple single user, generate sample data.
Sample data can be that a relevant information connected exports, and such as, the tuple information of connection, wraps long sequence information, up-downgoing flow count information, timestamp information and DPI tag along sort information etc.
In step 203, according to sample data, discharge pattern classification is carried out to flow.This step can be understood as the first order cluster for network traffics.
Particularly, network fluidic device is classified to discharge pattern according to the sample data generated in step 202., namely exports according to a relevant information connected and network traffics are divided into Encryption Model flow, non-encrypted model flow, P2P model flow and CS model flow etc.
In one example in which, judge network traffics type by adding up byte information output in a connection, such as, in connection, the probability of occurrence of each byte is based on equalization, then judge that this connection is Encryption Model flow.
In step 204, different Feature Combinations is selected to carry out flow cluster according to discharge pattern.This step can be understood as secondary cluster.
Particularly, according to the different flow type exported in step 203, different Feature Combinations is adopted to carry out flow cluster.Such as, for the model discharge pattern of encryption, tuple (tuple) Feature Combination, sample (pattern) Feature Combination or timestamp (timestamp) the Feature Combination model flow to encryption can be selected to carry out cluster.For unencrypted model discharge pattern, tuple (tuple) Feature Combination, load (payload) Feature Combination and timestamp (timestamp) Feature Combination can be selected to carry out cluster to unencrypted model flow.
In one example in which, carry out cluster for tuple (tuple) Feature Combination to the model flow of encryption or unencrypted model flow to set forth.It is the tuple information in flow is carried out fractionation combination that tuple (tuple) Feature Combination carries out cluster to flow, as the transport layer protocol in tuple information/source IP/ source port, transport layer protocol/Target IP/target port, transport layer protocol/source IP/ Target IP, connect with the various combination statistics of tuple information respectively, as transport layer protocol/source IP/ source port, transport layer protocol/Target IP/target port statistics is all connects the number of times occurred, consider the time difference connected, take out frequent item set (occurrence number is greater than 2).
Each frequent item set is merged by the mode of iteration.Merging rule is two two classification the insides arrived based on different tuple combination, if they include identical connection, all connections inside these two classes belong to same class.When through iteration, cannot set be merged, then iteration terminates again.
Such as, encryption or non-encrypted model flow in tuple information as shown in table 1:
Table 1
ID | Source IP | Source port | Target IP | Target port |
1 | 192.168.40.85 | 8000 | 202.38.64.2 | 4325 |
2 | 192.168.40.85 | 8000 | 202.38.64.3 | 5000 |
3 | 192.168.40.81 | 4000 | 202.38.64.3 | 5000 |
4 | 192.168.40.81 | 4000 | 202.38.64.4 | 4523 |
5 | 192.168.40.177 | 1452 | 202.106.46.14 | 12345 |
According to the tuple information in table 1, respectively by source IP and source port, the tuple information of No. ID 1 and No. ID 2 is polymerized to a class A by Target IP and target port, and No. ID 3 and No. ID 4 are polymerized to a class B, and the tuple information of No. ID 2 and No. ID 3 is polymerized to a class C.
After an iteration, judge whether each bunch of the inside includes identical connection, if had, merge, because source IP and the source port of No. ID 1 in A is connected identical, so A and C is merged into a class A ' with the source IP of No. ID 2 in C and source port.
Again after iteration, because source IP and the source port of No. ID 3 in A ' is connected identical, so A ' and B is merged into a class A with the source IP of No. ID 4 in B and source port ".
When set cannot merge by iteration again, iteration terminates.
In another example, for load (payload) Feature Combination, cluster is carried out to unencrypted model flow and set forth.Payload Feature Combination adopts the mode of sequential mode mining to carry out cluster, uses Apriori algorithm.The basic thought of algorithm is: first find out all frequent item sets, and the frequency that these collection occur is at least the same with predefined minimum support.Then produce Strong association rule by frequent item set, these rules must meet minimum support and Minimum support4.
First, from non-encrypted flow, export the load information of connection, get front 3 packets connected in both direction respectively, each packet gets 32, front and back byte.If long data packet is less than 32 bytes, remaining then with 0 filling.
Such as: data table items is as follows:
T0:I11,I23,I33
T1:I11,I24,I33
T2:I11,I22,I33
T3:I12,I25,I34
T3:I12,I22,I34
Arranging minimum support is 2
The first step: scan-data list item, calculates the number of times of each project appearance comprised in table.As shown in Figure 4, generate candidate C1, delete support in C1 and be less than the item collection that minimum support is 2, thus determine frequent item set L1.
Second step: as shown in Figure 5, produces candidate C2 by L1:, then scan-data table counts the item collection in C2, deletes the item collection that support is less than minimum support 2, thus determines frequent item set L2.
Finally because I11I33 and I12I34 in frequent item set L2 cannot merge again, so final frequent item set has two, one is that { I11, I33}, one is { I12, I34}.
In another example, carry out cluster for the model flow of sample (pattern) Feature Combination to encryption and set forth.By the similarity that two connect the long sequence of bag, Pattern Feature Combination judges whether two links belong to same application.Regard the long sequence of bag that two connect as a digital signal after over-sampling, the direction of bag is exactly the vibrations direction of signal, the similarity of the long sequences of bag that connects in the hope of two, just be converted to the coefficient correlation asking two signals, judge the similitude of two sampled signals in the signal processing with related system.
The computing formula of related system is as follows:
In above-mentioned formula, X, Y are respectively two long sequences of bag connected.When obtaining the related system maximum of current non-classified connection with the connection in each clustering cluster, so judge that current connection belongs to corresponding clustering cluster.
In another example, for timestamp (timestamp) Feature Combination, cluster is carried out to the model flow of encryption or unencrypted model flow and set forth.Timestamp Feature Combination is by calculating the timestamp correlation of current connection, such as connect the equispaced of time started, end time, bag, with the Euclidean distance of the connection in each clustering cluster, take out minimum value, judge that current connection belongs to the corresponding clustering cluster of Euclidean distance minimum value.
Euclidean distance computing formula: ρ (A, B)=sqrt [∑ (a [i]-b [i]) ^2] (i=1,2 ..., n), wherein a [i] and b [i] are respectively two timestamp correlations be connected.
The embodiment of the present invention is by gathering global network flow, global network flow is carried out flow cutting according to single user, generate sample data, and according to the mode of one-level cluster and secondary cluster, cluster is carried out to network traffics, accurately can excavate the number of applications in network traffics.
Fig. 3 is the equipment structure chart of embodiment of the present invention clustering network flow.Shown in institute figure, this equipment comprises collecting unit 31, sample data generation unit 32, one-level cluster cell 33 and secondary cluster cell 34.
Collecting unit 31 for gathering global network flow, i.e. the network traffics of all users of whole Intranet.
Sample data generation unit 32 carries out flow cutting for the global network flow gathered according to collecting unit 31 according to Intranet single user, generates sample data.Particularly, sample data generation unit 32 pairs of global network flows cut, and are divided into the model flow of multiple single user.
One-level cluster cell 33 carries out discharge pattern classification according to sample data to flow.Particularly, the data sample that one-level cluster cell 33 generates according to sample data generation unit 32 carries out discharge pattern classification to flow, such as, according to data sample, discharge pattern is divided into the model flow of encryption or unencrypted model flow.
Secondary cluster cell 34 selects different Feature Combinations to carry out cluster according to the discharge pattern that one-level cluster cell 33 exports.Particularly, cluster cell 34, according to discharge pattern classification results, selects different Feature Combinations to carry out cluster to flow.Such as, for the discharge pattern of encryption, tuple (tuple) Feature Combination, sample (pattern) Feature Combination and timestamp (timestamp) Feature Combination can be selected to carry out flow cluster to it.For unencrypted discharge pattern, tuple (tuple) Feature Combination, load (payload) Feature Combination and timestamp (timestamp) Feature Combination can be selected to carry out flow cluster to it.
The collecting unit 31 of the equipment of embodiment of the present invention clustering network flow, sample data generation unit 32, one-level cluster 33 and secondary cluster cell, respectively in order to realize the corresponding flow process of each method in Fig. 2, do not repeat them here.
Although illustrate and described specific embodiments of the present invention, but under the prerequisite not deviating from exemplary embodiment of the present invention and more broad aspect thereof, those skilled in the art obviously can make changes and modifications based on teaching herein.Therefore, appended claim is intended to all this kind of true spirits of exemplary embodiment of the present invention and the change of scope and change of not deviating to be included within its scope.
Claims (6)
1. a method for clustering network flow, is characterized in that: comprise the following steps:
Gather global network flow;
Described global network flow is carried out flow cutting according to single user, generates sample data;
According to described sample data, discharge pattern classification is carried out to flow;
Different Feature Combinations is selected to carry out cluster according to described discharge pattern;
Described described global network flow is carried out flow cutting according to single user, the step generating sample data comprises cuts described global network flow, is divided into the model flow of multiple single user;
Describedly according to described sample data, discharge pattern classification is carried out to flow and comprise, according to described sample data, discharge pattern is divided into Encryption Model flow or non-encrypted model flow.
2. method according to claim 1, it is characterized in that: describedly select different Feature Combinations to carry out cluster according to described discharge pattern to comprise when discharge pattern is Encryption Model flow, select the combination of tuple Feature Combination, sample characteristics or timestamp Feature Combination to carry out cluster.
3. method according to claim 1, it is characterized in that: describedly select different Feature Combinations to carry out cluster according to described discharge pattern to comprise when discharge pattern is non-encrypted model flow, select the combination of tuple Feature Combination, load characteristic or timestamp Feature Combination to carry out cluster.
4. an equipment for clustering network flow, is characterized in that comprising:
Collecting unit, for gathering global network flow;
Sample data generation unit, for carrying out flow cutting according to described global network flow according to single user, generates sample data;
One-level cluster cell, carries out discharge pattern classification according to described sample data to flow;
Secondary cluster cell, selects different Feature Combinations to carry out cluster according to described discharge pattern;
Described sample data generation unit cuts described global network flow, is divided into the model flow of multiple single user;
Discharge pattern is divided into Encryption Model flow or non-encrypted model flow according to described sample data by described one-level cluster cell.
5. equipment according to claim 4, is characterized in that: described secondary cluster cell selects the combination of tuple Feature Combination, sample characteristics or timestamp Feature Combination to carry out cluster to Encryption Model flow.
6. equipment according to claim 4, is characterized in that: described cluster cell selects the combination of tuple Feature Combination, load characteristic or timestamp Feature Combination to carry out cluster to non-encrypted model flow.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110295431.1A CN102299863B (en) | 2011-09-27 | 2011-09-27 | Method and equipment for clustering network flow |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110295431.1A CN102299863B (en) | 2011-09-27 | 2011-09-27 | Method and equipment for clustering network flow |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102299863A CN102299863A (en) | 2011-12-28 |
CN102299863B true CN102299863B (en) | 2015-02-11 |
Family
ID=45360050
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110295431.1A Active CN102299863B (en) | 2011-09-27 | 2011-09-27 | Method and equipment for clustering network flow |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102299863B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104394021B (en) * | 2014-12-09 | 2017-08-25 | 中南大学 | Exception of network traffic analysis method based on visualization cluster |
CN104767739B (en) * | 2015-03-23 | 2018-01-30 | 电子科技大学 | The method that unknown multi-protocols blended data frame is separated into single protocol data frame |
CN104753934B (en) * | 2015-03-23 | 2018-01-19 | 电子科技大学 | By the method that the more communication party's data stream separations of unknown protocol are Point-to-Point Data stream |
CN105022960B (en) * | 2015-08-10 | 2017-11-21 | 济南大学 | Multiple features mobile terminal from malicious software detecting method and system based on network traffics |
CN106452948A (en) * | 2016-09-22 | 2017-02-22 | 恒安嘉新(北京)科技有限公司 | Automatic classification method and system of network flow |
CN106850333B (en) * | 2016-12-23 | 2019-11-29 | 中国科学院信息工程研究所 | A kind of network equipment recognition methods and system based on feedback cluster |
CN109525508B (en) * | 2018-12-15 | 2022-06-21 | 深圳先进技术研究院 | Encrypted stream identification method and device based on flow similarity comparison and storage medium |
CN112822121A (en) * | 2019-11-15 | 2021-05-18 | 中兴通讯股份有限公司 | Traffic identification method, traffic determination method and knowledge graph establishment method |
CN114221816B (en) * | 2021-12-17 | 2024-05-03 | 恒安嘉新(北京)科技股份公司 | Flow detection method, device, equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102045363B (en) * | 2010-12-31 | 2013-10-09 | 华为数字技术(成都)有限公司 | Establishment, identification control method and device for network flow characteristic identification rule |
-
2011
- 2011-09-27 CN CN201110295431.1A patent/CN102299863B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN102299863A (en) | 2011-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102299863B (en) | Method and equipment for clustering network flow | |
US20130265883A1 (en) | Method and system for storing packet flows | |
CN108768986A (en) | A kind of encryption traffic classification method and server, computer readable storage medium | |
EP2337266A2 (en) | Detecting and classifying anomalies in communication networks | |
CN103716137A (en) | Method and system for identifying reasons of ZigBee sensor network packet loss | |
CN104102700A (en) | Categorizing method oriented to Internet unbalanced application flow | |
CN114401516A (en) | 5G slice network anomaly detection method based on virtual network traffic analysis | |
CN110768856A (en) | Network flow measuring method, network measuring equipment and control plane equipment | |
Yao et al. | Network anomaly detection using random forests and entropy of traffic features | |
Ding et al. | Internet traffic classification based on expanding vector of flow | |
CN112633353B (en) | Internet of things equipment identification method based on packet length probability distribution and k nearest neighbor algorithm | |
Nayak et al. | MAC protocol based IoT network intrusion detection using improved efficient shuffle bidirectional COOT channel attention network | |
Cai et al. | Flow identification and characteristics mining from internet traffic with hadoop | |
CN103297296A (en) | FPGA-based logical operation search method and system | |
CN107124410A (en) | Network safety situation feature clustering method based on machine deep learning | |
Damman et al. | Regular expressions for PCTL counterexamples | |
CN108234202B (en) | Method for discovering network topology based on life span | |
CN115242716B (en) | IP address route reachability identification method based on BGP prefix tree | |
Sajeev et al. | LASER: A novel hybrid peer to peer network traffic classification technique | |
Tapaswi et al. | Flow-based p2p network traffic classification using machine learning | |
CN106211139B (en) | A kind of recognition methods encrypting MANET interior joint type | |
CN106376020B (en) | A kind of recognition methods encrypting user type in MANET | |
CN106533756A (en) | Communication characteristic extraction and traffic generation method and device | |
CN102413007B (en) | Deep packet inspection method and equipment | |
CN100488167C (en) | Grouped data transmitting method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |