CN102299863B - Method and equipment for clustering network flow - Google Patents

Method and equipment for clustering network flow Download PDF

Info

Publication number
CN102299863B
CN102299863B CN201110295431.1A CN201110295431A CN102299863B CN 102299863 B CN102299863 B CN 102299863B CN 201110295431 A CN201110295431 A CN 201110295431A CN 102299863 B CN102299863 B CN 102299863B
Authority
CN
China
Prior art keywords
flow
sample data
cluster
network flow
discharge pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110295431.1A
Other languages
Chinese (zh)
Other versions
CN102299863A (en
Inventor
陈振昌
梁志勇
崔渊博
齐晓璐
洪婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING NETENTSEC Inc
Original Assignee
BEIJING NETENTSEC Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING NETENTSEC Inc filed Critical BEIJING NETENTSEC Inc
Priority to CN201110295431.1A priority Critical patent/CN102299863B/en
Publication of CN102299863A publication Critical patent/CN102299863A/en
Application granted granted Critical
Publication of CN102299863B publication Critical patent/CN102299863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and equipment for clustering network flow. The method comprises the following steps of: acquiring global network flow; cutting the global network flow according to a single user to generate sample data; classifying network flow types of the flow according to the sample data; and selecting different characteristic combinations for clustering according to the flow types. The equipment comprises an acquiring unit, a sample data generating unit, a primary clustering unit and a secondary clustering unit. The method for clustering network flow has the advantages of high accuracy, high efficiency, wide flow identification range and capability of accurately mining application quantity in the network flow, and can be realized as network flow control equipment.

Description

A kind of method of clustering network flow and equipment thereof
Technical field
The present invention relates to data processing technique, particularly relate to a kind of recognition methods and equipment thereof of network traffics.
Background technology
So-called cluster, the process that the set by physics or abstract object is divided into the multiple classes be made up of similar object is called as cluster.What generated by cluster bunch is the set of one group of data object, and these objects are similar each other to the object in same bunch, different with the object in other bunches.Network fluidic device, when monitoring network traffics, carries out cluster to excavate the number of applications in flow often through to the network traffics of whole local area network (LAN).
Clustering method of the prior art carries out cluster based on whole local area network traffic, and because the user in whole local area network (LAN), number of applications are various, network traffics are complicated, cluster accuracy is difficult to be guaranteed, and based on the clustering network flow of whole Intranet, sample space is large, and efficiency is low.
Summary of the invention
The technical problem to be solved in the present invention is that clustering network flow efficiency accuracy is low, and the problem that cluster efficiency is slow.
Solve the problems of the technologies described above, one aspect of the present invention provides a kind of method of clustering network flow, and the method comprises the following steps: gather global network flow; Global network flow is carried out flow cutting according to single user, generates sample data; According to sample data, discharge pattern classification is carried out to flow; Different Feature Combinations is selected to carry out cluster according to discharge pattern.
Second aspect present invention provides a kind of equipment of clustering network flow.This equipment comprises: collecting unit, for gathering global network flow; Sample data generation unit, for carrying out flow cutting according to global network flow according to single user, generates sample data; One-level cluster cell, carries out discharge pattern classification according to sample data; Secondary cluster cell, selects different Feature Combinations to carry out cluster according to discharge pattern.
According to method of the present invention and equipment thereof, clustering network flow accuracy is high, efficiency fast, Traffic identification scope is wide, accurately can excavate the number of applications in network traffics, can be used as network Flow Control functions of the equipments and realizes.
Accompanying drawing explanation
Fig. 1 is the method for clustering network flow of the present invention and the application scenarios of equipment thereof;
Fig. 2 is the method flow diagram of embodiment of the present invention clustering network flow;
Fig. 3 is the equipment structure chart of embodiment of the present invention clustering network flow;
Fig. 4 is frequent item set production process schematic diagram in a load characteristic combination;
Fig. 5 is frequent item set production process schematic diagram in the combination of another load characteristic.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is a part of embodiment of the present invention.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under the prerequisite not making creative work, all belongs to the scope of protection of the invention.
Fig. 1 is the method for clustering network flow and the application scenarios of equipment thereof.As shown in Figure 1, LAN subscriber 11, user 12 and user 13 are by network fluidic device 22 accesses network.Network fluidic device 22 can obtain the network traffics of all users (user 11, user 12 and user 13) in whole local area network (LAN).
Fig. 2 is the method flow diagram of embodiment of the present invention clustering network flow.The method comprising the steps of 201-204.
In step 201, gather global network flow, namely gather the network traffics of all users of whole Intranet.
In step 202, global network flow carries out flow cutting according to single user, generates sample data.Particularly, the global network flow gathered in step 201 is carried out flow cutting according to Intranet single user, be divided into the model flow of multiple single user, generate sample data.
Sample data can be that a relevant information connected exports, and such as, the tuple information of connection, wraps long sequence information, up-downgoing flow count information, timestamp information and DPI tag along sort information etc.
In step 203, according to sample data, discharge pattern classification is carried out to flow.This step can be understood as the first order cluster for network traffics.
Particularly, network fluidic device is classified to discharge pattern according to the sample data generated in step 202., namely exports according to a relevant information connected and network traffics are divided into Encryption Model flow, non-encrypted model flow, P2P model flow and CS model flow etc.
In one example in which, judge network traffics type by adding up byte information output in a connection, such as, in connection, the probability of occurrence of each byte is based on equalization, then judge that this connection is Encryption Model flow.
In step 204, different Feature Combinations is selected to carry out flow cluster according to discharge pattern.This step can be understood as secondary cluster.
Particularly, according to the different flow type exported in step 203, different Feature Combinations is adopted to carry out flow cluster.Such as, for the model discharge pattern of encryption, tuple (tuple) Feature Combination, sample (pattern) Feature Combination or timestamp (timestamp) the Feature Combination model flow to encryption can be selected to carry out cluster.For unencrypted model discharge pattern, tuple (tuple) Feature Combination, load (payload) Feature Combination and timestamp (timestamp) Feature Combination can be selected to carry out cluster to unencrypted model flow.
In one example in which, carry out cluster for tuple (tuple) Feature Combination to the model flow of encryption or unencrypted model flow to set forth.It is the tuple information in flow is carried out fractionation combination that tuple (tuple) Feature Combination carries out cluster to flow, as the transport layer protocol in tuple information/source IP/ source port, transport layer protocol/Target IP/target port, transport layer protocol/source IP/ Target IP, connect with the various combination statistics of tuple information respectively, as transport layer protocol/source IP/ source port, transport layer protocol/Target IP/target port statistics is all connects the number of times occurred, consider the time difference connected, take out frequent item set (occurrence number is greater than 2).
Each frequent item set is merged by the mode of iteration.Merging rule is two two classification the insides arrived based on different tuple combination, if they include identical connection, all connections inside these two classes belong to same class.When through iteration, cannot set be merged, then iteration terminates again.
Such as, encryption or non-encrypted model flow in tuple information as shown in table 1:
Table 1
ID Source IP Source port Target IP Target port
1 192.168.40.85 8000 202.38.64.2 4325
2 192.168.40.85 8000 202.38.64.3 5000
3 192.168.40.81 4000 202.38.64.3 5000
4 192.168.40.81 4000 202.38.64.4 4523
5 192.168.40.177 1452 202.106.46.14 12345
According to the tuple information in table 1, respectively by source IP and source port, the tuple information of No. ID 1 and No. ID 2 is polymerized to a class A by Target IP and target port, and No. ID 3 and No. ID 4 are polymerized to a class B, and the tuple information of No. ID 2 and No. ID 3 is polymerized to a class C.
After an iteration, judge whether each bunch of the inside includes identical connection, if had, merge, because source IP and the source port of No. ID 1 in A is connected identical, so A and C is merged into a class A ' with the source IP of No. ID 2 in C and source port.
Again after iteration, because source IP and the source port of No. ID 3 in A ' is connected identical, so A ' and B is merged into a class A with the source IP of No. ID 4 in B and source port ".
When set cannot merge by iteration again, iteration terminates.
In another example, for load (payload) Feature Combination, cluster is carried out to unencrypted model flow and set forth.Payload Feature Combination adopts the mode of sequential mode mining to carry out cluster, uses Apriori algorithm.The basic thought of algorithm is: first find out all frequent item sets, and the frequency that these collection occur is at least the same with predefined minimum support.Then produce Strong association rule by frequent item set, these rules must meet minimum support and Minimum support4.
First, from non-encrypted flow, export the load information of connection, get front 3 packets connected in both direction respectively, each packet gets 32, front and back byte.If long data packet is less than 32 bytes, remaining then with 0 filling.
Such as: data table items is as follows:
T0:I11,I23,I33
T1:I11,I24,I33
T2:I11,I22,I33
T3:I12,I25,I34
T3:I12,I22,I34
Arranging minimum support is 2
The first step: scan-data list item, calculates the number of times of each project appearance comprised in table.As shown in Figure 4, generate candidate C1, delete support in C1 and be less than the item collection that minimum support is 2, thus determine frequent item set L1.
Second step: as shown in Figure 5, produces candidate C2 by L1:, then scan-data table counts the item collection in C2, deletes the item collection that support is less than minimum support 2, thus determines frequent item set L2.
Finally because I11I33 and I12I34 in frequent item set L2 cannot merge again, so final frequent item set has two, one is that { I11, I33}, one is { I12, I34}.
In another example, carry out cluster for the model flow of sample (pattern) Feature Combination to encryption and set forth.By the similarity that two connect the long sequence of bag, Pattern Feature Combination judges whether two links belong to same application.Regard the long sequence of bag that two connect as a digital signal after over-sampling, the direction of bag is exactly the vibrations direction of signal, the similarity of the long sequences of bag that connects in the hope of two, just be converted to the coefficient correlation asking two signals, judge the similitude of two sampled signals in the signal processing with related system.
The computing formula of related system is as follows:
ρ = Σ i = 1 n X i Y i Σ i = 1 n X i 2 Σ i = 1 n Y i 2
In above-mentioned formula, X, Y are respectively two long sequences of bag connected.When obtaining the related system maximum of current non-classified connection with the connection in each clustering cluster, so judge that current connection belongs to corresponding clustering cluster.
In another example, for timestamp (timestamp) Feature Combination, cluster is carried out to the model flow of encryption or unencrypted model flow and set forth.Timestamp Feature Combination is by calculating the timestamp correlation of current connection, such as connect the equispaced of time started, end time, bag, with the Euclidean distance of the connection in each clustering cluster, take out minimum value, judge that current connection belongs to the corresponding clustering cluster of Euclidean distance minimum value.
Euclidean distance computing formula: ρ (A, B)=sqrt [∑ (a [i]-b [i]) ^2] (i=1,2 ..., n), wherein a [i] and b [i] are respectively two timestamp correlations be connected.
The embodiment of the present invention is by gathering global network flow, global network flow is carried out flow cutting according to single user, generate sample data, and according to the mode of one-level cluster and secondary cluster, cluster is carried out to network traffics, accurately can excavate the number of applications in network traffics.
Fig. 3 is the equipment structure chart of embodiment of the present invention clustering network flow.Shown in institute figure, this equipment comprises collecting unit 31, sample data generation unit 32, one-level cluster cell 33 and secondary cluster cell 34.
Collecting unit 31 for gathering global network flow, i.e. the network traffics of all users of whole Intranet.
Sample data generation unit 32 carries out flow cutting for the global network flow gathered according to collecting unit 31 according to Intranet single user, generates sample data.Particularly, sample data generation unit 32 pairs of global network flows cut, and are divided into the model flow of multiple single user.
One-level cluster cell 33 carries out discharge pattern classification according to sample data to flow.Particularly, the data sample that one-level cluster cell 33 generates according to sample data generation unit 32 carries out discharge pattern classification to flow, such as, according to data sample, discharge pattern is divided into the model flow of encryption or unencrypted model flow.
Secondary cluster cell 34 selects different Feature Combinations to carry out cluster according to the discharge pattern that one-level cluster cell 33 exports.Particularly, cluster cell 34, according to discharge pattern classification results, selects different Feature Combinations to carry out cluster to flow.Such as, for the discharge pattern of encryption, tuple (tuple) Feature Combination, sample (pattern) Feature Combination and timestamp (timestamp) Feature Combination can be selected to carry out flow cluster to it.For unencrypted discharge pattern, tuple (tuple) Feature Combination, load (payload) Feature Combination and timestamp (timestamp) Feature Combination can be selected to carry out flow cluster to it.
The collecting unit 31 of the equipment of embodiment of the present invention clustering network flow, sample data generation unit 32, one-level cluster 33 and secondary cluster cell, respectively in order to realize the corresponding flow process of each method in Fig. 2, do not repeat them here.
Although illustrate and described specific embodiments of the present invention, but under the prerequisite not deviating from exemplary embodiment of the present invention and more broad aspect thereof, those skilled in the art obviously can make changes and modifications based on teaching herein.Therefore, appended claim is intended to all this kind of true spirits of exemplary embodiment of the present invention and the change of scope and change of not deviating to be included within its scope.

Claims (6)

1. a method for clustering network flow, is characterized in that: comprise the following steps:
Gather global network flow;
Described global network flow is carried out flow cutting according to single user, generates sample data;
According to described sample data, discharge pattern classification is carried out to flow;
Different Feature Combinations is selected to carry out cluster according to described discharge pattern;
Described described global network flow is carried out flow cutting according to single user, the step generating sample data comprises cuts described global network flow, is divided into the model flow of multiple single user;
Describedly according to described sample data, discharge pattern classification is carried out to flow and comprise, according to described sample data, discharge pattern is divided into Encryption Model flow or non-encrypted model flow.
2. method according to claim 1, it is characterized in that: describedly select different Feature Combinations to carry out cluster according to described discharge pattern to comprise when discharge pattern is Encryption Model flow, select the combination of tuple Feature Combination, sample characteristics or timestamp Feature Combination to carry out cluster.
3. method according to claim 1, it is characterized in that: describedly select different Feature Combinations to carry out cluster according to described discharge pattern to comprise when discharge pattern is non-encrypted model flow, select the combination of tuple Feature Combination, load characteristic or timestamp Feature Combination to carry out cluster.
4. an equipment for clustering network flow, is characterized in that comprising:
Collecting unit, for gathering global network flow;
Sample data generation unit, for carrying out flow cutting according to described global network flow according to single user, generates sample data;
One-level cluster cell, carries out discharge pattern classification according to described sample data to flow;
Secondary cluster cell, selects different Feature Combinations to carry out cluster according to described discharge pattern;
Described sample data generation unit cuts described global network flow, is divided into the model flow of multiple single user;
Discharge pattern is divided into Encryption Model flow or non-encrypted model flow according to described sample data by described one-level cluster cell.
5. equipment according to claim 4, is characterized in that: described secondary cluster cell selects the combination of tuple Feature Combination, sample characteristics or timestamp Feature Combination to carry out cluster to Encryption Model flow.
6. equipment according to claim 4, is characterized in that: described cluster cell selects the combination of tuple Feature Combination, load characteristic or timestamp Feature Combination to carry out cluster to non-encrypted model flow.
CN201110295431.1A 2011-09-27 2011-09-27 Method and equipment for clustering network flow Active CN102299863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110295431.1A CN102299863B (en) 2011-09-27 2011-09-27 Method and equipment for clustering network flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110295431.1A CN102299863B (en) 2011-09-27 2011-09-27 Method and equipment for clustering network flow

Publications (2)

Publication Number Publication Date
CN102299863A CN102299863A (en) 2011-12-28
CN102299863B true CN102299863B (en) 2015-02-11

Family

ID=45360050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110295431.1A Active CN102299863B (en) 2011-09-27 2011-09-27 Method and equipment for clustering network flow

Country Status (1)

Country Link
CN (1) CN102299863B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104394021B (en) * 2014-12-09 2017-08-25 中南大学 Exception of network traffic analysis method based on visualization cluster
CN104767739B (en) * 2015-03-23 2018-01-30 电子科技大学 The method that unknown multi-protocols blended data frame is separated into single protocol data frame
CN104753934B (en) * 2015-03-23 2018-01-19 电子科技大学 By the method that the more communication party's data stream separations of unknown protocol are Point-to-Point Data stream
CN105022960B (en) * 2015-08-10 2017-11-21 济南大学 Multiple features mobile terminal from malicious software detecting method and system based on network traffics
CN106452948A (en) * 2016-09-22 2017-02-22 恒安嘉新(北京)科技有限公司 Automatic classification method and system of network flow
CN106850333B (en) * 2016-12-23 2019-11-29 中国科学院信息工程研究所 A kind of network equipment recognition methods and system based on feedback cluster
CN109525508B (en) * 2018-12-15 2022-06-21 深圳先进技术研究院 Encrypted stream identification method and device based on flow similarity comparison and storage medium
CN112822121A (en) * 2019-11-15 2021-05-18 中兴通讯股份有限公司 Traffic identification method, traffic determination method and knowledge graph establishment method
CN114221816B (en) * 2021-12-17 2024-05-03 恒安嘉新(北京)科技股份公司 Flow detection method, device, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102045363B (en) * 2010-12-31 2013-10-09 华为数字技术(成都)有限公司 Establishment, identification control method and device for network flow characteristic identification rule

Also Published As

Publication number Publication date
CN102299863A (en) 2011-12-28

Similar Documents

Publication Publication Date Title
CN102299863B (en) Method and equipment for clustering network flow
US20130265883A1 (en) Method and system for storing packet flows
CN108768986A (en) A kind of encryption traffic classification method and server, computer readable storage medium
EP2337266A2 (en) Detecting and classifying anomalies in communication networks
CN103716137A (en) Method and system for identifying reasons of ZigBee sensor network packet loss
CN104102700A (en) Categorizing method oriented to Internet unbalanced application flow
CN114401516A (en) 5G slice network anomaly detection method based on virtual network traffic analysis
CN110768856A (en) Network flow measuring method, network measuring equipment and control plane equipment
Yao et al. Network anomaly detection using random forests and entropy of traffic features
Ding et al. Internet traffic classification based on expanding vector of flow
CN112633353B (en) Internet of things equipment identification method based on packet length probability distribution and k nearest neighbor algorithm
Nayak et al. MAC protocol based IoT network intrusion detection using improved efficient shuffle bidirectional COOT channel attention network
Cai et al. Flow identification and characteristics mining from internet traffic with hadoop
CN103297296A (en) FPGA-based logical operation search method and system
CN107124410A (en) Network safety situation feature clustering method based on machine deep learning
Damman et al. Regular expressions for PCTL counterexamples
CN108234202B (en) Method for discovering network topology based on life span
CN115242716B (en) IP address route reachability identification method based on BGP prefix tree
Sajeev et al. LASER: A novel hybrid peer to peer network traffic classification technique
Tapaswi et al. Flow-based p2p network traffic classification using machine learning
CN106211139B (en) A kind of recognition methods encrypting MANET interior joint type
CN106376020B (en) A kind of recognition methods encrypting user type in MANET
CN106533756A (en) Communication characteristic extraction and traffic generation method and device
CN102413007B (en) Deep packet inspection method and equipment
CN100488167C (en) Grouped data transmitting method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant