CN107948007B - Long flow identification method based on sampling and two-stage CBF - Google Patents

Long flow identification method based on sampling and two-stage CBF Download PDF

Info

Publication number
CN107948007B
CN107948007B CN201710934979.3A CN201710934979A CN107948007B CN 107948007 B CN107948007 B CN 107948007B CN 201710934979 A CN201710934979 A CN 201710934979A CN 107948007 B CN107948007 B CN 107948007B
Authority
CN
China
Prior art keywords
message
bloom filter
stage
counting bloom
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710934979.3A
Other languages
Chinese (zh)
Other versions
CN107948007A (en
Inventor
秦文虎
翟金凤
孙立博
鲁凯
林学勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Nanjing Institute of Measurement and Testing Technology
Original Assignee
Southeast University
Nanjing Institute of Measurement and Testing Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University, Nanjing Institute of Measurement and Testing Technology filed Critical Southeast University
Priority to CN201710934979.3A priority Critical patent/CN107948007B/en
Publication of CN107948007A publication Critical patent/CN107948007A/en
Application granted granted Critical
Publication of CN107948007B publication Critical patent/CN107948007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/022Capturing of monitoring data by sampling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0894Packet rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Abstract

The invention provides a long flow identification algorithm based on sampling and two-stage CBF, which comprises the following steps: carrying out periodic sampling on the message; setting a long flow threshold value, and configuring two-stage CBF structure parameters; for the sampled message, judging whether the message belongs to the identified long stream or not through the second-stage CBF, if so, inserting the message, if not, judging whether the stream to which the message belongs is the long stream or not through the first-stage CBF, if so, recording the stream identification of the message, updating the record of the message in the two-stage CBF, and if not, inserting the message into the first-stage CBF; and repeating the process until all the sampled messages are processed, inquiring all the non-sampled messages through the second-level CBF, and inserting the non-sampled messages if the non-sampled messages belong to the identified long stream, or else, not processing the non-sampled messages. The invention can not only realize accurate identification of long flow, but also realize high-precision measurement of flow length on the basis of effectively saving space and time resources.

Description

Long flow identification method based on sampling and two-stage CBF
Technical Field
The invention belongs to the technical field of network flow measurement, relates to a long flow identification method, and particularly relates to a long flow identification method based on sampling and two-stage Counting Bloom Filter.
Background
The increasing speed of high-speed network operation and the rapid increase of traffic data make it more and more difficult to accurately measure network traffic. Many researches show that the statistics of the network flow shows a strong heavy tail distribution characteristic, and because a small amount of long flows occupy most of the network flow, the long flow information can be mastered under most conditions to meet the actual application requirements, so that the identification of the long flows is particularly important.
The existing long flow identification method mainly uses a sampling technology, a hash technology and a Bloom Filter technology. When the sampling technology is singly used for identifying the long flow, the flow identification information needs to be maintained in the identification process, so that large calculation overhead is generated, and the system processing speed is reduced; when the hash technology or the Bloom Filter technology is used alone to process all messages passing through a link, hash collision is increased, and accuracy of a measurement result is affected. The disadvantage of using only one technology can be effectively solved by combining the sampling technology with the hash technology or the Bloom Filter technology. Compared with the hash technology, the Bloom Filter can obviously reduce hash collision by maintaining a plurality of independent hash functions, and greatly reduce the storage overhead brought by maintaining the flow identification for each flow, one of the improved structures, the Counting Bloom Filter, can count the messages hashed into the storage space, and can record the flow identification of the long flow when the number of the messages exceeds the threshold value, so that the long flow identification can be realized more efficiently by combining the sampling technology and the Counting Bloom Filter.
The existing long flow identification method based on sampling and Counting Bloom Filter (CBF) generally uses simple linear estimation to estimate the number of messages contained in the original long flow, has certain flow length measurement error, and cannot meet the requirement of higher precision.
Disclosure of Invention
In order to solve the problems, the invention provides a long stream identification method based on sampling and two-stage Counting Bloom Filter, which identifies the message belonging to the long stream through the two-stage Counting Bloom Filter based on message sampling.
In order to achieve the purpose, the invention provides the following technical scheme:
the long flow identification method based on sampling and two-stage CBF comprises the following steps:
step 1, periodically sampling messages passing through a link in observation time according to sampling frequency;
step 2, setting a threshold value T of the long flow, and configuring two-stage Counting Bloom Filter structure parameters;
step 3, judging whether each sampled message belongs to the identified long stream or not through the second-stage Counting Bloom Filter, if so, inserting the message into the second-stage Counting Bloom Filter, and continuing to process the next message; if the long stream does not belong to the identified long stream, executing the step 4;
step 4, judging whether the flow to which the message belongs is a long flow or not through the first-stage Counting Bloom Filter, if so, recording the flow identification of the message, updating the record of the message in the two-stage Counting Bloom Filter, and continuously processing the next message; if not, executing step 5;
step 5, inserting the message into the first Counting Bloom Filter, and continuing to process the next message;
and 6, after the steps 3-5 are repeated to complete the processing of all the sampled messages, inquiring all the non-sampled messages through the second-level Counting Bloom Filter, if the messages belong to the identified long stream, inserting the messages into the second-level Counting Bloom Filter, and otherwise, not performing any processing.
Further, the extraction frequency in step 1 is one extraction frequency every n messages.
Furthermore, when the total number of the messages is larger, the sampling frequency is reduced, and when the total number of the messages is smaller, the sampling frequency is improved.
Further, the step 2 specifically includes the following steps:
setting a long flow threshold as T-N.m%, wherein N is the total number of messages passing through a link in observation time, and m is the percentage of the total number of messages occupied by the long flow; the threshold value for long flow identification by using sampling message is set as T1T/n; the two-stage Counting Bloom Filter selects the same k hash functions h (1), h (2), …, h (k) with small conflict; length m of Counter array in first-stage Counting Bloom Filter structure1Setting the power of 2 greater than the total N/N of the sampled messages, and distributing the number b of bits to each counter1The conditions are satisfied:
Figure BDA0001429586410000021
the length m2 of the Counter array in the second-stage Counting Bloom Filter structure is set to be greater than the power of 2 of the total number N of messages, and the number b of bits allocated to each Counter2The conditions are satisfied:
Figure BDA0001429586410000022
further, each counter is allocated a number of bits greater than the number in which the condition is satisfied.
Further, the step 3 specifically includes the following steps:
for each sampled message, mapping the sampled message to a corresponding position of a second-level Counting Bloom Filter through k hash functions, if k counter values of the corresponding position are not all 0, judging that the message belongs to the identified long stream, inserting the message into the second-level Counting Bloom Filter, continuously processing a next message, if any one of the k counter values of the corresponding position is 0, judging that the message does not belong to the identified long stream, and executing the step 4.
Further, the step 4 specifically includes the following steps:
mapping the sampled message to a first-stage Counting Bloom Filter through k hash functions, and solving the minimum value of k counters at corresponding positions; if the minimum value of the k counters is equal to the threshold value T1If yes, the flow is judged to be a long flow, the flow identification of the message is recorded, and the k counter values are respectively subtracted by the threshold value T1And mapping the counter value to a second-stage Counting Bloom Filter, and setting k counter values of corresponding positions as T1+1, continuing to process the next message; if the minimum value of the k counters is not equal to the threshold value T1If yes, the flow is judged not to be the long flow, and step 5 is executed.
Further, the process of inserting the packet into the first Counting Bloom Filter in the step 5 includes: the k counter values in the first Counting Bloom Filter are each incremented by 1.
Further, the process of inserting the second Counting Bloom Filter in the steps 3 and 6 includes: the k counter values in the second Counting Bloom Filter are respectively added with 1.
Further, step 7 is included, after all non-sampled messages are processed, the recorded stream identifier is mapped to the second-level Counting Bloom Filter.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention can not only realize accurate identification of the long stream, but also realize high-precision measurement of the original stream length on the basis of effectively saving space and time resources. The invention has good real-time performance, can be well adapted to the current high-speed network link environment, and has great significance for network management application such as network charging, bandwidth planning, safety detection and the like.
Drawings
FIG. 1 is a flow chart of the method steps of the present invention, wherein after all messages sampled in (i) are processed, non-sampled messages in (ii) are processed.
Fig. 2 shows specific long stream information in the implementation data.
Fig. 3 shows simulation results based on implementation data.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
The whole flow of the long flow identification method based on sampling and two-stage Counting Bloom Filter provided by the invention is shown in figure 1, and the method comprises the following steps:
step 1, periodically sampling the messages passing through the link in the observation time according to the frequency of every n extracted messages. When the total number of the messages is large, a relatively small sampling frequency can be properly selected to improve the processing speed of the method, for example, one message is extracted every 100 messages; when the total number of messages is small, a relatively large sampling frequency can be selected to ensure the accuracy of long stream identification, such as extracting every 10 messages.
Step 2, setting a threshold T of the long flow, and reasonably configuring two-stage Counting Bloom Filter structural parameters:
setting the total number of the messages passing through the link in the observation time as N, if the flow occupying more than m% of the total number of the messages is defined as a long flow, setting the threshold value as T-N.m%, and setting the threshold value for identifying the long flow by using the sampling messages as T1T/n; the two-stage Counting Bloom Filter selects the same k hash functions with small conflicts (the concept and judgment standard of the hash function with small conflicts are known to those skilled in the art), namely h (1), h (2), …, h (k), wherein k is 1 to 3; length m of Counter array in first-stage Counting Bloom Filter structure1Setting the number of bits b allocated to each Counter to a power of 2 greater than the total number of sample messages N/N1The conditions are required to be satisfied:
Figure BDA0001429586410000041
and several bits need to be properly allocated more to avoid counter overflow; length m of Counter array in second-level Counting Bloom Filter structure2Setting the power of 2 greater than the total number N of messages and the number b of bits allocated to each counter2The conditions are required to be satisfied:
Figure BDA0001429586410000042
several bits are also allocated in excess as appropriate to avoid counter overflow.
And 3, mapping each sampled message to a corresponding position of a second-level Counting Bloom Filter through k hash functions, if the k counter values of the corresponding positions are not 0, judging that the message belongs to the identified long stream, inserting the message into the second-level Counting Bloom Filter, adding 1 to the k counter values respectively, continuously processing the next message, if any one of the k counter values of the corresponding positions is 0, judging that the message does not belong to the identified long stream, and executing the step 4.
Step 4, mapping the sampled message to a first-stage Counting Bloom Filter through k hash functions, and solving the minimum value of k counters at corresponding positions; if the minimum value of the k counters is equal to the threshold value T1If yes, then judging the flow to which it belongs is a long flow, recording the flow identification of the message, and counting the kSubtracting threshold T from the value of the device respectively1And mapping the counter value to a second-stage Counting Bloom Filter, and setting k counter values of corresponding positions as T1+1, continuing to process the next message; if the minimum value of the k counters is not equal to the threshold value T1If yes, the flow is judged not to be the long flow, and step 5 is executed.
And 5, inserting the message into the first Counting Bloom Filter, namely adding 1 to the k counter values respectively, and continuing to process the next message.
And 6, after the steps 3-5 are repeated to complete the processing of all the sampled messages, inquiring all the non-sampled messages through the second-stage Counting Bloom Filter, if the k counter values of the corresponding positions of the messages mapped to the second-stage Counting Bloom Filter are not 0, judging that the messages belong to the identified long stream, inserting the messages into the second-stage Counting Bloom Filter, namely adding 1 to the k counter values of the corresponding positions respectively, and otherwise, not performing any processing.
As an improvement, the method further comprises a step 7 of mapping the recorded stream identifier into the second-level Counting Bloom Filter after all non-sampled messages are processed. The stream identifier recorded in step 4 is the stream identifier of the long stream identified by the method of the present invention, and the minimum value of the counter at the corresponding position in the second Counting Bloom Filter is the stream length of the long stream measured by the method.
The invention selects actual Trace data collected in chicago at 2016, 3, 17 and publicly provided by Internet data analysis cooperative organization (CAIDA) to carry out simulation analysis, and is realized by visual studio software. The first 5000000 message data in the Trace are intercepted to carry out experiments, the threshold value T is set to be 21000, the number of real long flows with the message number exceeding the threshold value is 3, and fig. 2 shows specific long flow information. The flow in the experiment refers to a message set with the same source and destination IP addresses, and the definition of the specific flow identifier can be determined according to the actual application requirements of the network. When the sampling frequency is set to 1/100, SHA1 algorithm is adopted for the hash functions of the two-stage Counting Bloom Filter, and the number of the hash functions is set to 1, the simulation result of the invention is shown in FIG. 3. Comparing fig. 2 and fig. 3, it can be found that the long stream information identified by the present invention is identical to the real long stream information, and the accurate identification of the long stream and the high-precision measurement of the original stream length can be realized.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (7)

1. The long stream identification method based on sampling and two-stage CBF is characterized by comprising the following steps:
step 1, periodically sampling messages passing through a link in observation time according to sampling frequency;
step 2, setting a threshold value T of the long flow, and configuring two-stage Counting Bloom Filter structure parameters;
the step 2 specifically comprises the following processes:
setting a long flow threshold as T-N.m%, wherein N is the total number of messages passing through a link in observation time, and m is the percentage of the total number of messages occupied by the long flow; the threshold value for long flow identification by using sampling message is set as T1T/n; the two-stage Counting Bloom Filter selects the same k hash functions h (1), h (2), …, h (k) with small conflict; length m of Counter array in first-stage Counting Bloom Filter structure1Setting the power of 2 greater than the total N/N of the sampled messages, and distributing the number b of bits to each counter1The conditions are satisfied:
Figure FDA0003118542680000011
length m of Counter array in second-level Counting Bloom Filter structure2Setting the power of 2 greater than the total number N of messages and the number b of bits allocated to each counter2The conditions are satisfied:
Figure FDA0003118542680000012
step 3, judging whether each sampled message belongs to the identified long stream or not through the second-stage Counting Bloom Filter, if so, inserting the message into the second-stage Counting Bloom Filter, and continuing to process the next message; if the long stream does not belong to the identified long stream, executing the step 4;
the step 3 specifically comprises the following steps:
for each sampled message, mapping the sampled message to a corresponding position of a second-level Counting Bloom Filter through k hash functions, if k counter values of the corresponding position are not all 0, judging that the message belongs to the identified long stream, inserting the message into the second-level Counting Bloom Filter, continuously processing a next message, if any one of the k counter values of the corresponding position is 0, judging that the message does not belong to the identified long stream, and executing a step 4;
step 4, judging whether the flow to which the message belongs is a long flow or not through the first-stage Counting Bloom Filter, if so, recording the flow identification of the message, updating the record of the message in the two-stage Counting Bloom Filter, and continuously processing the next message; if not, executing step 5;
the step 4 specifically comprises the following steps:
mapping the sampled message to a first-stage Counting Bloom Filter through k hash functions, and solving the minimum value of k counters at corresponding positions; if the minimum value of the k counters is equal to the threshold value T1If yes, the flow is judged to be a long flow, the flow identification of the message is recorded, and the k counter values are respectively subtracted by the threshold value T1And mapping the counter value to a second-stage Counting Bloom Filter, and setting k counter values of corresponding positions as T1+1, continuing to process the next message; if the minimum value of the k counters is not equal to the threshold value T1If yes, judging that the stream to which the stream belongs is not a long stream, and executing the step 5;
step 5, inserting the message into the first Counting Bloom Filter, and continuing to process the next message;
and 6, after the steps 3-5 are repeated to complete the processing of all the sampled messages, inquiring all the non-sampled messages through the second-level Counting Bloom Filter, if the messages belong to the identified long stream, inserting the messages into the second-level Counting Bloom Filter, and otherwise, not performing any processing.
2. The method for long flow identification based on decimation and two-stage CBF according to claim 1, wherein said decimation frequency in step 1 is every n packets.
3. The method of claim 1, wherein the sampling frequency is decreased when the total number of packets is large and increased when the total number of packets is small.
4. A method for sample and two stage CBF based long stream identification as claimed in claim 1, wherein each counter is allocated a number of bits more than the number in the satisfied condition.
5. The method for identifying a long flow based on sampling and two-stage CBF according to claim 1, wherein the step 5 of inserting the packet into the first-stage Counting Bloom Filter comprises: the k counter values in the first Counting Bloom Filter are each incremented by 1.
6. The method for identifying the long stream based on the sampling and two-stage CBF according to claim 1, wherein the step 3 and the step 6 for inserting the second-stage Counting Bloom Filter comprises: the k counter values in the second Counting Bloom Filter are respectively added with 1.
7. The method of claim 1, further comprising a step 7 of mapping the recorded flow id to a second-level Counting Bloom Filter after all non-sampled packets have been processed.
CN201710934979.3A 2017-10-10 2017-10-10 Long flow identification method based on sampling and two-stage CBF Active CN107948007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710934979.3A CN107948007B (en) 2017-10-10 2017-10-10 Long flow identification method based on sampling and two-stage CBF

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710934979.3A CN107948007B (en) 2017-10-10 2017-10-10 Long flow identification method based on sampling and two-stage CBF

Publications (2)

Publication Number Publication Date
CN107948007A CN107948007A (en) 2018-04-20
CN107948007B true CN107948007B (en) 2021-09-10

Family

ID=61936120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710934979.3A Active CN107948007B (en) 2017-10-10 2017-10-10 Long flow identification method based on sampling and two-stage CBF

Country Status (1)

Country Link
CN (1) CN107948007B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101459560A (en) * 2009-01-09 2009-06-17 中国人民解放军信息工程大学 Long stream recognition method, data flow measuring method and device thereof
US8134934B2 (en) * 2009-09-21 2012-03-13 Alcatel Lucent Tracking network-data flows
CN103368952A (en) * 2013-06-28 2013-10-23 百度在线网络技术(北京)有限公司 Method and equipment for carrying out sampling on data packet to be subjected to intrusion detection processing
WO2015116221A1 (en) * 2014-01-31 2015-08-06 Hewlett-Packard Development Company, L.P. Managing database with counting bloom filters
CN107196826A (en) * 2017-07-12 2017-09-22 东南大学 A kind of network flow programming method algorithm based on sampling

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101459560A (en) * 2009-01-09 2009-06-17 中国人民解放军信息工程大学 Long stream recognition method, data flow measuring method and device thereof
US8134934B2 (en) * 2009-09-21 2012-03-13 Alcatel Lucent Tracking network-data flows
CN103368952A (en) * 2013-06-28 2013-10-23 百度在线网络技术(北京)有限公司 Method and equipment for carrying out sampling on data packet to be subjected to intrusion detection processing
WO2015116221A1 (en) * 2014-01-31 2015-08-06 Hewlett-Packard Development Company, L.P. Managing database with counting bloom filters
CN107196826A (en) * 2017-07-12 2017-09-22 东南大学 A kind of network flow programming method algorithm based on sampling

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"一种基于双重Counting Bloom Filter的长流识别算法";吴桦,龚俭,杨望;《软件学报》;20100331;第1118页第17-27行,附图2 *
"基于Sample-CBF技术的长流识别实现";刘卫江,白磊,景泉;《计算机工程》;20071031;第33卷(第20期);第117页右栏第17-37行 *
"基于多级CBF的长流识别";刘元珍;《微型电脑应用》;20140930;第30卷(第9期);全文 *

Also Published As

Publication number Publication date
CN107948007A (en) 2018-04-20

Similar Documents

Publication Publication Date Title
CN107566206B (en) Flow measuring method, equipment and system
CN109861881B (en) Elephant flow detection method based on three-layer Sketch framework
CN110191024B (en) Network traffic monitoring method and device
US20110167149A1 (en) Internet flow data analysis method using parallel computations
CN109271390B (en) Index data structure based on neural network and data retrieval method thereof
CN108289125B (en) TCP session recombination and statistical data extraction method based on stream processing
Hu et al. Discount counting for fast flow statistics on flow size and flow volume
CN102025563A (en) Network flow identification method based on Hash collision compensation
CN112101765A (en) Abnormal data processing method and system for operation index data of power distribution network
JP2020503775A (en) DDoS attack detection method and device
CN106100997B (en) Network traffic information processing method and device
CN108132986B (en) Rapid processing method for test data of mass sensors of aircraft
Eom et al. Network traffic classification using ensemble learning in software-defined networks
CN113839835A (en) Top-k flow accurate monitoring framework based on small flow filtering
CN111641531A (en) DPDK-based data packet distribution and feature extraction method
CN107948007B (en) Long flow identification method based on sampling and two-stage CBF
CN110932971A (en) Inter-domain path analysis method based on layer-by-layer reconstruction of request information
CN106789429B (en) A kind of adaptive low-cost SDN network link utilization measurement method and system
CN104535827A (en) Defective point removing method and system used in AD sampling
WO2016201876A1 (en) Service identification method and device for encrypted traffic, and computer storage medium
CN110430133B (en) Inter-domain path identifier prefix obtaining method based on confidence interval
CN107528794B (en) Data processing method and device
EP2465283A1 (en) Estimation method for loss rates in a packetized network
CN111200542A (en) Network flow management method and system based on deterministic replacement strategy
CN111211939A (en) Device and method for realizing efficient flow table counting based on network processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant