CN113746700A - Elephant flow rapid detection method and system based on probability sampling - Google Patents

Elephant flow rapid detection method and system based on probability sampling Download PDF

Info

Publication number
CN113746700A
CN113746700A CN202111028109.2A CN202111028109A CN113746700A CN 113746700 A CN113746700 A CN 113746700A CN 202111028109 A CN202111028109 A CN 202111028109A CN 113746700 A CN113746700 A CN 113746700A
Authority
CN
China
Prior art keywords
data packet
flow
information
hash table
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111028109.2A
Other languages
Chinese (zh)
Other versions
CN113746700B (en
Inventor
彭伟
段晨
王宝生
赵宝康
郦苏丹
唐竹
原玉磊
陶静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111028109.2A priority Critical patent/CN113746700B/en
Publication of CN113746700A publication Critical patent/CN113746700A/en
Application granted granted Critical
Publication of CN113746700B publication Critical patent/CN113746700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses a method and a system for rapidly detecting elephant flow based on probability sampling, wherein the method comprises the following steps: 1) the data packet forwarding module sends the five-tuple information of the data packet head of the passing data packet and the occupation proportion of the interface queue; 2) receiving quintuple information of the head of the data packet and the occupation proportion of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet; 3) and storing the detected elephant flow information based on the elephant flow storage queue. The invention can realize the purposes of low time overhead, local quick decision of the switch and real-time detection of the elephant flow, does not bring large memory overhead, and can be deployed on programmable switches, intelligent network cards, commercial switch chips and any forwarding hardware.

Description

Elephant flow rapid detection method and system based on probability sampling
Technical Field
The invention relates to a computer network communication technology, in particular to a method and a system for rapidly detecting elephant flow based on probability sampling.
Background
With the rapid development of cloud computing and big data, the performance requirements of operators on data center networks are becoming higher. Traffic scheduling has been a long-standing and difficult problem to solve in data center networks for many years. Existing research has shown that in a data center network, flows accounting for 1% of the total number of flows in the data center network produce 90% of the total traffic, and these flows accounting for 1% of the total number of flows are called elephant flows. Scheduling of elephant flows is therefore an important factor affecting data center network performance. In order to allocate a reasonable transmission path for the elephant flow, thereby reducing network congestion and improving the network load balancing condition, a data center operator needs an accurate and fast elephant flow detection method.
The elephant flow detection is to detect the flow with large occupied bandwidth and long transmission time in the network under the condition that the starting and ending time and the transmission rate of the flow in the network are unknown. The current elephant flow detection method is mainly divided into the following two categories. The first method is to identify the elephant flow based on periodic flow granularity statistics such as the number of bytes of a data packet and the number of data packets. This method is often applied in the context of SDN software defined networks. The SDN switch counts the number of bytes and the number of data packets transmitted by the flow, and the SDN controller periodically inquires the SDN switch about the number of bytes and the number of data packets transmitted by each flow, so as to obtain the elephant flow in the network through screening. The method mainly faces the problem that the timeliness of periodic statistical data is insufficient. There is a time delay of at least one RTT for communication between the SDN controller and the SDN switch and the statistical period is typically in seconds. The second method is to count the number of data packets occupied by each flow in the queue of the switch interface according to the real-time snapshot or a plurality of continuous snapshots of the queue. The method is applied to the programmable switch and has strong timeliness. However, the real-time snapshot method has instability, and the result of one queue snapshot cannot objectively reflect the size of the stream. The method of multiple continuous snapshots introduces large memory overhead to store the snapshot result, and puts high requirements on the programmable switch hardware.
Currently, programmable switches enable programmability of packet processing. A schematic diagram of a programmable switch for processing packets is shown in fig. 1. The packet forwarding module extracts five-tuple information (source IP address, destination IP address, source port number, destination port number, and protocol number) of the packet header from the interface ingress queue of the switch. And the data packet forwarding module determines a forwarding outlet of the data packet according to the forwarding rule. And then the data packet forwarding module copies the data packet from the input queue of the receiving interface to the output queue of the interface corresponding to the forwarding output interface.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the invention provides a method and a system for rapidly detecting the elephant flow based on probability sampling, aiming at the problems of insufficient timeliness and large memory overhead caused by a snapshot method when a centralized controller detects the elephant flow in an SDN scene.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for rapidly detecting elephant flow based on probability sampling comprises the following steps:
1) the data packet forwarding module sends the five-tuple information of the data packet head of the passing data packet and the occupation proportion of the interface queue;
2) receiving quintuple information of the head of the data packet and the occupation proportion of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet;
3) and storing the detected elephant flow information based on the elephant flow storage queue.
Optionally, the processing step of the packet forwarding module in step 1) for any passing packet pkt includes:
1.1) reading head Quintuple information Quinuple of a passing data packet pkt from an incoming queue corresponding to an input interface InInt, wherein the head Quintuple information Quinuple is a binary string formed by splicing a source IP address srcIP, a destination IP address dstIP, a source port number srcPort, a destination port number dstPort and a protocol number protocol;
1.2) inquiring a hardware forwarding table according to the Quintuple information Quinuple at the head of the data packet pkt to obtain a forwarding output interface OutInt of the data packet pkt;
1.3) copying the data packet pkt from an in-queue of an interface InInt to an out-queue OutQueue of a forwarding out-interface OutInt, and acquiring the number CurrentNum of the data packets existing in the out-queue OutQueue of the forwarding out-interface OutInt;
1.4) dividing the number CurrentNum of data packets existing in an out-queue OutQueue of a forwarding out-interface OutInt by the total Length Length of the out-queue OutQueue to obtain an interface queue occupation ratio ORatio;
1.5) outputting the head Quintuple information Quinuple and the interface queue occupation ratio ORatio of the data packet pkt.
Optionally, step 2) comprises:
2.1) judging whether head Quintuple information Quinuple and an interface queue occupation ratio ORatio of the data packet pkt are received or not, and if so, skipping to execute the step 2.2); otherwise, continuing to return to execute the step 2.1);
2.2) if the occupation ratio ORatio of the interface queue is less than or equal to the preset minimum threshold MinthIf yes, the data packet is not counted; if the occupation ratio ORatio of the interface queue is larger than the preset minimum threshold MinthAnd less than or equal to a preset maximum threshold value MaxthCounting the data packet pkt by a probability p; if the occupation ratio ORatio of the interface queue is larger than the preset maximum threshold value MaxthCounting the data packet directly; the elephant flow is detected based on the packet count result.
Optionally, step 2.1) is preceded by the following initialization steps: with initialisation setting not countedThe number of packets count is 0, and the value range of the number of packets count not counted is [0, Length]Minimum threshold Min preset by interface queue occupation ratio ORatiothAnd Max of the highest thresholdthTimeout time Tmax of flow count information in hash table, maximum Max of real-time count probability qq(ii) a Step 2.2) comprises:
2.2.1) initializing the position Index of the flow to which the data packet pkt belongs in the hash table to 0, and initializing the flow count information CountUpdate of the flow to which the data packet pkt belongs to 0;
2.2.2) determining that the occupation ratio ORatio of the interface queue is greater than a preset minimum threshold MinthAnd less than or equal to a preset maximum threshold value MaxthIf yes, skipping to step 2.2.3), if the occupation ratio ORatio of the interface queue is greater than the preset maximum threshold value MaxthSkipping to step 2.2.4), if the occupation ratio ORatio of the interface queue is less than or equal to the preset minimum threshold MinthThen jump to step 2.2.5);
2.2.3) according to q ═ Maxq(Oratio–Minth)/(Maxth–Minth) Calculating a real-time count probability q, wherein MaxqThe value range of the real-time counting probability q is [0, Max ] which is the maximum value of the real-time counting probability qq](ii) a Calculating a counting probability p according to p ═ q/(1-count ×) wherein count is the number of the data packets which are not counted, and the value range of the counting probability p is [0, 1%](ii) a Multiplying the counting probability p by a preset random number value upper boundary value to obtain a random number rand; if the random number rand is smaller than the preset threshold value m, setting the number count of the data packets which are not counted as 0, counting the data packets pkt, and storing the flow information to which the data packets pkt belong in a hash table; otherwise, adding 1 to the countless data packet number count on the basis of the original value, if the countless data packet number count after adding 1 is greater than the total Length of the out queue, setting the countless data packet number count to 0, counting the data packet pkt, and storing the flow information to which the data packet pkt belongs in the hash table; jump execution step 2.2.6); otherwise, skipping to execute the step 2.2.5);
2.2.4) setting the countless data packet number as 0, counting the data packet pkt, and storing the flow information of the data packet pkt in a hash table; jump execution step 2.2.6);
2.2.5) not counting the data packet pkt, setting a position Index of the flow to which the data packet pkt belongs in the hash table to 0, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs to 0; jump execution step 2.2.6);
2.2.6) detecting the elephant flow based on the counting result of the data packets pkt.
Optionally, the step of storing the flow information to which the data packet pkt belongs in the hash table includes:
s1) acquiring current time currenttime; calculating a position Index Index1 of the flow to which the data packet pkt belongs in the hash table by using a preset first hash function; if the position corresponding to the position Index1 in the hash table is empty, inserting the header Quintuple information quintuplet, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index1 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index1 in the hash table is not empty, jumping to the next step;
s2) extracting stream quintuple information QuintupleA, stream count information CountA, and filling time timeA of the location Index 1; if the extracted stream Quintuple information QuintupleA of the location Index1 and the header quintuplet information quintuplet of the data packet pkt are equal to each other, adding one to the stream count information of the location Index1, adding 1 to the stream count information CountA on the basis of the original value, setting the location Index of the stream to which the data packet pkt belongs in the hash table as the location Index1, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as the new stream count information CountA, and performing a jump step S3); otherwise, jumping to execute step S4);
s3) judging whether the difference between the currenttime and the filling time timeA is greater than the preset time Tmax, and if yes, emptying the data of a position Index1 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 1; setting the position Index of the stream to which the data packet pkt belongs in the hash table as position Index1, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as 1, and skipping to execute step 2.2.6); otherwise, directly skipping to execute the step 2.2.6); at this time, since it cannot be established that the difference between the current time currenttime and the filling time timeA is greater than the preset time Tmax, it can be considered that the action of counting the flow count information CountA +1 in step S2) is valid, that is, the counting of the data packet pkt is completed, and thus step 2.2.6 can be directly skipped to execute);
s4) calculating a position Index2 of the flow to which the data packet pkt belongs in the hash table by using a preset second hash function; if the position corresponding to the position Index2 in the hash table is empty, inserting the header Quintuple information quintuplet, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index2 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index2 in the hash table is not empty, jumping to the next step;
s5) extracting stream quintuple information QuintupleB, stream count information CountB, and filling time timeB of the location Index 2; if the extracted stream Quintuple information QuintupleB of the location Index2 and the header quintuplet information quintuplet of the data packet pkt are equal to each other, adding one to the stream count information of the location Index2, adding 1 to the stream count information CountB on the basis of the original value, setting the location Index of the stream to which the data packet pkt belongs in the hash table as the location Index2, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as the new stream count information CountB, and performing a jump step S6); otherwise, jumping to execute step S7);
s6) judging whether the difference between the currenttime and the filling time timeB is greater than the preset time Tmax, and if yes, emptying the data of a position Index2 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 2; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s7) judging whether the difference between the currenttime and the filling time timeA is greater than the preset time Tmax, and if yes, emptying the data of a position Index1 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 1; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, jumping to execute step S8);
s8) judging whether the difference between the currenttime and the filling time timeB is greater than the preset time Tmax, and if yes, emptying the data of a position Index2 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 2; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, jumping to execute step S9);
s9) calculating a minimum value minCount between the flow count information CountA and the flow count information CountB ═ Min { CountA, CountB }, determining a corresponding position index minidex in the hash table according to the minimum value minCount, and obtaining quintuple information minQuintuple, flow count information minCount, and filling time minTime of the position index minidex;
s10) calculating a position index of the quintuple information minquintupple in the hash table equal to minIndex1 by using a preset first hash function, if the position corresponding to the position index minIndex1 in the hash table is empty, inserting the quintuple information minquintupple, the flow count information minCount and the filling time minTime into the position corresponding to the position index minIndex1 in the hash table, and jumping to execute step S12); otherwise, jumping to execute step S11);
s11) calculating a position index of the quintuple information minquintupple in the hash table equal to minIndex2 by using a preset second hash function, if the position corresponding to the position index minIndex2 in the hash table is empty, inserting the quintuple information minquintupple, the flow count information minCount and the filling time minTime into the position corresponding to the position index minIndex2 in the hash table, and jumping to execute step S12); otherwise, jumping to execute step S12);
s12) emptying the data of the position index minIndex in the hash table, and jumping to execute the step S13);
s13) inserting Quintuple information Quintuple, 1, and filling time currenttime of the stream to which the packet pkt belongs in the hash table by using the position Index minidex, setting the position Index of the stream to which the packet pkt belongs in the hash table as the position Index minidex, and setting the stream count information CountUpdate of the stream to which the packet pkt belongs as 1; jump execution step 2.2.6);
optionally, step 2.2.6) comprises: if the counting result of the data packet pkt is greater than the preset elephant flow counting threshold value, judging that the flow to which the data packet pkt belongs is the elephant flow, and emptying data of a position Index in the hash table; otherwise, the stream to which the data packet pkt belongs is judged to be not the elephant stream.
Optionally, the elephant flow storage queue in step 3) is a round-robin queue with a length of LenQueue, the round-robin queue includes a plurality of queue units connected end to end, the elephant flow storage queue includes a head pointer HeadP and a tail pointer TailP, the head pointer HeadP points to an element position of the earliest joining queue, the tail pointer TailP points to an element position of the latest joining queue of the round-robin queue, and the elephant flow storage queue is initialized to be empty, and both the initial head pointer HeadP and the initial tail pointer TailP are equal.
Optionally, the step of performing storage based on the elephant flow storage queue in step 3) includes:
3.1) judging whether quintuple information EleQuintuple of the elephant flow is received or not, and executing the next step if the quintuple information EleQuintuple of the elephant flow is received; otherwise, continuously returning to the step 3.1) to continuously carry out detection;
3.2) judging whether the queue unit pointed by the tail pointer TailP is empty, and if not, executing the next step; otherwise, skipping to execute the step 3.5);
3.3) clearing the queue unit pointed by the tail pointer TailP;
3.4) update the head pointer HeadP according to HeadP ═ (HeadP + 1)% LenQueue, where% is modulo arithmetic;
3.5) storing quintuple information EleQuintuple of the elephant flow into a queue unit pointed by a tail pointer TailP;
3.6) update the tail pointer TailP according to TailP ═ TailP + 1)% LenQueue, and end.
In addition, the invention also provides a system for rapidly detecting the elephant flow based on the probability sampling, which comprises an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, wherein the data forwarding controller is respectively connected with the input module and the output module, and is programmed or configured to execute the steps of the method for rapidly detecting the elephant flow based on the probability sampling.
In addition, the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program of the above elephant flow rapid detection method based on probability sampling.
Compared with the prior art, the invention has the following advantages:
1. the method comprises the steps that a data packet forwarding module sends five-tuple information of the head part of a data packet of a passing data packet and the occupation proportion of an interface queue; receiving quintuple information of the head of the data packet and the occupation proportion of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet; the detected elephant flow information is stored based on the elephant flow storage queue, the elephant flow detection can be achieved, the problems of low timeliness and high communication cost of the elephant flow detection in an SDN scene are solved, the problem of high memory cost of an elephant flow detection method based on queue snapshot is solved, the elephant flow detection speed is greatly improved, and meanwhile resource cost of a data exchange plane is reduced.
2. The method can be deployed on programmable switches, intelligent network cards, commercial switch chips and various network data forwarding hardware, and has the advantage of good universality.
Drawings
FIG. 1 is a schematic diagram of a prior art programmable switch architecture
FIG. 2 is a schematic diagram of a basic process flow of a method according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a programmable switch according to an embodiment of the present invention.
Detailed Description
The first embodiment is as follows:
in order to more clearly illustrate the details of the method and system for rapidly detecting elephant flow based on probability sampling according to the present invention, the following will take the data plane deployed in the programmable switch as an example to further detail the method and system for rapidly detecting elephant flow based on probability sampling according to the present invention.
As shown in fig. 2, the method for rapidly detecting an elephant flow based on probability sampling in this embodiment includes:
1) the data packet forwarding module sends the five-tuple information of the data packet head of the passing data packet and the occupation proportion of the interface queue;
2) receiving quintuple information of the head of the data packet and the occupation proportion of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet;
3) and storing the detected elephant flow information based on the elephant flow storage queue.
In this embodiment, the processing step of the packet forwarding module for any passing packet pkt in step 1) includes:
1.1) reading head Quintuple information Quinuple of a passing data packet pkt from an incoming queue corresponding to an input interface InInt, wherein the head Quintuple information Quinuple is a binary string formed by splicing a source IP address srcIP, a destination IP address dstIP, a source port number srcPort, a destination port number dstPort and a protocol number protocol;
1.2) inquiring a hardware forwarding table according to the Quintuple information Quinuple at the head of the data packet pkt to obtain a forwarding output interface OutInt of the data packet pkt;
1.3) copying the data packet pkt from an in-queue of an interface InInt to an out-queue OutQueue of a forwarding out-interface OutInt, and acquiring the number CurrentNum of the data packets existing in the out-queue OutQueue of the forwarding out-interface OutInt;
1.4) dividing the number CurrentNum of data packets existing in an out-queue OutQueue of a forwarding out-interface OutInt by the total Length Length of the out-queue OutQueue to obtain an interface queue occupation ratio ORatio;
1.5) outputting the head Quintuple information Quinuple and the interface queue occupation ratio ORatio of the data packet pkt.
By the means, the occupation ratio ORatio of the interface queue can be extracted quickly, the data packet is counted by adopting a probability sampling method in the step 2), and basic data is provided for detecting the elephant flow based on the counting result of the data packet.
It should be noted that the packet forwarding module is a core module on a programmable switch, an intelligent network card, a commercial switch chip and various network data forwarding hardware, in this embodiment, step 1) relates to function extension of the existing packet forwarding module, and the packet header quintuple information and interface queue occupation ratio extraction and transmission functions of the packet are extended on the basis of the function of the existing packet forwarding module.
In this embodiment, step 2) includes:
2.1) judging whether head Quintuple information Quinuple and an interface queue occupation ratio ORatio of the data packet pkt are received or not, and if so, skipping to execute the step 2.2); otherwise, continuing to return to execute the step 2.1);
2.2) if the occupation ratio ORatio of the interface queue is less than or equal to the preset minimum threshold MinthIf yes, the data packet is not counted; if the occupation ratio ORatio of the interface queue is larger than the preset minimum threshold MinthAnd less than or equal to a preset maximum threshold value MaxthThen logarithmic with probability pCounting the pkt of the packet; if the occupation ratio ORatio of the interface queue is larger than the preset maximum threshold value MaxthCounting the data packet directly; the elephant flow is detected based on the packet count result. Through the step 2.2), the probability sampling-based data packet counting method capable of self-adapting to the flow size is realized. And the flow counting module counts the data packets one by one under the condition that the occupation proportion of the interface queue exceeds a maximum threshold value, namely the link load is heavier. And the flow counting module counts the data packets according to the probability p under the condition that the occupation proportion of the interface queue is between the minimum threshold and the maximum threshold, namely when the link load is normal. The magnitude of the probability p is related to the number of consecutive uncounted times when the interface queue occupancy ratio is between the minimum threshold and the maximum threshold. When the number of times of continuous non-counting is more, the probability p is larger, which shows that even if the occupation ratio of the interface queue is still between the minimum threshold and the maximum threshold, the probability of counting the data packet by the flow counting module is increased along with the increase of the number of times of continuous non-counting. And the flow counting module does not count the data packets under the condition that the occupation proportion of the interface queue is lower than the minimum threshold value, namely the link load is light.
In this embodiment, step 2.1) further includes the following initialization steps: initially setting the countless data packet number as 0, where the countless data packet number is in a value range of [0, Length]Minimum threshold Min preset by interface queue occupation ratio ORatiothAnd Max of the highest thresholdthMaximum value Max of real-time counting probability qq(ii) a In this embodiment, to ensure that the final counting probability p is in the value range of [0,1]Maximum value Max of real-time counting probability qqThe value is 1/(1+ Length). Step 2.2) comprises:
2.2.1) initializing the position Index of the flow to which the data packet pkt belongs in the hash table to 0, and initializing the flow count information CountUpdate of the flow to which the data packet pkt belongs to 0; the hash table is used for counting the number of data packets of the flow, each element in the hash table comprises three fields, namely a flow quintuple information field, a counting field and a filling time field, and the quintuple information field of the flow to which the data packet belongs comprises a source IP address, a destination IP address, a source port number, a destination port number and a protocol number of the flow.
2.2.2) determining that the occupation ratio ORatio of the interface queue is greater than a preset minimum threshold MinthAnd less than or equal to a preset maximum threshold value MaxthIf yes, skipping to step 2.2.3), if the occupation ratio ORatio of the interface queue is greater than the preset maximum threshold value MaxthSkipping to step 2.2.4), if the occupation ratio ORatio of the interface queue is less than or equal to the preset minimum threshold MinthThen jump to step 2.2.5);
2.2.3) according to q ═ Maxq(Oratio–Minth)/(Maxth–Minth) Calculating a real-time count probability q, wherein MaxqThe value range of the real-time counting probability q is [0, Max ] which is the maximum value of the real-time counting probability qq](ii) a Calculating a counting probability p according to p ═ q/(1-count ×) wherein count is the number of the data packets which are not counted, and the value range of the counting probability p is [0, 1%](ii) a Multiplying the counting probability p by a preset random number value upper boundary value to obtain a random number rand; if the random number rand is smaller than the preset threshold value m, setting the number count of the data packets which are not counted as 0, counting the data packets pkt, and storing the flow information to which the data packets pkt belong in a hash table; otherwise, adding 1 to the countless data packet number count on the basis of the original value, if the countless data packet number count after adding 1 is greater than the total Length of the out queue, setting the countless data packet number count to 0, counting the data packet pkt, and storing the flow information to which the data packet pkt belongs in the hash table; jump execution step 2.2.6); otherwise, the jump executes step 2.2.5), that is: if the number of the data packets which are not counted after adding 1 is counted<If yes, then not counting;
2.2.4) setting the countless data packet number as 0, counting the data packet pkt, and storing the flow information of the data packet pkt in a hash table; jump execution step 2.2.6);
2.2.5) not counting the data packet pkt, setting a position Index of the flow to which the data packet pkt belongs in the hash table to 0, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs to 0; jump execution step 2.2.6);
2.2.6) detecting the elephant flow based on the counting result of the data packets pkt.
In this embodiment, the step of storing the flow information to which the data packet pkt belongs in the hash table includes:
s1) acquiring current time currenttime; calculating a position Index1 of the flow to which the data packet pkt belongs in the hash table by using a preset first hash function, which can be expressed as:
Index1=Hash(Quintuple,hashA),
the Hash is a Hash function, the quintuplet is head Quintuple information of the data packet pkt, and the Hash A is a parameter of the first Hash function;
if the position corresponding to the position Index1 in the hash table is empty, inserting the header Quintuple information quintuplet, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index1 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index1 in the hash table is not empty, jumping to the next step;
s2) extracting stream quintuple information QuintupleA, stream count information CountA, and filling time timeA of the location Index 1; if the extracted stream Quintuple information QuintupleA of the location Index1 and the header quintuplet information quintuplet of the data packet pkt are equal to each other, adding one to the stream count information of the location Index1, adding 1 to the stream count information CountA on the basis of the original value, setting the location Index of the stream to which the data packet pkt belongs in the hash table as the location Index1, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as the new stream count information CountA, and performing a jump step S3); otherwise, jumping to execute step S4);
s3) judging whether the difference between the currenttime and the filling time timeA is greater than the preset time Tmax, and if yes, emptying the data of a position Index1 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 1; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s4) using a preset second hash function to calculate a position Index2 of the flow to which the data packet pkt belongs in the hash table, which can be expressed as:
Index2=Hash(Quintuple,hashB),
wherein, the Hash is a Hash function, the quintuplet is the head Quintuple information of the data packet pkt, and the Hash B is a parameter of the second Hash function;
if the position corresponding to the position Index2 in the hash table is empty, inserting the header Quintuple information quintuplet, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index2 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index2 in the hash table is not empty, jumping to the next step;
s5) extracting stream quintuple information QuintupleB, stream count information CountB, and filling time timeB of the location Index 2; if the extracted stream Quintuple information QuintupleB of the location Index2 and the header quintuplet information quintuplet of the data packet pkt are equal to each other, adding one to the stream count information of the location Index2, adding 1 to the stream count information CountB on the basis of the original value, setting the location Index of the stream to which the data packet pkt belongs in the hash table as the location Index2, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as the new stream count information CountB, and performing a jump step S6); otherwise, jumping to execute step S7);
s6) judging whether the difference between the currenttime and the filling time timeB is greater than the preset time Tmax, and if yes, emptying the data of a position Index2 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 2; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s7) judging whether the difference between the currenttime and the filling time timeA is greater than the preset time Tmax, and if yes, emptying the data of a position Index1 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 1; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, jumping to execute step S8);
s8) judging whether the difference between the currenttime and the filling time timeB is greater than the preset time Tmax, and if yes, emptying the data of a position Index2 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 2; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, jumping to execute step S9);
s9) calculating a minimum value minCount between the flow count information CountA and the flow count information CountB ═ Min { CountA, CountB }, determining a corresponding position index minidex in the hash table according to the minimum value minCount, and obtaining quintuple information minQuintuple, flow count information minCount, and filling time minTime of the position index minidex;
s10) using a preset first hash function to calculate that the position index of the quintuple information minquintupple in the hash table is equal to minIndex1, which can be expressed as:
minIndex1=Hash(minQuintuple,hashA),
wherein, the Hash is a Hash function, the minQuintuple is quintuple information of a position index minIndex, and the Hash A is a parameter of the first Hash function;
if the corresponding position of the position index minIndex1 in the hash table is empty, inserting quintuple information minQuintuple, stream count information minCount and filling time minTime into the corresponding position of the position index minIndex1 in the hash table, and skipping to execute the step S12); otherwise, jumping to execute step S11);
s11) calculating a position index of the quintuple information minquintupple in the hash table equal to minIndex2 by using a preset second hash function, if the position corresponding to the position index minIndex2 in the hash table is empty, inserting the quintuple information minquintupple, the flow count information minCount and the filling time minTime into the position corresponding to the position index minIndex2 in the hash table, and jumping to execute step S12); otherwise, jumping to execute step S12);
using the first hash function and the second hash function to calculate the index position minQuintuple, wherein one of the index positions must be equal to the current storage position minIndex, steps S9) -S11) are essential to confirm whether the index position calculated by another hash algorithm for minQuintuple is empty. And if the index position obtained by calculating minQuintuple by another hash algorithm is empty, inserting the minQuintuple information in the hash table into a new empty position. Otherwise, if the index positions calculated by the minQuintuple in the hash table based on the first hash function and the second hash function are not null, deleting the minQuintuple information stored in the hash table.
S12) emptying the data of the position index minIndex in the hash table (at this time, minQuintuple has been inserted into a new position or the position cannot be changed because there is no empty position, and the original position minIndex of minQuintuple needs to be emptied so as to insert the count information of the data packet pkt), and skipping to execute step S13);
s13) inserting Quintuple information Quintuple, 1, and filling time currenttime of the stream to which the packet pkt belongs in the hash table by using the position Index minidex, setting the position Index of the stream to which the packet pkt belongs in the hash table as the position Index minidex, and setting the stream count information CountUpdate of the stream to which the packet pkt belongs as 1; the jump performs step 2.2.6).
In this embodiment, step 2.2.6) includes: if the counting result of the data packet pkt is larger than a preset elephant flow counting threshold value, judging that the flow to which the data packet pkt belongs is the elephant flow, and emptying data of a position Index in a hash table by a flow counting module; otherwise, the stream to which the data packet pkt belongs is judged to be not the elephant stream.
In this embodiment, the elephant flow storage queue in step 3) is a round-robin queue with a length of LenQueue, the round-robin queue includes a plurality of queue units (for storing five tuple information of a flow) connected end to end, the elephant flow storage queue includes a head pointer HeadP and a tail pointer TailP, the head pointer HeadP points to an element position of the earliest joining queue, the tail pointer TailP points to an element position of the latest joining queue of the round-robin queue, and the elephant flow storage queue is initialized to be empty, and the initial head pointer HeadP and the initial tail pointer TailP are equal. The elephant flow storage queue is a circular queue with the length of LenQueue, and the constraint of the self length of the circular queue is also the constraint of the effective time of the elephant flow detection result, because the elephant flow detected earlier may be finished, the elephant flow detection queue has no reference value for network flow scheduling, and meanwhile, the memory space is saved by limiting the length of the circular queue.
In this embodiment, the step of storing based on the elephant flow storage queue in step 3) includes:
3.1) judging whether quintuple information EleQuintuple of the elephant flow is received or not, and executing the next step if the quintuple information EleQuintuple of the elephant flow is received; otherwise, continuously returning to the step 3.1) to continuously carry out detection;
3.2) judging whether the queue unit pointed by the tail pointer TailP is empty, and if not, executing the next step; otherwise, skipping to execute the step 3.5);
3.3) clearing the queue unit pointed by the tail pointer TailP;
3.4) update the head pointer HeadP according to HeadP ═ (HeadP + 1)% LenQueue, where% is modulo arithmetic;
3.5) storing quintuple information EleQuintuple of the elephant flow into a queue unit pointed by a tail pointer TailP;
3.6) update the tail pointer TailP according to TailP ═ TailP + 1)% LenQueue, and end.
The embodiment also provides a system for rapidly detecting an elephant flow based on probability sampling, which comprises an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, wherein the data forwarding controller is respectively connected with the input module and the output module, and is programmed or configured to execute the steps of the method for rapidly detecting the elephant flow based on probability sampling.
As shown in fig. 3, the data forwarding controller in this embodiment includes: the data packet forwarding module is used for sending the five-tuple information of the data packet head of the passing data packet and the queue occupation ratio; the flow counting module is used for receiving the quintuple information of the head part of the data packet and the occupation proportion of the interface queue sent by the data packet forwarding module, counting the data packet by adopting a probability sampling method and detecting the elephant flow based on the counting result of the data packet; the elephant flow storage module is used for storing the detected elephant flow information based on the elephant flow storage queue; the input end of the flow counting module is connected with the data packet forwarding module, the output end of the flow counting module is connected with the elephant flow storage module, and the elephant flow storage module is connected with the elephant flow storage queue.
The implementation method of the elephant flow rapid detection system based on probability sampling in the embodiment comprises the following steps:
in the first step, compared with a data forwarding controller in a traditional switch data plane, a flow counting module, an elephant flow storage module, a hash table and an elephant flow storage queue are added to the data forwarding controller, and a data packet forwarding module is modified. The hash table is used for counting the number of data packets of the flow, and each element in the hash table comprises three fields, namely a flow quintuple information field, a counting field and a filling time field. The flow five tuple information field is derived from the source IP address, destination IP address, source port number, destination port number, and protocol number of the flow. The elephant flow storage queue is a circular queue that stores elephant flow information. Five tuple information of the flow is stored in each element of the queue. The flow counting module is connected with the flow counting table, the data packet forwarding module and the elephant flow storage module. The input of the flow counting module is the quintuple information of the data packet head and the queue occupation proportion sent by the data packet forwarding module. The flow counting module may read and write the hash table. The output of the stream counting module is the detected elephant stream quintuple information. The elephant flow storage module is connected with the flow counting module and the elephant flow storage queue. The elephant flow detection module receives elephant flow quintuple information sent by the flow counting module. The elephant flow storage module can read and write the elephant flow storage queue. The data packet forwarding module is connected with the switch interface queue and is responsible for forwarding the data packet. When the data packet forwarding module copies the data packet to the dequeue of the switch interface, the occupation condition of the dequeue of the interface is recorded, and the quintuple information of the head of the data packet and the occupation condition of the queue are sent to the flow counting module.
In the second step, the programmable switch begins operation and the flow count table is initialized to null.
And thirdly, the data packet forwarding module, the flow counting module and the elephant flow storage module work in parallel to complete elephant flow detection and storage in a matching manner.
The data packet forwarding module forwards the data packets according to the following procedures and sends the head quintuple information of each data packet to the flow counting module: and the data packet forwarding module reads the five-tuple information of the head part of the data packet from the interface in-queue and queries a hardware forwarding table to obtain a forwarding interface of the data packet. And the data packet forwarding module copies the data packets to an out-queue of a forwarding out-interface to acquire the number of the data packets in the out-queue. And the data packet forwarding module calculates the occupation proportion of the dequeue and sends the five-tuple information at the head of the data packet and the occupation proportion of the dequeue to the flow counting module. The specific method comprises the following steps:
3.1.1 the data packet forwarding module reads the head Quintuple information of the data packet pkt from the input queue of an interface InInt and records the information as a binary string Quinuple, wherein the Quinuple is formed by splicing srCp, dstIP, srcPort, dstPort and protocol, namely, a source IP address, a destination IP address, a source port number, a destination port number and a protocol number. Execution 3.1.2;
3.1.2 the data packet forwarding module queries a hardware forwarding table according to the Quintuple information quintuplet of the head of the data packet pkt to obtain a forwarding output interface OutInt of the data packet pkt. Execution 3.1.3;
3.1.3 the data packet forwarding module copies the data packet pkt from the enqueue of the interface InInt to the dequeue of the interface OutInt, the data packet forwarding module obtains the number of the data packets existing in the dequeue of the interface OutInt, records the dequeue of the interface OutInt as OutQueue, records the number of the existing data packets as CurrentNum, and executes 3.1.4;
3.1.4 packet forwarding module calculates the out-queue occupancy ratio of the interface OutInt, and is denoted as ORatio. ORatio is equal to the number of packets, CurrentNum, present in the out-queue OutQueue of the interface OutInt divided by the total Length of OutQueue. Execution 3.1.5;
3.1.5 the data packet forwarding module sends the head Quintuple information Quintuple of the data packet pkt and the occupation ratio QRTio of the queue OutQueue to the flow counting module. Turn 3.1.1.
The flow counting module counts the data packets by adopting a probability sampling method in the following processes, writes counting results into a hash table, and sends elephant flow detection results to the elephant flow storage module in real time:
and 3.2.1, the flow counting module receives the quintuple information of the head part of the data packet and the occupation proportion of the queue, which are sent by the data packet forwarding module. And the flow counting module judges whether to count the data packet according to the queue occupation proportion. The specific method comprises the following steps:
3.2.1.1, if the queue occupation ratio is less than or equal to a preset lowest threshold, not counting the data packets;
3.2.1.2 if the queue occupation ratio is greater than a preset lowest threshold and less than or equal to a preset highest threshold, counting the data packets by a probability p;
3.2.1.3 if the queue occupation ratio is larger than the preset highest threshold, directly counting the data packet.
Further, the process of counting the data packets by the flow counting module is as follows:
the flow counting module initializes the hash table to null. Each element in the hash table contains three fields, namely a flow five tuple information field, a count field and a fill-in time field. The flow counting module first enters a first round of hashing.
The first round of hashing:
first, a hash value, i.e., index1 in the hash table, is calculated for the packet header quintuple information using hash function 1 (first hash function). And if the position of the index1 in the hash table is empty, filling stream quintuple information, counting information '1' and filling time in the position of the index1, and recording the stored position index and the counting value. And if the position of the index1 is not empty and the quintuple information of the data packet header is equal to the stream quintuple information stored in the position of the index1, updating the stream counting information of the position of the index1 by the stream counting module and recording the stored position index and the counting value. Corresponding to the previous step S3), if the position stream quintuple information of index1 is equal to the packet header quintuple information, the stream count information of the position of index1 is updated, and it is also determined whether the position of index1 is overtime. The flow count information is set to 1 if the time-out occurs. And entering a second round of hash under the condition that the first round of hash fails.
And a second round of hash:
the flow counting module uses hash function 2 (second hash function) to calculate a hash value, i.e., index2 in the hash table, for the packet header quintuple information. And if the position of the index2 in the hash table is empty, filling stream quintuple information, counting information '1' and filling time into the position of the index2, and recording the stored position index and the counting value. And if the position of the index2 is not empty and the quintuple information of the data packet head is equal to the stream quintuple information stored in the position of the index2, updating the stream count information in the position of the index2 and recording the stored position index and the count value. In addition, as with the first round of hash, it is also necessary to determine whether the time is out. Here, the quintuple information corresponding to the index values 1 and 2 obtained by the two hash functions is not equal to the quintuple information at the head of the data packet, and then whether the two positions of the index values 1 and 2 are overtime or not can be judged.
Under the condition that the second round of hash fails, the flow counting module judges whether the information stored in the two positions is overtime one by one, namely whether the difference between the current time and the filling time exceeds a preset time threshold value: if the stored information in one position is overtime, the stored information is deleted and filled with new flow quintuple information, counting and filling time, and the stored position index and counting value are recorded. And if the number of the hash codes is not overtime, the flow counting module is switched to the third round of hash.
The third round of hash:
and the stream counting module firstly uses two different hash functions to calculate a hash value for the information stored in the index1 position in the third round of hash, if one of the two obtained index positions is empty, the information stored in the index1 position is copied to the empty position, new stream quintuple information, counting information '1' and filling time are filled in the index1 position, and the stored position index and the count value are recorded. Otherwise, the stream counting module calculates a hash value for the information stored in the index2 position by using two different hash functions, if one of the two obtained index positions is empty, the information stored in the index2 position is copied to the empty position in the same way, new stream quintuple information, counting information '1' and filling time are filled in the index2 position, and the stored position index and the stored counting value are recorded. And if the hash values obtained by respectively calculating the information stored in the index1 and the index2 by the flow counting module by using two different hash functions are not empty in the hash table, the flow counting module is switched to an occupation process. And the flow counting module firstly compares the sizes of the counting information stored in the positions of the index1 and the index2 in the hash table in the occupying process, selects the smaller one of the counting information and clears the information stored in the position. The flow count module inserts new flow quintuple information, flow count information "1" and fill-in time at the emptied location, and records the stored location index and count value. And after the flow counting module finishes counting the data packets, judging whether the flow to which the data packets belong is the elephant flow according to the position index and the counting value recorded in the data packet counting process. The specific method comprises the following steps: and the flow counting module judges that the flow to which the data packet belongs is the elephant flow if the counting value is larger than the elephant flow counting threshold value. And the flow counting module sends the head quintuple information of the data packet to the elephant flow storage module, and clears the value of the position index recorded in the hash table. Otherwise, the flow counting module judges that the flow to which the data packet belongs is not the elephant flow. The specific method comprises the following steps:
3.2.1, the flow counting module initializes the countless data packet quantity variable count to 0The variable for counting the number of packets has a value range of 0, Length]. Minimum threshold Min for initializing interface queue occupation proportion by flow counting modulethMaximum threshold value Max of occupation ratio of interface queueth. The flow counting module initializes the timeout time Tmax of the flow counting information in the hash table. Flow counting module initialization real-time counting probability q maximum value Maxq1/(1+ Length). The flow counting module initializes the hash table to be empty, the length of the hash table is LenHash, and each element of the hash table contains flow quintuple information, flow counting information and filling time. The flow counting module initializes an expression of a first Hash function equal to Hash (data, Hash A), an expression of a second Hash function equal to Hash (data, Hash B), wherein the data represents data packet header quintuple information, the Hash A and the Hash B represent parameters of the first Hash function and the second Hash function respectively, and the results of the Hash (data, Hash A) and the Hash (data, Hash B) are positive integers and the value range is [1, LenHash]. Execution 3.2.2.
3.2.2, the flow counting module judges whether head Quintuple information Quinuple of the data packet pkt sent by the data packet forwarding module and the occupation proportion ORatio of the interface queue are received or not. If yes, executing 3.2.3, otherwise, continuing to execute 3.2.2.
3.2.3, the flow counting module makes Index to represent the position Index of the flow to which the data packet pkt belongs in the hash table and initializes Index to 0, and CountUpdate represents the flow counting information of the flow to which the data packet pkt belongs and initializes CountUpdate to 0. Execution 3.2.4.
3.2.4, if Minth<ORatio<MaxthExecute 3.2.4.1, otherwise go to 3.2.5.
3.2.4.1, counting the data packet pkt by the flow counting module according to the probability p, wherein the specific method comprises the following steps:
3.2.4.1.1, the flow counting module calculates the real-time counting probability q, and the q is made to be Maxq(Oratio–Minth)/(Maxth–Minth) Then the value range of q is [0, Max ]q]And 3.2.4.1.2 is executed.
3.2.4.1.2, the flow counting module calculates the counting probability p, let p be q/(1-count q), at this time, the larger the value of the un-counted data packet quantity variable count, the higher the counting probabilityp is large, and the probability of counting pkt can be increased under the condition that the packet is not counted for a plurality of consecutive times. Meanwhile, the value range of q is [0, Max ]q]Ensures that the value range of p is [0,1 ]]. 3.2.4.1.3 is executed.
3.2.4.1.3, the flow counting module calculates p 100 and the result is denoted as m. The flow counting module generates a random number rand between 1 and 100. 3.2.4.1.4 is executed.
3.2.4.1.4, if rand < ═ m, the flow counting module sets the countless data packet quantity variable count to 0. The flow count module counts the data packets pkt, executing 3.2.4.1.5. Otherwise go to 3.2.4.1.8.
3.2.4.1.5, the flow counting module stores the flow information to which the data packet pkt belongs in the hash table.
3.2.4.1.6 now rand > m, the flow count module sets the count not count packet number variable count + 1. Turning to 3.2.4.1.7.
3.2.4.1.7 if the count of the uncounted packets is equal to count > Length, the flow count module will have not counted the consecutive Length packets. The flow counting module needs to count the data packets pkt. 3.2.4.1.5 is executed.
3.2.5 if ORatio ≧ MaxthAnd the flow counting module sets the countless data packet quantity variable count to be 0. The flow counting module counts the data packets pkt directly, performing 3.2.4.1.5. Otherwise, go to 3.2.6.
3.2.6 when ORatio is less than or equal to MinthThe flow counting module does not count the data packet pkt. The flow counting module sets the countless data packet quantity variable count to 0, makes count 0, Index 0, and changes to 3.2.7.
The 3.2.7 flow counting module determines that if the count information CountUpdate > Elemax of the flow to which the data packet pkt belongs, 3.2.7.1 is executed. Otherwise go to 3.2.8.
3.2.7.1 the flow counting module determines that the flow to which the packet pkt belongs is a elephant flow. And the stream counting module sends Quintuple information Quintuple of the data packet pkt to the elephant stream storage module. 3.2.7.2 is executed.
The 3.2.7.2 flow count module clears the Index location of the hash table and executes 3.2.8.
3.2.8 flow counting module completes data countingProcessing of the packets pkt and elephant flow detection process. The flow counting module counts the data packets by a probability sampling-based method through 3.2.4.1.1-3.2.4.1.7, and when the occupation proportion of the interface queue is continuously located for multiple times (Min)th,Maxth) In the interval, along with the increase of the times of non-counting, the probability of counting the data packets by the flow counting module is continuously increased, so that the condition that the data packets are not counted continuously for multiple times under an extreme condition is avoided. The flow counting module detects the large image flow through 3.2.7.1-3.2.7.2. Turn 3.2.2.
Wherein, the step 3.2.4.1.5 of the flow counting module storing the flow information to which the data packet pkt belongs in the hash table includes:
3.2.4.1.5.1 the flow counting module takes the current time as currenttime. 3.2.4.1.5.2 is executed.
3.2.4.1.5.2 flow counting module calculates the position Index1 of the flow to which the data packet pkt belongs in the Hash table as Hash (quintuplet, Hash a) by using the first Hash function. 3.2.4.1.5.3 is executed.
3.2.4.1.5.3 if the Index1 position of the hash table is empty, the flow counting module inserts the Quintuple information quintuplet, flow counting information 0 and current time currenttime of the pkt header into the Index1 position of the hash table. The flow counting module makes Index1 equal to Index, count 1 equal to 1, and then turns to 3.2.7. Otherwise, go to 3.2.4.1.5.4.
3.2.4.1.5.4 at this time, the Index1 position of the hash table is not empty, the flow counting module extracts the information of the Index1 position of the hash table, the flow quintuple information is recorded as Quintuplea, the flow counting information is recorded as CountA, the filling time is recorded as timeA, and 3.2.4.1.5.5 is executed.
3.2.4.1.5.5 flow counting module judges if QuintupleA is Quintuple, flow counting module adds one to flow counting information at Index1 position of hash table, let CountA be CountA +1, CountUdate be CountA, Index be Index1, turn to 3.2.4.1.5.6. Otherwise 3.2.4.1.5.7 is executed.
3.2.4.1.5.6 flow counting module judges if currenttime-time A > Tmax, the flow counting module clears Index1 position of hash table. The flow counting module inserts Quintuple, 1 and currenttime at the Index1 position of the hash table, and the flow counting module makes Index1, CountUpdate 1, and changes to 3.2.7. Otherwise 3.2.4.1.5.6 is executed.
3.2.4.1.5.7 the flow counting module uses the second Hash function to calculate the position Index of the flow to which the packet belongs in the Hash table, i.e. Index2 is Hash (quintuplet, Hash b). 3.2.4.1.6 is executed.
3.2.4.1.5.8 if the Index2 position of the hash table is empty, the flow counting module inserts the Quintuple information quintuplet, the flow counting information 1 and the current time currenttime of the pkt header into the Index2 position of the hash table. The flow counting module makes Index2 equal to Index, count 1 equal to 1, and then turns to 3.2.7. Otherwise, go to 3.2.4.1.5.9.
3.2.4.1.5.9 at this time, the Index2 position of the hash table is not empty, the flow counting module extracts the information of the Index2 position of the hash table, the flow quintuple information is recorded as QuintupleB, the flow counting information is recorded as CountB, the filling time is recorded as timeB, and 3.2.4.1.5.10 is executed.
3.2.4.1.5.10 flow counting module judges if QuintupleB is Quintuple, flow counting module adds one to flow counting information at Index2 position of hash table, let CountB be CountB +1, CountUdate be CountB, Index be Index2, turn to 3.2.4.1.5.11. Otherwise 3.2.4.1.5.12 is executed.
3.2.4.1.5.11 flow counting module judges if currenttime-time B > Tmax, flow counting module clears Index2 position of hash table, flow counting module inserts Quintuple, 1 and currenttime at Index2 position of hash table. The flow counting module makes Index2 equal to Index, count 0 equal to 0, and then turns to 3.2.7. Otherwise 3.2.4.1.5.12 is executed.
3.2.4.1.5.12 at this time, the stream Quintuple information quintuplet has the count information of other streams at the position in the hash table calculated by the first hash function and the second hash function. The flow counting module calculates whether other flows in the positions of the hash table calculated by the first hash function and the second hash function are overtime, and the specific method is as follows, and 3.2.4.1.5.12.1 is executed.
3.2.4.1.5.12.1 flow counting module judges if currenttime-time A > Tmax, the flow counting module clears Index1 position of hash table. The flow counting module inserts Quintuple, 1 and currenttime at the Index1 position of the hash table, and the flow counting module makes Index1, CountUpdate 1, and changes to 3.2.7. Otherwise go to 3.2.4.1.5.12.2.
3.2.4.1.5.12.2 flow counting module judges if currenttime-time B > Tmax, flow counting module clears Index2 position of hash table, flow counting module inserts Quintuple, 1 and currenttime at Index2 position of hash table. The flow counting module makes Index2 equal to Index, count 1 equal to 1, and then turns to 3.2.7. Otherwise 3.2.4.1.5.13 is executed.
3.2.4.1.5.13 at this time, the stream Quintuple information quintuplet has count information of other streams at the positions in the hash table calculated by the first hash function and the second hash function, and the storage time of other streams in the hash table does not exceed Tmax. The flow counting module calculates the minimum value of CountA and CountB as minCount ═ Min { CountA, CountB }, and the position index of the hash table corresponding to minCount is minIndex. The quintuple information at the minIndex position of the hash table is denoted as minQuintuple, and the padding time is denoted as minTime. The flow counting module reselects a storage location for the element of the minIndex location of the hash table. The specific method is as follows, 3.2.4.1.5.13.1 is performed.
3.2.4.1.5.13.1 the flow counting module uses a first Hash function to calculate the position of the flow quintuple information minQuintuple in the Hash table equal to minIndex1 ═ Hash (minQuintuple, HashA). If minIndex1 position in the hash table is empty, the stream counting module inserts minQuintuple, minCount and minTime at minIndex1 position in the hash table. The flow counting module clears the minIndex location of the hash table, leading to 3.2.4.1.5.14. Otherwise execution continues at 3.2.4.1.5.13.2.
3.2.4.1.5.13.2 the flow counting module uses a second Hash function to calculate the position of the flow quintuple information minQuintuple in the Hash table equal to minIndex2 ═ Hash (minQuintuple, HashB). If minIndex2 position in the hash table is empty, the stream counting module inserts minQuintuple, minCount and minTime at Index12 position in the hash table. The flow counting module clears the minIndex location of the hash table, leading to 3.2.4.1.5.14. Otherwise execution continues at 3.2.4.1.5.13.3.
3.2.4.1.5.13.3 at this time, the flow counting module uses two hash algorithms to calculate that other flows exist in the storage position of the flow quintuple information minquintupple in the hash table. And the flow counting module clears the minIndex position of the hash table, namely deleting the counting of the hash table to the flow quintuple information minQuintuple. Turning to 3.2.4.1.5.14.
3.2.4.1.5.14 flow counting module inserts Quintuple, 1 and currenttime at minIndex location of hash table. The flow counting module makes CountUpdate 0 and Index min Index 3.2.7.
The elephant flow storage module stores the elephant flow information sent by the flow counting module according to the following flow: the elephant flow storage module receives flow quintuple information sent by the flow counting module and stores the flow quintuple information into an elephant flow storage queue, and the method specifically comprises the following steps:
3.3.1, the elephant flow storage module initializes the elephant flow storage queue to empty. The elephant flow store queue is a circular queue of length LenQueue. The elephant stream storage module initializes a head pointer, HeadP, and a tail pointer, TailP, and HeadP is TailP. The head pointer, HeadP, points to the element position that was first enqueued. The tail pointer TailP points to the position of the element that the circular queue joined the queue the latest.
3.3.2, judging whether quintuple information EleQuintuple of the elephant flow sent by the flow counting module is received by the elephant flow storage module. If yes, executing 3.3.3, otherwise, continuing to execute 3.3.2.
3.3.3, the elephant flow storage module judges whether the queue unit pointed by the TailP is empty, and if not, the step 3.3.4 is executed. Otherwise, 3.3.6 is executed.
3.3.4, the elephant flow storage module empties the queue element pointed to by TailP. Execution 3.3.5.
3.3.5, the elephant flow memory module lets HeadP ═ HeadP + 1)% LenQueue, perform 3.3.6.
3.3.6, storing quintuple information EleQuintuple of the elephant flow into a queue unit pointed by TailP by the elephant flow storage module, and executing 3.3.7.
3.3.7, elephant flow storage module lets Tailp ═ Tailp + 1)% LenQueue. 3.3.8 is executed.
3.3.8, finally, through 3.3.2-3.3.7, the elephant flow storage module finishes the process of storing the five-tuple information of the elephant flow into the circular queue, and then the elephant flow storage module switches to 3.3.2 to continue to receive the five-tuple information EleQuintuple of the elephant flow sent by the flow counting module.
In addition, the present embodiment also provides a computer-readable storage medium, in which a computer program of the above-mentioned elephant flow quick detection method based on probability sampling is stored.
Example two:
the present embodiment is substantially the same as the first embodiment, and the main difference is that the first hash function and the second hash function are implemented differently. An embodiment uses the same hash function with different parameters to calculate the hash value. In the embodiment, different hash functions are directly used to calculate the hash value.
Calculating a position Index1 of the flow to which the data packet pkt belongs in the hash table by using a preset first hash function, which can be expressed as: index1 is HashA (Quintuple), where HashA is the first hash function and Quintuple is the header Quintuple information of the packet pkt.
Calculating a position Index2 of the flow to which the data packet pkt belongs in the hash table by using a preset second hash function, which can be expressed as: index2 ═ HashB (Quintuple), where HashB is the second hash function and Quintuple is the header Quintuple information of packet pkt.
Calculating the position index of the quintuple information minQuintuple in the hash table equal to minIndex1 by using a preset first hash function, which can be expressed as: minIndex1 is HashA (minQuintuple), where HashA is the first hash function and minQuintuple is the five-tuple information of the location index minIndex.
Example three:
the present embodiment is basically the same as the first embodiment, and the main differences are as follows: the system for rapidly detecting the elephant flow based on probability sampling in the comparison file 1 is a programmable switch. The present embodiment also provides a system for rapidly detecting an elephant flow based on probability sampling, which is specifically an intelligent network card, and the system also includes an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, the data forwarding controller is respectively connected to the input module and the output module, the data forwarding controller is programmed or configured to execute the steps of the method for rapidly detecting an elephant flow based on probability sampling, and the data forwarding controller includes: the data packet forwarding module is used for sending the five-tuple information of the data packet head of the passing data packet and the queue occupation ratio; the flow counting module is used for receiving the quintuple information of the head part of the data packet and the occupation proportion of the interface queue sent by the data packet forwarding module, counting the data packet by adopting a probability sampling method and detecting the elephant flow based on the counting result of the data packet; the elephant flow storage module is used for storing the detected elephant flow information based on the elephant flow storage queue; the input end of the flow counting module is connected with the data packet forwarding module, the output end of the flow counting module is connected with the elephant flow storage module, and the elephant flow storage module is connected with the elephant flow storage queue.
Example four:
the present embodiment is basically the same as the first embodiment, and the main differences are as follows: the system for rapidly detecting the elephant flow based on probability sampling in the comparison file 1 is a programmable switch. The present embodiment also provides a system for rapidly detecting an elephant flow based on probability sampling, which is specifically a commercial switch chip, and the system for rapidly detecting an elephant flow based on probability sampling also includes an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, the data forwarding controller is respectively connected to the input module and the output module, the data forwarding controller is programmed or configured to execute the steps of the method for rapidly detecting an elephant flow based on probability sampling, and the data forwarding controller includes: the data packet forwarding module is used for sending the five-tuple information of the data packet head of the passing data packet and the queue occupation ratio; the flow counting module is used for receiving the quintuple information of the head part of the data packet and the occupation proportion of the interface queue sent by the data packet forwarding module, counting the data packet by adopting a probability sampling method and detecting the elephant flow based on the counting result of the data packet; the elephant flow storage module is used for storing the detected elephant flow information based on the elephant flow storage queue; the input end of the flow counting module is connected with the data packet forwarding module, the output end of the flow counting module is connected with the elephant flow storage module, and the elephant flow storage module is connected with the elephant flow storage queue.
It should be noted that the programmable switch, the intelligent network card, and the commercial switch chip in the foregoing embodiments are merely examples of physical forms of the elephant flow rapid detection system based on probability sampling, and are not exhaustive, and needless to say, the elephant flow rapid detection method based on probability sampling of the present invention may also be applied to various types of network data forwarding hardware, and the elephant flow rapid detection system based on probability sampling may also be other various types of network data forwarding hardware, and an integrated component product or a complete machine product including various types of network data forwarding hardware, and therefore, description thereof is omitted.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application, wherein the instructions that execute via the flowcharts and/or processor of the computer program product create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A method for rapidly detecting elephant flow based on probability sampling is characterized by comprising the following steps:
1) the data packet forwarding module sends the five-tuple information of the data packet head of the passing data packet and the occupation proportion of the interface queue;
2) receiving quintuple information of the head of the data packet and the occupation proportion of an interface queue sent by a data packet forwarding module, counting the data packet by adopting a probability sampling method, and detecting the elephant flow based on the counting result of the data packet;
3) and storing the detected elephant flow information based on the elephant flow storage queue.
2. The method for rapidly detecting elephant flow based on probability sampling as claimed in claim 1, wherein the step of processing the data packet forwarding module for any passing data packet pkt in step 1) comprises:
1.1) reading head Quintuple information Quinuple of a passing data packet pkt from an incoming queue corresponding to an input interface InInt, wherein the head Quintuple information Quinuple is a binary string formed by splicing a source IP address srcIP, a destination IP address dstIP, a source port number srcPort, a destination port number dstPort and a protocol number protocol;
1.2) inquiring a hardware forwarding table according to the Quintuple information Quinuple at the head of the data packet pkt to obtain a forwarding output interface OutInt of the data packet pkt;
1.3) copying the data packet pkt from an in-queue of an interface InInt to an out-queue OutQueue of a forwarding out-interface OutInt, and acquiring the number CurrentNum of the data packets existing in the out-queue OutQueue of the forwarding out-interface OutInt;
1.4) dividing the number CurrentNum of data packets existing in an out-queue OutQueue of a forwarding out-interface OutInt by the total Length Length of the out-queue OutQueue to obtain an interface queue occupation ratio ORatio;
1.5) outputting the head Quintuple information Quinuple and the interface queue occupation ratio ORatio of the data packet pkt.
3. The method for rapidly detecting elephant flow based on probability sampling according to claim 2, wherein the step 2) comprises:
2.1) judging whether head Quintuple information Quinuple and an interface queue occupation ratio ORatio of the data packet pkt are received or not, and if so, skipping to execute the step 2.2); otherwise, continuing to return to execute the step 2.1);
2.2) if the occupation ratio ORatio of the interface queue is less than or equal to the preset minimum threshold MinthIf yes, the data packet is not counted; if the occupation ratio ORatio of the interface queue is larger than the preset minimum threshold MinthAnd less than or equal to a preset maximum threshold value MaxthCounting the data packet pkt by a probability p; if the occupation ratio ORatio of the interface queue is larger than the preset maximum threshold value MaxthCounting the data packet directly; the elephant flow is detected based on the packet count result.
4. The method for rapidly detecting elephant flow based on probability sampling according to claim 3, characterized in that step 2.1) is preceded by the following initialization steps: initially setting the countless data packet number as 0, where the countless data packet number is in a value range of [0, Length]Minimum threshold Min preset by interface queue occupation ratio ORatiothAnd Max of the highest thresholdthTimeout time Tmax of flow count information in hash table, maximum Max of real-time count probability qq(ii) a Step 2.2) comprises:
2.2.1) initializing the position Index of the flow to which the data packet pkt belongs in the hash table to 0, and initializing the flow count information CountUpdate of the flow to which the data packet pkt belongs to 0;
2.2.2) determining that the occupation ratio ORatio of the interface queue is greater than a preset minimum threshold MinthAnd less than or equal to a preset maximum threshold value MaxthIf yes, skipping to step 2.2.3), if the occupation ratio ORatio of the interface queue is greater than the preset maximum threshold value MaxthSkipping to step 2.2.4), if the occupation ratio ORatio of the interface queue is less than or equal to the preset minimum threshold MinthThen jump to step 2.2.5);
2.2.3) according to q ═ Maxq(Oratio–Minth)/(Maxth–Minth) Calculating a real-time count probability q, wherein MaxqThe value range of the real-time counting probability q is [0, Max ] which is the maximum value of the real-time counting probability qq](ii) a Calculating a counting probability p according to p ═ q/(1-count ×) wherein count is the number of the data packets which are not counted, and the value range of the counting probability p is [0, 1%](ii) a Multiplying the counting probability p by a preset random number value upper boundary value to obtain a random number rand; if the random number rand is smaller than the preset threshold value m, setting the number count of the data packets which are not counted as 0, counting the data packets pkt, and storing the flow information to which the data packets pkt belong in a hash table; otherwise, adding 1 to the countless data packet number count on the basis of the original value, if the countless data packet number count after adding 1 is greater than the total Length of the out queue, setting the countless data packet number count to 0, counting the data packet pkt, and storing the flow information to which the data packet pkt belongs in the hash table; jump execution step 2.2.6); otherwise, skipping to execute the step 2.2.5);
2.2.4) setting the countless data packet number as 0, counting the data packet pkt, and storing the flow information of the data packet pkt in a hash table; jump execution step 2.2.6);
2.2.5) not counting the data packet pkt, setting a position Index of the flow to which the data packet pkt belongs in the hash table to 0, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs to 0; jump execution step 2.2.6);
2.2.6) detecting the elephant flow based on the counting result of the data packets pkt.
5. The method for rapidly detecting elephant flow based on probability sampling as claimed in claim 4, wherein the step of storing the flow information to which the data packet pkt belongs in the hash table comprises:
s1) acquiring current time currenttime; calculating a position Index Index1 of the flow to which the data packet pkt belongs in the hash table by using a preset first hash function; if the position corresponding to the position Index1 in the hash table is empty, inserting the header Quintuple information quintuplet, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index1 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index1 in the hash table is not empty, jumping to the next step;
s2) extracting stream quintuple information QuintupleA, stream count information CountA, and filling time timeA of the location Index 1; if the extracted stream Quintuple information QuintupleA of the location Index1 and the header quintuplet information quintuplet of the data packet pkt are equal to each other, adding one to the stream count information of the location Index1, adding 1 to the stream count information CountA on the basis of the original value, setting the location Index of the stream to which the data packet pkt belongs in the hash table as the location Index1, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as the new stream count information CountA, and performing a jump step S3); otherwise, jumping to execute step S4);
s3) judging whether the difference between the currenttime and the filling time timeA is greater than the preset time Tmax, and if yes, emptying the data of a position Index1 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 1; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s4) calculating a position Index2 of the flow to which the data packet pkt belongs in the hash table by using a preset second hash function; if the position corresponding to the position Index2 in the hash table is empty, inserting the header Quintuple information quintuplet, the flow count information 1 and the current time currenttime of the data packet pkt into the position corresponding to the position Index2 in the hash table; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); if the position corresponding to the position Index2 in the hash table is not empty, jumping to the next step;
s5) extracting stream quintuple information QuintupleB, stream count information CountB, and filling time timeB of the location Index 2; if the extracted stream Quintuple information QuintupleB of the location Index2 and the header quintuplet information quintuplet of the data packet pkt are equal to each other, adding one to the stream count information of the location Index2, adding 1 to the stream count information CountB on the basis of the original value, setting the location Index of the stream to which the data packet pkt belongs in the hash table as the location Index2, setting the stream count information CountUpdate of the stream to which the data packet pkt belongs as the new stream count information CountB, and performing a jump step S6); otherwise, jumping to execute step S7);
s6) judging whether the difference between the currenttime and the filling time timeB is greater than the preset time Tmax, and if yes, emptying the data of a position Index2 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 2; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, directly skipping to execute the step 2.2.6);
s7) judging whether the difference between the currenttime and the filling time timeA is greater than the preset time Tmax, and if yes, emptying the data of a position Index1 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 1; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index1, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, jumping to execute step S8);
s8) judging whether the difference between the currenttime and the filling time timeB is greater than the preset time Tmax, and if yes, emptying the data of a position Index2 in the hash table; then, inserting header Quintuple information quintuplet, stream count information 1 and current time currenttime of the data packet pkt into a hash table by using a position Index 2; setting a position Index of a flow to which the data packet pkt belongs in the hash table as a position Index2, and setting flow count information CountUpdate of the flow to which the data packet pkt belongs as 1; jump execution step 2.2.6); otherwise, jumping to execute step S9);
s9) calculating a minimum value minCount between the flow count information CountA and the flow count information CountB ═ Min { CountA, CountB }, determining a corresponding position index minidex in the hash table according to the minimum value minCount, and obtaining quintuple information minQuintuple, flow count information minCount, and filling time minTime of the position index minidex;
s10) calculating a position index of the quintuple information minquintupple in the hash table equal to minIndex1 by using a preset first hash function, if the position corresponding to the position index minIndex1 in the hash table is empty, inserting the quintuple information minquintupple, the flow count information minCount and the filling time minTime into the position corresponding to the position index minIndex1 in the hash table, and jumping to execute step S12); otherwise, jumping to execute step S11);
s11) calculating a position index of the quintuple information minquintupple in the hash table equal to minIndex2 by using a preset second hash function, if the position corresponding to the position index minIndex2 in the hash table is empty, inserting the quintuple information minquintupple, the flow count information minCount and the filling time minTime into the position corresponding to the position index minIndex2 in the hash table, and jumping to execute step S12); otherwise, jumping to execute step S12);
s12) emptying the data of the position index minIndex in the hash table, and jumping to execute the step S13);
s13) inserting Quintuple information Quintuple, 1, and filling time currenttime of the stream to which the packet pkt belongs in the hash table by using the position Index minidex, setting the position Index of the stream to which the packet pkt belongs in the hash table as the position Index minidex, and setting the stream count information CountUpdate of the stream to which the packet pkt belongs as 1; the jump performs step 2.2.6).
6. The method for rapidly detecting elephant flow based on probability sampling according to claim 4, wherein the step 2.2.6) comprises: if the counting result of the data packet pkt is greater than the preset elephant flow counting threshold value, judging that the flow to which the data packet pkt belongs is the elephant flow, and emptying data of a position Index in the hash table; otherwise, the stream to which the data packet pkt belongs is judged to be not the elephant stream.
7. The method for rapidly detecting elephant flow based on probability sampling as claimed in claim 1, wherein the elephant flow storage queue in step 3) is a LenQueue with length of LenQueue, the circular queue comprises a plurality of queue units connected end to end, the elephant flow storage queue comprises a head pointer HeadP and a tail pointer TailP, the head pointer HeadP points to the element position of the earliest joining queue, the tail pointer TailP points to the element position of the latest joining queue of the circular queue, and the elephant flow storage queue is initialized to be empty, and both the initial head pointer HeadP and the tail pointer TailP are equal.
8. The method for rapidly detecting elephant flow based on probability sampling as claimed in claim 7, wherein the step of storing based on elephant flow storage queue in step 3) comprises:
3.1) judging whether quintuple information EleQuintuple of the elephant flow is received or not, and executing the next step if the quintuple information EleQuintuple of the elephant flow is received; otherwise, continuously returning to the step 3.1) to continuously carry out detection;
3.2) judging whether the queue unit pointed by the tail pointer TailP is empty, and if not, executing the next step; otherwise, skipping to execute the step 3.5);
3.3) clearing the queue unit pointed by the tail pointer TailP;
3.4) update the head pointer HeadP according to HeadP ═ (HeadP + 1)% LenQueue, where% is modulo arithmetic;
3.5) storing quintuple information EleQuintuple of the elephant flow into a queue unit pointed by a tail pointer TailP;
3.6) update the tail pointer TailP according to TailP ═ TailP + 1)% LenQueue, and end.
9. A system for rapidly detecting elephant flow based on probability sampling, comprising an input module with at least one input port and a corresponding input queue, an output module with at least one output port and a corresponding output queue, and a data forwarding controller, wherein the data forwarding controller is respectively connected with the input module and the output module, and is characterized in that the data forwarding controller is programmed or configured to execute the steps of the method for rapidly detecting elephant flow based on probability sampling according to any one of claims 1 to 8, and the data forwarding controller comprises: the data packet forwarding module is used for sending the five-tuple information of the data packet head of the passing data packet and the queue occupation ratio; the flow counting module is used for receiving the quintuple information of the head part of the data packet and the occupation proportion of the interface queue sent by the data packet forwarding module, counting the data packet by adopting a probability sampling method and detecting the elephant flow based on the counting result of the data packet; the elephant flow storage module is used for storing the detected elephant flow information based on the elephant flow storage queue; the input end of the flow counting module is connected with the data packet forwarding module, the output end of the flow counting module is connected with the elephant flow storage module, and the elephant flow storage module is connected with the elephant flow storage queue.
10. A computer-readable storage medium, wherein a computer program of the method for rapidly detecting elephant flow based on probability sampling according to any one of claims 1-8 is stored in the computer-readable storage medium.
CN202111028109.2A 2021-09-02 2021-09-02 Elephant flow rapid detection method and system based on probability sampling Active CN113746700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111028109.2A CN113746700B (en) 2021-09-02 2021-09-02 Elephant flow rapid detection method and system based on probability sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111028109.2A CN113746700B (en) 2021-09-02 2021-09-02 Elephant flow rapid detection method and system based on probability sampling

Publications (2)

Publication Number Publication Date
CN113746700A true CN113746700A (en) 2021-12-03
CN113746700B CN113746700B (en) 2023-04-07

Family

ID=78735146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111028109.2A Active CN113746700B (en) 2021-09-02 2021-09-02 Elephant flow rapid detection method and system based on probability sampling

Country Status (1)

Country Link
CN (1) CN113746700B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114240730A (en) * 2021-12-20 2022-03-25 苏州凌云视界智能设备有限责任公司 Processing method for detection data in AOI detection equipment
CN115396373A (en) * 2022-10-27 2022-11-25 阿里云计算有限公司 Information processing method and system based on cloud server and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131222A1 (en) * 2010-11-22 2012-05-24 Andrew Robert Curtis Elephant flow detection in a computing device
CN106453129A (en) * 2016-09-30 2017-02-22 杭州电子科技大学 Elephant flow two-level identification system and method
CN106453130A (en) * 2016-09-30 2017-02-22 杭州电子科技大学 Flow scheduling system and method based on accurate elephant flow identification
CN109861881A (en) * 2019-01-24 2019-06-07 大连理工大学 A kind of elephant stream detection method based on three layers of Sketch framework
CN110677324A (en) * 2019-09-30 2020-01-10 华南理工大学 Elephant flow two-stage detection method based on sFlow sampling and controller active update list
CN111262756A (en) * 2020-01-20 2020-06-09 长沙理工大学 High-speed network elephant flow accurate measurement method and structure
US10924418B1 (en) * 2018-02-07 2021-02-16 Reservoir Labs, Inc. Systems and methods for fast detection of elephant flows in network traffic
CN112416950A (en) * 2021-01-25 2021-02-26 中国人民解放军国防科技大学 Design method and device of three-dimensional sketch structure
CN112788038A (en) * 2021-01-15 2021-05-11 昆明理工大学 Method for distinguishing DDoS attack and elephant flow based on PCA and random forest

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131222A1 (en) * 2010-11-22 2012-05-24 Andrew Robert Curtis Elephant flow detection in a computing device
CN106453129A (en) * 2016-09-30 2017-02-22 杭州电子科技大学 Elephant flow two-level identification system and method
CN106453130A (en) * 2016-09-30 2017-02-22 杭州电子科技大学 Flow scheduling system and method based on accurate elephant flow identification
US10924418B1 (en) * 2018-02-07 2021-02-16 Reservoir Labs, Inc. Systems and methods for fast detection of elephant flows in network traffic
CN109861881A (en) * 2019-01-24 2019-06-07 大连理工大学 A kind of elephant stream detection method based on three layers of Sketch framework
CN110677324A (en) * 2019-09-30 2020-01-10 华南理工大学 Elephant flow two-stage detection method based on sFlow sampling and controller active update list
CN111262756A (en) * 2020-01-20 2020-06-09 长沙理工大学 High-speed network elephant flow accurate measurement method and structure
CN112788038A (en) * 2021-01-15 2021-05-11 昆明理工大学 Method for distinguishing DDoS attack and elephant flow based on PCA and random forest
CN112416950A (en) * 2021-01-25 2021-02-26 中国人民解放军国防科技大学 Design method and device of three-dimensional sketch structure

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
汤倩: "《基于SDN的数据中心网络大象流检测与调度策略研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
田雨: "《基于OpenFlow网络的大象流检测调度的研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114240730A (en) * 2021-12-20 2022-03-25 苏州凌云视界智能设备有限责任公司 Processing method for detection data in AOI detection equipment
CN114240730B (en) * 2021-12-20 2024-01-02 苏州凌云光工业智能技术有限公司 Processing method of detection data in AOI detection equipment
CN115396373A (en) * 2022-10-27 2022-11-25 阿里云计算有限公司 Information processing method and system based on cloud server and electronic equipment

Also Published As

Publication number Publication date
CN113746700B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
US10735325B1 (en) Congestion avoidance in multipath routed flows
US10135740B2 (en) Method and apparatus for limiting rate by means of token bucket, and computer storage medium
AU746446B2 (en) A method and apparatus for controlling the flow of variable-length packets through a multiport switch
CN113746700B (en) Elephant flow rapid detection method and system based on probability sampling
US7787442B2 (en) Communication statistic information collection apparatus
US10778588B1 (en) Load balancing for multipath groups routed flows by re-associating routes to multipath groups
US7206284B2 (en) Method and apparatus for automatic congestion avoidance for differentiated service flows
US8774001B2 (en) Relay device and relay method
CN101984608A (en) Method and system for preventing message congestion
CN107005485A (en) A kind of method, corresponding intrument and system for determining route
US10693790B1 (en) Load balancing for multipath group routed flows by re-routing the congested route
JP2002300197A (en) Method and device for setting priority
WO2015107385A2 (en) Methods and network device for oversubscription handling
EP3526937A1 (en) Heterogeneous flow congestion control
US8942090B2 (en) Technique for throughput control for packet switches
RU2628477C2 (en) Package processing device, method of configuring stream entry and program
US8194545B2 (en) Packet processing apparatus
US8867350B2 (en) Method and apparatus for packet buffering measurement
CN111970211A (en) Elephant flow processing method and device based on IPFIX
CN110300085B (en) Evidence obtaining method, device and system for network attack, statistical cluster and computing cluster
US20230155947A1 (en) Method for identifying flow, and apparatus
CN109787922B (en) Method and device for acquiring queue length and computer readable storage medium
Avci et al. Congestion aware priority flow control in data center networks
CN114710444A (en) Data center flow statistical method and system based on tower abstract and evictable flow table
CN109547389B (en) Code stream file recombination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant