WO2021229361A1 - Heavy hitter flow classification based on inter-packet gap analysis - Google Patents

Heavy hitter flow classification based on inter-packet gap analysis Download PDF

Info

Publication number
WO2021229361A1
WO2021229361A1 PCT/IB2021/053738 IB2021053738W WO2021229361A1 WO 2021229361 A1 WO2021229361 A1 WO 2021229361A1 IB 2021053738 W IB2021053738 W IB 2021053738W WO 2021229361 A1 WO2021229361 A1 WO 2021229361A1
Authority
WO
WIPO (PCT)
Prior art keywords
flow
data packet
network node
ipgw
ipg
Prior art date
Application number
PCT/IB2021/053738
Other languages
French (fr)
Inventor
Christian Rodolfo ESTEVE ROTHENBERG
Suneet KUMAR SINGH
Pedro Henrique GOMES DA SILVA
Gergely PONGRÁCZ
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Publication of WO2021229361A1 publication Critical patent/WO2021229361A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0888Throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/106Active monitoring, e.g. heartbeat, ping or trace-route using time related information in packets, e.g. by adding timestamps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service

Definitions

  • the present disclosure relates generally to monitoring traffic in computer networks, and more particularly, to detecting Heavy Hitters (HH) in network streams transmitted over computer networks.
  • HH Heavy Hitters
  • Network traffic monitoring is crucial in order to effectively identify and resolve network is sues before they get worse. For example, network traffic monitoring generally helps to prevent outages that could cause network bottlenecks, as well as to correct network outages soon after they occur. Additionally, network traffic monitoring allows network operators to identify security threats, satisfy service level agreements (SLAs) in place with subscribers, and facilitates the ability of network devices to make re-routing decisions, and the like.
  • SLAs service level agreements
  • Heavy Hitter commonly refers to an entity that is the most powerful and influential among other entities.
  • a Heavy Hitter refers to a packet flow carrying a significant amount of traffic in terms of the number of packets and/or bytes (i.e. throughput in bits per second - bps) over a network link of a given bandwidth capacity (in bps).
  • the ability to detect these heavy flows also commonly referred to as “ele phant flows,” is fundamental to many network management and security applications such as Denial of Service (DoS) attack prevention, flow-size aware routing, and Quality of Service (QoS) management.
  • DoS Denial of Service
  • QoS Quality of Service
  • HH detection techniques focus on the timely and accurate detection of a set of heavy flows. That is, HH detection should occur within a very short time and with a high degree of ac curacy (i.e. detecting as many true positives HH flows as possible and avoiding false positives). Additionally, HH techniques should use as few memory resources as possible (e.g., maintain a limited amount of state information), have a low computation complexity (e.g., require few and simple calculations), and minimize communication overhead (e.g., send and receive collected data or network activity reports over out-of-band (OOB) networks).
  • OOB out-of-band
  • iOS NetFlow by CISCO and sFlow by sFlow.org are two traditional sam pling-based network traffic measurement approaches commonly used to detect HH.
  • a network device e.g., a router or a switch
  • a network device e.g., a router or a switch
  • sample-and-hold One such method, referred to as a “sample-and-hold” method, is described in an article authored by C. Estanand and G. Varghese entitled, “New di rections in traffic measurement and accounting,” ACM Trans. Computer Systems, 21 (3), 2003. These “sample-and-hold” methods count every packet in a sampled flow, in contrast to the ap proaches used by NetFlow, which count only those packets that are sampled.
  • Streaming algorithms are alternate methods to the packet sampling approaches dis cussed above. Streaming algorithms employ data structures with a bounded memory size and process every packet in a flow rather than only sampled packets.
  • Sketching algorithms such as those employed by Count-min sketch and Count sketch methods, provide yet another approach to network traffic monitoring.
  • These particular ap proaches which are respectively described in papers authored by G. Cormode and S. Muthuk- rishnan (“An improved data stream summary: The count-min sketch and its applications,” Jour nal of Algorithms, 55(1):58-75, 2005), and M. Charikar, K.Chen and M. Farach-Colton (“Finding frequent items in data streams,” In Springer ICALP, 2002), are efficient algorithms that employ sublinear space for counting the streams of packets. More particularly, for every incoming pack et, a number of hash functions are applied on the packet headers for indexing.
  • the counters at every hashed location increase, and the minimum value among all the hashed locations is de termined for a given bounded time.
  • One drawback of these algorithms are hash collisions, which can cause the over-counting of packets and thus, negatively impact accuracy.
  • a space saving algorithm such as the one described in the paper authored by A. Metwally, D. Agrawal, and A. EIAbbadi and entitled “Efficient computa tion of frequent and top-k elements in data streams,” In International Conference on Database Theory, Springer, 2005, maintain both a key and a count for each incoming packet. Particularly, when a packet arrives, these algorithms check to determine whether its corresponding flow en try is stored in a table in memory. If so, the algorithm increments the corresponding by 1 . When the table is full, the algorithm replaces the table entry for the flow having the minimum count with a new flow entry with a count value equal to the minimum count + 1 . However, searching for the minimum count in the table for each incoming packet increases the overall processing overhead.
  • HashPipe algorithms such as those described in the paper authored by Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., and Rexford, J. entitled “Heavy-hitter detec tion entirely in the data plane,” 2017, is a modified space saving algorithm (tailored to program mable switches based on P4/PISA) that maintains multiple stages of hash tables to reduce the number of memory reads. Outdated information is difficult to remove in such HashPipe ap proaches, however, which affects accuracy.
  • Counter Overflow With conventional methods, the counters can overflow more frequently. Such overflow can occur especially when a network device processes hundreds of millions of packets every second;
  • Counter Size The size of the counters should be large enough to prevent the frequent intervention of a controller to flush all counters to 0. However, large counter sizes oc cupy excessive memory.
  • High Complexity There is a high level of complexity in the data structures used by conventional methods. Therefore, in order to determine a current HH flow sta tus/classification, a more complex method is needed to update and remove outdated and/or ir relevant information;
  • counter overflow is an especially common problem. This issue, however, can generally be avoided by resetting the counters at regular intervals. Not only do regular counter resets avoid overflow, but they also remove outdated or irrelevant in formation. Such information is not required by a device to determine a current status of the HH flows, and can lead to falsely detecting an HH flow.
  • a solution to the counter overflow issue is presented in a paper authored by B. Turkovic, J. Oostenbrink, and F. Kuipers entitled “Detecting Heavy Hitters in the Data-plane,” arXiv pre print arXiv:1902.06993 (2019).
  • the coun ters can be set to a larger size, such as 32 bits, for example. As stated above, however, larger counter sizes occupy more memory space. This can be a particularly cumbersome problem - especially in switch-based Application Specific Integrated Circuits (ASICs) where the memory available to store stateful information is very limited.
  • ASICs Application Specific Integrated Circuits
  • conventional HH detection techniques have limitations. For example, convention al techniques typically consume large amounts of memory in order to maintain a traffic volume state. Conventional techniques also require the frequent intervention of a controller component, which undesirably increases the overall communication and processing overhead. Further, with some conventional techniques, HH flows detected in previous time window can be lost, and thus, need to be detected again. The need for such “re-detection” of the same HH flow can in crease detection time and decrease accuracy.
  • the embodiments of the present disclosure employ an Inter-Packet Gap (IPG)-based analysis to detect and classify a given data packet flow received at a network node as being an HH flow. More specifically, the embodiments herein improve HH flow classification using low complexity methods and limited memory consumption, and reduce the times the network node needs to detect an HH flow, while greatly increasing the accuracy of HH flow detection.
  • the present disclosure provides a method, performed by a network node, determined whether a flow of data packets in a network stream is a Heavy Hitter (HH) flow.
  • HH Heavy Hitter
  • the method comprises receiving a flow of data packets at a network node, wherein each data packet has an ingress timestamp (TS) indicating a time at which the data packet was received, determining one or more Inter-Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets, and determining the flow to be an HH flow based on an analysis of the IPG values.
  • TS ingress timestamp
  • IPG Inter-Packet Gap
  • a network node comprises commu nications circuitry configured to communicate with one or more network nodes, and processing circuitry operatively connected to the communications circuitry.
  • the processing circuitry is con figured to receive a flow of data packets, wherein each data packet has an ingress timestamp (TS) indicating a time at which the data packet was received, determine one or more Inter- Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets, and determine the flow to be an HH flow based on an analysis of the IPG values.
  • TS ingress timestamp
  • IPG Inter- Packet Gap
  • the present disclosure provides a non-transitory computer-readable medium stores a computer program thereon.
  • the computer program comprises instructions that, when executed by processing circuitry of a network node, causes the network node to re ceive a flow of data packets, wherein each data packet has an ingress timestamp (TS) indicat ing a time at which the data packet was received, determine one or more Inter-Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets, and determine the flow to be an HH flow based on an analysis of the IPG values.
  • TS ingress timestamp
  • IPG Inter-Packet Gap
  • the present disclosure provides a computer program comprises exe cutable instructions that, when executed by a by processing circuitry of a network node, causes the network node to receive a flow of data packets, wherein each data packet has an ingress timestamp (TS) indicating a time at which the data packet was received, determine one or more Inter-Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets, and determine the flow to be an HH flow based on an analysis of the IPG values.
  • TS ingress timestamp
  • IPG Inter-Packet Gap
  • the present disclosure provides a carrier containing the computer pro gram according to the fourth aspect.
  • the carrier is one of an electronic signal, optical signal, radio signal, or computer readable storage medium.
  • the present disclosure provides a method, performed by a network node, for determining whether a flow of data packets in a network stream is a Heavy Hitter (HH) flow.
  • the method comprises receiving a plurality of data packet flows at a network node, wherein each flow comprises a plurality of incoming data packets, and wherein each in coming data packet has an ingress timestamp (TS) indicating a time at which it was received at the network node, determining an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet, and determining the flow to be an HH flow based on an analysis of the IPG values.
  • TS ingress timestamp
  • IPG Inter-Packet Gap
  • the present disclosure provides a network node comprising com munications circuitry configured to communicate with one or more network nodes, and pro cessing circuitry operatively connected to the communications circuitry.
  • the pro cessing circuitry is configured to receive a plurality of data packet flows at a network node, wherein each flow comprises a plurality of incoming data packets, and wherein each incoming data packet has an ingress timestamp (TS) indicating a time at which it was received at the network node, determine an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet, and determine the flow to be an HH flow based on an analysis of the IPG values.
  • TS ingress timestamp
  • IPG Inter-Packet Gap
  • the present disclosure provides a non-transitory computer-readable medium storing a computer program.
  • the computer program comprises instructions that, when executed by processing circuitry of a network node, causes the network node to receive a plu rality of data packet flows at a network node, wherein each flow comprises a plurality of incom ing data packets, and wherein each incoming data packet has an ingress timestamp (TS) indi cating a time at which it was received at the network node, determine an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet, and determine the flow to be an HH flow based on an analysis of the IPG values.
  • TS ingress timestamp
  • IPG Inter-Packet Gap
  • the present disclosure provides a computer program comprising exe cutable instructions that, when executed by a by processing circuitry of a network node, causes the network node to receive a plurality of data packet flows at a network node, wherein each flow comprises a plurality of incoming data packets, and wherein each incoming data packet has an ingress timestamp (TS) indicating a time at which it was received at the network node, determine an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet, and determine the flow to be an HH flow based on an analysis of the IPG values.
  • TS ingress timestamp
  • IPG Inter-Packet Gap
  • the present disclosure provides a carrier containing the computer pro gram according to the ninth aspect.
  • the carrier is one of an electronic signal, optical signal, ra dio signal, or computer readable storage medium.
  • Figures 1A-1B illustrate respective mouse and elephant flows suitable for use with em bodiments of the present disclosure.
  • Figures 2A-2B illustrate exemplary data structures suitable for implementation with em bodiments of the present disclosure.
  • Figures 3A-3C illustrate the IPG-based HH classification method applied to a sample trace according to one embodiment.
  • Figures 4A-4C are graphs illustrating that IPGw values calculated according to the pre sent embodiments indicate an HH flow.
  • Figure 5 is a flow diagram illustrating a method for determining whether a given data flow received at a network node is an HH flow according to embodiments of the present disclo sure.
  • Figure 6 is a flow diagram illustrating a more detailed method for determining whether a given data flow received at a network node is an HH flow according to embodiments of the pre sent disclosure.
  • FIG. 7 graphically illustrates the method of the embodiment described in Figure 6.
  • Figures 8A-8C are graphs illustrating how the false negative rate and the false positive rate of identifying HH flows relates to the size of a hash table for different numbers of HH flows according to embodiments of the present disclosure.
  • Figure 9 is a functional block diagram of a network node configured according to one embodiment of the present disclosure.
  • Figure 10 illustrates a computer program product executing on the processing circuitry of a network node according to one embodiment of the present disclosure.
  • exemplary embodiments of the present disclosure pro vide a technique for using Inter Packet Gap (IPG) analysis to identify Heavy Hitter (HH) flows rather than using packet count approaches, as in conventional methods.
  • IPG Inter Packet Gap
  • HH Heavy Hitter
  • a Heavy Hitter flow can be characterized by small IPG values (i.e., the elapsed time intervals between two consecutively received data packets).
  • the throughput (i.e., heaviness) of a packet flow can be approximated by dividing the average pack et size by the average IPG value (e.g. 1 KB packet every 1 ms equals 8Mbps).
  • one way to calculate an IPG value is to determine the time differences between the ingress timestamps of consecutively received packets in a flow.
  • An IPG metric for the flow is then determined as function (e.g., an average) of those de termined IPG values.
  • one embodiment of the present disclosure utilizes an expo nential weighted moving average (EWMA) of the IPG values in a flow to determine the IPG met ric for the flow. This metric is then analyzed over a period of time in order to determine whether the flow should or should not be classified as a HH flow.
  • EWMA expo nential weighted moving average
  • the computed IPG metric can also be used beyond such a “binary” classification of a HH flow (i.e., is or is not a given flow an HH flow).
  • the present disclosure also provides an approximate ranking of multiple flows with respect to their throughput or “heaviness.”
  • the EWMA i.e., the “IPGw”
  • TS ingress timestamp
  • HH flows can be classified into ranges of IPG metrics, with each range corresponding to a “bucket” identified by a HH Bucket ID (e.g. an 8-bit identifier).
  • HH Bucket ID e.g. an 8-bit identifier.
  • Each HH Bucket ID is unique and mapped to a corresponding IPGw values (e.g., a mean value within the IPG range of the HH Bucket ID) used for IPGw calculations.
  • Embodiments of the present disclosure provide advantages and benefits that conven tional methods of HH flow detection do not provide.
  • IPG based im plementation is a very lightweight approach when compared to the traditional packet count ap proaches.
  • conventional counter based HH flow detection generally requires 32-bit counters in order to maintain the requisite information in a hash table.
  • the IPG-based embodi ments of the present disclosure are able to detect HH flows by maintaining 16-bits or less for the TS parameters and 8-bits or less for each HH Bucket ID.
  • the present embodiment do not introduce any additional limitations in terms of time windows, or require a controller to reset counters.
  • different, more efficient data structures and techniques are possible with the present embodiments, thereby making IPG-based HHs detection more flexible in terms of memory and accuracy.
  • the IPG-based approaches of the present disclosure are suitable for opera tion with most of the existing algorithms and data structures used to detect HH flows with or without minor modifications. Further, the present IPG-based methods improve accuracy, detec tion time, and reduce the overhead associated with computation, memory, and communication.
  • the embodiments provided by the present disclosure are also suitable for implementa tion in emerging programmable networking devices, such as Protocol Independent Switch Ar chitecture (PISA) switches supporting the P4 language, and Smart Network Interface Cards (SmartNICs) that may exist in end systems.
  • PISA Protocol Independent Switch Ar chitecture
  • SmartNICs Smart Network Interface Cards
  • This is in addition to being implemented in tradi tional computing systems based on general purpose processors (x86).
  • embodiments of the present disclosure utilize IPG analy sis, sometimes referred to as “inter-packet time,” as an effective technique to detect HH flows.
  • Figures 1 A-1B illustrate two types of packet flows being received at a network de vice - a so-called “mouse” flow 10 ( Figure 1 A) and a so-called “elephant” flow 20 ( Figure 1 B).
  • Each flow comprises a plurality of received data packets with corresponding IPGs between con secutively received packets.
  • mouse flow 10 shows data packets 12-18 having cor responding IPGs IPG 1 ...IPG 3 .
  • Elephant flow 20 shows data packets 22-32 having corresponding IPGs IPG 1 ...IPG 5 .
  • Figures 1A-1B illustrate how the number of packets in a given flow relate to the IPG- based method of the present embodiments.
  • a flow such as mouse flow 10 having a small num ber of packets with relatively large IPGs over a given time window may not be considered to be an HH flow.
  • a flow such as elephant flow 20 having a large number of packets with small IPG values over a given time window can be considered to be an HH flow.
  • both the large number of data packets 22-32 in the elephant flow 20 and the relatively small IPG values IPG1...IPG5 are indicators of a high throughput flow, i.e. a HH flow.
  • Figures 2A-2B illustrate hash tables for maintaining a per-flow state for packet data flows, such as mouse flow 10 and elephant flow 20, according to the present embodiments.
  • hash table 50 of Figure 2A maintains two variables.
  • Variable k which is optional, rep resents an identifier (ID) or an IP source address of a 5-tuple flow.
  • Variable c which is manda tory, indicates a total number of packets in a flow received over a given period of time f (e.g., a pre-defined time window).
  • a network device counting the data packets re ceived in a given flow k would increment cwith each counted packet.
  • FIG 2B illustrates a hash table 60 configured according to at least one embodiment of the present disclosure.
  • the network device monitoring the flows maintains hash table 60 with three variables - k, which as described above is an optional variable identify ing either the flow ID or the source address of the given flow, IPGw indicating the weighted IPG metric, and TSL, which is the value of the ingress TS of the last received data packet.
  • the IPGw metric is a value that is calculated as the EWMA of the individual IPGs of a flow (e.g., IPG1...IPG5 of elephant flow 20 seen in Figure 1B) along with the ingress TS of the last received data packet (TSL).
  • the calculations are based on several varia bles:
  • the ingress TS of the last received data packet is first set to the value of the ingress TSN of a next consecutively received data packet.
  • IPG N is a time difference value equal to the elapsed time between the ingress TSs of two consecutively received data packets - the ingress TS of the last received data packet (TSL) and the ingress TS of the next consecutively received data packet (TSN).
  • the weighted metric of the IPG (IPGw) is then calculated using the following exponential weighted moving average (EWMA) formula:
  • IPGw a IPGW_LAST + (1-a) IPGN where: a is a degree of weighting decrease; and
  • IPGW_L A ST is the last determined IPGw metric for the flow.
  • the value of ‘a’ plays a significant role for tuning accuracy in the present embodiments. This is evidenced, as seen in more detail below, by the results of using the IPG-based method of the present disclosure on actual traces to detect HH flows. In particular, the results indicate that the value of a can be optimized not only for different time window sizes, but also for differ ent numbers of top-ranked HH flows and/or different HH throughput thresholds. According to the present disclosure, the value of a can be easily adapted at run-time based on one or more dy namic considerations, such as the amount of incoming traffic and measurement parameters.
  • the method of the present disclosure was tested on actual ISP backbone traffic traces to determine whether a given flow was or was not a HH flow.
  • the traces that were used for test ing can be located at in a paper provided by the Center for Applied Internet Data Analysis (CAIDA) entitled, “The CAIDA UCSD Anonymized Internet Traces 2016 - March,” http://www.caida.org/data/passive/passive_2016_dataset.xml.”
  • the results of those tests are illustrated in the graphs seen in Figures 3A-3C and 4A-4C.
  • the patterns of IPG N and IPGw were analyzed using equations (1) - (3) above with the actual ISP backbone traffic traces.
  • Figures 3A-3B illustrate one sample trace of 48 secs over a 10G interface. There were a total of 20,295,101 data packets in the trace and 693,931 different 5 Tuple flows.
  • the IPG-based method of the present disclosure was implemented in a Python script to process the Comma Separated Value (CSV) file from a target Packet Capture (PCAP) trace.
  • CSV Comma Separated Value
  • PCAP target Packet Capture
  • an IPGN value is calculated based on equation (2) for every incom ing data packet of each flow.
  • 3-4 random flows were selected for each of three different ranges of data throughput.
  • Figure 3A illustrates the selected flows with throughput > 10 Mbps
  • Figures 3B and 3C illustrate the selected flows with throughput between 1-2 Mbps and between 0.1 -0.5 Mbps, respectively.
  • FIG. 3A illustrate the IPGN values for different time intervals of a flow where each interval contains 1000 packets. Since most of the packets come in bursts, spikes of up to 12k microseconds at substantially regular intervals can be observed, which indicate Inter Burst Gap (IBG). Generally, the IPGN values for all three selected flows in Figure 3A are below 1k microseconds. Figures 3B and 3C, in contrast, illustrate larger IPGN values with relatively large spikes and decreasing throughput.
  • IBG Inter Burst Gap
  • Figure 5 is a flow diagram illustrating a method 70 for classifying a given flow as an HH flow according to one embodiment of the present disclosure.
  • Method 70 as described herein, is implemented by a network node.
  • a network node such as a Wi-Fi network.
  • the embodiments of the present disclosure are suitable for implementation in a variety of different network nodes and in various data plane mechanisms, pipelines and data structures supported by the network nodes. This includes both existing/traditional nodes and architectures and emerging programmable architectures.
  • method 70 begins with the network node receiving a flow of data packets (box 72).
  • the network node will receive multiple flows, each having a plurality of corresponding data packets; however, for ease of discussion, method 70 is explained in the context of a single flow.
  • the flow of Figure 5 is received in a unidirectional stream of 5-tuple flows, and each data packet in the flow has an ingress TS indicating a time at which it was received at the network node.
  • the network node next determines the IPGN value for each IPG between consecutively received data packets (box 74). As previously described, the IPGN value is the elapsed time between two consecutively received data packets. Then, for each de termined IPGN value, the network node uses Equation (3) to calculate the IPGw value for the flow (box 76), and updates that value in hash table 60. Additionally, as previously described, network node also updates other variables in hash table 60 such as the ingress TS of the last received data packet (i.e., TSL), and optionally, the variable k indicating the number of data packets received with the flow. Once the values are calculated, network node determines whether the given flow is an HH flow based on the IPG values (box 78).
  • a HH flow can be detected based on various strategies.
  • the network node compares the calculated IPGw value to a pre defined IPGw threshold each time the IPGw value is updated. If the calculated IPGw value ex ceeds the IPGw threshold, the flow would be classified as an HH flow.
  • the network node ranks the flow against other flows received in a network stream.
  • data packet flows having a high throughput for a fixed time interval exhibit lower IPGw values than do packet data flows having a low throughput. Therefore, “high throughput” flows having lower IPGw values would typically be at the “bottom” of the ranked IPGw values.
  • the network node may deter mine that the flows associated with the r-most “bottom-ranked” IPGw values (i.e., where ‘r’” is a predetermined integer) are HH flows.
  • network node determines if a given flow is an HH flow based on IPGw value metrics.
  • network node may rank the IPGw values, as previously described. If the ranked IPGw value for a given flow falls within a predetermined percentile of all ranked flows (e.g., if the ranked IPGw value for the given flow is in the “bottom” 5% of the ranked IPGw values), network node may classify the given flow as an HH flow.
  • FIG 6 is a flow diagram of a method 80, implemented at a network node, illustrating the present embodiments in more detail.
  • Figure 7 illustrates, graphically, the steps performed by network node in Figure 6.
  • the IPG-based method of the present disclosure is applied using a simple memory efficient data structure influenced by a space saving algo rithm, such as the one described in the paper by A. Metwally, D. Agrawal, and A. EIAbbadi, and entitled “Efficient computation of frequent and top-k elements in data streams,” In International Conference on Database Theory, Springer, 2005.
  • this embodiment assigns a finite number of memory slots called a flow table (e.g., hash table 60), in order to maintain a per-flow state.
  • a flow table e.g., hash table 60
  • the network node first receives the data packets of one or more in coming flows (box 82), and determines the 5-Tuple flow ID for each flow and the ingress TS of each incoming data packet for each flow (box 84).
  • the flow IDs may be de termined from information associated with the received flows.
  • network node searches hash table 60 for the flow ID of the given flow (box 86). If the flow ID is already in hash table 60, network node calculates the IPGw value using equations (2) and (3) above (box 88), and updates the memory slot in hash table 60 corresponding to the flow ID accordingly.
  • network node updates the IPGw value in hash table 60 to be the newly calcu- lated IPGw value, and sets the ingress TS of the last received data packet TSL to the ingress TS of the received data packet TSN (box 90).
  • network node will determine whether hash table 60 has a “free” or “open” slot (box 92). If there are no free slots in hash table 60, network node locates the slot having the highest IPGw value and replaces or “overwrites” the information that slot. As described above, flows that are not considered to be HH flows (e.g., “HH flow candidates”) will have IPGw values that are higher than flows that are considered to be HH flow candidates.
  • network node replaces the IPGw and TSL values of the entry having the highest IPGw value in hash table 60 with the newly calculated IPGw and TSN values of the received data packet (box 94). If a free slot is available, however, network node simply inserts the flow ID and IPGw values associated with the received data packet into the free slot, and sets the TSL value of that slot to the TS N of the received data packet (box 96). Network node then follows these same steps for each incoming data packet in each flow, and determines, based on the information stored in hash table 60 (e.g., their respective IPGW values), which of the flows should be classified as HH flows.
  • the information stored in hash table 60 e.g., their respective IPGW values
  • the IPG-based method of the present disclosure is suitable for im plementation in most of the existing HH detection mechanisms and network device data path pipelines.
  • Figures 8A-8C graphically illustrate the results of one such implementation.
  • CAIDA Center for Applied Internet Data Analysis
  • the data was split into small chunks based on time window sizes, such as detailed in Table 1 below. For each time window, the tests used about 50 chunks. Each chunk is considered to be one “trail,” and each data point in the graphs illustrated in Figures 8A-8C represent the average of 50 trails. Additionally, three different sizes of time windows are used so as to better evaluate how the IPG-based HH detection method of the present disclosure performs for different time windows.
  • the IPG-based HH detection method was implemented in a Python-based simulator that reads each incoming packet of multiple, different flows.
  • a data structure e.g., hash table 60
  • the test utilized an array of n memory slots, with each memory slot storing a tuple (Flow ID, IPGw, TSL) for a corresponding one of the incoming flows.
  • network node inserts the tuple (Flow ID, IPGw, TSL) as a new entry. If the hash table 60 is full, and the flow ID does not exist in the hash table 60, the present embodiments replace the Flow ID in the slot having the maximum IPGw value with the Flow ID of the incoming packet flow, but keeps the same IPGw. After processing about 50 chunks for each time window, the following accuracy results in terms of false positives and false negatives were obtained.
  • Figures 8A-8C illustrate the False Negative rate (i.e. the number of true HH flows not re ported as top-k HH divided by all analyzed flows) and False Positive rate (i.e. the number of non-HH flows falsely reported as top-k HH divided by all analyzed flows) as a function of an in creasing number of memory slots for different numbers of top-k reported HH flows to be identi fied.
  • the false negatives and false positives decrease as more memory slots are allocated.
  • Figures 8A(iii), 8B(vi), and 8C(ix) illustrate the False Negative rate for 50, 100, and 200 top flows for selected trace flows #70, #120 and #220, respectively. Based on these figures, it is seen that the false negative rate reduces up to 0.04. This indicates that the IPG-based approach of the present embodiments can miss detecting flows at the tail end of the top k flows where the difference in throughput between the different flows is very small. Hence, when flows are over reported to detect the top k flows (e.g., the top k+20 rows are reported), the false negative rate decreases. Fine tuning of the a parameter value can help to detect HHs at the tail end of top ‘k’ flows, or a small number of over-reported flows to detect k flows can dras tically increase accuracy.
  • the false negative rate for different sizes of time window also confirms that the IPG- based HHs detection and classification method of the present disclosure can detect HH flows with high accuracy among a various number of total flows such as 15,000, 45,000 and 70,000 flows.
  • the IPG-based method for HH flow detection according to the present dis closure provides at least the following advantages when compared to conventional methods for detecting HH flows:
  • the IPG based approach of the present disclosure is easy to implement and is suitable for use with most existing algorithms and data plane tech nologies for detecting HH flows. Additionally, the present embodiments reduce the com plexity needed for detecting HH flows, which translates into the use of fewer computa tional and communication resources. For example, by using compact timestamps and low IPG metrics encoding as a means and indicator for detecting and identifying HH flows, the present embodiments utilize fewer bits than conventional packet counter ap proaches that use 32-bits to prevent counter overflow. Additionally, while most existing memory-efficient HH detection solutions focus only on a low memory footprint, they ig nore the number of memory accesses (typically just one read/modify/write per data structure).
  • the present embod iments do not require resetting the IPGw metric at regular time intervals, as do conven tional solutions based on counting packets.
  • the present embodiments mitigate the frequent intervention of a controller, which contributes to improved accuracy and lower communication costs.
  • the ability to have a standalone networking device detect an HH flow merely by inspecting a current IPGw metric associated with the flow allows for the timely execution of data plane-only actions to handle the HH flow (e.g., apply QoS policy such as rate-limit or queue selection based on IPGw metric).
  • the present embodiments move beyond the simple binary classification of an HH flow (i.e. , is/is not a given flow an HH flow) based on a pre configured packet count threshold over a given time window.
  • the IPG-based method of the present disclosure provides a native way to rank flows based on their IPGw metrics. While some conventional counter-based approaches also rank flows, none use a memory efficient approach based on probabilistic data structures (e.g., hash ing-based, Bloom filter like). Moreover, conventional approaches, such as those that are based on the so-called “sketching algorithms,” are not capable of reporting arbitrary rankings of top HH flows.
  • An apparatus can perform any of the methods herein described by implementing any functional means, modules, units, or circuitry.
  • the apparatus comprises respective circuits or circuitry configured to perform the steps shown in the method figures.
  • the circuits or circuitry in this regard may comprise circuits dedicated to performing cer- tain functional processing and/or one or more microprocessors in conjunction with memory.
  • the circuitry may include one or more microprocessor or microcontrollers, as well as other digital hardware, which may include Digital Signal Processors (DSPs), special-purpose digital logic, and the like.
  • DSPs Digital Signal Processors
  • the processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as read-only memory (ROM), random-access memory, cache memory, flash memory devices, optical stor age devices, etc.
  • Program code stored in memory may include program instructions for execut ing one or more telecommunications and/or data communications protocols as well as instruc tions for carrying out one or more of the methods described herein, in several embodiments.
  • the memory stores program code that, when executed by the one or more processors, carries out the methods described herein.
  • FIG. 9 is a block diagram of some functional components of a network node 100 configured according to one embodiment of the present disclosure.
  • the network node 100 can be configured to implement the procedures and methods for HH flow classification as herein described and comprises processing circuitry 102, memory 104, and communications circuitry 108.
  • the communication circuitry 108 comprises interface circuitry for communicating with other network nodes in a computer network.
  • the incoming data packets of a plurali ty of flows are received by communication circuitry 108.
  • Processing circuitry 102 controls the overall operation of the network node 100 and is configured to implement the procedures shown in Figures 5-7.
  • the processing circuitry 102 may comprise one or more microprocessors, hardware, firmware, or a combination thereof configured to perform methods shown in Figures 5-7.
  • Memory circuitry 104 comprises both volatile and non-volatile memory for storing com puter program code and data needed by the processing circuitry 102 for operation.
  • Memory cir cuitry 104 may comprise any tangible, non-transitory computer-readable storage medium for storing data including electronic, magnetic, optical, electromagnetic, or semiconductor data storage.
  • Memory circuitry 104 stores a computer program 106 comprising executable instruc tions that configure the processing circuitry 102 to implement the methods illustrated and dis cussed with respect to Figures 5-7.
  • a computer program in this regard may comprise one or more code modules corresponding to the means or units described above.
  • computer program instructions and configuration information are stored in a non-volatile memory, such as a ROM, erasable programmable read only memory (EPROM) or flash memory.
  • Temporary da ta generated during operation may be stored in a volatile memory, such as a random access memory (RAM).
  • computer program 106 for configuring the processing circuitry 102 as herein described may be stored in a removable memory, such as a portable compact disc, portable digital video disc, or other removable media.
  • the computer program 106 may also be embodied in a carrier such as an electronic signal, optical signal, radio signal, or computer readable storage medium.
  • Figure 10 illustrates a computer program product, such as computer program 106, exe cuting on the processing circuitry 102 of network node 100 according to one embodiment of the present disclosure.
  • computer program 106 comprises a communications module/unit 110, an IPG determination module/unit 112, an IPGw determination module/unit 114, a Heavy Hitter (HH) determination module/unit 116, and a flow table update module/unit 118.
  • communications module/unit 110 an IPG determination module/unit 112
  • IPGw determination module/unit 114 an IPGw determination module/unit 114
  • HH Heavy Hitter
  • the communications module/unit 110 is configured to send and receive messages to other nodes in a computer network, and more specifically, to receive flows of data packets from those other nodes, as previously described. Each data packet in each flow is timestamped with a corresponding ingress TS indicating a time of receipt at network node 100.
  • the IPG determi nation module/unit 112 is configured to calculate the IPG N value between two consecutively re ceived data packets of a flow. Particularly, in one embodiment, the IPG determination mod ule/unit 112 determines the IPG N values using Equation (2), as previously described.
  • the IPGw determination module/unit 114 is configured to calculate the IPGw values as an exponential weighted moving average (EWMA) of the IPG N values in a flow, as previously described. More specifically, one embodiment of the present disclosure determines the IPGw values using Equa tion (3), as previously described.
  • the HH determination module/unit 116 is configured to ana lyze the calculated IPGw values for the incoming flows, and based on that analysis, classify a given flow as being an HH flow, as previously described.
  • the flow table update module/unit 118 is configured to determine whether the flow ID for a given incoming flow of data packets already exists in a flow table (e.g., hash table 60), and to update the flow table based on that determina tion.
  • a flow table e.g., hash table 60
  • the flow table update module/unit 118 is configured to update the IPGw and TSL values for Flow IDs that already exist in hash table 60, as previously described. If the Flow ID for the given flow does not exist in the flow table, however, the flow table update module/unit 118 is configured to either insert a tuple (Flow ID, IPGw, TSL) for the given flow into the table (if a free slot in the flow table exists), or to overwrite the information associated with a flow that is not considered to be a HH flow with a new tuple (Flow ID, IPGw, TSL) associated with the re ceived data packet (if a free slot does not exists the flow table).
  • Flow ID, IPGw, TSL tuple
  • the present embodiments may, of course, be carried out in other ways than those spe cifically set forth herein without departing from essential characteristics of the embodiments.
  • the embodiments described herein discuss the present disclosure in terms of a “flow.”
  • the concept of a “flow” is not restricted to a particular definition of flow in a packet network stream.
  • the methods de scribed herein are suitable for use with any of the commonly used granularities of packet flows including, but not limited to, all packets between a source IP and destination IP, between a source IP and destination IP and specific transport protocol ports, all packets from a specific source IP, and the like.
  • the present embodiments are therefore to be considered in all respects as illustrative and not restrictive, and all changes coming within the meaning and equivalency range of the appended embodiments are intended to be embraced therein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A network node (100) configured to monitor the traffic in a computer network receives data streams from other nodes in the network. Each data stream can have multiple data flows with each flow having a plurality of data packets. The network node uses an Inter-Packet Gap (IPG)-based analysis to detect and classify which data flows in those data streams are Heavy Hitter (HH) flows. Additionally, the network node maintains the flow state for each incoming flow while reducing complexity and resource usage, and increasing the accuracy with which HH flows are detected.

Description

HEAVY HITTER FLOW CLASSIFICATION BASED ON INTER-PACKET GAP ANALYSIS
TECHNICAL FIELD
The present disclosure relates generally to monitoring traffic in computer networks, and more particularly, to detecting Heavy Hitters (HH) in network streams transmitted over computer networks.
BACKGROUND
Network traffic monitoring is crucial in order to effectively identify and resolve network is sues before they get worse. For example, network traffic monitoring generally helps to prevent outages that could cause network bottlenecks, as well as to correct network outages soon after they occur. Additionally, network traffic monitoring allows network operators to identify security threats, satisfy service level agreements (SLAs) in place with subscribers, and facilitates the ability of network devices to make re-routing decisions, and the like.
One common way to detect traffic outliers or unusual traffic is Heavy Hitter (HH) detec tion. A “Heavy Hitter” commonly refers to an entity that is the most powerful and influential among other entities. In terms of computer networking, and as used herein, a Heavy Hitter (HH) refers to a packet flow carrying a significant amount of traffic in terms of the number of packets and/or bytes (i.e. throughput in bits per second - bps) over a network link of a given bandwidth capacity (in bps). The ability to detect these heavy flows, also commonly referred to as “ele phant flows,” is fundamental to many network management and security applications such as Denial of Service (DoS) attack prevention, flow-size aware routing, and Quality of Service (QoS) management.
HH detection techniques focus on the timely and accurate detection of a set of heavy flows. That is, HH detection should occur within a very short time and with a high degree of ac curacy (i.e. detecting as many true positives HH flows as possible and avoiding false positives). Additionally, HH techniques should use as few memory resources as possible (e.g., maintain a limited amount of state information), have a low computation complexity (e.g., require few and simple calculations), and minimize communication overhead (e.g., send and receive collected data or network activity reports over out-of-band (OOB) networks).
Currently, there are many commercially available software products for detecting HH flows. For example, iOS NetFlow by CISCO and sFlow by sFlow.org are two traditional sam pling-based network traffic measurement approaches commonly used to detect HH. In opera tion, both approaches randomly sample network packets in which a network device (e.g., a router or a switch) processes only one randomly selected packet among n sequential packets in a flow. However, while the use of low sampling rates reduce packet processing overhead, data collection bandwidth, and the amount of data that needs to be collected, they sacrifice estima tion accuracy. Other known approaches to HH detection aim to improve accuracy, while at the same time, reduce memory consumption. One such method, referred to as a “sample-and-hold” method, is described in an article authored by C. Estanand and G. Varghese entitled, “New di rections in traffic measurement and accounting,” ACM Trans. Computer Systems, 21 (3), 2003. These “sample-and-hold” methods count every packet in a sampled flow, in contrast to the ap proaches used by NetFlow, which count only those packets that are sampled.
Streaming algorithms are alternate methods to the packet sampling approaches dis cussed above. Streaming algorithms employ data structures with a bounded memory size and process every packet in a flow rather than only sampled packets.
Sketching algorithms, such as those employed by Count-min sketch and Count sketch methods, provide yet another approach to network traffic monitoring. These particular ap proaches, which are respectively described in papers authored by G. Cormode and S. Muthuk- rishnan (“An improved data stream summary: The count-min sketch and its applications,” Jour nal of Algorithms, 55(1):58-75, 2005), and M. Charikar, K.Chen and M. Farach-Colton (“Finding frequent items in data streams,” In Springer ICALP, 2002), are efficient algorithms that employ sublinear space for counting the streams of packets. More particularly, for every incoming pack et, a number of hash functions are applied on the packet headers for indexing. The counters at every hashed location increase, and the minimum value among all the hashed locations is de termined for a given bounded time. One drawback of these algorithms are hash collisions, which can cause the over-counting of packets and thus, negatively impact accuracy.
Unlike sketching approaches, a space saving algorithm, such as the one described in the paper authored by A. Metwally, D. Agrawal, and A. EIAbbadi and entitled “Efficient computa tion of frequent and top-k elements in data streams,” In International Conference on Database Theory, Springer, 2005, maintain both a key and a count for each incoming packet. Particularly, when a packet arrives, these algorithms check to determine whether its corresponding flow en try is stored in a table in memory. If so, the algorithm increments the corresponding by 1 . When the table is full, the algorithm replaces the table entry for the flow having the minimum count with a new flow entry with a count value equal to the minimum count + 1 . However, searching for the minimum count in the table for each incoming packet increases the overall processing overhead.
HashPipe algorithms, such as those described in the paper authored by Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrishnan, S., and Rexford, J. entitled “Heavy-hitter detec tion entirely in the data plane,” 2017, is a modified space saving algorithm (tailored to program mable switches based on P4/PISA) that maintains multiple stages of hash tables to reduce the number of memory reads. Outdated information is difficult to remove in such HashPipe ap proaches, however, which affects accuracy.
The Elastic Sketch data structure, described in a paper authored by T. Yang, J. Jiang, P. Liu, Q. Huang, J. Gong, Y. Zhou, R. Miao, X. Li, and S. Uhlig and entitled “Elastic sketch: Adap- tive and fast network-wide measurements,” In Proceedings of the 2018 ACM SIGCOMM Con ference. ACM, 2018, is more accurate than the HashPipe approach and easily removes the outdated information associated with that approach. This approach is based on counting buck ets, with each bucket having multiple layers, and storing three pieces of information: flow ID, positive votes, and negative votes.
Therefore, conventional methods have used various efficient data structures and algo rithms to detect HH flows with high accuracy, while at the same time, reducing memory con sumption and overhead. However, most, if not all, of these conventional methods rely funda mentally on counting packets to detect HH flows. Maintaining such packet/traffic counters, in a switch hardware or a network node, for example, can be inherently problematic. Particularly, such methods can negatively impact memory consumption, communication and processing overhead, accuracy, and timeliness.
Some examples of the issues associated with conventional approaches include:
1 . Counter Overflow: With conventional methods, the counters can overflow more frequently. Such overflow can occur especially when a network device processes hundreds of millions of packets every second;
2. Counter Size: The size of the counters should be large enough to prevent the frequent intervention of a controller to flush all counters to 0. However, large counter sizes oc cupy excessive memory.
3. High Complexity: There is a high level of complexity in the data structures used by conventional methods. Therefore, in order to determine a current HH flow sta tus/classification, a more complex method is needed to update and remove outdated and/or ir relevant information;
4. Memory: With conventional methods, it is more difficult to simultaneously flush all memory of the data structures used for counting. This is especially true for time-windows span ning small time periods where the status of the counters typically change very quickly;
5. Detection and Resource Usage: Both the time needed for detection, and the communication overhead, are large when management nodes (i.e. controllers) run the HH de tection algorithm;
6. Accuracy: The accuracy of sliding window approaches in detecting HH flows (i.e., those that focus on an amount of data transmitted/received over a pre-determined time window Tw) is low.
In most HH detection algorithms, counter overflow is an especially common problem. This issue, however, can generally be avoided by resetting the counters at regular intervals. Not only do regular counter resets avoid overflow, but they also remove outdated or irrelevant in formation. Such information is not required by a device to determine a current status of the HH flows, and can lead to falsely detecting an HH flow. A solution to the counter overflow issue is presented in a paper authored by B. Turkovic, J. Oostenbrink, and F. Kuipers entitled “Detecting Heavy Hitters in the Data-plane,” arXiv pre print arXiv:1902.06993 (2019). In this paper, the authors present a sliding window technique used to detect HH flows in which the sliding window contains information only for the last N packets. Using this technique, there is no need to reset counters at regular time intervals. How ever, there is a trade-off. Particularly, the technique presented in this paper requires large amounts of memory to maintain the N number of flows.
To reduce the frequency of counter resets, or the intervention of a controller, the coun ters can be set to a larger size, such as 32 bits, for example. As stated above, however, larger counter sizes occupy more memory space. This can be a particularly cumbersome problem - especially in switch-based Application Specific Integrated Circuits (ASICs) where the memory available to store stateful information is very limited.
Outdated information occupies memory space and leave less space for HHs. However, even with counter resets, removing the outdated and/or irrelevant information from data struc tures, such as those associated with inactive flows and counters, is difficult. Such issues are detailed in the paper authored by Sivaraman, V., Narayana, S., Rottenstreich, O., Muthukrish- nan, S., and Rexford, J. entitled “Heavy-hitter detection entirely in the data plane,” 2017. Partic ularly, it should be possible to flush the counters to 0, or techniques should use complex data structures. See T. Yang, J. Jiang, P. Liu, Q. Huang, J. Gong, Y. Zhou, R. Miao, X. Li, and S. Uhlig, “Elastic sketch: Adaptive and fast network-wide measurements,” In Proceedings of the 2018 ACM SIGCOMM Conference. ACM, 2018.
Similarly, duplicate entries in most packet count based streaming algorithms, such as HashPipe and Elastic sketch, are difficult to remove or merge quickly. This, too, can negatively impact memory consumption and the overall accuracy of detecting HH flows.
Thus, conventional HH detection techniques have limitations. For example, convention al techniques typically consume large amounts of memory in order to maintain a traffic volume state. Conventional techniques also require the frequent intervention of a controller component, which undesirably increases the overall communication and processing overhead. Further, with some conventional techniques, HH flows detected in previous time window can be lost, and thus, need to be detected again. The need for such “re-detection” of the same HH flow can in crease detection time and decrease accuracy.
SUMMARY
The embodiments of the present disclosure employ an Inter-Packet Gap (IPG)-based analysis to detect and classify a given data packet flow received at a network node as being an HH flow. More specifically, the embodiments herein improve HH flow classification using low complexity methods and limited memory consumption, and reduce the times the network node needs to detect an HH flow, while greatly increasing the accuracy of HH flow detection. In a first aspect, the present disclosure provides a method, performed by a network node, determined whether a flow of data packets in a network stream is a Heavy Hitter (HH) flow. In this aspect, the method comprises receiving a flow of data packets at a network node, wherein each data packet has an ingress timestamp (TS) indicating a time at which the data packet was received, determining one or more Inter-Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets, and determining the flow to be an HH flow based on an analysis of the IPG values.
In a second aspect, the present disclosure provides a network node comprises commu nications circuitry configured to communicate with one or more network nodes, and processing circuitry operatively connected to the communications circuitry. The processing circuitry is con figured to receive a flow of data packets, wherein each data packet has an ingress timestamp (TS) indicating a time at which the data packet was received, determine one or more Inter- Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets, and determine the flow to be an HH flow based on an analysis of the IPG values.
In a third aspect, the present disclosure provides a non-transitory computer-readable medium stores a computer program thereon. The computer program comprises instructions that, when executed by processing circuitry of a network node, causes the network node to re ceive a flow of data packets, wherein each data packet has an ingress timestamp (TS) indicat ing a time at which the data packet was received, determine one or more Inter-Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets, and determine the flow to be an HH flow based on an analysis of the IPG values.
In a fourth aspect, the present disclosure provides a computer program comprises exe cutable instructions that, when executed by a by processing circuitry of a network node, causes the network node to receive a flow of data packets, wherein each data packet has an ingress timestamp (TS) indicating a time at which the data packet was received, determine one or more Inter-Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets, and determine the flow to be an HH flow based on an analysis of the IPG values.
In a fifth aspect, the present disclosure provides a carrier containing the computer pro gram according to the fourth aspect. The carrier is one of an electronic signal, optical signal, radio signal, or computer readable storage medium.
In a sixth aspect, the present disclosure provides a method, performed by a network node, for determining whether a flow of data packets in a network stream is a Heavy Hitter (HH) flow. In this aspect, the method comprises receiving a plurality of data packet flows at a network node, wherein each flow comprises a plurality of incoming data packets, and wherein each in coming data packet has an ingress timestamp (TS) indicating a time at which it was received at the network node, determining an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet, and determining the flow to be an HH flow based on an analysis of the IPG values.
In a seventh aspect, the present disclosure provides a network node comprising com munications circuitry configured to communicate with one or more network nodes, and pro cessing circuitry operatively connected to the communications circuitry. In this aspect, the pro cessing circuitry is configured to receive a plurality of data packet flows at a network node, wherein each flow comprises a plurality of incoming data packets, and wherein each incoming data packet has an ingress timestamp (TS) indicating a time at which it was received at the network node, determine an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet, and determine the flow to be an HH flow based on an analysis of the IPG values.
In an eight aspect, the present disclosure provides a non-transitory computer-readable medium storing a computer program. The computer program comprises instructions that, when executed by processing circuitry of a network node, causes the network node to receive a plu rality of data packet flows at a network node, wherein each flow comprises a plurality of incom ing data packets, and wherein each incoming data packet has an ingress timestamp (TS) indi cating a time at which it was received at the network node, determine an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet, and determine the flow to be an HH flow based on an analysis of the IPG values.
In a ninth aspect, the present disclosure provides a computer program comprising exe cutable instructions that, when executed by a by processing circuitry of a network node, causes the network node to receive a plurality of data packet flows at a network node, wherein each flow comprises a plurality of incoming data packets, and wherein each incoming data packet has an ingress timestamp (TS) indicating a time at which it was received at the network node, determine an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet, and determine the flow to be an HH flow based on an analysis of the IPG values.
In a tenth aspect, the present disclosure provides a carrier containing the computer pro gram according to the ninth aspect. The carrier is one of an electronic signal, optical signal, ra dio signal, or computer readable storage medium.
BRIEF DESCRIPTION OF THE DRAWINGS
Figures 1A-1B illustrate respective mouse and elephant flows suitable for use with em bodiments of the present disclosure. Figures 2A-2B illustrate exemplary data structures suitable for implementation with em bodiments of the present disclosure.
Figures 3A-3C illustrate the IPG-based HH classification method applied to a sample trace according to one embodiment.
Figures 4A-4C are graphs illustrating that IPGw values calculated according to the pre sent embodiments indicate an HH flow.
Figure 5 is a flow diagram illustrating a method for determining whether a given data flow received at a network node is an HH flow according to embodiments of the present disclo sure.
Figure 6 is a flow diagram illustrating a more detailed method for determining whether a given data flow received at a network node is an HH flow according to embodiments of the pre sent disclosure.
Figure 7 graphically illustrates the method of the embodiment described in Figure 6.
Figures 8A-8C are graphs illustrating how the false negative rate and the false positive rate of identifying HH flows relates to the size of a hash table for different numbers of HH flows according to embodiments of the present disclosure.
Figure 9 is a functional block diagram of a network node configured according to one embodiment of the present disclosure.
Figure 10 illustrates a computer program product executing on the processing circuitry of a network node according to one embodiment of the present disclosure.
DETAILED DESCRIPTION
Referring now to the drawings, exemplary embodiments of the present disclosure pro vide a technique for using Inter Packet Gap (IPG) analysis to identify Heavy Hitter (HH) flows rather than using packet count approaches, as in conventional methods. According to the pre sent embodiments, a Heavy Hitter flow can be characterized by small IPG values (i.e., the elapsed time intervals between two consecutively received data packets). By definition, the throughput (i.e., heaviness) of a packet flow can be approximated by dividing the average pack et size by the average IPG value (e.g. 1 KB packet every 1 ms equals 8Mbps).
According to the present disclosure, one way to calculate an IPG value is to determine the time differences between the ingress timestamps of consecutively received packets in a flow. An IPG metric for the flow is then determined as function (e.g., an average) of those de termined IPG values. For example, one embodiment of the present disclosure utilizes an expo nential weighted moving average (EWMA) of the IPG values in a flow to determine the IPG met ric for the flow. This metric is then analyzed over a period of time in order to determine whether the flow should or should not be classified as a HH flow. The computed IPG metric can also be used beyond such a “binary” classification of a HH flow (i.e., is or is not a given flow an HH flow). For example, in one embodiment, the present disclosure also provides an approximate ranking of multiple flows with respect to their throughput or “heaviness.”
According to the present disclosure, two pieces of information are maintained per target flow: (1) the EWMA (i.e., the “IPGw”) calculated based on the IPG values; and (2) the ingress timestamp (TS) of the last received packet in the flow. There are various ways to represent these parameters in terms of memory size; however one approach is to allocate 32-bits to each of the IPGw and TS values. Another approach would be to use a compact number representa tion (e.g., 16-bit integer conversion format adapted from IEEE 754 or bfloatl 6). This latter ap proach be more space-efficient than allocating 32-bits, and would also experience some degra dation in precision. However, it would still provide the requisite range and overall effectiveness of the method.
Additionally, according to the present disclosure, HH flows can be classified into ranges of IPG metrics, with each range corresponding to a “bucket” identified by a HH Bucket ID (e.g. an 8-bit identifier). Each HH Bucket ID is unique and mapped to a corresponding IPGw values (e.g., a mean value within the IPG range of the HH Bucket ID) used for IPGw calculations.
Embodiments of the present disclosure provide advantages and benefits that conven tional methods of HH flow detection do not provide. One such advantage is that IPG based im plementation is a very lightweight approach when compared to the traditional packet count ap proaches. Particularly, conventional counter based HH flow detection generally requires 32-bit counters in order to maintain the requisite information in a hash table. The IPG-based embodi ments of the present disclosure, in contrast, are able to detect HH flows by maintaining 16-bits or less for the TS parameters and 8-bits or less for each HH Bucket ID. Further, the present embodiment do not introduce any additional limitations in terms of time windows, or require a controller to reset counters. Moreover, different, more efficient data structures and techniques are possible with the present embodiments, thereby making IPG-based HHs detection more flexible in terms of memory and accuracy.
Additionally, the IPG-based approaches of the present disclosure are suitable for opera tion with most of the existing algorithms and data structures used to detect HH flows with or without minor modifications. Further, the present IPG-based methods improve accuracy, detec tion time, and reduce the overhead associated with computation, memory, and communication.
The embodiments provided by the present disclosure are also suitable for implementa tion in emerging programmable networking devices, such as Protocol Independent Switch Ar chitecture (PISA) switches supporting the P4 language, and Smart Network Interface Cards (SmartNICs) that may exist in end systems. This is in addition to being implemented in tradi tional computing systems based on general purpose processors (x86).
Further, as stated previously, conventional methods that utilize a packet count approach are required to periodically and regularly reset certain count values. With other conventional methods, the removal of duplicate entries from memory is a challenging task. The present em- bodiments, however, do not require the reset of IPGw values at regular intervals to prevent overflow or remove outdated information. Therefore, the present embodiments decrease detec tion time and complexity, and increase accuracy.
Turning now to the drawings, embodiments of the present disclosure utilize IPG analy sis, sometimes referred to as “inter-packet time,” as an effective technique to detect HH flows.
In particular, Figures 1 A-1B illustrate two types of packet flows being received at a network de vice - a so-called “mouse” flow 10 (Figure 1 A) and a so-called “elephant” flow 20 (Figure 1 B). Each flow comprises a plurality of received data packets with corresponding IPGs between con secutively received packets. For example, mouse flow 10 shows data packets 12-18 having cor responding IPGs IPG1...IPG3. Elephant flow 20, however, shows data packets 22-32 having corresponding IPGs IPG1...IPG5.
Figures 1A-1B illustrate how the number of packets in a given flow relate to the IPG- based method of the present embodiments. A flow such as mouse flow 10 having a small num ber of packets with relatively large IPGs over a given time window may not be considered to be an HH flow. However, a flow such as elephant flow 20 having a large number of packets with small IPG values over a given time window can be considered to be an HH flow. Particularly, both the large number of data packets 22-32 in the elephant flow 20 and the relatively small IPG values IPG1...IPG5 are indicators of a high throughput flow, i.e. a HH flow.
Figures 2A-2B illustrate hash tables for maintaining a per-flow state for packet data flows, such as mouse flow 10 and elephant flow 20, according to the present embodiments. In particular, hash table 50 of Figure 2A maintains two variables. Variable k, which is optional, rep resents an identifier (ID) or an IP source address of a 5-tuple flow. Variable c, which is manda tory, indicates a total number of packets in a flow received over a given period of time f (e.g., a pre-defined time window). With hash table 50, a network device counting the data packets re ceived in a given flow k would increment cwith each counted packet.
Figure 2B illustrates a hash table 60 configured according to at least one embodiment of the present disclosure. In this embodiment, the network device monitoring the flows maintains hash table 60 with three variables - k, which as described above is an optional variable identify ing either the flow ID or the source address of the given flow, IPGw indicating the weighted IPG metric, and TSL, which is the value of the ingress TS of the last received data packet. As stated previously, the IPGw metric is a value that is calculated as the EWMA of the individual IPGs of a flow (e.g., IPG1...IPG5 of elephant flow 20 seen in Figure 1B) along with the ingress TS of the last received data packet (TSL).
There are different ways to determine the IPGw metric for a given flow according to the present disclosure. However, in one embodiment, the calculations are based on several varia bles:
• the ingress TS of a last received data packet (TSL);
• the ingress TSN of a next consecutively received data packet; • the weighted metric (IPGw) for the flow; and
• the degree of weighting decrease a.
More particularly, in one embodiment, the ingress TS of the last received data packet (TSL) is first set to the value of the ingress TSN of a next consecutively received data packet.
(1) TSL = TSN
Thereafter, each time the next consecutive data packet in the same flow is received at the net work device, the ingress TS of that received data packet (i.e., TSN) is subtracted from the value of TSL to calculate IPGN. In other words, IPGN is a time difference value equal to the elapsed time between the ingress TSs of two consecutively received data packets - the ingress TS of the last received data packet (TSL) and the ingress TS of the next consecutively received data packet (TSN).
(2) IPGN = TSN - TSL
The weighted metric of the IPG (IPGw) is then calculated using the following exponential weighted moving average (EWMA) formula:
(3) IPGw = a IPGW_LAST + (1-a) IPGN where: a is a degree of weighting decrease; and
IPGW_LAST is the last determined IPGw metric for the flow.
The value of ‘a’ plays a significant role for tuning accuracy in the present embodiments. This is evidenced, as seen in more detail below, by the results of using the IPG-based method of the present disclosure on actual traces to detect HH flows. In particular, the results indicate that the value of a can be optimized not only for different time window sizes, but also for differ ent numbers of top-ranked HH flows and/or different HH throughput thresholds. According to the present disclosure, the value of a can be easily adapted at run-time based on one or more dy namic considerations, such as the amount of incoming traffic and measurement parameters.
It should be noted that although this embodiment explains how to calculate IPGw ac cording to Equation (3), the present disclosure is not so limited. There are any number of alter native algorithms that are suitable for use in calculating the IPGw metric for a given flow, and for effectively classifying a data packet flow as being a HH flow based on those calculations.
The method of the present disclosure was tested on actual ISP backbone traffic traces to determine whether a given flow was or was not a HH flow. The traces that were used for test ing can be located at in a paper provided by the Center for Applied Internet Data Analysis (CAIDA) entitled, “The CAIDA UCSD Anonymized Internet Traces 2016 - March,” http://www.caida.org/data/passive/passive_2016_dataset.xml.” The results of those tests are illustrated in the graphs seen in Figures 3A-3C and 4A-4C. In particular, the patterns of IPGN and IPGw were analyzed using equations (1) - (3) above with the actual ISP backbone traffic traces. Among the several recorded traces of 2016, Figures 3A-3B illustrate one sample trace of 48 secs over a 10G interface. There were a total of 20,295,101 data packets in the trace and 693,931 different 5 Tuple flows.
In Figures 3A-3C, the IPG-based method of the present disclosure was implemented in a Python script to process the Comma Separated Value (CSV) file from a target Packet Capture (PCAP) trace. In a first step, an IPGN value is calculated based on equation (2) for every incom ing data packet of each flow. For simplicity, 3-4 random flows were selected for each of three different ranges of data throughput. Figure 3A illustrates the selected flows with throughput > 10 Mbps, while Figures 3B and 3C illustrate the selected flows with throughput between 1-2 Mbps and between 0.1 -0.5 Mbps, respectively.
The subplots in Figure 3A illustrate the IPGN values for different time intervals of a flow where each interval contains 1000 packets. Since most of the packets come in bursts, spikes of up to 12k microseconds at substantially regular intervals can be observed, which indicate Inter Burst Gap (IBG). Generally, the IPGN values for all three selected flows in Figure 3A are below 1k microseconds. Figures 3B and 3C, in contrast, illustrate larger IPGN values with relatively large spikes and decreasing throughput.
Similar patterns are observed for the calculated IPGw values illustrated in Figures 4A- 4C. Particularly, the graphs in these figures confirm that long data packet flows having a high throughput for a fixed time interval exhibit lower IPGw values when compared to the IPGw val ues calculated for low throughput flows.
Figure 5 is a flow diagram illustrating a method 70 for classifying a given flow as an HH flow according to one embodiment of the present disclosure. Method 70, as described herein, is implemented by a network node. However, those of ordinary skill in the art will readily appre ciate that this is for illustrative purposes only. The embodiments of the present disclosure are suitable for implementation in a variety of different network nodes and in various data plane mechanisms, pipelines and data structures supported by the network nodes. This includes both existing/traditional nodes and architectures and emerging programmable architectures.
As seen in Figure 5, method 70 begins with the network node receiving a flow of data packets (box 72). Typically, the network node will receive multiple flows, each having a plurality of corresponding data packets; however, for ease of discussion, method 70 is explained in the context of a single flow. Regardless, the flow of Figure 5 is received in a unidirectional stream of 5-tuple flows, and each data packet in the flow has an ingress TS indicating a time at which it was received at the network node.
Using Equation (2) above, the network node next determines the IPGN value for each IPG between consecutively received data packets (box 74). As previously described, the IPGN value is the elapsed time between two consecutively received data packets. Then, for each de termined IPGN value, the network node uses Equation (3) to calculate the IPGw value for the flow (box 76), and updates that value in hash table 60. Additionally, as previously described, network node also updates other variables in hash table 60 such as the ingress TS of the last received data packet (i.e., TSL), and optionally, the variable k indicating the number of data packets received with the flow. Once the values are calculated, network node determines whether the given flow is an HH flow based on the IPG values (box 78).
As previously described, a HH flow can be detected based on various strategies. In one embodiment, for example, the network node compares the calculated IPGw value to a pre defined IPGw threshold each time the IPGw value is updated. If the calculated IPGw value ex ceeds the IPGw threshold, the flow would be classified as an HH flow.
In another embodiment, the network node ranks the flow against other flows received in a network stream. Particularly, as described above, data packet flows having a high throughput for a fixed time interval exhibit lower IPGw values than do packet data flows having a low throughput. Therefore, “high throughput” flows having lower IPGw values would typically be at the “bottom” of the ranked IPGw values. With this embodiment, the network node may deter mine that the flows associated with the r-most “bottom-ranked” IPGw values (i.e., where ‘r’” is a predetermined integer) are HH flows.
In yet another embodiment, network node determines if a given flow is an HH flow based on IPGw value metrics. In more detail, network node may rank the IPGw values, as previously described. If the ranked IPGw value for a given flow falls within a predetermined percentile of all ranked flows (e.g., if the ranked IPGw value for the given flow is in the “bottom” 5% of the ranked IPGw values), network node may classify the given flow as an HH flow.
Figure 6 is a flow diagram of a method 80, implemented at a network node, illustrating the present embodiments in more detail. Figure 7 illustrates, graphically, the steps performed by network node in Figure 6. In this embodiment, the IPG-based method of the present disclosure is applied using a simple memory efficient data structure influenced by a space saving algo rithm, such as the one described in the paper by A. Metwally, D. Agrawal, and A. EIAbbadi, and entitled “Efficient computation of frequent and top-k elements in data streams,” In International Conference on Database Theory, Springer, 2005. In particular, this embodiment assigns a finite number of memory slots called a flow table (e.g., hash table 60), in order to maintain a per-flow state.
As seen in Figure 6, the network node first receives the data packets of one or more in coming flows (box 82), and determines the 5-Tuple flow ID for each flow and the ingress TS of each incoming data packet for each flow (box 84). The flow IDs, as stated above, may be de termined from information associated with the received flows. Then, for a given flow, network node searches hash table 60 for the flow ID of the given flow (box 86). If the flow ID is already in hash table 60, network node calculates the IPGw value using equations (2) and (3) above (box 88), and updates the memory slot in hash table 60 corresponding to the flow ID accordingly. More particularly, network node updates the IPGw value in hash table 60 to be the newly calcu- lated IPGw value, and sets the ingress TS of the last received data packet TSL to the ingress TS of the received data packet TSN (box 90).
However, if the search for the flow ID reveals that it is not already in hash table 60 (box 86), network node will determine whether hash table 60 has a “free” or “open” slot (box 92). If there are no free slots in hash table 60, network node locates the slot having the highest IPGw value and replaces or “overwrites” the information that slot. As described above, flows that are not considered to be HH flows (e.g., “HH flow candidates”) will have IPGw values that are higher than flows that are considered to be HH flow candidates. Therefore, because it is not consid ered to be a HH candidate, network node replaces the IPGw and TSL values of the entry having the highest IPGw value in hash table 60 with the newly calculated IPGw and TSN values of the received data packet (box 94). If a free slot is available, however, network node simply inserts the flow ID and IPGw values associated with the received data packet into the free slot, and sets the TSL value of that slot to the TSN of the received data packet (box 96). Network node then follows these same steps for each incoming data packet in each flow, and determines, based on the information stored in hash table 60 (e.g., their respective IPGW values), which of the flows should be classified as HH flows.
As previously stated, the IPG-based method of the present disclosure is suitable for im plementation in most of the existing HH detection mechanisms and network device data path pipelines. Figures 8A-8C graphically illustrate the results of one such implementation.
Particularly, validation testing of the IPG-based method described herein followed the approach described in the paper by A. Metwally, D. Agrawal, and A. EIAbbadi, and entitled “Ef ficient computation of frequent and top-k elements in data streams,” In International Conference on Database Theory, Springer, 2005. This approach very accurately ranks and reports both the “top” k flows and the frequent elements with small margin of error. Then, similar to the ap proaches described in existing networking literature, a long CAIDA trace, such as described in the paper provided by the Center for Applied Internet Data Analysis (CAIDA) entitled, “The CAIDA UCSD Anonymized Internet Traces 2016 - March,” http://www.caida.org/data/passive/passive_2016_dataset.xml,” was used as test data. The data was split into small chunks based on time window sizes, such as detailed in Table 1 below. For each time window, the tests used about 50 chunks. Each chunk is considered to be one “trail,” and each data point in the graphs illustrated in Figures 8A-8C represent the average of 50 trails. Additionally, three different sizes of time windows are used so as to better evaluate how the IPG-based HH detection method of the present disclosure performs for different time windows.
Further, in Figures 8A-8C, the IPG-based HH detection method was implemented in a Python-based simulator that reads each incoming packet of multiple, different flows. For a data structure (e.g., hash table 60), the test utilized an array of n memory slots, with each memory slot storing a tuple (Flow ID, IPGw, TSL) for a corresponding one of the incoming flows. As was seen in Figures 5-7, upon receiving a data packet for a given flow, network node checks the hash table 60 to determine whether an entry already exists for that flow. If so, the IPG-based HH detection method of the present disclosure updates the corresponding memory slot with new ingress timestamp (i.e., TSL = TSN) and IPGw values. If the flow does not exist and space is available in the flow table, network node inserts the tuple (Flow ID, IPGw, TSL) as a new entry. If the hash table 60 is full, and the flow ID does not exist in the hash table 60, the present embodiments replace the Flow ID in the slot having the maximum IPGw value with the Flow ID of the incoming packet flow, but keeps the same IPGw. After processing about 50 chunks for each time window, the following accuracy results in terms of false positives and false negatives were obtained.
Figures 8A-8C illustrate the False Negative rate (i.e. the number of true HH flows not re ported as top-k HH divided by all analyzed flows) and False Positive rate (i.e. the number of non-HH flows falsely reported as top-k HH divided by all analyzed flows) as a function of an in creasing number of memory slots for different numbers of top-k reported HH flows to be identi fied. As seen in these figures, the false negatives and false positives decrease as more memory slots are allocated. Thus, there is a relationship between the size of the hash table 60 and the rate at which HH flows are correctly or incorrectly classified.
More particularly, Figures 8A(iii), 8B(vi), and 8C(ix) illustrate the False Negative rate for 50, 100, and 200 top flows for selected trace flows #70, #120 and #220, respectively. Based on these figures, it is seen that the false negative rate reduces up to 0.04. This indicates that the IPG-based approach of the present embodiments can miss detecting flows at the tail end of the top k flows where the difference in throughput between the different flows is very small. Hence, when flows are over reported to detect the top k flows (e.g., the top k+20 rows are reported), the false negative rate decreases. Fine tuning of the a parameter value can help to detect HHs at the tail end of top ‘k’ flows, or a small number of over-reported flows to detect k flows can dras tically increase accuracy.
The false negative rate for different sizes of time window also confirms that the IPG- based HHs detection and classification method of the present disclosure can detect HH flows with high accuracy among a various number of total flows such as 15,000, 45,000 and 70,000 flows.
Size of Time Total No. of 5 Total No. of Pack
Window Tuple Flows ets
100 msec 15000 50000 500 msec 45000 270000 1000 msec 70000 530000 Table 1. Split of a CAIDA Long Traffic Trace collected into small chunks based on time windows of different sizes
Accordingly, the IPG-based method for HH flow detection according to the present dis closure provides at least the following advantages when compared to conventional methods for detecting HH flows:
• Compatibility and Simplicity: The IPG based approach of the present disclosure is easy to implement and is suitable for use with most existing algorithms and data plane tech nologies for detecting HH flows. Additionally, the present embodiments reduce the com plexity needed for detecting HH flows, which translates into the use of fewer computa tional and communication resources. For example, by using compact timestamps and low IPG metrics encoding as a means and indicator for detecting and identifying HH flows, the present embodiments utilize fewer bits than conventional packet counter ap proaches that use 32-bits to prevent counter overflow. Additionally, while most existing memory-efficient HH detection solutions focus only on a low memory footprint, they ig nore the number of memory accesses (typically just one read/modify/write per data structure).
• Data plane only without controller intervention: As previously stated, the present embod iments do not require resetting the IPGw metric at regular time intervals, as do conven tional solutions based on counting packets. Thus, the present embodiments mitigate the frequent intervention of a controller, which contributes to improved accuracy and lower communication costs. Further, the ability to have a standalone networking device detect an HH flow merely by inspecting a current IPGw metric associated with the flow allows for the timely execution of data plane-only actions to handle the HH flow (e.g., apply QoS policy such as rate-limit or queue selection based on IPGw metric).
• Accurate HH flow ranking: Moreover, the present embodiments move beyond the simple binary classification of an HH flow (i.e. , is/is not a given flow an HH flow) based on a pre configured packet count threshold over a given time window. Particularly, the IPG-based method of the present disclosure provides a native way to rank flows based on their IPGw metrics. While some conventional counter-based approaches also rank flows, none use a memory efficient approach based on probabilistic data structures (e.g., hash ing-based, Bloom filter like). Moreover, conventional approaches, such as those that are based on the so-called “sketching algorithms,” are not capable of reporting arbitrary rankings of top HH flows.
An apparatus can perform any of the methods herein described by implementing any functional means, modules, units, or circuitry. In one embodiment, for example, the apparatus comprises respective circuits or circuitry configured to perform the steps shown in the method figures. The circuits or circuitry in this regard may comprise circuits dedicated to performing cer- tain functional processing and/or one or more microprocessors in conjunction with memory. For instance, the circuitry may include one or more microprocessor or microcontrollers, as well as other digital hardware, which may include Digital Signal Processors (DSPs), special-purpose digital logic, and the like. The processing circuitry may be configured to execute program code stored in memory, which may include one or several types of memory such as read-only memory (ROM), random-access memory, cache memory, flash memory devices, optical stor age devices, etc. Program code stored in memory may include program instructions for execut ing one or more telecommunications and/or data communications protocols as well as instruc tions for carrying out one or more of the methods described herein, in several embodiments. In embodiments that employ memory, the memory stores program code that, when executed by the one or more processors, carries out the methods described herein.
Figure 9, for example, is a block diagram of some functional components of a network node 100 configured according to one embodiment of the present disclosure. The network node 100 can be configured to implement the procedures and methods for HH flow classification as herein described and comprises processing circuitry 102, memory 104, and communications circuitry 108.
The communication circuitry 108 comprises interface circuitry for communicating with other network nodes in a computer network. In particular, the incoming data packets of a plurali ty of flows are received by communication circuitry 108. Processing circuitry 102 controls the overall operation of the network node 100 and is configured to implement the procedures shown in Figures 5-7. The processing circuitry 102 may comprise one or more microprocessors, hardware, firmware, or a combination thereof configured to perform methods shown in Figures 5-7.
Memory circuitry 104 comprises both volatile and non-volatile memory for storing com puter program code and data needed by the processing circuitry 102 for operation. Memory cir cuitry 104 may comprise any tangible, non-transitory computer-readable storage medium for storing data including electronic, magnetic, optical, electromagnetic, or semiconductor data storage. Memory circuitry 104 stores a computer program 106 comprising executable instruc tions that configure the processing circuitry 102 to implement the methods illustrated and dis cussed with respect to Figures 5-7. A computer program in this regard may comprise one or more code modules corresponding to the means or units described above. In general, computer program instructions and configuration information are stored in a non-volatile memory, such as a ROM, erasable programmable read only memory (EPROM) or flash memory. Temporary da ta generated during operation may be stored in a volatile memory, such as a random access memory (RAM). In some embodiments, computer program 106 for configuring the processing circuitry 102 as herein described may be stored in a removable memory, such as a portable compact disc, portable digital video disc, or other removable media. The computer program 106 may also be embodied in a carrier such as an electronic signal, optical signal, radio signal, or computer readable storage medium.
Figure 10 illustrates a computer program product, such as computer program 106, exe cuting on the processing circuitry 102 of network node 100 according to one embodiment of the present disclosure. As seen in Figure 10, computer program 106 comprises a communications module/unit 110, an IPG determination module/unit 112, an IPGw determination module/unit 114, a Heavy Hitter (HH) determination module/unit 116, and a flow table update module/unit 118.
The communications module/unit 110 is configured to send and receive messages to other nodes in a computer network, and more specifically, to receive flows of data packets from those other nodes, as previously described. Each data packet in each flow is timestamped with a corresponding ingress TS indicating a time of receipt at network node 100. The IPG determi nation module/unit 112 is configured to calculate the IPGN value between two consecutively re ceived data packets of a flow. Particularly, in one embodiment, the IPG determination mod ule/unit 112 determines the IPGN values using Equation (2), as previously described. The IPGw determination module/unit 114 is configured to calculate the IPGw values as an exponential weighted moving average (EWMA) of the IPGN values in a flow, as previously described. More specifically, one embodiment of the present disclosure determines the IPGw values using Equa tion (3), as previously described. The HH determination module/unit 116 is configured to ana lyze the calculated IPGw values for the incoming flows, and based on that analysis, classify a given flow as being an HH flow, as previously described. The flow table update module/unit 118 is configured to determine whether the flow ID for a given incoming flow of data packets already exists in a flow table (e.g., hash table 60), and to update the flow table based on that determina tion. Particularly, the flow table update module/unit 118 is configured to update the IPGw and TSL values for Flow IDs that already exist in hash table 60, as previously described. If the Flow ID for the given flow does not exist in the flow table, however, the flow table update module/unit 118 is configured to either insert a tuple (Flow ID, IPGw, TSL) for the given flow into the table (if a free slot in the flow table exists), or to overwrite the information associated with a flow that is not considered to be a HH flow with a new tuple (Flow ID, IPGw, TSL) associated with the re ceived data packet (if a free slot does not exists the flow table).
The present embodiments may, of course, be carried out in other ways than those spe cifically set forth herein without departing from essential characteristics of the embodiments. For example, the embodiments described herein discuss the present disclosure in terms of a “flow.” However, those of ordinary skill in the art should appreciate that the concept of a “flow” is not restricted to a particular definition of flow in a packet network stream. Rather, the methods de scribed herein are suitable for use with any of the commonly used granularities of packet flows including, but not limited to, all packets between a source IP and destination IP, between a source IP and destination IP and specific transport protocol ports, all packets from a specific source IP, and the like. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, and all changes coming within the meaning and equivalency range of the appended embodiments are intended to be embraced therein.

Claims

CLAIMS What is claimed is:
1 . A method (70), performed by a network node (100), for determining whether a flow of data packets in a network stream is a Heavy Hitter (HH) flow, the method comprising: receiving (72) a flow of data packets at a network node, wherein each data packet has an ingress timestamp (TS) indicating a time at which the data packet was received; determining (74) one or more Inter-Packet Gap (IPG) values for the flow, wherein each IPG value is a time difference between the ingress TSs of two consecutive data packets; and determining (78) the flow to be an HH flow based on an analysis of the IPG values.
2. The method of claim 1 wherein each of the one or more IPG values is determined by:
IPGN = TSN — TSN-I where: N is a count of the total number of data packets of the flow received at the network node;
IPGN is the IPG between the N,h data packet received at the network node and the N-1 data packet received at the network node;
TSN-I is the ingress TS of a first data packet received at the network node; and TSN is the ingress TS of a second data packet received at the network node consec utively after the first data packet.
3. The method of claims 1 and 2 further comprising determining (76) a weighted IPG (IPGw) value for the flow as a function of an exponential weighted moving average (EWMA) equation:
IPGw = a IPGW_LAST + (1-a) IPGN where: a is a degree of weighting decrease;
I PGN is the IPG between the N,h data packet received at the network node and the N-1 data packet received at the network node; and I PGW_LAST is the last determined IPGw value.
4. The method of claim 3 wherein a is variable at runtime based on one or more parameter values associated with received data packet traffic and measurements on the received data packet traffic.
5. The method of any of claims 1-4 wherein determining the flow to be an HH flow is based on one or more of the IPGw values.
6. The method of any of claims 1-5 wherein determining the flow to be an HH flow is based on a comparison of the IPGw value to a predefined IPGw threshold value.
7. The method of any of claims 1-5 wherein determining the flow to be an HH flow is based on a ranking of the IPGw value of the flow in comparison to the ranking of the IPGw values of one or more other flows being monitored.
8. The method of any of claims 1-5 wherein determining the flow to be an HH flow is based on whether the IPGw value of the flow falls within a pre-defined percentage of HH flows.
9. A network node (100) comprising: communications circuitry (108) configured to communicate with one or more network nodes; and processing circuitry (102) operatively connected to the communications circuitry and config ured to perform any of the claims 1-8.
10. A non-transitory computer-readable medium (104) storing a computer program (106) thereon, the computer program comprising instructions that, when executed by processing cir cuitry of a network node, causes the network node to perform any of the claims 1-8.
11. A computer program (106) comprising executable instructions that, when executed by a by processing circuitry (102) of a network node (100), causes the network node to perform any one of the methods of claims 1-8.
12. A carrier containing a computer program of claim 11 , wherein the carrier is one of an electronic signal, optical signal, radio signal, or computer readable storage medium.
13. A method (80), performed by a network node (100), for determining whether a flow of data packets in a network stream is a Heavy Hitter (HH) flow, the method comprising: receiving (82) a plurality of data packet flows at a network node (100), wherein each flow comprises a plurality of incoming data packets, and wherein each incoming data packet has an ingress timestamp (TS) indicating a time at which it was received at the network node; determining (88) an Inter-Packet Gap (IPG) value for each incoming data packet in each flow, wherein the IPG value is a time difference between the ingress TS of the incoming data packet and the ingress TS of a last received data packet; and determining (78) the flow to be an HH flow based on an analysis of the IPG values.
14. The method of claim 13 further comprising determining (84) a flow ID for each flow, and the ingress TS for each incoming data packet in the flow.
15. The method of claims 13-14 further comprising determining (86) whether the flow ID is in a flow table having a predetermined number of memory slots.
16. The method of claim 15 wherein, for each flow, the IPG value of each incoming data packet is determined responsive to determining that the flow ID of the flow is in the flow table.
17. The method of any of claim 13-16 further comprising, responsive to determining that the flow ID is in the flow table, determining (88) a weighted IPG (IPGw) for each flow as a function of: the IPG values of the incoming data packets in the flow; and a variable a indicating a degree of weighting decrease.
18. The method of any of claims 13-17 further comprising updating (90) a slot in the flow ta ble with the flow ID, the IPGw, and the ingress TS of the incoming data packet responsive to determining that the flow ID is in the flow table.
19. The method of any of claims 13-15 further comprising determining (92) whether the flow table has an open slot responsive to determining that the flow ID is not in the flow table.
20. The method of claim 19 wherein if the flow table has an open slot, the method further comprises writing (96) the flow ID, the IPGw, and the ingress TS of the incoming data packet to the open slot.
21. The method of claim 19 wherein if the flow table does not have an open slot, the method further comprises: locating a candidate slot in the flow table, wherein the candidate slot has an IPGw value that is higher than the other IPGw values in the other slots of the flow table; and overwriting (94) information in the candidate slot with the flow ID, the IPGw, and the ingress TS.
22. The method of any of claims 13-21 wherein, for each incoming data packet in each flow, the IPG value is determined as a function of:
IPGN = TSN — TSN-I where: N is a count of the total number of data packets received at the network node;
IPGN is the IPG between the N,h data packet received at the network node and the N-1 data packet received at the network node;
TSN-I is the ingress TS of a first data packet received at the network node; and TSN is the ingress TS of a second data packet received at the network node consec utively after the first data packet.
23. The method of any of claims 13-22 wherein the IPGw value for each flow is determined as a function of:
IPGw = a IPGW_LAST + (1 -a) IPGN where: a is a degree of weighting decrease and varies based on one or more parameter values associated with data packet traffic received at the network node over a specified time window, and measurements performed on the data packet traffic received at the network node over the specified time window;
IPGN is the IPG between the N,h data packet received at the network node and the N-1 data packet received at the network node; and IPGW_LAST is the last determined IPGw value.
24. The method of any of claims 13-23 wherein, for each flow, the IPGN and IPGw values are analyzed over a pre-determined time window.
25. The method of any of claims 13-24 wherein a flow is an HH flow when the IPGw value for the flow exceeds a pre-determined IPGw threshold value over the pre-determined time win dow.
26. The method of any of claims 13-24 further comprising ranking each of the IPGw values and determining that the flow is an HH flow based on the ranking.
27. The method of claim 26 wherein a flow is an HH flow when the IPGw value for the flow is one of a predetermined number of ranked IPGw values.
28. The method of claim 26 wherein the flow is an HH flow when the IPGw value for the flow falls within a predefined percentage of flows determined to be HH flows.
29. The method of any of the preceding claims wherein each data packet flow is a unidirec tional 5-tuple flow.
30. The method of claim 29 wherein the unidirectional 5-tuple flow comprises: a source IP; a destination IP; a source port; a destination port; and a protocol type.
31 . The method of any of the preceding claims wherein each data packet flow is a unidirec tional n-tuple flow based on a number of data packet header file values.
32. The method of claim 31 wherein data packet header file values comprise a source IP and/or a destination IP.
33. The method of any of the preceding claims wherein a minimum time duration of a data packet flow is defined as a criteria for HH classification.
34. A network node (100) comprising: communications circuitry (108) configured to communicate with one or more network nodes; and processing circuitry (102) operatively connected to the communications circuitry and config ured to perform any of the claims 13-33.
35. A non-transitory computer-readable medium (104) storing a computer program (106) thereon, the computer program comprising instructions that, when executed by processing cir cuitry (102) of a network node (100), causes the network node to perform any of the claims 13- 33.
36. A computer program (106) comprising executable instructions that, when executed by a by processing circuitry (102) of a network node (100), causes the network node to perform any one of the methods of claims 13-33.
37. A carrier containing a computer program of claim 36, wherein the carrier is one of an electronic signal, optical signal, radio signal, or computer readable storage medium.
PCT/IB2021/053738 2020-05-14 2021-05-04 Heavy hitter flow classification based on inter-packet gap analysis WO2021229361A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063024576P 2020-05-14 2020-05-14
US63/024,576 2020-05-14

Publications (1)

Publication Number Publication Date
WO2021229361A1 true WO2021229361A1 (en) 2021-11-18

Family

ID=75888107

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2021/053738 WO2021229361A1 (en) 2020-05-14 2021-05-04 Heavy hitter flow classification based on inter-packet gap analysis

Country Status (1)

Country Link
WO (1) WO2021229361A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114598590A (en) * 2022-05-10 2022-06-07 鹏城实验室 Detection method for stability element and related equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100110922A1 (en) * 2008-10-31 2010-05-06 Kanapathipillai Ketheesan Method and Apparatus for Estimating Channel Bandwidth
US20160205026A1 (en) * 2013-08-19 2016-07-14 Instart Logic, Inc. Method & implementation of zero overhead rate controlled (zorc) information transmission via digital communication link

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100110922A1 (en) * 2008-10-31 2010-05-06 Kanapathipillai Ketheesan Method and Apparatus for Estimating Channel Bandwidth
US20160205026A1 (en) * 2013-08-19 2016-07-14 Instart Logic, Inc. Method & implementation of zero overhead rate controlled (zorc) information transmission via digital communication link

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
"The CAIDA UCSD Anonymized Internet Traces 2016 - March", March 2016, CENTER FOR APPLIED INTERNET DATA ANALYSIS (CAIDA
A. METWALLYD. AGRAWALA. ELABBADI: "International Conference on Database Theory", 2005, SPRINGER, article "Efficient computation of frequent and top-k elements in data streams"
B. TURKOVICJ. OOSTENBRINKF. KUIPERS: "Detecting Heavy Hitters in the Data-plane", ARXIV PREPRINT ARXIV:1902.06993, 2019
BELMA TURKOVIC ET AL: "Detecting Heavy Hitters in the Data-plane", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 19 February 2019 (2019-02-19), XP081030740 *
C. ESTANANDG. VARGHESE: "New directions in traffic measurement and accounting", ACM TRANS. COMPUTER SYSTEMS, vol. 21, no. 3, 2003
G. CORMODES. MUTHUKRISHNAN: "An improved data stream summary: The count-min sketch and its applications", JOURNAL OF ALGORITHMS, vol. 55, no. 1, 2005, pages 58 - 75
M. CHARIKARK.CHENM. FARACH-COLTON: "Finding frequent items in data streams", SPRINGER ICALP, 2002
ROMAN KRZANOWSKI: "Burst (of packets) and Burstiness", 7 October 2006 (2006-10-07), 66th IETF - Montreal, Quebec, Canada, XP055446565, Retrieved from the Internet <URL:https://www.ietf.org/proceedings/66/slides/ippm-10.pdf> [retrieved on 20180131] *
SIVARAMAN, V.NARAYANA, S.ROTTENSTREICH, O.MUTHUKRISHNAN, S.REXFORD, J., HEAVY-HITTER DETECTION ENTIRELY IN THE DATA PLANE, 2017
T. YANGJ. JIANGP. LIUQ. HUANGJ. GONGY. ZHOUR. MIAOX. LIS. UHLIG: "Proceedings of the 2018 ACM SIGCOMM Conference", 2018, ACM, article "Elastic sketch: Adap tive and fast network-wide measurements"

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114598590A (en) * 2022-05-10 2022-06-07 鹏城实验室 Detection method for stability element and related equipment
CN114598590B (en) * 2022-05-10 2022-07-12 鹏城实验室 Detection method for stability element and related equipment

Similar Documents

Publication Publication Date Title
US8339951B2 (en) Method for configuration of a load balancing algorithm in a network device
Harrison et al. Network-wide heavy hitter detection with commodity switches
Li et al. LossRadar: Fast detection of lost packets in data center networks
EP3223486B1 (en) Distributed anomaly detection management
US10097464B1 (en) Sampling based on large flow detection for network visibility monitoring
US9979624B1 (en) Large flow detection for network visibility monitoring
CN105493450B (en) The method and system of service exception in dynamic detection network
Jose et al. Online measurement of large traffic aggregates on commodity switches
US10536360B1 (en) Counters for large flow detection
CN108028778A (en) Generate the mthods, systems and devices of information transmission performance warning
WO2019120187A1 (en) Non-intrusive mechanism to measure network function packet processing delay
WO2017163352A1 (en) Anomaly detection apparatus, anomaly detection system, and anomaly detection method
US20150052243A1 (en) Transparent software-defined network management
US10003515B1 (en) Network visibility monitoring
US20190182266A1 (en) System and method for out of path ddos attack detection
US9992081B2 (en) Scalable generation of inter-autonomous system traffic relations
WO2018120915A1 (en) Ddos attack detection method and device
EP3644563B1 (en) Sampling traffic telemetry for device classification with distributed probabilistic data structures
CN107294743B (en) Network path detection method, controller and network equipment
CN116055362A (en) Two-stage Hash-Sketch network flow measurement method based on time window
Reis et al. An unsupervised approach to infer quality of service for large-scale wireless networking
CN109952743B (en) System and method for low memory and low flow overhead high flow object detection
CN110351166B (en) Network-level fine-grained flow measurement method based on flow statistical characteristics
WO2021229361A1 (en) Heavy hitter flow classification based on inter-packet gap analysis
Turkovic et al. Detecting heavy hitters in the data-plane

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21724779

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21724779

Country of ref document: EP

Kind code of ref document: A1