WO2023104292A1

WO2023104292A1 - System and method for accurate traffic monitoring on multi-pipeline switches

Info

Publication number: WO2023104292A1
Application number: PCT/EP2021/084572
Authority: WO
Inventors: Amir ROOZBEH; Marco Chiesa; Fabio Luciano LUCIANO; Alireza FARSHIN
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2023-06-15

Abstract

A method and system monitors network packet traffic in a networking device. The method includes receiving a data packet at an ingress port of an ingress pipeline, determining an egress pipeline for the data packet, and adding tracking information to a monitoring cache in the ingress pipeline, in response to determining the data packet is not monitored in the egress pipeline.

Description

SPECIFICATION

SYSTEM AND METHOD FOR ACCURATE TRAFFIC MONITORING ON MULTIPIPELINE SWITCHES

TECHNICAL FIELD

[0001] Embodiments of the invention relate to the field of network traffic monitoring; and more specifically, to the monitoring of traffic within switches.

BACKGROUND ART

[0002] Network monitoring is the process of tracking various statistics related to the movement of data across a set of network components. Network monitoring is a key component of network infrastructure to efficiently administer the network infrastructure and sustain the ever-growing needs of modern-day applications that use the network infrastructure. Network monitoring relies on a collection of coarse-grained traffic statistics reported by the devices in the network infrastructure. In this context, coarse-grained traffic statistics can refer to sampling methods that are relatively infrequent (e.g., sampling 1 out of 4k packets) because of scalability. [0003] Coarse-grained traffic monitoring across network devices provides a view on the operation of the network as a whole but limited information about the operations of network devices and how they affect the traffic across the network. Network operators can utilize some fine-grained network monitoring tools to be able to identify possible causes of network problems that affect interactive or high-speed applications. Some examples of fine-grained monitoring tasks include detection of ‘ short-lived heavy-hitters,’ ‘micro-bursts,’ congestion, and detailed accounting of specific customers’ network usage.

[0004] Network monitoring entails collecting statistics about the traffic traversing a network with the goal of supporting decisions by automated control-planes and/or human-based control. Network statistics can include link utilization, queue occupancies, packet latencies, and per-flow throughput. These network statistics are collected over time and analyzed by a collector (e.g., a controller when data plane and control plane are separated) to identify load imbalances, security attacks, and network misconfigurations for debugging purposes. A control plane of a network is a set of functions, commands, and/or components that determine how a data packet is to be forwarded toward the destination of the data packet (e.g., by configuring forwarding tables). A data plane of a network is a set of functions, commands, and/or components that implement the forwarding of the data packets in the network (e.g., that apply the forwarding tables configured by the control plane). SUMMARY

[0005] In one embodiment, a method monitors network packet traffic in a networking device. The method includes receiving a data packet at an ingress port of an ingress pipeline, determining an egress pipeline for the data packet, and adding tracking information to a monitoring cache in the ingress pipeline, in response to determining the data packet is not monitored in the egress pipeline.

[0006] In another embodiment, an electronic device executes a method of traffic monitoring. The electronic device includes a non-transitory computer-readable storage medium having stored therein a traffic monitor, and a set of processing devices coupled to the non-transitory machine-readable medium. The set of processing devices execute the traffic monitor. The traffic monitor performs the operations of receiving a data packet at an ingress port of an ingress pipeline, determining an egress pipeline for the data packet, and adding tracking information to a monitoring cache in the ingress pipeline, in response to determining the data packet is not monitored in the egress pipeline.

[0007] In one embodiment, an electronic device executes a method of traffic monitoring in a network. The computing device executes a plurality of virtual machines. The plurality of virtual machines implementing network function virtualization (NFV). The electronic device includes a non-transitory computer-readable storage medium having stored therein a traffic monitor, and a processor coupled to the non-transitory computer-readable storage medium. The processor executes one of the plurality of virtual machines. The one of the plurality of virtual machines executes the traffic monitor. The traffic monitor perform the operations of receiving a data packet at an ingress port of an ingress pipeline, determining an egress pipeline for the data packet, and adding tracking information to a monitoring cache in the ingress pipeline, in response to determining the data packet is not monitored in the egress pipeline.

[0008] In a further embodiment, a control plane device executes a method of traffic monitoring in a software defined networking (SDN) network. The control plane device includes a non-transitory computer-readable storage medium having stored therein a traffic monitor, and a processor coupled to the non-transitory computer-readable storage medium. The processor executes the traffic monitor. The traffic monitor to receive a data packet at an ingress port of an ingress pipeline, determine an egress pipeline for the data packet, and add tracking information to a monitoring cache in the ingress pipeline, in response to determining the data packet is not monitored in the egress pipeline. BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The invention may be best understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:

[0010] Figure l is a diagram of one embodiment of a network device.

[0011] Figure 2 is a diagram of one embodiment of a network device with components for the improved traffic monitoring system.

[0012] Figure 3 is a diagram of one example embodiment of a set of metadata added to a packet to be forwarded to an egress pipeline.

[0013] Figure 4 is a flowchart of one embodiment of a process for handling of a data packet at the ingress pipeline to support the fine-grain traffic monitoring.

[0014] Figure 5 is a flowchart of one embodiment of the process of the egress pipeline in supporting the fine-grained traffic monitoring.

[0015] Figure 6 is a diagram of a network device based on Open Tofino with data structures to support fine-grained traffic monitoring.

[0016] Figure 7 is a diagram of one embodiment of a monitoring cache implemented as a stack.

[0017] Figure 8A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention.

[0018] Figure 8B illustrates an exemplary way to implement a special-purpose network device according to some embodiments of the invention.

[0019] Figure 8C illustrates various exemplary ways in which virtual network elements (VNEs) may be coupled according to some embodiments of the invention.

[0020] Figure 8D illustrates a network with a single network element (NE) on each of the NDs, and within this straightforward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.

[0021] Figure 8E illustrates the simple case of where each of the NDs implements a single NE, but a centralized control plane has abstracted multiple of the NEs in different NDs into (to represent) a single NE in one of the virtual network(s), according to some embodiments of the invention.

[0022] Figure 8F illustrates a case where multiple VNEs are implemented on different NDs and are coupled to each other, and where a centralized control plane has abstracted these multiple VNEs such that they appear as a single VNE within one of the virtual networks, according to some embodiments of the invention.

[0023] Figure 9 illustrates a general -purpose control plane device with centralized control plane (CCP) software 950), according to some embodiments of the invention.

DETAILED DESCRIPTION

[0024] The following description describes methods and apparatus for fine-grain network statistics collection and monitoring. The methods and apparatus provide accurate traffic monitoring in multi-pipeline network devices. In the following description, numerous specific details such as logic implementations, opcodes, means to specify operands, resource partitioning/sharing/duplication implementations, types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits, and full software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.

[0025] References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0026] Bracketed text and blocks with dashed borders (e.g., large dashes, small dashes, dotdash, and dots) may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.

[0027] In the following description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. “Coupled” is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other. “Connected” is used to indicate the establishment of communication between two or more elements that are coupled with each other.

[0028] An electronic device stores and transmits (internally and/or with other electronic devices over a network) code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (also called a carrier) (e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals). Thus, an electronic device (e.g., a computer) includes hardware and software, such as a set of one or more processors (e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding) coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data. For instance, an electronic device may include non-volatile memory containing the code since the non-volatile memory can persist code/data even when the electronic device is turned off (when power is removed), and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower nonvolatile memory into volatile memory (e.g., dynamic random access memory (DRAM), static random access memory (SRAM)) of that electronic device. Typical electronic devices also include a set of one or more physical network interface(s) (NI(s)) to establish network connections (to transmit and/or receive code and/or data using propagating signals) with other electronic devices. For example, the set of physical NIs (or the set of physical NI(s) in combination with the set of processors executing code) may perform any formatting, coding, or translating to allow the electronic device to send and receive data whether over a wired and/or a wireless connection. In some embodiments, a physical NI may comprise radio circuitry capable of receiving data from other electronic devices over a wireless connection and/or sending data out to other devices via a wireless connection. This radio circuitry may include transmitted s), receiver(s), and/or transceiver(s) suitable for radiofrequency communication. The radio circuitry may convert digital data into a radio signal having the appropriate parameters (e.g., frequency, timing, channel, bandwidth, etc.). The radio signal may then be transmitted via antennas to the appropriate recipient(s). In some embodiments, the set of physical NI(s) may comprise network interface controlled s) (NICs), also known as a network interface card, network adapter, or local area network (LAN) adapter. The NIC(s) may facilitate in connecting the electronic device to other electronic devices allowing them to communicate via wire through plugging in a cable to a physical port connected to a NIC. One or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware.

[0029] A network device (ND) is an electronic device that communicatively interconnects other electronic devices on the network (e.g., other network devices, end-user devices). Some network devices are “multiple services network devices” that provide support for multiple networking functions (e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management), and/or provide support for multiple application services (e.g., data, voice, and video).

[0030] The embodiments provide a data plane monitoring system and process that improves monitoring detection accuracy and reduces memory overhead for collecting traffic statistics about flows or sets of flows at a finer level of granularity, specifically, within a given network device. Fine-grained network monitoring is able to quickly identify and locate performance anomalies in a network for performance-sensitive applications such as interactive applications (e.g., cloud gaming, virtual reality (VR)/augmented reality (AR), robotics, and similar applications) and high-speed applications (e.g., remote direct memory access (RDMA) transfers in a datacenter). Fine-grained network monitoring also provides advantages for time sensitive networks (TSN) where applications require strict networking behavior, e.g., zero congestion loss. At the same time, the detection of ‘heavy -hitters’ inside the 5th generation core (5GC), i.e., of the 3rd generation partnership project (3GPP), enables the triggering of improved control policies to reroute the traffic through different paths. A ‘heavy hitter’ is a traffic flow that consumes significantly greater resources than combinations of other traffic flows at the network devices. In the context of a 5GC architecture, such detection network issues can be done at the user plane function (UPF) or before so that specific chaining of the heavy-hitters can be done without affecting time sensitive applications. The finer grain network monitoring can be done at the edge or in the core cloud depending on where the actuation will be implemented. The embodiments are capable of running in any programmable hardware regardless of location such as in the edge, access, or core networks.

[0031] In many cases, coarse-grained monitoring is incapable of identifying where and why an application is experiencing performance degradation. One example of such a case is network performance anomalies that are related to events that happen during a short timeframe (e.g., microburst congestion) and not related to failures of components that are easy to detect with existing tools. Performance anomalies are difficult to detect because the performance anomalies can last for short amounts of time and cannot be easily reproduced in a testing environment because the exact conditions to trigger the performance anomaly are unknown to operators. Over 80% of network problems may be related to performance anomalies rather than faults. Finegrained monitoring can be deployed alongside other monitoring tools to supplement the information provided by other monitoring tools. However, existing fine-grained monitoring solutions lack high accuracy and have slow reaction times.

[0032] Figure l is a diagram of one embodiment of a network device. Network devices include general purpose or specialized processors to process received data packet by identifying an egress port for each packet received by an ingress port. The operations of the network devices can be logically divided into a data-plane 101 and a control -plane 103. The data-plane 101 is responsible for forwarding packets between the ports of the network device based on the packet processing logic. The control -plane 103 is responsible for configuring the operation of the data- plane packet processing logic. In some embodiments, the configuration can include populating some of the data-plane match/action data structures (e.g., a Layer 2 forwarding table) that manage the forwarding of packets

[0033] In the data plane 101, the processing components and related circuitry are sub-divided into a set of pipes 109, a set of pipelines 107, and a traffic manager 105. A ‘set,’ as used herein refers to any positive whole number of items, including one item. A pipe 109 is a physical entity including a set of resources (e.g., memory, arithmetic logic unit, and similar components). An ingress pipeline 107A or egress pipeline 107B is a representation of a subset of the pipe 109 components used for programming. Both the ingress pipelines 107A and egress pipelines 107B of a given pipe 109 can be compiled on the same pipe resources. Different pipelines do not share a transactional memory together at data-plane speed.

[0034] A packet processing pipeline 107 takes as input a sequence of packets from the input ports 111 and transforms them according to the packet processing logic 113 implemented by the pipeline 107. The transformation includes identifying the outgoing port interface of the network device for the incoming traffic. More specifically, each pipeline 107 can include a parser 115 that extracts the headers from each received packet, and a logic 113 that performs a sequence of packet modification operations. The logic 113 can be implemented using match-action tables or similar structures and circuitry. Each pipeline 107 can further include a deparser 117 that emits the packet as an output from the pipeline 107.

[0035] In some embodiments, each physical port interface 111 on the network device can be connected to a single one of the packet processing pipelines. For example, on a 25.6-Tbps switch with 64 ports and eight pipelines, there would be 16 different port interfaces of the switch connected to each pipeline. In some embodiments, there are two types of ingress pipelines 107A and egress pipelines 107B. The ingress pipeline 107 A processes packets received from the physical port interfaces 111, whereas the egress pipeline processes packets leaving the network device through any ports connected to that pipeline. Egress pipelines 107B can include the same components as the ingress pipelines that function in a similar manner, but with an inverted purpose of outputting packets received from ingress pipelines 107 A.

[0036] All of the ingress pipelines 107 A and egress pipelines 107B are interconnected via the traffic manager 105. The role of the traffic manager 105 is to forward packets from an ingress pipeline 107 A to the correct egress pipeline 107B. The traffic manager 105 also provides functionalities such as scheduling policies and beyond. It is also possible to create loops and send back packets to an ingress pipeline 107 A and multicast packets to multiple egress pipelines 107B.

[0037] The components, functions, and structures of the network device in Figure 1 are provided by way of example and not limitation. One skilled in the art would appreciate that other additional components, intermediate components, interconnecting technologies, and similar components and features can be utilized in the network device consistent with the operation of the example network devices and fine-grained monitoring functions and structures described herein.

[0038] In some embodiments, a pipe 109 can include different entities such as TCAM (ternary content-addressable memory) and/or SRAM (Static random-access memory), Arithmetic Logic Units (ALUs), Stateful ALUs (SALUs), and similar components. In some embodiments, the packet processing pipelines 107 can be mapped onto these resources of a pipe 109. These resources are used to realize the packet processing logic 113. The packet processing logic 113 typically consists of match/action tables where actions can be operations such as packet header rewriting, arithmetic operations, hash computation, and similar operations. The specific entries contained in a match/action table can be modified at run-time from the control-plane of the network device (e.g., adding a new rule matching traffic destined for a specific IP subnet).

[0039] In some embodiments the processes and structures set forth herein are utilized in network devices where the packet processing pipelines can only access and modify mutually exclusive regions of memory. For example, in such network devices, the processing logic of a pipeline cannot modify the value of register on another pipeline at data-plane speed. This constraint is typical of network devices given the difficulty of synchronizing memories at the high-speed at which these devices operate. These embodiments are designed to overcome these limitations of those network devices that cannot access/modify memory resources of other pipelines at data-plane speed.

[0040] Network devices can include different types of hardware components including hardware components that are custom-built integrated circuits and components that are programmable silicon chips. Custom-built integrated circuits support limited reconfiguration of the packet processing pipeline (e.g., defining the size of the different Layer 2 (L2) and longest prefix match (LPM) tables) where most of the logic is hardcoded in the silicon and cannot be easily modified. Programmable network devices, in contrast, support expressive reconfigurations of the packet processing pipeline that can be used to completely modify the operation of packet processing parsers and logic. Different languages can be used to express packet processing logic, including P4 and network programming language (NPL). Each vendor of the programmable networking hardware provides a compiler to transform the packet processing program into an application specific integrated circuit (ASIC)-specific configuration. [0041] The network device of Figure 1 is an example architecture of a network device that includes four ingress 107 A and four egress pipelines 107B interconnected by a traffic manager 105. The example network device can include support for two types of data structures that are used to implement packet processing programs match/action tables and registers. Match/action tables are defined by a set of match fields (e.g., matching the IPv4 destination address of the packet) that are associated with actions (e.g., forward towards a specific output port). The match/action tables compare matching criteria in the matching fields of the tables with designated fields of received packets and perform the associated action where a match is found. The content in the match-action tables can only be modified from the control plane and not by the faster data plane. This makes such data structures unsuitable for collecting traffic statistics at the per flow-class granularity.

[0042] In the example, the registers are similar to array data structures and can be accessed using an index. Registers can be modified directly at data-plane speeds and are, therefore, the primary data structures that are used to collect and store fine-grained traffic information. There are some limitations on the type of programs that can be compiled on any network device architecture. The embodiments are designed to overcome at least two of the limitations relevant to the example network device architecture. First, each register can only be accessed/modified once when a packet goes through the pipeline. To clarify, multiple registers can be accessed for the same packet, but each register can only be accessed once. Each network device has a predefined limited number of registers. Second, only one entry in a register can be accessed (and modified) for each single processed packet. This means that it is not possible to extract two element entries from the same register. The embodiments provide a process that can enable finegrained monitoring even in network devices with these limitations on the use of registers in the pipelines.

[0043] The traffic statistics that are collected can be maintained in the memory (e.g., the registers of the processing pipelines) of the network device in some form of data structure. An array is one example of a data structure that can be used to collect traffic statistics, which is accessed with an index. For example, to keep per-port statistics, a monitoring system can allocate an entry in an array to each port and use the identifier of the port as the index where the relevant counter is stored (e.g., a count of the number of received/sent packets for that port). [0044] The embodiments provide a process that is applicable to traffic monitoring. Any identifier can be used to perform any fine-grained monitoring at the packet level. In the example embodiments provided herein for sake of illustration the monitoring process can focus on storing network statistics at the granularity level of a “flow-class”, where a flow-class can be a single flow (e.g., identified by a transmission control protocol (TCP) 5-tuple) or a set of flows (e.g., identified by the IPv4 source address).

[0045] In some embodiments, network devices can support probabilistic data structures in the data-plane (e.g., count-min sketches implemented with registers) and update these structures at the speed of the data-plane. The advantage with the probabilistic data structures is that detection of anomalies can be performed directly in the fast data-plane, thus relieving the slower processing capabilities of the network device or external servers from the burden of processing enormous amounts of data. The main problem with these options for fine-grained monitoring in a network device is that the information collected for a single flow-class may be spread across multiple pipelines, resulting in significant overhead by consuming the scarce memory available on the network device.

[0046] The problems of the prior art techniques include that the prior art techniques are based on sampling and cannot identify events at fine time scales. Some prior art techniques are based on mirroring traffic, which have a limited visibility and/or high processor overheads because only a small fraction of the traffic can be analyzed. Some prior art techniques are based on data- plane probabilistic data structures implemented on programmable network devices that maintain monitoring information for the same flow-classes in multiple pipelines, which unnecessarily wastes memory resources due to redundancy. The embodiments overcome these limitations of the prior art by providing a cache-based mechanism for fine-grain packet monitoring that can be used to keep statistics of a flow-class in a single pipeline, thus minimizing memory resources. [0047] The embodiments provide systems and methods for performing accurate traffic monitoring on multi -pipeline switches. The embodiments introducing caching of traffic monitoring data at ingress pipelines and the introduction of a supporting metadata format. The embodiments enable applications and network administrators to accurately monitor the network traffic at a packet level of granularity within each network device, which enables the network administrators and applications that operate in the network to react promptly to network events (e.g., congestion at a particular network device caused by a particular flow). The embodiments improve the accuracy of existing data-plane monitoring systems while requiring a negligible amount of additional memory resources (i.e., the cache). These improvements can be achieved for monitoring queries that require identifying traffic classes whose packets are spread among multiple ingress pipelines.

[0048] The embodiments provide a method for monitoring packets at egress pipelines, as opposed to ingress-based traffic monitoring. A cache in each pipeline is configured to store temporary state information at the ingress pipeline to help track the monitoring status and information until it is collected at the egress pipeline. A metadata format is defined to transfer information stored in the cache in the ingress pipelines to the monitoring data structures stored in the egress pipelines. The embodiments further provide a method to transfer information stored in the cache of the ingress pipelines to the monitoring data structures stored in the egress pipeline. The method is compliant with the limitations of high-speed programmable switches. [0049] In addition, the embodiments include example implementations of caches at ingress pipeline in the specific network device architecture. The principles and structures of the embodiments can be incorporated into any type or variety of data structures that are deployed in the ingress pipelines. For example, caching process of the embodiments can be based on implementations of stacks and queues data structures entirely in the data plane of the programmable network device.

[0050] The embodiments including example implementations provide numerous advantages of the prior art. The embodiments achieve higher accuracy compared to prior art solutions, enabling network administrators/operators to react promptly to network events. The embodiments remove any need to rely on traditional, slow, and inaccurate alternatives that are currently used to perform network monitoring at the networking device. The embodiments improve compute resource utilization and reduce memory usage, which can be a limited resource on the network devices. In some example implementations, the implementations achieve similar performance to existing network devices with almost half of the memory requirements on a 4-pipeline architecture. The cache mechanism of the embodiments has been evaluated using a simulator, which has shown that very small queues sizes are necessary. In some example evaluations, assuming an even distribution of traffic, the maximum size of the queues is nearly 300 flow identifiers occupying 9.37KB and 37.5KB, respectively for 2 and 4 pipelines.

[0051] Figure 2 is a diagram of one embodiment of a network device with components for the improved traffic monitoring system. The diagram of the example network device illustrates an embodiment that includes a monitoring cache 201. A monitoring cache 201 is a data storage device in each ingress pipeline 207A in the network device. The monitoring cache 201 is used by the processes of the embodiments to enable fine-grain monitoring and improve the accuracy of the fine-grain traffic monitoring. The monitoring cache 201 can be a discrete memory device or a data structure in a memory device of the pipeline 207 A. The monitoring cache 201 can function as a table where there are at least four columns or where each entry includes at least four fields.

[0052] The data in the monitoring cache 201 can include an egress flow identifier field, a counter (e.g., a field containing an integer of any size), a timestamp field, and an output egress pipeline identifier field. Other additional fields can also be included in the monitoring cache 201 structure or variations on these fields and data contents.

[0053] The egress flow identifier field is used to store information about one or more flows in the ingress pipeline and for updating the correct entry in the data structures stored at the associated egress pipeline. For example, if the egress pipeline maintains a key -value data structure where the key is a 5-tuple of each packet, then the egress flow identifier for that egress pipeline is a 5-tuple. If instead the egress pipeline uses probabilistic data structures that require computing a hash of the IPv4 source address of a packet to update the correct counter, then the egress flow identifier will be the hash of the IPv4 source address (or the IPv4 source address itself). These identifiers for egress flows or any other similar identifier for packets of these egress flows can be used to identify the associated egress flow.

[0054] The timestamp field that is used to store information about the last time instant when the entry in the cache has been updated. This field can be used to free up space from the cache whenever some information has not been updated for a certain amount of time. This functionality can be used to guarantee all information is moved to the egress pipeline within a predefined time. The counter field can be used to store information about traffic statistics for the specific flows identified by the egress flow identifier field. For example, a counter may count the number of packets received for that specific egress flow identifier or it may count the number of bytes received. The output egress pipeline field contains a value that identifies the egress pipeline storing the monitoring information for a given identifier. This field can be optional or not utilized where the egress flow identifier either explicitly or implicitly identifies the egress pipeline or is otherwise known.

Table I

[0055] The example of Table I provides an example of a monitoring cache where two egress flows are tracked and identified by ID1 and ID2, respectively. The first egress flow identifier has had one packet received, a last packet on the timestamp 1634739748 and the egress flow is forwarded to egress pipeline 2. The second egress flow (ID2) has a count of 5, timestamp 1634739790, and is forwarded to egress pipeline 0. This example table of a monitoring cache would be stored at any one of the ingress pipelines to track monitoring statistics for packets of egress flows that were forwarded to an egress pipeline other than the one where their metrics are tracked. Each monitoring cache can have any size. Time limits and or scheduled flushing can be utilized to ensure that the cache does not overflow.

[0056] To transfer the information stored in the monitoring cache 201 located in the ingress pipeline 207A, the ingress pipeline adds metadata to received packets. This added metadata will be parsed/processed at the egress pipeline 207B to update the traffic information maintained in a monitoring statistic data structure 203 at the egress pipeline 207B. Egress flow identifiers that are stored in the destination egress pipeline are carried within the packet metadata. Similar to the monitoring cache 201 in the ingress pipeline 207A, the monitoring statistic data structure 203 can be a discrete memory device, or a data structure in a shared or general purpose memory structure utilized by the egress pipeline 207B. The monitoring statistic data structure 203 can have a table format with a set of defined columns or fields including an egress flow ID field, counter, and timestamp. The egress flow ID identifies each egress flow, the counter tracks a metric such as processed or forwarded packet count for the egress flow, and the timestamp can indicate the last packet processed. Other or additional fields can be tracked that provide additional statistical information.

[0057] In one example embodiment, the metadata includes a set of fields including length, egress flow identifier, counter, and timestamp. The length field specifies the number of attached fields (e.g., ID+counter+Timestamp) in the metadata. The ID field contains an egress flow identifier that is locally unique to the network device and utilized across the ingress pipeline, metadata, and egress pipeline to track statistics. The counter field shows the number of packets for the given identifier that should be counted in the designated egress pipeline (i.e., added to the count of the egress pipeline). The timestamp field contains the timestamp of the last received packet with the given ID. The timestamp fields can be used in the egress pipeline’s logic module to perform additional time-based monitoring (e.g., tracking the number of received packets within a period).

[0058] Figure 3 is a diagram of one example embodiment of a set of metadata added to a packet to be forwarded to an egress pipeline. In the illustrated example, the metadata includes a single length field 301 that identifies the number of additional fields or groupings of fields 303 A-N. The metadata can be part of an encapsulation of the received data packet, can be added to a header or tail of the data packet or similarly associated and forwarded with the received data packet. Based on the egress pipeline that a packet is being forwarded to, each of the cache entries that have an egress pipeline that matches the destination egress pipeline for the packet can be added to the metadata. A separate grouping 303 A-N of metadata can be added to the packet for each entry in the monitoring case that correlates to the egress pipeline of the data packet being processed at the ingress pipeline. As the data packet and metadata are forwarded the monitoring cache entries that were added to the metadata can be cleared, because the associated data has been forwarded to be collected at the egress pipeline in the monitoring statistics data structure.

[0059] The operations in the flow diagrams will be described with reference to the exemplary embodiments of the other figures. However, it should be understood that the operations of the flow diagrams can be performed by embodiments of the invention other than those discussed with reference to the other figures, and the embodiments of the invention discussed with reference to these other figures can perform operations different than those discussed with reference to the flow diagrams.

[0060] Figure 4 is a flowchart of one embodiment of a process for handling of a data packet at the ingress pipeline to support the fine-grain traffic monitoring. The processing of data packets at a network device implementing the embodiments, includes three phases or related processes, the ingress pipeline processing, the traffic manager processing, and the egress pipeline process.

[0061] In the ingress pipeline process, each data packet arrives at an input port of a given ingress pipeline (Block 401). The packet is then processed at the parser of the ingress pipeline. The parser can extract the headers from each received packet and perform similar tasks to prepare the received data packet for the processing logic. The processing logic performs relevant processing of the received data packet based on the installed configuration (e.g., defined by the network administrator) and identifies the egress pipeline of the data packet based on the computed output port (Block 403). As an example, a data packet with an IPv4 address IP_A arrives at ingress pipeline 2. The processing logic determines to send the given packet to egress pipeline 1. The processing logic also calculates an egress flow identifier for the received packet (Block 405). For example, an egress flow identifier can be a hash of (or value of) a specific portion of the packet (e.g., one packet header field). An example is computing the egress flow identifier as the hash of the IPv4 source address of the received packet using a cyclic redundancy check (e.g., CRC32).

[0062] In the example the value of ID A is calculated for a given packet. The processing logic checks whether the egress flow identifier of the received packet should be monitored in the computed egress pipeline (Block 407). The primary monitoring egress pipeline (i.e., an egress pipeline containing the monitoring information for a specific packet egress flow identifier) can be either calculated at run-time and/or based on a predefined policy. For example, the primary egress pipeline can be previously defined by a network administrator based on destination, source, or similar packet information. In some embodiments, if the received packet should be monitored in the computed egress pipeline, then the packet egress flow identifier is added as metadata to the packet (Block 409). As an example, if a packet with IPv4 source address IP A should be monitored in egress pipeline 1 and the logic is forwarding the packet to egress pipeline 1, then the ID A is added as metadata to the data packet with counter field=l . In other embodiments, for this example, the ID A is not added as metadata because it can be generated at the egress pipeline using the IPv4 source.

[0063] If the data packet should not be monitored in the egress pipeline that the data packet is being forwarded to, then the packet egress flow identifier and other tracking information is added into the monitoring cache of the ingress pipeline (Block 411). As an example, if a packet with IPv4 source address IP A should be monitored in egress pipeline 0 but the processing logic is forwarding the packet to egress pipeline 1, then the output of the CRC32 hash of the IPv4 source address of the packet is added into the monitoring cache associated with egress pipeline 0 along with a timestamp and other tracking information maintained by the monitoring cache table.

[0064] Regardless of whether the data packet is destined for the egress pipeline that is tracking the data for the associated egress flow, the processing logic determines whether there is tracking information in the monitoring cache that is destined for the egress pipeline of the data packet (Block 413). The processing logic checks the monitoring cache for pending identifiers destined for the designated egress pipeline. If pending identifiers are found (i.e., there are entries in the monitoring cache with egress pipeline values that match the destination egress pipeline for the data packet), then the processing logic adds these identifiers as additional metadata to the packet (Block 415). Additionally, the processing logic removes any of the information added as metadata to the packet from the monitoring cache (Block 417). Furthermore, the processing logic removes the expired entries in the cache based on the defined policy and timestamps recorded in the monitoring cache. In the example of Table I, the monitoring cache contains an additional entry with output egress pipeline 1 and ID B. Therefore, the processing logic adds this entry as metadata to the data packet.

[0065] When the processing logic has completed, the metadata added as needed, and any other relevant processing and traffic metric updates completed in the ingress pipeline, then the ingress pipeline can provide the data packet to the traffic manager to be forwarded to the appropriate egress pipelines (Block 419). The traffic manager receives data packets from the ingress pipelines and sends them to the identified egress pipelines. In some cases, the data packet might be recirculated back to an ingress pipeline or multi -casted to the same pipeline or to multiple egress pipelines. The embodiments are compatible with these cases. However, multicasting may cause some issues that could result in duplicate updates on the same egress pipeline. This issue can be solved by introducing a "replication ID" for multicast cases that is added to the data packet that would signal that the data packet has been replicated via multi-cast to ensure the proper tracking of metrics at the egress pipelines. Similarly, data packets that are sent back to the ingress pipeline or similarly looped can be marked in their metadata or similarly marked to ensure proper traffic monitoring and to avoid redundancy.

[0066] Figure 5 is a flowchart of one embodiment of the process of the egress pipeline in supporting the fine-grained traffic monitoring. The egress pipeline process can be triggered by the receipt of a data packet from the traffic manager (Block 501). The data packet can be received by the parser of the egress pipeline. Similar to the ingress pipeline, the processing logic of the egress pipeline applies relevant logic based on the installed configuration (e.g., defined by the network administrator) to the data packet. Additionally, the processing logic or parser can identify whether the data packet contains any packet identifiers stored in its metadata (Block 503). In other embodiments, the metadata can be extracted prior to the application of the relevant logic by the processing logic. As an example, the parser can extract the first field of the metadata header, which is the value of the metadata header length (e.g., a value of 1 indicating one grouping of metadata). The parser can then extract the packet egress flow identifier, timestamp, and counter according to the metadata format for the single grouping. The identification and extraction of metadata as referred to herein are specific to the metadata for fine-grained traffic monitoring as described herein as opposed to metadata related to other technologies or protocols that could be present in the received data packet.

[0067] If metadata exists in the data packet (e.g., with the length indicating a value greater than zero), then processing logic iterates through the attached metadata and updates the relevant counters (e.g., data structure) (Block 505). In the example, the processing logic extracts the information embedded in the metadata for ID B and updates the relevant monitoring counters for that specific identifier. The metadata is removed from the packet after the monitoring statistics data structure 505 is updated (Block 507). The packet is forwarded to the deparser and the relevant output port (Block 509).

[0068] In further embodiments, the processes are adapted for use in a network device that does not have a dedicated monitoring cache in the ingress pipelines. In this example, the data-plane monitoring statistics data structure is a Feed-forward Count-Min sketch (FCM), which consists of three hierarchical levels of registers that are updated for each single received packet. Possibly, all three levels of registers are updated by a single packet. In some embodiments, the size of the data structures is 2^A19 8-bit entries for level 1, 2^A16 16-bit entries for level 2, and 2^A13 32-bit entries for level 3. The memory requirements for the implementation in the ingress pipeline are 688KB of SRAM in each ingress pipeline, which means 2.75MB over four pipelines. The use of FCM is provided by way of example and not limitation, any other sketch implementation or similar mechanism can be used for data-plane monitoring.

[0069] In this example embodiment, the monitoring process is detecting heavy hitters and use the fl -score as a metric of the system performance. The heavy hitters are classified based on only the IPv4 source IP address and the process uses the hash of the IPv4 source address as the egress flow identifier. The same FCM data structures can be deployed in the egress pipelines consuming roughly 25% of the memory required by a standard FCM implementation on a 4-pipeline switch. In the example embodiment, each egress flow identifier is mapped to exactly one egress pipeline, for instance, using a hash function that spreads these identifiers across all pipelines or mapping all identifiers to a single egress pipeline. In the example, data packets belonging to the same source IP address are therefore stored onto a single egress pipeline for metrics tracking as opposed to spreading them over multiple pipelines. This allows the embodiments to make more efficient utilization of the memory and improve the fl -score for the heavy hitter monitoring task.

[0070] In this example, the packet processing steps are described for a case where a packet PKT with IPv4 source address SRCIPA arrives at an ingress pipeline PIP X and has a designated egress pipeline PIP Y. Three operations are performed for the received packet PKT. The ingress pipeline checks the content of the cache in the ingress pipeline. If there are any egress flow identifiers stored in the cache whose source IP address should be stored in egress pipeline PIP Y, the ingress pipeline fetches one or more egress flow identifiers, and moves this information to the egress by attaching it as metadata to the incoming packet. The exact amount of information moved to the egress depends on the specific hardware.

[0071] In one example, the packet PKT will be forwarded to egress pipeline 0. Then the ingress pipeline checks whether there is any egress flow identifier in the cache that should be stored in egress pipeline 0. In this case, it adds the corresponding egress flow identifiers to the packet as metadata. A further operation can depend on whether the received packet PKT should update registers in PIP Y or not. If the packet should update register entries that are stored in PIP Y, then the process adds the egress flow identifier of the packet to the metadata. In the specific case of FCM, this means storing the hash of the IPv4 source address of the packet into the metadata of the data packet. If the data packet should update register entries that are stored in an egress pipeline different from PIP Y, then the ingress pipeline caches the egress flow identifier of the packet (i.e., the IPv4 source address) in the ingress pipeline using a cache data structure. If the FCM data structure keeps two sketches that must be updated using two different hashes of the same header field, then the ingress pipelines stores either the header fields or all the hashes in the metadata.

[0072] In one example, a data packet arrives at the ingress pipeline 1 and is forwarded to egress pipeline 2 while the data structure to be updated for this packet is in egress pipeline 0. FCM only needs to know the hash of the IPv4 source address of the packet. Then, the ingress pipeline will add the hash of SRCIPA (i.e., the source IP address of the packet) into the cache. When the data packet arrives at the egress pipeline PIP Y, then all the egress flow identifiers are extracted from the metadata of the packet and the corresponding register entries are updated. As an example, if a data packet contains a metadata with three hashes of the IPv4 source addresses of three packets (e.g., three hashes fetched from the cache, then the egress pipeline will iteratively update the counters of FCM for each of these hashes of the IPv4 source address. Each hash may update counters at potentially all the three levels of registers of FCM.

[0073] In some embodiments, the monitoring cache data structure can be implemented based on the Open Tofino specification. In an example embodiment, a programmable network device with four pipelines (both ingress and egress) is utilized. The Open Tofino specification sets forth that registers of a pipeline are accessed via Register Action extems, which contain a function named apply that can read and update the value of one entry of a Register. Up to four separate RegisterActions may be defined for a single Register extern, but only one Register Action may be executed per packet for a given Register. This aspect of the operation of a network device using Open Tofino has an important implication on the implementation such that the embodiments need to split/partition the data structure of the monitoring statistics module (e.g., FCM) into multiple registers (here called “slices”) so that multiple entries in the same data structure (implemented with registers) can be updated in parallel. This design enables the update register entries for both the original packet and any egress flow identifier fetched from the cache.

[0074] The embodiments address this constraint of Open Tofino by relying on a configurable amount N of slices. In one example, for each ingress pipeline, four cache data structures are used that are implemented as either queues (3 registers needed) or stacks (2 registers needed). Each cache data structure is associated with an egress pipeline. Each cache data structure is further divided into N slices. The number of slices between the ingress and egress data structures should be identical. A slice X associated with pipeline Y only stores packet information for those packets that have a source IP address that should be updated in the egress pipeline Y and into slice X of the register. The selection of the slice given a specific packet can be performed for instance using a simple hash-based mechanism. The rationale for the slices is simple, if a packet metadata contains egress flow identifiers belonging to different slices, then all these identifiers can be updated in their corresponding register data structures. An example of this configuration is shown in Figure 6. Figure 6 is a diagram of a network device based on Open Tofino with data structures to support fine-grained traffic monitoring.

[0075] In some embodiments the slice-based technique, which is based on the Open Tofino specification, can process an example packet PKT that arrives at an ingress pipeline PIP X and the ingress logic designates PIP Y as the egress pipeline. The IPv4 source address SRCIPA of the packet can be stored in an egress pipeline PIP J and slice SL Z.

[0076] The ingress pipeline performs the following operations. For any slice SL_K different from SL Z, the ingress pipeline checks the content of the cache associated with PIP X and any slice SL_K. If there is any egress flow identifier stored in that cache, the ingress pipeline fetches exactly one egress flow identifier and adds this information as metadata to the packet. The ingress pipeline also decides what should be the egress flow identifier in the metadata that is associated to slice SL Z (i.e., the slice of the received packets). This decision depends on whether packet PKT should update data structures in the egress pipeline where the packet will be forwarded (i.e., PIP Y == PIP J) or not (i.e., PIP Y != PIPJ_) If the packet is being forwarded to that egress pipeline that stores its information, then the ingress pipeline adds the hash of the IPv4 source address of packet PKT to the metadata related to slice SLZ. If the data packet is being forwarded to another egress pipeline, then the ingress pipeline caches the hash of the IPv4 source address of packet PKT in the cache associated with egress pipeline PIP J and slice SL Z.

[0077] The egress pipeline PIP Y performs the following operation to update the FCM data structures. All the egress flow identifiers, one for each slice, are extracted from the metadata of the data packet and the corresponding register entries, one register per level per slice in FCM, are updated. For example, with two slices, the egress pipeline of FCM will contain 6 registers, 3 for slice 0 and 3 for slice 1. A data packet that arrives at the egress pipeline, will contain zero, one, or two flow egress identifiers. The egress pipeline will first update the counters related to the hash stored for slice 0 in the registers associated with slice 0 and it will then perform the same operation for counters stored for slice 1.

[0078] In some embodiments two types of alternate implementations can be used for the monitoring cache. In one embodiment, a stack with push and pop operations. A “stack” can be realized using two registers (one for the head pointer and one for the data). A queue with queue and dequeue operations. A “queue” can be realized using three registers. The number of slices has implications on the ability of the system to move the cached information to the egress data structures in the presence of traffic imbalances among the egress pipelines. Whenever high imbalances are present, which leads to the cache being filled, the ingress logic starts recirculating truncated packets in the switch with the goal of draining the caches. If any of the caches associated with pipeline Y are above a certain threshold, we set the egress pipeline of the recirculated packet to be Y and keep recirculating packets until the caches return to a lower value. An alternative solution is to use a packet generator on the Tofino.

[0079] Figure 7 is a diagram of one embodiment a monitoring cache implemented as a stack. In the example illustrated embodiment, a first register 701 or similar data store includes a set of pointers and counter values that correlate a second register 703 or similar data store. In some embodiments, the pointers are explicit identifiers of the location of each of the correlated stacks. In other embodiments, the pointers are implicit having a defined correlation with the stacks. The counter values for each of the correlated stacks indicate the number of items in each stack.

[0080] The items in each stack in register 703 can include the equivalent of the monitoring cache entries described herein with relation to Table I. Each stack can hold a given number of items that have tracking information that is waiting for a data packet that is destined for the correlated egress pipe. When a data packet destined for one of the egress pipes is received, then the items can be ‘popped’ from the stack for the pipeline to be added as metadata for the data packet. In this manner, two registers or similar storage mechanisms can be utilized to implement the monitoring cache in each ingress pipeline.

[0081] Figure 8A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention. Figure 8A shows NDs 800A-H, and their connectivity by way of lines between 800A-800B, 800B-800C, 800C-800D, 800D-800E, 800E-800F, 800F-800G, and 800A-800G, as well as between 800H and each of 800A, 800C, 800D, and 800G. These NDs are physical devices, and the connectivity between these NDs can be wireless or wired (often referred to as a link). An additional line extending from NDs 800A, 800E, and 800F illustrates that these NDs act as ingress and egress points for the network (and thus, these NDs are sometimes referred to as edge NDs; while the other NDs may be called core NDs).

[0082] Two of the exemplary ND implementations in Figure 8A are: 1) a special-purpose network device 802 that uses custom application-specific integrated-circuits (ASICs) and a special-purpose operating system (OS); and 2) a general purpose network device 804 that uses common off-the-shelf (COTS) processors and a standard OS.

[0083] The special-purpose network device 802 includes networking hardware 810 comprising a set of one or more processor(s) 812, forwarding resource(s) 814 (which typically include one or more ASICs and/or network processors), and physical network interfaces (NIs) 816 (through which network connections are made, such as those shown by the connectivity between NDs 800A-H), as well as non-transitory machine readable storage media 818 having stored therein networking software 820. During operation, the networking software 820 may be executed by the networking hardware 810 to instantiate a set of one or more networking software instance(s) 822. Each of the networking software instance(s) 822, and that part of the networking hardware 810 that executes that network software instance (be it hardware dedicated to that networking software instance and/or time slices of hardware temporally shared by that networking software instance with others of the networking software instance(s) 822), form a separate virtual network element 830A-R. Each of the virtual network element(s) (VNEs) 830A-R includes a control communication and configuration module 832A-R (sometimes referred to as a local control module or control communication module) and forwarding table(s) 834A-R, such that a given virtual network element (e.g., 830 A) includes the control communication and configuration module (e.g., 832A), a set of one or more forwarding table(s) (e.g., 834A), and that portion of the networking hardware 810 that executes the virtual network element (e.g., 830A).

[0084] The special-purpose network device 802 is often physically and/or logically considered to include: 1) a ND control plane 824 (sometimes referred to as a control plane) comprising the processor(s) 812 that execute the control communication and configuration module(s) 832A-R; and 2) a ND forwarding plane 826 (sometimes referred to as a forwarding plane, a data plane, or a media plane) comprising the forwarding resource(s) 814 that utilize the forwarding table(s) 834A-R and the physical NIs 816. By way of example, where the ND is a router (or is implementing routing functionality), the ND control plane 824 (the processor(s) 812 executing the control communication and configuration module(s) 832A-R) is typically responsible for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) and storing that routing information in the forwarding table(s) 834A-R, and the ND forwarding plane 826 is responsible for receiving that data on the physical NIs 816 and forwarding that data out the appropriate ones of the physical NIs 816 based on the forwarding table(s) 834A-R.

[0085] In some embodiments, the networking software 820 can include a traffic monitor 865 that performs the operations of traffic monitoring as described herein that manage a monitoring cache in the ingress pipeline and/or the updating of traffic monitoring information at the egress pipeline.

[0086] Figure 8B illustrates an exemplary way to implement the special-purpose network device 802 according to some embodiments of the invention. Figure 8B shows a special-purpose network device including cards 838 (typically hot pluggable). While in some embodiments the cards 838 are of two types (one or more that operate as the ND forwarding plane 826 (sometimes called line cards), and one or more that operate to implement the ND control plane 824 (sometimes called control cards)), alternative embodiments may combine functionality onto a single card and/or include additional card types (e.g., one additional type of card is called a service card, resource card, or multi-application card). A service card can provide specialized processing (e.g., Layer 4 to Layer 7 services (e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)). By way of example, a service card may be used to terminate IPsec tunnels and execute the attendant authentication and encryption algorithms. These cards are coupled together through one or more interconnect mechanisms illustrated as backplane 836 (e.g., a first full mesh coupling the line cards and a second full mesh coupling all of the cards). [0087] Returning to Figure 8A, the general purpose network device 804 includes hardware 840 comprising a set of one or more processor(s) 842 (which are often COTS processors) and physical NIs 846, as well as non-transitory machine readable storage media 848 having stored therein software 850. During operation, the processor(s) 842 execute the software 850 to instantiate one or more sets of one or more applications 864A-R. While one embodiment does not implement virtualization, alternative embodiments may use different forms of virtualization. [0088] In some embodiments, the software 850 can include a traffic monitor 865 that performs the operations of traffic monitoring as described herein that manage a monitoring cache in the ingress pipeline and/or the updating of traffic monitoring information at the egress pipeline. [0089] For example, in one such alternative embodiment the virtualization layer 854 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 862A-R called software containers that may each be used to execute one (or more) of the sets of applications 864A-R; where the multiple software containers (also called virtualization engines, virtual private servers, or jails) are user spaces (typically a virtual memory space) that are separate from each other and separate from the kernel space in which the operating system is run; and where the set of applications running in a given user space, unless explicitly allowed, cannot access the memory of the other processes. In another such alternative embodiment the virtualization layer 854 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and each of the sets of applications 864A-R is run on top of a guest operating system within an instance 862A-R called a virtual machine (which may in some cases be considered a tightly isolated form of software container) that is run on top of the hypervisor - the guest operating system and application may not know they are running on a virtual machine as opposed to running on a “bare metal” host electronic device, or through para-virtualization the operating system and/or application may be aware of the presence of virtualization for optimization purposes. In yet other alternative embodiments, one, some or all of the applications are implemented as unikernel(s), which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application. As a unikemel can be implemented to run directly on hardware 840, directly on a hypervisor (in which case the unikernel is sometimes described as running within a LibOS virtual machine), or in a software container, embodiments can be implemented fully with unikemels running directly on a hypervisor represented by virtualization layer 854, unikemels running within software containers represented by instances 862A-R, or as a combination of unikemels and the above-described techniques (e.g., unikernels and virtual machines both run directly on a hypervisor, unikernels and sets of applications that are run in different software containers).

[0090] The instantiation of the one or more sets of one or more applications 864A-R, as well as virtualization if implemented, are collectively referred to as software instance(s) 852. Each set of applications 864A-R, corresponding virtualization construct (e.g., instance 862A-R) if implemented, and that part of the hardware 840 that executes them (be it hardware dedicated to that execution and/or time slices of hardware temporally shared), forms a separate virtual network element(s) 860A-R.

[0091] The virtual network element(s) 860A-R perform similar functionality to the virtual network element(s) 830A-R - e.g., similar to the control communication and configuration module(s) 832A and forwarding table(s) 834A (this virtualization of the hardware 840 is sometimes referred to as network function virtualization (NFV)). Thus, NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which could be located in Data centers, NDs, and customer premise equipment (CPE). While embodiments of the invention are illustrated with each instance 862A-R corresponding to one VNE 860A-R, alternative embodiments may implement this correspondence at a finer level granularity (e.g., line card virtual machines virtualize line cards, control card virtual machine virtualize control cards, etc.); it should be understood that the techniques described herein with reference to a correspondence of instances 862A-R to VNEs also apply to embodiments where such a finer level of granularity and/or unikemels are used.

[0092] In certain embodiments, the virtualization layer 854 includes a virtual switch that provides similar forwarding services as a physical Ethernet switch. Specifically, this virtual switch forwards traffic between instances 862A-R and the physical NI(s) 846, as well as optionally between the instances 862A-R; in addition, this virtual switch may enforce network isolation between the VNEs 860A-R that by policy are not permitted to communicate with each other (e.g., by honoring virtual local area networks (VLANs)). [0093] The third exemplary ND implementation in Figure 8A is a hybrid network device 806, which includes both custom ASICs/ special-purpose OS and COTS processors/ standard OS in a single ND or a single card within an ND. In certain embodiments of such a hybrid network device, a platform VM (i.e., a VM that that implements the functionality of the special-purpose network device 802) could provide for para-virtualization to the networking hardware present in the hybrid network device 806.

[0094] Regardless of the above exemplary implementations of an ND, when a single one of multiple VNEs implemented by an ND is being considered (e.g., only one of the VNEs is part of a given virtual network) or where only a single VNE is currently being implemented by an ND, the shortened term network element (NE) is sometimes used to refer to that VNE. Also in all of the above exemplary implementations, each of the VNEs (e.g., VNE(s) 830A-R, VNEs 860A-R, and those in the hybrid network device 806) receives data on the physical NIs (e.g., 816, 846) and forwards that data out the appropriate ones of the physical NIs (e.g., 816, 846). For example, a VNE implementing IP router functionality forwards IP packets on the basis of some of the IP header information in the IP packet; where IP header information includes source IP address, destination IP address, source port, destination port (where “source port” and “destination port” refer herein to protocol ports, as opposed to physical ports of a ND), transport protocol (e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.

[0095] Figure 8C illustrates various exemplary ways in which VNEs may be coupled according to some embodiments of the invention. Figure 8C shows VNEs 870A.1-870A.P (and optionally VNEs 870A.Q-870A.R) implemented in ND 800A and VNE 870H.1 in ND 800H. In Figure 8C, VNEs 870A.1-P are separate from each other in the sense that they can receive packets from outside ND 800A and forward packets outside of ND 800A; VNE 870A.1 is coupled with VNE 870H.1, and thus they communicate packets between their respective NDs; VNE 870A.2-870A.3 may optionally forward packets between themselves without forwarding them outside of the ND 800A; and VNE 870A.P may optionally be the first in a chain of VNEs that includes VNE 870A.Q followed by VNE 870A.R (this is sometimes referred to as dynamic service chaining, where each of the VNEs in the series of VNEs provides a different service - e.g., one or more layer 4-7 network services). While Figure 8C illustrates various exemplary relationships between the VNEs, alternative embodiments may support other relationships (e.g., more/fewer VNEs, more/fewer dynamic service chains, multiple different dynamic service chains with some common VNEs and some different VNEs).

[0096] The NDs of Figure 8 A, for example, may form part of the Internet or a private network; and other electronic devices (not shown; such as end user devices including workstations, laptops, netbooks, tablets, palm tops, mobile phones, smartphones, phablets, multimedia phones, Voice Over Internet Protocol (VOIP) phones, terminals, portable media players, GPS units, wearable devices, gaming systems, set-top boxes, Internet enabled household appliances) may be coupled to the network (directly or through other networks such as access networks) to communicate over the network (e.g., the Internet or virtual private networks (VPNs) overlaid on (e.g., tunneled through) the Internet) with each other (directly or through servers) and/or access content and/or services. Such content and/or services are typically provided by one or more servers (not shown) belonging to a service/content provider or one or more end user devices (not shown) participating in a peer-to-peer (P2P) service, and may include, for example, public webpages (e.g., free content, store fronts, search services), private webpages (e.g., usemame/password accessed webpages providing email services), and/or corporate networks over VPNs. For instance, end user devices may be coupled (e.g., through customer premise equipment coupled to an access network (wired or wirelessly)) to edge NDs, which are coupled (e.g., through one or more core NDs) to other edge NDs, which are coupled to electronic devices acting as servers. However, through compute and storage virtualization, one or more of the electronic devices operating as the NDs in Figure 8A may also host one or more such servers (e.g., in the case of the general purpose network device 804, one or more of the software instances 862A-R may operate as servers; the same would be true for the hybrid network device 806; in the case of the special-purpose network device 802, one or more such servers could also be run on a virtualization layer executed by the processor(s) 812); in which case the servers are said to be co-located with the VNEs of that ND.

[0097] A virtual network is a logical abstraction of a physical network (such as that in Figure 8A) that provides network services (e.g., L2 and/or L3 services). A virtual network can be implemented as an overlay network (sometimes referred to as a network virtualization overlay) that provides network services (e.g., layer 2 (L2, data link layer) and/or layer 3 (L3, network layer) services) over an underlay network (e.g., an L3 network, such as an Internet Protocol (IP) network that uses tunnels (e.g., generic routing encapsulation (GRE), layer 2 tunneling protocol (L2TP), IPSec) to create the overlay network).

[0098] A network virtualization edge (NVE) sits at the edge of the underlay network and participates in implementing the network virtualization; the network-facing side of the NVE uses the underlay network to tunnel frames to and from other NVEs; the outward-facing side of the NVE sends and receives data to and from systems outside the network. A virtual network instance (VNI) is a specific instance of a virtual network on a NVE (e.g., a NE/VNE on an ND, a part of a NE/VNE on a ND where that NE/VNE is divided into multiple VNEs through emulation); one or more VNIs can be instantiated on an NVE (e.g., as different VNEs on an ND). A virtual access point (VAP) is a logical connection point on the NVE for connecting external systems to a virtual network; a VAP can be physical or virtual ports identified through logical interface identifiers (e.g., a VLAN ID).

[0099] Examples of network services include: 1) an Ethernet LAN emulation service (an Ethernet-based multipoint service similar to an Internet Engineering Task Force (IETF) Multiprotocol Label Switching (MPLS) or Ethernet VPN (EVPN) service) in which external systems are interconnected across the network by a LAN environment over the underlay network (e.g., an NVE provides separate L2 VNIs (virtual switching instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network); and 2) a virtualized IP forwarding service (similar to IETF IP VPN (e.g., Border Gateway Protocol (BGP)/MPLS IPVPN) from a service definition perspective) in which external systems are interconnected across the network by an L3 environment over the underlay network (e.g., an NVE provides separate L3 VNIs (forwarding and routing instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network)). Network services may also include quality of service capabilities (e.g., traffic classification marking, traffic conditioning and scheduling), security capabilities (e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements), and management capabilities (e.g., full detection and processing).

[00100] Fig. 8D illustrates a network with a single network element on each of the NDs of Figure 8A, and within this straightforward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention. Specifically, Figure 8D illustrates network elements (NEs) 870A-H with the same connectivity as the NDs 800A-H of Figure 8 A.

[00101] Figure 8D illustrates that the distributed approach 872 distributes responsibility for generating the reachability and forwarding information across the NEs 870A-H; in other words, the process of neighbor discovery and topology discovery is distributed.

[00102] For example, where the special-purpose network device 802 is used, the control communication and configuration module(s) 832A-R of the ND control plane 824 typically include a reachability and forwarding information module to implement one or more routing protocols (e.g., an exterior gateway protocol such as Border Gateway Protocol (BGP), Interior Gateway Protocol(s) (IGP) (e.g., Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), Routing Information Protocol (RIP), Label Distribution Protocol (LDP), Resource Reservation Protocol (RSVP) (including RSVP-Traffic Engineering (TE): Extensions to RSVP for LSP Tunnels and Generalized Multi -Protocol Label Switching (GMPLS) Signaling RSVP-TE)) that communicate with other NEs to exchange routes, and then selects those routes based on one or more routing metrics. Thus, the NEs 870A-H (e.g., the processor(s) 812 executing the control communication and configuration module(s) 832A-R) perform their responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by distributively determining the reachability within the network and calculating their respective forwarding information. Routes and adjacencies are stored in one or more routing structures (e.g., Routing Information Base (RIB), Label Information Base (LIB), one or more adjacency structures) on the ND control plane 824. The ND control plane 824 programs the ND forwarding plane 826 with information (e.g., adjacency and route information) based on the routing structure(s). For example, the ND control plane 824 programs the adjacency and route information into one or more forwarding table(s) 834A-R (e.g., Forwarding Information Base (FIB), Label Forwarding Information Base (LFIB), and one or more adjacency structures) on the ND forwarding plane 826. For layer 2 forwarding, the ND can store one or more bridging tables that are used to forward data based on the layer 2 information in that data. While the above example uses the special-purpose network device 802, the same distributed approach 872 can be implemented on the general purpose network device 804 and the hybrid network device 806. [00103] Figure 8D illustrates that a centralized approach 874 (also known as software defined networking (SDN)) that decouples the system that makes decisions about where traffic is sent from the underlying systems that forwards traffic to the selected destination. The illustrated centralized approach 874 has the responsibility for the generation of reachability and forwarding information in a centralized control plane 876 (sometimes referred to as a SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity), and thus the process of neighbor discovery and topology discovery is centralized. The centralized control plane 876 has a south bound interface 882 with a data plane 880 (sometime referred to the infrastructure layer, network forwarding plane, or forwarding plane (which should not be confused with a ND forwarding plane)) that includes the NEs 870A-H (sometimes referred to as switches, forwarding elements, data plane elements, or nodes). The centralized control plane 876 includes a network controller 878, which includes a centralized reachability and forwarding information module 879 that determines the reachability within the network and distributes the forwarding information to the NEs 870A-H of the data plane 880 over the south bound interface 882 (which may use the OpenFlow protocol). Thus, the network intelligence is centralized in the centralized control plane 876 executing on electronic devices that are typically separate from the NDs. [00104] For example, where the special-purpose network device 802 is used in the data plane 880, each of the control communication and configuration module(s) 832A-R of the ND control plane 824 typically include a control agent that provides the VNE side of the south bound interface 882. In this case, the ND control plane 824 (the processor(s) 812 executing the control communication and configuration module(s) 832A-R) performs its responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) through the control agent communicating with the centralized control plane 876 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 879 (it should be understood that in some embodiments of the invention, the control communication and configuration module(s) 832A-R, in addition to communicating with the centralized control plane 876, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach; such embodiments are generally considered to fall under the centralized approach 874, but may also be considered a hybrid approach).

[00105] While the above example uses the special-purpose network device 802, the same centralized approach 874 can be implemented with the general purpose network device 804 (e.g., each of the VNE 860A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by communicating with the centralized control plane 876 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 879; it should be understood that in some embodiments of the invention, the VNEs 860A-R, in addition to communicating with the centralized control plane 876, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach) and the hybrid network device 806. In fact, the use of SDN techniques can enhance the NFV techniques typically used in the general purpose network device 804 or hybrid network device 806 implementations as NFV is able to support SDN by providing an infrastructure upon which the SDN software can be run, and NFV and SDN both aim to make use of commodity server hardware and physical switches.

[00106] Figure 8D also shows that the centralized control plane 876 has a north bound interface 884 to an application layer 886, in which resides application(s) 888. The centralized control plane 876 has the ability to form virtual networks 892 (sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 870A-H of the data plane 880 being the underlay network)) for the application(s) 888. Thus, the centralized control plane 876 maintains a global view of all NDs and configured NEs/VNEs, and it maps the virtual networks to the underlying NDs efficiently (including maintaining these mappings as the physical network changes either through hardware (ND, link, or ND component) failure, addition, or removal).

[00107] In some embodiments, the application layer 886 or similar aspect of the centralized approach can include a traffic monitor 881 that performs the operations of traffic monitoring as described herein that manage a monitoring cache in the ingress pipeline and/or the updating of traffic monitoring information at the egress pipeline.

[00108] While Figure 8D shows the distributed approach 872 separate from the centralized approach 874, the effort of network control may be distributed differently or the two combined in certain embodiments of the invention. For example: 1) embodiments may generally use the centralized approach (SDN) 874, but have certain functions delegated to the NEs (e.g., the distributed approach may be used to implement one or more of fault monitoring, performance monitoring, protection switching, and primitives for neighbor and/or topology discovery); or 2) embodiments of the invention may perform neighbor discovery and topology discovery via both the centralized control plane and the distributed protocols, and the results compared to raise exceptions where they do not agree. Such embodiments are generally considered to fall under the centralized approach 874, but may also be considered a hybrid approach.

[00109] While Figure 8D illustrates the simple case where each of the NDs 800A-H implements a single NE 870A-H, it should be understood that the network control approaches described with reference to Figure 8D also work for networks where one or more of the NDs 800 A-H implement multiple VNEs (e.g., VNEs 830A-R, VNEs 860 A-R, those in the hybrid network device 806). Alternatively or in addition, the network controller 878 may also emulate the implementation of multiple VNEs in a single ND. Specifically, instead of (or in addition to) implementing multiple VNEs in a single ND, the network controller 878 may present the implementation of a VNE/NE in a single ND as multiple VNEs in the virtual networks 892 (all in the same one of the virtual network(s) 892, each in different ones of the virtual network(s) 892, or some combination). For example, the network controller 878 may cause an ND to implement a single VNE (a NE) in the underlay network, and then logically divide up the resources of that NE within the centralized control plane 876 to present different VNEs in the virtual network(s) 892 (where these different VNEs in the overlay networks are sharing the resources of the single VNE/NE implementation on the ND in the underlay network).

[00110] On the other hand, Figures 8E and 8F respectively illustrate exemplary abstractions of NEs and VNEs that the network controller 878 may present as part of different ones of the virtual networks 892. Figure 8E illustrates the simple case of where each of the NDs 800A-H implements a single NE 870A-H (see Figure 8D), but the centralized control plane 876 has abstracted multiple of the NEs in different NDs (the NEs 870A-C and G-H) into (to represent) a single NE 8701 in one of the virtual network(s) 892 of Figure 8D, according to some embodiments of the invention. Figure 8E shows that in this virtual network, the NE 8701 is coupled to NE 870D and 870F, which are both still coupled to NE 870E.

[00111] Figure 8F illustrates a case where multiple VNEs (VNE 870A.1 and VNE 870H.1) are implemented on different NDs (ND 800A and ND 800H) and are coupled to each other, and where the centralized control plane 876 has abstracted these multiple VNEs such that they appear as a single VNE 870T within one of the virtual networks 892 of Figure 8D, according to some embodiments of the invention. Thus, the abstraction of a NE or VNE can span multiple NDs.

[00112] While some embodiments of the invention implement the centralized control plane 876 as a single entity (e.g., a single instance of software running on a single electronic device), alternative embodiments may spread the functionality across multiple entities for redundancy and/or scalability purposes (e.g., multiple instances of software running on different electronic devices).

[00113] Similar to the network device implementations, the electronic device(s) running the centralized control plane 876, and thus the network controller 878 including the centralized reachability and forwarding information module 879, may be implemented a variety of ways (e.g., a special purpose device, a general-purpose (e.g., COTS) device, or hybrid device). These electronic device(s) would similarly include processor(s), a set of one or more physical NIs, and a non-transitory machine-readable storage medium having stored thereon the centralized control plane software. For instance, Figure 9 illustrates, a general purpose control plane device 904 including hardware 940 comprising a set of one or more processor(s) 942 (which are often COTS processors) and physical NIs 946, as well as non-transitory machine readable storage media 948 having stored therein centralized control plane (CCP) software 950.

[00114] In some embodiments, the non-transitory machine readable storage media948 can include a traffic monitor 981 that performs the operations of traffic monitoring as described herein that manage a monitoring cache in the ingress pipeline and/or the updating of traffic monitoring information at the egress pipeline.

[00115] In embodiments that use compute virtualization, the processor(s) 942 typically execute software to instantiate a virtualization layer 954 (e.g., in one embodiment the virtualization layer 954 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 962A-R called software containers (representing separate user spaces and also called virtualization engines, virtual private servers, or jails) that may each be used to execute a set of one or more applications; in another embodiment the virtualization layer 954 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and an application is run on top of a guest operating system within an instance 962A-R called a virtual machine (which in some cases may be considered a tightly isolated form of software container) that is run by the hypervisor ; in another embodiment, an application is implemented as a unikernel, which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application, and the unikernel can run directly on hardware 940, directly on a hypervisor represented by virtualization layer 954 (in which case the unikemel is sometimes described as running within a LibOS virtual machine), or in a software container represented by one of instances 962A-R). Again, in embodiments where compute virtualization is used, during operation an instance of the CCP software 950 (illustrated as CCP instance 976A) is executed (e.g., within the instance 962A) on the virtualization layer 954. In embodiments where compute virtualization is not used, the CCP instance 976A is executed, as a unikemel or on top of a host operating system, on the “bare metal” general purpose control plane device 904. The instantiation of the CCP instance 976A, as well as the virtualization layer 954 and instances 962A-R if implemented, are collectively referred to as software instance(s) 952.

[00116] In some embodiments, the CCP instance 976A includes a network controller instance 978. The network controller instance 978 includes a centralized reachability and forwarding information module instance 979 (which is a middleware layer providing the context of the network controller 878 to the operating system and communicating with the various NEs), and an CCP application layer 980 (sometimes referred to as an application layer) over the middleware layer (providing the intelligence required for various network operations such as protocols, network situational awareness, and user - interfaces). At a more abstract level, this CCP application layer 980 within the centralized control plane 876 works with virtual network view(s) (logical view(s) of the network) and the middleware layer provides the conversion from the virtual networks to the physical view.

[00117] The centralized control plane 876 transmits relevant messages to the data plane 880 based on CCP application layer 980 calculations and middleware layer mapping for each flow. A flow may be defined as a set of packets whose headers match a given pattern of bits; in this sense, traditional IP forwarding is also flow-based forwarding where the flows are defined by the destination IP address for example; however, in other implementations, the given pattern of bits used for a flow definition may include more fields (e.g., 10 or more) in the packet headers. Different NDs/NEs/VNEs of the data plane 880 may receive different messages, and thus different forwarding information. The data plane 880 processes these messages and programs the appropriate flow information and corresponding actions in the forwarding tables (sometime referred to as flow tables) of the appropriate NE/VNEs, and then the NEs/VNEs map incoming packets to flows represented in the forwarding tables and forward packets based on the matches in the forwarding tables.

[00118] Standards such as OpenFlow define the protocols used for the messages, as well as a model for processing the packets. The model for processing packets includes header parsing, packet classification, and making forwarding decisions. Header parsing describes how to interpret a packet based upon a well-known set of protocols. Some protocol fields are used to build a match structure (or key) that will be used in packet classification (e.g., a first key field could be a source media access control (MAC) address, and a second key field could be a destination MAC address).

[00119] Packet classification involves executing a lookup in memory to classify the packet by determining which entry (also referred to as a forwarding table entry or flow entry) in the forwarding tables best matches the packet based upon the match structure, or key, of the forwarding table entries. It is possible that many flows represented in the forwarding table entries can correspond/match to a packet; in this case the system is typically configured to determine one forwarding table entry from the many according to a defined scheme (e.g., selecting a first forwarding table entry that is matched). Forwarding table entries include both a specific set of match criteria (a set of values or wildcards, or an indication of what portions of a packet should be compared to a particular value/values/wildcards, as defined by the matching capabilities - for specific fields in the packet header, or for some other packet content), and a set of one or more actions for the data plane to take on receiving a matching packet. For example, an action may be to push a header onto the packet, for the packet using a particular port, flood the packet, or simply drop the packet. Thus, a forwarding table entry for IPv4/IPv6 packets with a particular transmission control protocol (TCP) destination port could contain an action specifying that these packets should be dropped.

[00120] Making forwarding decisions and performing actions occurs, based upon the forwarding table entry identified during packet classification, by executing the set of actions identified in the matched forwarding table entry on the packet.

[00121] However, when an unknown packet (for example, a “missed packet” or a “match- miss” as used in OpenFlow parlance) arrives at the data plane 880, the packet (or a subset of the packet header and content) is typically forwarded to the centralized control plane 876. The centralized control plane 876 will then program forwarding table entries into the data plane 880 to accommodate packets belonging to the flow of the unknown packet. Once a specific forwarding table entry has been programmed into the data plane 880 by the centralized control plane 876, the next packet with matching credentials will match that forwarding table entry and take the set of actions associated with that matched entry.

[00122] For example, while the flow diagrams in the figures show a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).

[00123] While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.

Claims

CLAIMS What is claimed is:

1. A method for network packet traffic monitoring in a networking device, the method comprising: receiving (401) a data packet at an ingress port of an ingress pipeline; determining (403) an egress pipeline for the data packet; and adding (411) tracking information to a monitoring cache in the ingress pipeline, in response to determining the data packet is not monitored in the egress pipeline.

2. The method of claim 1, further comprising: determining (413) whether monitoring cache entries that match the egress pipeline are present in the monitoring cache; and adding (415) tracking information from the matching monitoring cache entries as metadata for the data packet.

3. The method of claim 2, further comprising: removing (417) matching monitoring cache entries from the monitoring cache.

4. The method of claim 3, further comprising: forwarding (419) the data packet with the metadata to the egress pipeline.

5. The method of claim 4, further comprising: determining (503), by the egress pipeline, whether the metadata is present in the data packet.

6. The method of claim 5, further comprising: updating (505) counters in the monitoring statistics data structure based on the metadata in the data packet.

7. The method of claim 6, further comprising: removing (507) the metadata from the data packet; and forwarding (509) the data packet on an egress port of the egress pipeline.

8. The method of claim 1, further comprising: recirculating or multicasting a monitoring cache entry.

34

9. A machine-readable medium comprising program code which when executed by an electronic device carries out the method steps of any of claims 1-8.

10. An electronic device comprising: a non-transitory machine-readable medium having stored therein a traffic monitor; and a set of processing devices coupled to the non-transitory machine-readable medium, the set of processing devices to execute the traffic monitor, the traffic monitor to perform the operations of claims 1-8.

35