CN111201757A - Network access node virtual structure dynamically configured on underlying network - Google Patents
Network access node virtual structure dynamically configured on underlying network Download PDFInfo
- Publication number
- CN111201757A CN111201757A CN201880063473.2A CN201880063473A CN111201757A CN 111201757 A CN111201757 A CN 111201757A CN 201880063473 A CN201880063473 A CN 201880063473A CN 111201757 A CN111201757 A CN 111201757A
- Authority
- CN
- China
- Prior art keywords
- packet
- access node
- fcp
- access nodes
- packets
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4633—Interconnection of networks using encapsulation techniques, e.g. tunneling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/26—Special purpose or proprietary protocols or architectures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/16—Multipoint routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/42—Centralised routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/64—Routing or path finding of packets in data switching networks using an overlay routing layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/15—Flow control; Congestion control in relation to multipoint traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/18—End to end
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/50—Queue scheduling
- H04L47/52—Queue scheduling by attributing bandwidth to queues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/25—Routing or path finding in a switch fabric
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/30—Definitions, standards or architectural aspects of layered protocol stacks
- H04L69/32—Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
- H04L69/322—Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
- H04L69/324—Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the data link layer [OSI layer 2], e.g. HDLC
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A network access node virtual fabric dynamically configured on an underlying network is described. A centralized controller, such as a Software Defined Network (SDN) controller, of the packet-switched network is configured to establish one or more virtual fabrics as overlay networks over a physical underlay network of the packet-switched network. For example, the SDN controller may define multiple sets of two or more access nodes connected to the packet-switched network, and the access nodes of a given one of the sets may dynamically set the tunnel to a virtual fabric on the packet-switched network using a new data transfer protocol, generally referred to herein as Fabric Control Protocol (FCP). The FCP tunnel may include all or a subset of the parallel data paths through the packet-switched network between access nodes for a given virtual fabric.
Description
This application claims the benefit of U.S. provisional application No. 62/566,060 filed on 29.9.2017 and U.S. provisional application No. 62/638,788 filed on 5.3.2018, each of which is incorporated herein by reference in its entirety.
Technical Field
The present invention relates to computer networks, and more particularly to data center networks.
Background
In a typical cloud-based data center, a large number of interconnected servers provide computing and/or storage capacity for executing various applications. For example, a data center may include facilities that host applications and services for subscribers, i.e., customers of the data center. For example, a data center may host all infrastructure equipment, such as computing nodes, networking and storage systems, power systems, and environmental control systems.
In most data centers, clusters of storage systems and application servers are interconnected via a high-speed switching fabric provided by one or more layers of physical network switches and routers. Data centers vary widely in size, with some common data centers containing thousands of servers and often distributed across multiple regions for redundancy. A typical data center switching fabric includes multiple layers of interconnected switches and routers. In current implementations, packets of a given packet flow (flow) between an origin server and a destination server or storage system are always forwarded from the origin to the destination along a single path through routers and switches comprising a switching fabric.
Disclosure of Invention
In general, this disclosure describes a network access node virtual fabric that is dynamically configured on an underlying network. In accordance with the disclosed technology, a centralized controller of a packet-switched network, such as a software-defined network (SDN) controller, is configured to establish one or more virtual fabrics as overlay networks over a physical underlay network of the packet-switched network. For example, the SDN controller may define multiple sets of two or more access nodes connected to the packet-switched network, and the access nodes of a given one of the sets may use a new data transfer protocol, generally referred to herein as the Fabric Control Protocol (FCP), to dynamically set the tunnel to a virtual fabric on the packet-switched network. The FCP tunnel may utilize all or a subset of the path through the packet-switched network between access nodes for a given virtual fabric.
After the FCP tunnel is set to one or more virtual fabrics on the packet-switched network, the FCP can also cause any access node for a given virtual fabric to pass packet data for a given packet flow (e.g., a packet with the same tuple or five tuples of packet headers before tunneling) to any other access node for the same virtual fabric over the packet-switched network using any parallel data path. As further described herein, example implementations of FCP enable individual packets for a packet stream to be injected across some or all of a plurality of parallel data paths through a packet-switched network and the packets are reordered for delivery to a destination.
Example implementations of a fabric control protocol for use within a data center or other computing environment are described. As one example, fabric control protocols may provide certain advantages in an environment where a switch fabric provides full mesh interconnectivity such that any server may communicate packet data of a given packet flow to any other server using any one of a plurality of parallel data paths within a data center switch fabric. As further described herein, example implementations of fabric control protocols enable individual packets to be injected for a given packet flow across some or all of multiple parallel data paths in a data center switch fabric, and optionally, reordered for delivery to a destination. In some examples, the fabric control protocol packet structure is carried over an underlying protocol, such as User Datagram Protocol (UDP).
The techniques described herein may provide certain advantages. For example, fabric control protocols may provide end-to-end bandwidth scaling and flow fairness within a single tunnel based on requests and grants for end-point control of flows. In addition, the fabric control protocol may delay packet fragmentation of the flow until a grant is received, provide fault tolerant and hardware-based adaptive rate control of requests and grants, provide adaptive request window scaling, encrypt and authenticate requests and grants, and improve Explicit Congestion Notification (ECN) marking support.
In some examples, the fabric control protocol includes an end-to-end admission control mechanism in which the sender explicitly requests the receiver to transfer a certain number of payload data bytes. In response, the receiver issues grants based on its buffer resources, quality of service (QoS), and/or fabric congestion metrics. For example, fabric control protocols include a permission control mechanism by which a source node requests permission before transmitting packets on the fabric to a destination node. For example, the source node sends a request message to the destination node requesting a certain number of bytes to be transferred, and the destination node sends a grant message to the source node after reserving the egress bandwidth. In addition, fabric control protocols enable packets of a single packet stream to be injected across all available paths between a source node and a destination node, rather than flow-based switching and equal-cost multi-path (ECMP) forwarding for sending all packets of a Transmission Control Protocol (TCP) stream on the same path, in order to avoid packet reordering. The source node assigns a packet sequence number to each packet of a flow, and the destination node can use the packet sequence number to order incoming packets of the same flow in order.
In one example, the present disclosure relates to a network system comprising: a plurality of servers; a packet-switched network comprising a centralized controller; and a plurality of access nodes, each of the access nodes coupled to a subset of the servers and to the packet-switched network. The centralized controller is configured to establish one or more virtual fabrics, wherein each of the virtual fabrics comprises two or more of the access nodes. When communicating a packet flow of packets between a source server and a destination server coupled to an access node for one of the virtual fabrics, a first access node of the access nodes coupled to the source server is configured to: the packets of the packet flow are injected across a plurality of parallel data paths through the packet-switched network to a second one of the access nodes coupled to the destination server, and the second one of the access nodes is configured to deliver the packets to the destination server.
In another example, the present disclosure is directed to a method comprising: interconnecting a plurality of servers by a packet-switched network and a plurality of access nodes, each of the access nodes being coupled to a subset of the servers and to the packet-switched network; establishing, by a centralized controller of a packet-switched network, one or more virtual fabrics, wherein each of the virtual fabrics comprises two or more of the access nodes; and communicating a packet flow of packets between a source server and a destination server coupled to an access node for one of the virtual fabrics, comprising: the method includes injecting, by a first one of the access nodes coupled to the source server, packets of the packet flow across a plurality of parallel data paths through the packet-switched network to a second one of the access nodes coupled to the destination server, and delivering, by the second one of the access nodes, the packets to the destination server.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a block diagram illustrating an example network with a data center in which examples of the techniques described herein may be implemented.
Fig. 2 is a block diagram illustrating in further detail the logical interconnectivity provided by the access nodes and the switching fabric within the data center.
Fig. 3 is a block diagram illustrating one example of a Network Storage Computing Unit (NSCU)40 that includes a group of access nodes and their supporting servers.
Fig. 4 is a block diagram illustrating an example logical chassis arrangement including two NSCUs from fig. 3.
Fig. 5 is a block diagram illustrating an example of full mesh connectivity between two access node groups within a logical chassis.
Fig. 6 is a block diagram illustrating an example arrangement of a complete physical chassis including two logical chassis from fig. 4.
Fig. 7A is a block diagram illustrating a logical view of the networking data paths and operations within an access node.
Fig. 7B is a block diagram illustrating an example first level network fanout implemented between a set of access nodes within a logical chassis.
Fig. 8 is a block diagram illustrating example multi-level network fanout across a data center switching fabric between access nodes.
Fig. 9 is a block diagram illustrating an example access node including a networking element and two or more processing cores.
Fig. 10 is a block diagram illustrating an example networking element of an access node.
Fig. 11 is a conceptual diagram illustrating an example network architecture between a source access node and a destination access node.
Fig. 12 is a conceptual diagram illustrating an example fabric control protocol queue pair fabric between a source access node and a destination access node.
Fig. 13 is a conceptual diagram illustrating an example of fabric control protocol queue status at a source access node and a destination access node.
Fig. 14 is a conceptual diagram illustrating an example fabric control protocol operation for transmitting an input packet stream (stream) from a source access node to a destination access node.
Fig. 15 is a conceptual diagram illustrating an example fabric control protocol source access node operational flow.
Fig. 16 is a conceptual diagram illustrating an example fabric control protocol destination access node operational flow.
Fig. 17A and 17B are conceptual diagrams illustrating an example of flow fairness implemented at a destination access node using a fabric control protocol grant scheduler.
Fig. 18 is a conceptual diagram illustrating an example format of a fabric control protocol control packet for a request message or an authorization message.
Fig. 19 is a conceptual diagram illustrating an example format of a fabric control protocol data packet.
Fig. 20 is a block diagram illustrating an example system with a packet-switched network having multiple network access node virtual fabrics dynamically configured on the packet-switched network in accordance with the techniques described herein.
Fig. 21 is a flow chart illustrating an example of operation of a network system in accordance with the techniques described herein.
Fig. 22 is a flow chart illustrating another example of operation of a network system in accordance with the techniques described herein.
Detailed Description
Today's large data center networks may connect over 100,000 two-socket (two-socket) servers and are typically designed to operate at approximately 25% of binary throughput. Thus, as capacity demands increase, most data centers need to provide greater bisection bandwidth. Data centers must also support an ever-increasing variety of applications, from big data analytics to financial services. They must also be agile and allow applications to be deployed to any server to improve efficiency and cost-effectiveness.
Data centers utilize various flow scheduling techniques in an attempt to balance the utilization of the underlying interconnect fabric of the network. For example, traffic flow between endpoints (servers) has traditionally relied on ECMP (equal cost multi-path) based load balancing. However, since ECMP randomly hashes the packet flow to the network path, poor load balancing often results. The structure of a data center is often severely unbalanced due to hash collisions and a small amount of large traffic. ECMP exchange coupling with small flows (flowlets) can select a new path each time a small flow exchange is performed, thereby improving load balancing to some extent. However, ECMP uses local decisions to divide traffic between equal cost paths without any feedback on any possible congestion or link failure in the downstream of any selected path. As a result, even though the network may have built-in redundancy, failures may significantly reduce the effective throughput.
Another flow scheduling technique (referred to as Hedera) attempts to provide dynamic flow scheduling for data center networks. Hedera collects flow information from the constituent switches, computes collision-free paths for the flows, and instructs the switches to reroute traffic accordingly. With a high level of knowledge of routing and traffic requirements, Hedera attempts to enable the dispatch system to see bottlenecks that the load switching elements cannot overcome. However, Hedera is too slow for today's data center traffic volatility because it requires monitoring the flows for a period of time and their estimated ideal needs before making reallocation decisions.
MPTCP (multipath transmission control protocol) is another example flow scheduling technique. MPTCP splits a large TCP flow into multiple TCP flows and the payload is striped among the MPTCP flows so that each MPTCP flow is small enough that ECMP bottlenecks do not occur due to hash collisions. However, MPTCP does require changes to the final host network stack, which are typically not under the control of the network operator. Even if the network operator does have control over the network stack, some high bandwidth low latency applications (such as storage traffic) may bypass the kernel and enable their own transport. MPTCP further adds to the complexity of the already complex transport layers that suffer from the low latency and burst absorption requirements of today's data centers.
As another example, CONGA (distributed congestion-aware load balancing for data centers) segments TCP flows into smaller flows, estimates real-time congestion in fabric paths, and assigns smaller flows to paths based on feedback from remote switches. The feedback from the remote switch enables the CONGA to seamlessly handle the asymmetry without any TCP modifications. However, the CONGA must be implemented in a custom ASIC as part of the new network architecture in order to be able to react to congestion in microseconds.
Some of the problems evident in today's data centers are summarized as follows:
despite built-in redundancy, the structure is under-used due to load imbalance.
The fabric is unable to react to traffic pattern changes, and component/link failures result in further efficiency degradation
TCP congestion avoidance uses AIMD (additive increase/multiplicative decrease) mechanisms with multiple congestion management algorithms that cause traffic to fluctuate whenever the network encounters congestion.
Lack of admission control at the end-hosts, and therefore a slow TCP start between hosts is required to prevent over-subscription of the end-and network resources, but at the cost of latency.
Complex algorithms such as ECN (explicit congestion notification) react to local traffic congestion seen by local switch elements and depending on the traffic profile (profile) and network topology, result in unfairness between TCP headers at the cost of congestion control.
The present disclosure describes a new data transmission protocol, referred to herein as the Fabric Control Protocol (FCP), designed to address some of the problems in today's data centers. In various example implementations, FCP may significantly improve network throughput, such as 90% or higher. The proposed protocols and techniques described herein have many example differences from existing protocols, which in turn are described below. The following examples may be used in any combination and subcombination to provide various implementations of the techniques described herein. Also, FCP may be used in place of or in conjunction with other transport protocols.
As a first example, an FCP may provide fabric admission control, as described herein. The source node maintains a queue for each destination node and traffic class. Before transmitting a packet over a fabric, a source node requests permission by sending a request message to a destination node to request a certain number of bytes to be transmitted. After reserving the egress bandwidth, the destination node sends an authorization message to the source. The source node then transmits the packet until it sends the authorized number of bytes to the destination (stopping at the packet boundary).
Second, FCP enables the injection of packets of the same packet flow to all available paths between the source node and the destination node, if needed. For example, a data center network has many paths from a source node to a destination node through a typical leaf/backbone topology. Traditionally, to maintain the packet order of a TCP flow, a switch element determines the path of the flow through a 5-tuple hash and ECMP forwarding algorithm. All packets of the flow (based on hash buckets) travel on the same path to avoid packet reordering. Paths connecting multiple layers of switches in a network use low bandwidth links. The low bandwidth link limits the maximum bandwidth carried by the TCP flow. FCP allows packets to be injected to all available links between a source node and a destination node, thereby limiting the size of TCP flows. The source node assigns (assign) a packet sequence number to each packet. The destination node may use the packet sequence number to order the incoming packets before delivering them to higher layers (such as TCP).
Third, example implementations of FCP may be used to provide resiliency against request/grant packet loss and out-of-order delivery. The request and grant messages do not need to be reordered by the end nodes and do not carry packet sequence numbers. The request/grant message uses a sliding window based marker to convey size information, thereby enabling the underlying transport of the request/grant message to be resistant to loss/drop or out-of-order delivery. As described above, data packets carrying payloads are explicitly reordered by the destination node using packet sequence numbers. Packet losses are handled through reordering timeouts and losses are recovered through higher levels, such as TCP over retransmissions.
Fourth, FCP supports adaptive and low-latency fabric implementation. The source/destination nodes use adaptive bandwidth control techniques through outgoing request and grant messages that react to long-term fabric congestion caused by fabric failures. By adaptively controlling the request and grant rates, the amount of data entering/leaving the structure is controlled. By making the destination node's throughput slightly lower than the maximum supported throughput via grant rate limiting, FCP maintains congestion-free fabric operation, thereby achieving predictable latency for packets traversing through the fabric.
Fifth, in some examples, FCP provides failure recovery that can accommodate network switch/link failures to minimize impact. The FCP will exploit any fabric failures that the hardware detects within the Round Trip Time (RTT) to minimize packet loss.
Sixth, in some examples, FCP has reduced or minimized protocol overhead costs. FCP involves an explicit request/grant message exchange for each segment of the payload to be transferred between nodes. To facilitate protocol operation, the payload packet is encapsulated by a UDP + FCP header. FCP provides the various advantages listed herein at the expense of latency and a certain amount of bandwidth. Latency impact is minimized to small flows via unsolicited packet transmission without explicit request grant handshake (handbreak).
Seventh, in some examples, FCP provides support for unsolicited packet transfers. FCP allows limited fabric bandwidth to be used for sending unsolicited packets from sender to receiver (no explicit request-grant handshake). At the receiver, a small number of credits may be configured to allow a small amount of bandwidth for unsolicited transmissions. Only unsolicited traffic is allowed from very shallow queues (based on thresholds). The request/grant rate limiter adjusts for unsolicited traffic and non-FCP traffic so as not to cause continuous fabric congestion.
Eighth, in some examples, FCP provides support for node coexistence with/without FCP support. FCP allows non-FCP capable nodes (non-FCP) to co-exist in the same network as FCP capable nodes. non-FCP nodes may use ECMP or any other mode of packet transport and load balancing.
Ninth, in some examples, FCP provides flow-aware fair bandwidth allocation. Traffic is managed by a flow aware admission control scheduler at the destination node. The request/grant mechanism uses a "pull" model (via grant) and ensures that flow-aware fair bandwidth allocation is achieved between the headland flows.
Tenth, in some examples, FCP provides transmit buffer management by adaptively requesting window scaling. The destination node provides a scale factor based on a global view of the active topping flow. The source node adjusts the outstanding request window based on the scaling factor, thereby limiting the total transmit buffer used by each FCP queue based on its drain rate. Thus, the transmit buffers are effectively used for various size streams based on their respective drain rates.
Eleventh, in some examples, the FCP enables grant management based on receive buffer occupancy. The FCP controls the generation of grants by an explicit grant pacing algorithm. Grant generation reacts to receive buffer occupancy, the number of granted blocks in the structure, and the number of blocks in the reorder buffer.
Twelfth, in some examples, FCP supports improved end-to-end QoS. FCP provides improved end-to-end QoS through an authorized scheduler at the destination. The destination looks at incoming requests from multiple sources grouped based on priority and schedules grants based on expected QoS behavior between priority groups. Assuming that FCP enables low-latency fabric operation due to admission control, QoS-aware grant scheduling will remove any dependency of QoS behavior from the infrastructure.
Thirteenth, in some examples, FCP supports security through encryption and end-to-end authentication. FCP supports end-to-end privacy through encryption and also supports authentication of FCP packets, thereby protecting all FCP-specific protocol handshakes.
Fourteenth, in some examples, FCP enables improved ECN marking support. The FCP grant scheduler provides a unique view of the total load based on the total number of all pending requests seen at the grant scheduler. ECN marking based on global load seen by the destination endpoint provides a significant improvement over ECN marking based on local congestion seen by individual switches/paths of the fabric. Since data center TCP implementations rely on the widespread use of ECNs to manage congestion, ECN marking based on a global view of the output egress queues at the grant scheduler has a significant improvement compared to discontinuous and local views of some paths through the fabric and provides better congestion management at the TCP level.
Fig. 1 is a block diagram illustrating an example system 8 having a data center 10 that may implement examples of the techniques described herein. In general, the data center 10 provides an operating environment for applications and services for customers 11 coupled to the data center through the content/service provider network 7 and the gateway device 20. In other examples, the content/service provider network 7 may be a data center wide area network (DC WAN), a private network, or other type of network. The data center 10 may, for example, host infrastructure equipment such as computing nodes, networking and storage systems, redundant power supplies, and environmental controls. The content/service provider network 7 may be coupled to one or more networks managed by other providers and may therefore form part of a large-scale public network infrastructure, such as the internet.
In some examples, data center 10 may represent one of many geographically distributed network data centers. In the example of fig. 1, data center 10 is a facility that provides information services to customers 11. The clients 11 may be collective entities such as businesses and governments, or individuals. For example, a network data center may host web services for multiple enterprises and end users. Other exemplary services may include data storage, virtual private networks, file serving services, data mining services, scientific computing or supercomputing services, and the like.
In this example, the data center 10 includes a collection of storage systems and application servers 12 interconnected via a high-speed switching fabric 14. In some examples, the servers 12 are arranged into a plurality of different server groups, each server group including any number of servers, up to, for example, n servers 121–12n. The server 12 provides the computing and storage facilities for applications and data associated with the client 11, and may be a physical (bare metal) server, a virtual machine running on a physical server, a virtualized container running on a physical server, or a combination thereof.
In the example of fig. 1, a Software Defined Network (SDN) controller 21 provides a high-level controller for configuring and managing the routing and switching infrastructure of data center 10. SDN controller 21 provides a logically and, in some cases, physically centralized controller for facilitating operation of one or more virtual networks within data center 10 in accordance with one or more embodiments of the present disclosure. In some examples, SDN controller 21 may operate in response to configuration input received from a network administrator.
In some examples, in accordance with the techniques described herein, SDN controller 21 operates to configure access node 17 to logically establish one or more virtual fabrics as an overlay network that is dynamically configured over a physical underlay network provided by switch fabric 14. The operation of the virtual fabric and the access node to establish the virtual fabric is described below with respect to fig. 20.
For example, although not shown, the data center 10 may also include one or more non-edge switches, routers, hubs, gateways, security devices such as firewalls, intrusion detection and/or intrusion prevention devices, servers, computer terminals, laptops, printers, databases, wireless mobile devices such as cellular phones or personal digital assistants, wireless access points, bridges, cable modems, application accelerators, or other network devices.
In the example of fig. 1, each server 12 is coupled to the switch fabric 14 through an access node 17. As further described herein, in one example, each access node 17 is a highly programmable I/O processor specifically designed to offload certain functions from the server 12. In one example, each access node 17 includes one or more processing cores, consisting of multiple clusters of internal processors (e.g., MIPS cores), equipped with hardware engines to offload encryption functions, compression and regular expression (RegEx) processing, data storage functions, and networking operations. In this manner, each access node 17 includes components for fully implementing and processing the network and storage stack on behalf of one or more servers 12. In addition, the access nodes 18 may be programmatically configured to act as security gateways for their respective servers 12, freeing up the servers' processors to dedicate resources to application workloads. In some example implementations, each access node 17 may be considered a network interface subsystem that enables full offloading of processing of data packets (with zero copies in server memory) and storage acceleration of the attached server system. In one example, each access node 17 may be implemented as one or more Application Specific Integrated Circuits (ASICs) or other hardware and software components, each supporting a subset of servers.
The access node 17 may also be referred to as a Data Processing Unit (DPU) or a device comprising a DPU. In other words, the term access node may be used interchangeably herein with the term DPU. Additional example details of various example DPUs are described in U.S. patent application No. 16/031,921 (attorney docket No. 1242-and US01) entitled "Data Processing Unit for computer Nodes and Storage Nodes" filed on 7/10 2018 and U.S. patent application No. 16/031,945 (attorney docket No. 1242-and US 048US01) filed on 7/10 2018 and entitled "Data Processing Unit for Stream Processing", each of which is incorporated herein by reference in its entirety.
In an example implementation, the access node 17 may be configured to operate in a standalone network device having one or more access nodes. For example, the access nodes 17 may be arranged into a plurality of different access node groups 19, each access node group comprising up to, for example, x access nodes 171–17xAny number of access nodes. As such, the plurality of access nodes 17 may be grouped into groups (e.g., within a single electronic device or network device), referred to herein as access node groups 19, for providing services to a set of servers supported by a set of access nodes located within the device. In one example, the access node group 19 may include four access nodes 17, each supporting four servers, so as to support a set of sixteen servers.
In the example of fig. 1, each access node 17 provides connectivity to the switch fabric 14 of a different set of servers 12 and may be assigned a respective IP address and provide routing operations for the servers 12 coupled thereto. As described herein, the access nodes 17 provide routing and/or switching functions for communications from/to the respective servers 12. For example, as shown in fig. 1, each access node 17 includes a set of edge-facing electrical or optical local bus interfaces for communicating with a respective group of servers 12, and one or more core-facing electrical or optical interfaces for communicating with core switches within the switching fabric 14. In addition, the access nodes 17 described herein may provide additional services such as storage (e.g., integration of solid state storage devices), security (e.g., encryption), acceleration (e.g., compression), I/O offload, and the like. In some examples, one or more access nodes 17 may include a storage device, such as a high speed solid state drive or a rotating hard drive, configured to provide network-accessible storage for use by applications executing on a server. Although not shown in fig. 1, the access nodes 17 may be directly coupled to each other, such as between access nodes in a common access node group 19, to provide direct interconnectivity between access nodes of the same group. For example, multiple access nodes 17 (e.g., 4 access nodes) may be located within a common access node group 19 to serve a group of servers (e.g., 16 servers).
As one example, each access node group 19 of the plurality of access nodes 17 may be configured as a standalone network device and may be implemented as two rack unit (2RU) devices occupying two rack units (e.g., slots) of an equipment rack. In another example, the access node 17 may be integrated within a server, such as a single 1RU server, with four CPUs coupled to the forwarding ASIC described herein on a motherboard disposed within a common computing device. In yet another example, one or more of access node 17 and server 12 may be integrated in a suitably sized (e.g., 10RU) framework, which in such an example may become a Network Storage Computing Unit (NSCU) for data center 10. For example, the access node 17 may be integrated within the motherboard of the server 12, or otherwise located in a single chassis with the server.
In accordance with the techniques herein, an example implementation is described in which access nodes 17 interface and utilize switch fabric 14 to provide full mesh (any to any) interconnectivity such that any one of servers 12 may use any one of a plurality of parallel data paths within data center 10 to communicate packet data of a given packet flow to any other one of the servers. Example network architectures and techniques are described in which, in an example implementation, access nodes inject individual packets for packet flows between access nodes across some or all of multiple parallel data paths in a data center switching fabric 14, and optionally reorder the packets for delivery to a destination, in order to provide full mesh connectivity.
As described herein, the techniques of the present invention introduce a new data transfer protocol, referred to as the Fabric Control Protocol (FCP), which may be used by the different operating networking components of any access node 17 to facilitate data communication across the switch fabric 14. As further described, FCP is an end-to-end admission control protocol in which, in one example, a sender explicitly requests a receiver to transmit a certain number of payload data bytes. In response, the receiver issues grants based on its buffer resources, QoS, and/or fabric congestion metrics. In general, FCP enables packet injection of flows to all paths between source and destination nodes, and may provide any of the advantages and techniques described herein, including resiliency to request/grant packet loss, adaptive and low latency fabric implementation, fault recovery, reduced or minimal protocol overhead costs, support for unsolicited packet delivery, support for node coexistence with/without FCP support, flow-aware fair bandwidth allocation, transport buffer management through adaptive request window scaling, grant management based on receive buffer occupancy, improved end-to-end QoS, security through encryption and end-to-end authentication, and/or improved ECN marking support.
These techniques may provide certain advantages. For example, these techniques may significantly improve the bandwidth utilization of the underlying switch fabric 14. Moreover, in example implementations, these techniques may provide full grid interconnectivity between servers of a data center and yet may be non-blocking and non-dropping. More specifically, based on the end-to-end admission control mechanism of FCP and based on packets being injected in proportion to available bandwidth, the switch fabric 14 may include an efficient non-drop fabric without using link-level flow control.
Although access node 17 is depicted in fig. 1 with respect to switching fabric 14 of data center 10, in other examples, the access node may provide full mesh interconnectivity over any packet-switched network. For example, the packet-switched network may include a Local Area Network (LAN), a Wide Area Network (WAN), or a collection of one or more networks. The packet switched network may have any topology, e.g. flat or multi-layered, as long as there is full connectivity between the access nodes. The packet switched network may use any technology, including IP over ethernet, among others. Regardless of the type of packet-switched network, in accordance with the techniques described in this disclosure, access nodes may eject individual packets of a packet flow between access nodes and across multiple parallel data paths in the packet-switched network, and optionally reorder the packets for delivery to a destination, in order to provide full mesh connectivity.
Fig. 2 is a block diagram illustrating in further detail the logical interconnectivity provided by the access node 17 and the switching fabric 14 within the data center. As shown in this example, access node 17 and switch fabric 14 may be configured to provide full mesh interconnectivity such that access node 17 may communicate packet data of any one of servers 12 to any other one of servers 12 using any one of a plurality M of parallel data paths to any one of core switches 22A-22M (collectively, "core switches 22"). Also, in accordance with the techniques described herein, the access nodes 17 and the switch fabric 14 may be configured and arranged in a manner such that the M parallel data paths in the switch fabric 14 provide reduced L2/L3 hops and full mesh interconnections (e.g., bipartite graph) between the servers 12, even in large data centers having thousands of servers. It is noted that in this example, the switches 22 are not connected to each other, which makes it more likely that any failures of one or more switches will be independent of each other. In other examples, the switch fabric itself may be implemented using multiple layers of interconnected switches, as in a CLOS network.
In some example implementations, therefore, each access node 17 may have multiple parallel data paths to reach any given other access node 17, as well as servers 12 reachable through those access nodes. In some examples, rather than sending all packets of a given flow along a single path in the switch fabric, the switch fabric 14 may be configured such that for any given packet flow between servers 12, the access node 17 may inject packets of the packet flow across all or a subset of the M parallel data paths of the switch fabric 14 through which a given destination access node 17 of a destination server 12 may be reached.
In accordance with the disclosed technique, the access node 17 may inject packets of each packet flow end-to-end across the M paths, thereby forming a virtual tunnel between the source access node and the destination access node. In this manner, the number of layers included in the switch fabric 14 or the number of hops along the M parallel data paths may be insignificant for implementing the packet injection techniques described in this disclosure.
However, the technique of injecting packets of the respective packet streams across all or a subset of the M parallel data paths of the switch fabric 14 reduces the number of layers of network devices within the switch fabric 14 to, for example, an absolute minimum, i.e., 1. Further, it makes it possible to construct a fabric in which switches are not connected to each other, reducing the possibility of failure-related connections between two switches, thereby improving the reliability of the switching fabric. Flattening the switch fabric 14 may reduce cost by eliminating layers of network devices that require power and reduce latency by eliminating layers of network devices that perform packet switching. In one example, the flat topology of the switching fabric 14 may result in a core layer that includes only one layer of backbone switches, such as core switches 22, that may not communicate directly with each other but form a single hop along the M parallel data paths. In this example, any access node 17 that supplies traffic into the switching fabric 14 may reach any other access node 17 through a single one-hop L3 lookup by one of the core switches 22.
The access node 17 serving the source server 12 with the packet flow may use any technique for ejecting packets across the available parallel data paths, such as available bandwidth, random, round robin, hash-based, or other mechanisms, which may be designed to maximize bandwidth utilization or otherwise avoid congestion, for example. In some example implementations, flow-based load balancing need not necessarily be utilized, and more efficient bandwidth utilization may be utilized by allowing packets of a given packet flow (five tuples) served by a server 12 to traverse different paths of the switch fabric 14 between access nodes 17 coupled to source and destination servers. In some examples, the respective destination access node 17 associated with the destination server 12 may be configured to reorder the variable length IP packets of the packet stream into the original sequence in which they were sent, and to deliver the reordered packets to the destination server.
In other examples, the respective destination access node 17 associated with the destination server 12 may not reorder the packets of the packet flow prior to delivering the packets to the destination server. In these examples, destination access node 17 may instead deliver the packets to the destination server in the order in which they arrived at destination access node 17. For example, packets that include storage access requests or responses to a destination storage device may not need to be reordered to send their original sequence. Instead, such storage access requests and responses may be delivered to the destination storage device in the order in which they arrived.
In some example implementations, each access node 17 implements at least four different operational networking components or functions: (1) a source component operable to receive traffic from the server 12, (2) a source switching component operable to switch source traffic to other source switching components or core switches 22 of different access nodes 17 (possibly different groups of access nodes), (3) a destination switching component operable to switch inbound traffic received from other source switching components or core switches 22, and (4) a destination component operable to reorder the packet flow and provide the packet flow to the destination server 12.
In this example, server 12 is connected to a source component of access node 17 to inject traffic into switch fabric 14, and server 12 is similarly coupled to a destination component within access node 17 to receive traffic therefrom. Due to the full mesh parallel data paths provided by the switch fabric 14, each source switch component and destination switch component in a given access node 17 need not perform an L2/L3 switch. Instead, access node 17 may inject packets of a packet flow, e.g., based on available bandwidth, randomly, cyclically, applying an injection algorithm based on quality of service (QoS/scheduling or otherwise) to efficiently forward the packets without packet analysis and lookup operations.
The destination switching component of access node 17 may provide the limited look-up necessary only to select the appropriate output port to forward the packet to local server 12. Thus, with respect to the complete routing table of the data center, only the core switches 22 may need to perform a complete lookup operation. Thus, the switching fabric 14 provides a highly scalable, flat, high-speed interconnect, where in some embodiments the server 12 is actually one L2/L3 hop from any other server 12 within the data center.
Flow-based routing and switching over equal-cost multi-path (ECMP) paths through a network can be affected by highly variable load-dependent delays. For example, a network may include many small bandwidth flows and some large bandwidth flows. In the case of routing and switching ECMP paths, the source access node may select the same path for two larger bandwidth flows, resulting in a larger delay on that path. To avoid this problem and keep latency on the network low, for example, administrators may be forced to keep network utilization below 25-30%. The techniques described in this disclosure for configuring access node 17 to inject packets of various packet flows on all available paths enable higher network utilization, e.g., 85-90%, while maintaining bounded or limited latency. The packet injection technique enables the source access node 17 to fairly allocate packets for a given flow over all available paths while accounting for link failures. In this way, the load can be fairly distributed across the available paths of the entire network, regardless of the bandwidth size of a given flow, to avoid over-utilizing a particular path. The disclosed techniques enable the same number of network devices to deliver three times the amount of data traffic over the network while maintaining low latency characteristics and reducing the number of layers of energy consuming network devices.
As shown in the example of fig. 2, in some example implementations, the access nodes 17 may be arranged into a plurality of different access node groups 191-19Y(ANG in FIG. 2), each access node group comprising up to, for example, x access nodes 171–17xAny number of access nodes 17. As such, multiple access nodes 17 may be grouped and arranged (e.g., within a single electronic device or network device), referred to herein as an Access Node Group (ANG)19, to provide services to a group of servers supported by a set of access nodes located within the device.
As described, each access node group 19 may be configured as a standalone network device and may be implemented as a device configured to be installed within a computing, storage, or converged rack. In general, each access node group 19 may be configured to act as a high performance I/O hub designed to aggregate and process network and/or storage I/O for multiple servers 12. As described above, the set of access nodes 17 within each access node group 19 provides highly programmable, dedicated I/O processing circuitry to handle networking and communication operations on behalf of the server 12. Additionally, in some examples, each access node group 19 may include a storage device 27, such as a high-speed solid state drive, configured to provide network accessible storage for use by applications executing on the server. Each access node group 19, including its set of access nodes 17, storage devices 27, and the set of servers 12 supported by the access nodes 17 of that access node group, may be referred to herein as a Network Storage Computing Unit (NSCU) 40.
Fig. 3 is a block diagram illustrating one example of a Network Storage Computing Unit (NSCU)40 including a group of access nodes 19 and their supporting servers 52. The access node group 19 may be configured to act as a high performance I/O hub designed to aggregate and process network and storage I/O to multiple servers 52. In the particular example of fig. 3, the access node group 19 includes four access nodes 17 connected to a pool of local solid state storage 411-174(collectively referred to as "access nodes 17"). In the illustrated example, the access node group 19 supports a total of sixteen server nodes 121-1216(collectively, "server nodes 12"), wherein each of the four access nodes 17 within the access node group 19 supports four server nodes 12. In some examples, each of the four server nodes 12 supported by each access node 17 may be arranged as a server 52. In some examples, a "server 12" described throughout this application may be a dual-socket or dual-processor "server node" that is arranged in groups of two or more within independent server devices, e.g., server 52.
Although the access node group 19 is illustrated in fig. 3 as including four access nodes 17 all connected to a single pool of solid state storage 41, the access node group may be arranged in other ways. In one example, each of the four access nodes 17 may be included on a separate access node base (slid) that also includes solid state storage and/or other types of storage for the access node. In this example, the access node group may include four access node chassis, each having a set of local storage devices and an access node.
In one example implementation, the access nodes 17 within the access node group 19 are connected to a server 52 and solid state storage 41 using peripheral component interconnect express (PCIe) links 48, 50 and to other access nodes and the data center switch fabric 14 using ethernet links 42, 44, 46. For example, each access node 17 may support six high-speed Ethernet connections, including two externally available Ethernet connections 42 for communicating with the switch fabric, one externally available Ethernet connection 44 for communicating with other access nodes in other access node groups, and three internal Ethernet connections 46 for communicating with other access nodes 17 in the same access node group 19. In one example, each externally available connection 42 may be a 100 Gigabit Ethernet (GE) connection. In this example, the access node group 19 has an 8x100 GE externally available port to connect to the switch fabric 14.
Within the access node group 19, the connections 42 may be copper wires, i.e., electrical links, disposed between each access node 17 and the optical ports of the access node group 19 as 8x25 GE links. Between the access node group 19 and the switching fabric, the connection 42 may be an optical ethernet connection coupled to an optical port of the access node group 19. The optical ethernet connection may be connected to one or more optical devices within the switching fabric, such as an optical permutation device described in more detail below. Optical ethernet connections can support more bandwidth than electrical connections without increasing the number of cables in the switch fabric. For example, each fiber optic cable coupled to access node group 19 may carry 4x100 GE fibers, each carrying optical signals at four different wavelengths or λ. In other examples, the externally available connection 42 may remain as an ethernet power connection to the switching fabric.
The remaining four ethernet connections supported by each access node 17 include: an ethernet connection 44 for communicating with other access nodes in other access node groups; and three ethernet connections 46 for communication with the other three access nodes within the same access node group 19. In some examples, connection 44 may be referred to as an "access node inter-group link" and connection 46 may be referred to as an "access node intra-group link".
The ethernet connections 44, 46 provide full mesh connectivity between access nodes within a given structural unit. In one example, such a structural unit may be referred to herein as a logical chassis (e.g., a half-chassis or a half-physical chassis) that includes two NSCUs 40 having two AGNs 19 and supports an 8-way grid of eight access nodes 17 of these AGNs. In this particular example, the connections 46 will provide full mesh connectivity between four access nodes 17 within the same access node group 19, and the connections 44 will provide full mesh connectivity between each access node 17 and four other access nodes within one other access node group of the logical chassis (i.e., the structural unit). In addition, the access node group 19 may have enough (e.g., 16) externally available ethernet ports to connect to four access nodes in another access node group.
In the case of an 8-way grid of access nodes (i.e., logical chassis of two NSCUs 40), each of the access nodes 17 may be connected to each of the other seven access nodes by a 50GE connection. For example, each connection 46 between four access nodes 17 within the same access node group 19 may be a 50GE connection arranged as a 2x25 GE link. Each of the connections 44 between the four access nodes 17 and the four access nodes in the other access node group may include four 50GE links. In some examples, each of the four 50GE links may be arranged as a 2x25 GE link such that each of the connections 44 includes an 8x25 GE link to other access nodes in another access node group. This example is described in more detail below with respect to fig. 5.
In another example, the ethernet connections 44, 46 provide full mesh connectivity between access nodes within a given structural unit, which is a full chassis or full physical chassis, that includes four NSCUs 40 with four AGNs 19, and a 16-way mesh of access nodes 17 supporting those AGNs. In this example, connections 46 provide full mesh connectivity between four access nodes 17 within the same access node group 19, and connections 44 provide full mesh connectivity between each access node 17 and twelve other access nodes within three other access node groups. In addition, the access node group 19 may have enough (e.g., 48) externally available ethernet ports to connect to four access nodes in another access node group.
In the case of a 16-way mesh of access nodes, for example, each access node 17 may be connected to each of the other fifteen access nodes by 25GE connections. In other words, in this example, each connection 46 between four access nodes 17 within the same access node group 19 may be a single 25GE link. Each of the connections 44 between the four access nodes 17 and twelve other access nodes in the three other access node groups may comprise a12 x25 GE link.
As shown in fig. 3, each access node 17 within the access node group 19 may also support a set of high speed PCIe connections 48, 50, e.g., PCIe Gen 3.0 or PCIe Gen4.0 connections, for communicating with solid state storage 41 within the access node group 19 and with a server 52 within the NSCU 40. Each of the servers 52 includes four server nodes 12 supported by one of the access nodes 17 within the access node group 19. The solid state storage 41 may be a non-volatile memory express (NVMe) based Solid State Drive (SSD) storage device accessible by each access node 17 via connection 48.
In one example, solid state storage 41 may include twenty-four SSD devices with six SSD devices per access node 17. Twenty-four SSD devices may be arranged into four rows of six SSD devices, each row of SSD devices being connected to one of access nodes 17. Each SSD device may provide up to 16 Terabytes (TB) of storage, totaling 384TB per access node group 19. As described in more detail below, in some cases, a physical chassis may include four access node groups 19 and their supported servers 52. In this case, a typical physical rack may support local solid state storage of approximately 1.5 gigabytes (PB). In another example, solid state storage 41 may include up to 32 u.2x4 SSD devices. In other examples, NSCU40 may support other SSD devices, such as 2.5 "serial ata (sata) SSDs, micro sata (msata) SSDs, m.2ssds, and so forth.
In the above example, where each access node 17 is included on a separate access node chassis with local storage for the access node, each access node chassis may include four SSD devices and some additional storage, which may be hard disk drives or solid state drive devices. In this example, four SSD devices and additional storage may provide approximately the same amount of storage for each access node as the six SSD devices described in the previous example.
In one example, each access node 17 supports a total of 96 PCIe lanes. In this example, each connection 48 may be an 8x4 channel PCI Gen 3.0 connection via which each access node 17 may communicate with up to eight SSD devices within solid state storage 41. Additionally, each connection 50 between a given access node 17 and four server nodes 12 within a server 52 supported by the access node 17 may be a 4x16 lane PCIe Gen 3.0 connection. In this example, the access node group 19 has a total of 256 external facing PCIe links that interface with the server 52. In some scenarios, the access nodes 17 may support redundant server connectivity such that each access node 17 connects to eight server nodes 12 within two different servers 52 using an 8x8 lane PCIe Gen 3.0 connection.
In another example, each access node 17 supports a total of 64 PCIe lanes. In this example, each connection 48 may be an 8x4 channel PCI Gen 3.0 connection via which each access node 17 may communicate with up to eight SSD devices within solid state storage 41. Additionally, each connection 50 between a given access node 17 and four server nodes 12 within a server 52 supported by the access node 17 may be a 4x8 lane PCIe Gen4.0 connection. In this example, the access node group 19 has a total of 128 externally facing PCIe links that interface with the server 52.
FIG. 4 is a diagram illustrating the inclusion of two NSCUs 40 from FIG. 31And 402A block diagram of an example logic chassis arrangement 60. In some examples, each NSCU40 may be referred to as a "compute sandwich" based on the structural arrangement of the access node group 19 being "sandwiched" between the top two servers 52 and the bottom two servers 52. For example, server 52A may be referred to as the top second server, server 52B may be referred to as the top server, server 52C may be referred to as the bottom server, and server 52D may be referred to as the bottom second server. Each server 52 may include four server nodes, and each server node may be a dual socket or dual processor server chassis.
Each access node group 19 is connected to a server 52 using PCIe link 50 and to switch fabric 14 using ethernet link 42. Access node group 191And 191May each include four access nodes connected to each other using ethernet links and local solid state storage devices connected to the access nodes using PCIe links as described above with respect to fig. 3. Access node group 191And 192The access nodes within are connected to each other in a full mesh 64, which will be described in more detail with respect to fig. 5.
In addition, each access node group 19 supports a PCIe connection 50 to a server 52. In one example, each connection 50 may be a 4x16 lane PCIe Gen 3.0 connection such that the access node group 19 has a total of 256 externally available PCIe links that interface with the server 52. In another example, each connection 50 may be a 4x8 lane PCIe Gen4.0 connection for communication between access nodes within the access node group 19 and server nodes within the server 52. In either example, connection 50 may provide a raw throughput of 512 gigabits per access node 19 or a bandwidth of approximately 128 gigabits per server node without regard to any overhead bandwidth cost.
As discussed above with respect to fig. 3, each NSCU40 supports an 8x100 GE link 42 from the access node group 19 to the switch fabric 14. Each NSCU40 thus provides support for up to sixteen server nodes of the four servers 52, local solid-state storage, and full-duplex (i.e., bidirectional) network bandwidth of 800 Gbps. Thus, each access node group 19 can provide a true super-convergence of computation, storage, networking, and security of the server 52. Thus, the logical chassis 60 (including the two NSCUs 40) provides support for up to 32 server nodes in the eight servers 52, local solid state storage at the access node group 19, and the 16x100 GE link 42 to the switch fabric 14, which results in a full duplex network bandwidth of 1.6 terabits per second (Tbps).
Fig. 5 is a block diagram illustrating two access node groups 19 within a logical chassis 601、192A block diagram of an example of full mesh connectivity between. As illustrated in FIG. 5, the set of access nodes 191Comprising four access nodes 171-174And access node group 192Also included are four access nodes 175-178. Each access node 17 is connected to other access nodes within the logical chassis in a mesh topology. The eight access nodes 17 included in the mesh topology may be referred to as an access node "cluster". In this way, each access node 17 is able to inject incoming packets to each of the other access nodes in the cluster.
In the illustrated configuration of an 8-way mesh interconnecting two access node groups 19, each access node 17 is connected to each of the other seven access nodes in the cluster via full mesh connectivity. The mesh topology between access nodes 17 includes an intra-access node group link 46 between four access nodes included in the same access node group 19, and the access node group 191 Access node 17 in (b)1-174And access node group 192 Access node 17 in (b)5-178Inter-access node group link 44. Although illustrated as a single connection between each access node 17, each of the connections 44, 46 is bi-directional such that each access node is connected to every other access node in the cluster via a separate link.
First access node group 191Internal access node 171-174Each having access to a first group of access nodes 191Three access node intra-group connections 46 to other access nodes in the group. Such as the first access node group 191Illustrated, an access node 171Supporting access to an access node 174To access node 17, connection 46A3And connection 46B to access node 172 Connection 46C. Access node 172Supporting access to an access node 171To access node 17, connection 46A4And connection 46D to access node 173 Connection 46E. Access node 173Supporting access to an access node 171To access node 17, connection 46B2Connection 46E and to access node 174 Connection 46F. Access node 174Supporting access to an access node 171 Connection 46A of,To the access node 172And connection 46D to access node 173 Connection 46F. Access node 175-178Are similarly connected to the second group of access nodes 192And (4) the following steps.
First access node group 191Each access node 17 within1-174And also to a second group of access nodes 192Internal access node 175-178Is connected 44 between the four access node groups. As illustrated in fig. 5, the first access node group 191And a second access node group 192There are sixteen externally available ports 66, respectively, for connection to one another. E.g. access node 171Through the first access node group 191To the second access node group 192 Support connections 44A, 44B, 44C and 44D to access node 17 by four externally available ports 665-178. In particular, the access node 171Supporting access to a second group of access nodes 192 Internal access node 175To the second access node group 19, 44A2 Internal access node 176To the second access node group 19, 44B2 Internal access node 177And to the second access node group 192 Internal access node 178The connection 44D. First access node group 191Rest of the access nodes 17 in2-174Is similarly connected to a second group of access nodes 192Internal access node 175-178. In addition, in the opposite direction, the access node 175-178Is similarly connected to the first group of access nodes 191Internal access node 171-174。
Each access node 17 may be configured to support up to 400 gigabits of bandwidth to connect to other access nodes in the cluster. In the illustrated example, each access node 17 may support up to eight 50GE links to other access nodes. In this example, since each access node 17 is connected to only seven other access nodes, 50 gigabits of bandwidth may be reserved and used to manage the access nodes. In some casesIn an example, each of the connections 44, 46 may be a single 50GE connection. In other examples, each of the connections 44, 46 may be a 2x25 GE connection. In other examples, each of the access node intra-group connections 46 may be a 2 × 25GE connection and each of the access node inter-group connections 44 may be a single 50GE connection to reduce the number of inter-box cables. E.g. from the first access node group 191Each access node 17 within1-174Disconnect the 4x50GE link to connect to the second access node group 192 Access node 17 in (b)5-178. In some examples, a 4x50GE link may be taken from each access node 17 using a DAC cable.
Fig. 6 is a block diagram illustrating an example arrangement of a complete physical rack 70 including two logical racks 60 from fig. 4. In the illustrated example of fig. 6, the rack 70 has 42 rack units or slots of vertical height, 2 rack units (2 RUs) including top of rack (TOR) devices 72, for providing connectivity to devices within the switch fabric 14. In one example, TOR device 72 comprises a top-of-rack ethernet switch. In other examples, TOR device 72 includes an optical displacer. In some examples, the rack 70 may not include additional TOR devices 72, but rather have typically 40 rack units.
In the illustrated example, the chassis 70 includes four access node groups 191-194Each access node group is a separate network device of height 2 RU. Each access node group 19 includes four access nodes and may be configured as shown in the example of fig. 3. E.g. access node group 191Including access nodes AN1-AN4, access node group 192Including access nodes AN5-AN8, access node group 193Including access nodes AN9-AN12, and access node group 194Including access nodes AN13-AN 16. The access nodes AN1-AN16 may be substantially similar to the access node 17 described above.
In this example, each access node group 19 supports sixteen server nodes. E.g. access node group 191Support server node A1-A16, Access node group 192Support server node B1-B16, Access nodeDot group 193Support server nodes C1-C16, and access node group 194Support server nodes D1-D16. The server node may be a dual slot or dual processor server chassis having a width of 1/2 racks and a height of 1 RU. As described with respect to fig. 3, the four server nodes may be arranged as a height 2RU server 52. For example, server 52A includes server nodes A1-A4, server 52B includes server nodes A5-A8, server 52C includes server nodes A9-A12, and server 52D includes server nodes A13-A16. The server nodes B1-B16, C1-C16, and D1-D16 may similarly be arranged as servers 52.
The access node group 19 and the server 52 are arranged as an NSCU40 from fig. 3-4. The height of the NSCUs 40 is 10RU and each includes one 2RU access node group 19 and four 2RU servers 52. As illustrated in fig. 6, the access node groups 19 and servers 52 may be configured as a computing sandwich, where each access node group 19 is "sandwiched" between the top two servers 52 and the bottom two servers 52. E.g. with respect to the access node group 191 Server 52A may be referred to as the top second server, server 52B may be referred to as the top server, server 52C may be referred to as the bottom server, and server 52D may be referred to as the bottom second server. In the illustrated structural arrangement, the access node groups 19 are separated by eight chassis units to accommodate the bottom two 2RU servers 52 supported by one access node group and the top two 2RU servers 52 supported by another access node group.
NSCU40 may be arranged as a logical chassis 60, i.e., a semi-physical chassis from fig. 5. Logic chassis 60 is 20RU high and each includes two NSCUs 40 with full mesh connectivity. In the illustrated example of fig. 6, the access node group 191And access node group 192And their respective supported server nodes a1-a16 and B1-B16 are included in the same logical chassis 60. As described in more detail above with respect to fig. 5, the access nodes AN1-AN8 that include the same logical chassis 60 are connected to each other in AN 8-way mesh. The access nodes AN9-AN16 may similarly be connected in AN 8-way mesh within another logical chassis 60The logical chassis comprising a set of access nodes 193And 194And their respective server nodes C1-C16 and D1-D16.
The logical racks 60 within the rack 70 may be connected to the switching fabric directly or through intervening overhead equipment 72. As described above, in one example, the TOR device 72 comprises a top-of-rack ethernet switch. In other examples, TOR devices 72 include optical displacers that transport optical signals between access nodes 17 and core switches 22 and are configured such that optical communications are "transposed" based on wavelength to provide full mesh connectivity between upstream and downstream ports without any optical interference.
In the illustrated example, each access node group 19 may connect to the TOR device 72 via one or more of the 8x100 GE links supported by the access node group to reach the switching fabric. In one case, two logical racks 60 within a rack 70 may each be connected to one or more ports of the TOR device 72, and the TOR device 72 may also receive signals from one or more logical racks within adjacent physical racks. In other examples, the rack 70 may not include the TOR device 72 itself, but rather the logical rack 60 may be connected to one or more TOR devices included in one or more adjacent physical racks.
For a standard rack size of 40RU, it may be desirable to remain within typical power limits, such as a power limit of 15 kilowatts (kW). In the example of rack 70, the additional 2RU TOR equipment 72, even with 64 server nodes and four access node groups, can be easily maintained within or near the 15kW power limit. For example, each access node group 19 may use approximately 1kW of power, thereby producing approximately 4kW of power for the access node group. In addition, each server node may use approximately 200W of power, producing approximately 12.8kW of power for the servers 52. In this example, the 40RU arrangement of the access node group 19 and the server 52 therefore uses approximately 16.8kW of power.
Fig. 7A is a block diagram illustrating a logical view of the networking data paths and operations within access node 17. As shown in the example of fig. 7A, in some example implementations, each access node 17 implements at least four different operational networking components or functions: (1) a Source (SF) component 30 operable to receive traffic from a set of servers 12 supported by the access node, (2) a source Switch (SX) component 32 operable to switch source traffic to other source switch components of different access nodes 17 (which may be of different access node groups) or to the core switch 22, (3) a destination switch (DX) component 34 operable to switch inbound traffic received from other source switch components or the core switch 22, and (4) a Destination (DF) component 36 operable to reorder the packet flow and provide the packet flow to the destination server 12.
In some examples, different operational networking components of access node 17 may perform flow-based switching and ECMP-based load balancing for Transmission Control Protocol (TCP) packet flows. However, in general, ECMP load balancing is poor because it randomly hashes flows to paths, making it possible to assign some large flows to the same path and severely unbalancing the structure. In addition, ECMP relies on local path decisions and does not use any feedback on possible congestion or downstream link failure of any selected path.
The techniques described in this disclosure introduce a new data transfer protocol, referred to as the Fabric Control Protocol (FCP), which may be used by the different operating networking components of the access node 17. FCP is an end-to-end admission control protocol in which a sender explicitly requests a receiver to transfer a certain number of payload data bytes. In response, the receiver issues grants based on its buffer resources, QoS, and/or fabric congestion metrics.
For example, FCP includes a permission control mechanism by which a source node requests permission before transmitting a packet on a fabric to a destination node. For example, the source node sends a request message to the destination node requesting a certain number of bytes to be transferred, and the destination node sends a grant message to the source node after reserving the egress bandwidth. In addition, FCP enables packets of a single packet flow to be injected on all available links between a source node and a destination node, rather than flow-based switching and ECMP forwarding for sending all packets of a TCP flow on the same path, in order to avoid packet reordering. The source node assigns a packet sequence number to each packet of a flow, and the destination node can use the packet sequence number to order incoming packets of the same flow in order.
The SF component 30 of the access node 17 is considered to be the source node of the structure. In accordance with the disclosed technology, for FCP traffic, SF component 30 is configured as a plurality of SX components that inject their input bandwidth (e.g., 200Gbps) over a link to an access node within a logical chassis. For example, as described in more detail with respect to fig. 7B, the SF component 30 may inject packets of the same flow across eight links of seven other SX components of the SX component 32 and other access nodes within a logical chassis. For non-FCP traffic, SF component 30 is configured to select one of the SX components of the connection to which packets of the same flow are sent.
The DX component 34 of the access node 17 can receive incoming packets from the plurality of core switches directly or via one or more intermediary devices (e.g., TOR ethernet switches, electrical permutation devices, or optical permutation devices). For example, the DX component 34 can receive incoming packets from eight core switches or four or eight intermediate devices. The DX component 34 is configured to select the DF component to which to send the received packet. For example, the DX component 34 may be connected to the DF component 36 within the logical chassis and seven other DF components of other access nodes. In some cases, the DX component 34 may become a point of congestion because the DX component 34 may receive a large amount of bandwidth (e.g., 200Gbps) that will all be sent to the same DF component. In the case of FCP traffic, the DX component 34 can use the admission control mechanism of FCP to avoid long-term congestion.
The DF component 36 of the access node 17 can receive incoming packets from multiple DX components of the access node within the logical chassis (e.g., the DX component 34 within the logical chassis and seven other DX components of other access nodes). The DF component 36 is considered the destination node of the fabric. For FCP traffic, the DF component 36 is configured to reorder packets of the same flow prior to transmitting the flow to the destination server 12.
In some examples, SX component 32 and DX component 34 of access node 17 may use the same forwarding table to perform packet switching. In this example, the personality (personality) of the access node 17 and the next hop for the same destination IP address identified by the forwarding table may depend on the source port type of the received data packet. For example, if a source packet is received from an SF component, access node 17 operates as an SX component 32 and determines the next hop to forward the source packet through the fabric to the destination node. If a packet is received from a fabric-facing port, access node 17 operates as a DX component 34 and determines the final next hop to forward the incoming packet directly to the destination node. In some examples, a received packet may include an incoming label specifying its source port type.
Fig. 7B is a block diagram illustrating a set of access nodes 17 within a logical chassis 601-178A block diagram of an example first level network fanout implemented in between. In the illustrated example of fig. 7B, the logical chassis 60 includes two access node groups 191And 192Comprising eight access nodes 171-178And by each access nodeA supporting server node 12.
As shown in fig. 7B, SF components 30A-30H and SX components 32A-32H for an access node 17 within logical shelf 60 have full mesh connectivity, with each SF component 30 being connected to all SX components 32 for eight access nodes 17 within logical shelf 60. As described above, the eight access nodes 17 within the logic chassis 60 may be connected to each other through an 8-way mesh of ethernet power connections. In the case of FCP traffic, SF component 30 of access node 17 within logical shelf 60 applies an injection algorithm to inject packets for any given packet flow over all available links to SX component 32. In this manner, the SF component 30 does not have to perform a full lookup operation for the L2/L3 exchange of outbound packets for a packet flow originating from the server 12. In other words, packets of a given packet flow may be received by an SF module 30 (such as SF module 30A) and injected on some or all of the links of SX modules 32 of logical shelf 60. In this way, in this example, the access node 17 of the logical chassis implements 1: 8, and in some examples may do so without causing any L2/L3 forwarding lookups relative to keying information in the packet header. As such, packets for a single packet stream do not necessarily follow the same path when injected by a given SF module 30.
Thus, according to the disclosed technique, after receiving source traffic from one of the servers 12, for example, by the access node 171The implemented SF component 30A performs 8-way injection of packets of the same flow across all available links to the SX component 32 implemented by the access node 17 included in the logical shelf 60. More specifically, SF components 30A span the same access node 17 within logical chassis 601An internal SX component 32A and other access nodes 172-178Seven external SX assemblies 32B-32H are sprayed. In some implementations, this 8-way injection between SF 30 and SX 32 within logic rack 60 may be referred to as a first phase injection. As described in other portions of this disclosure, the second stage spraying may be performed on a second stage network fanout within the switching fabric between access node 17 and core switch 22. For example, the second stage injection may be via an intermediate device (such as a TOR Ethernet switch)A switch, an electrical displacement device, or an optical displacement device).
In some examples, the first four access nodes 17, as described in more detail above1-174May be included in the first access node group 191And a second four access nodes 174-178May be included in the second access node group 192In (1). The access nodes 17 within the first and second access node groups 19 may be connected to each other via a full mesh to allow 8-way injection between SF 30 and SX 32 within the logical shelf 60. In some examples, the logical chassis 60 including the two access node groups and their supporting servers 12 may be referred to as a half chassis or a half physical chassis. In other examples, more or fewer access nodes may be connected together using full mesh connectivity. In one example, sixteen access nodes 17 may be connected together in a full mesh to support the first stage 16-way injection within a full physical chassis.
Fig. 8 is a block diagram illustrating an example multi-level network fanout across a data center switching fabric between access nodes 17. In the illustrated example of fig. 8, each logical chassis 60 includes eight access nodes 171-178And server nodes 12 supported by each access node. First logical Chassis 601Is connected to the second logical chassis 60 through the core switch 22 within the switch fabric2. In some examples, the first logical chassis 601And a second logical rack 602May be the same logical chassis.
In accordance with the disclosed technology, a switch fabric includes FCP-based flow control and network communications within the network fabric. The network fabric may be visualized as including a plurality of channels, e.g., a request channel, a grant channel, an FCP data channel, and a non-FCP data channel, as described in more detail with respect to fig. 11. As illustrated in fig. 8, the FCP data channel carries data packets via a logical tunnel 100, the logical tunnel 100 including a first logical chassis 601Of (e.g., access node 17)1 SF module 30A) and second logical rack 602A destination node (e.g., an access node)Point 171All paths between DF components 36A). The FCP data channel carries data packets using the FCP protocol. FCP packets are injected over the fabric from the source node to the destination node through a suitable load balancing scheme. FCP packets are expected to be delivered out of order, but the destination node may perform packet reordering. For example, it may be possible for access node 17 to be1Towards the access node 17 from packets of the traffic flow received by the SF component 30A of the source server 121The DF assembly 36A of (a) is injected over some or all of the possible links within the logical tunnel 100.
In some examples, the DF component 36A is configured to reorder the received packets prior to streaming the packets to the destination server 12 to recreate the original sequence of the packet stream. In other examples, the DF component 36A may not need to reorder the packets of the received packet flow prior to transmitting the packet flow to the destination server 12. In these examples, the DF component 36A may instead deliver the packets to the destination server 12 in the order in which the packets arrived. For example, packets that include storage access requests or responses to a destination storage device may not need to be reordered to send their original sequence.
A request channel within the network fabric may be used to carry the FCP request message from the source node to the destination node. Similar to an FCP data packet, an FCP request message may be injected on all available paths to a destination node, but the request message need not be reordered. In response, a grant channel within the network fabric may be used to carry the FCP grant message from the destination node to the source node. The FCP grant message may also be injected towards the source node on all available paths and no reordering of the grant message is required. non-FCP data channels within the network fabric carry data packets that do not use the FCP protocol. ECMP-based load balancing may be used to forward or route non-FCP data packets and for a given flow identified by five tuples, the packets are expected to be delivered to the destination node in order.
The example of FIG. 8 illustrates a first logical rack 601First level network fanout between access nodes 17 within, as described above with respect to the figure7B, and second level network fanout between access node 17 and core switch 22. The first logical rack 60, as described above with respect to fig. 3-41The eight access nodes 17 within are connected to the core switch 22 using electrical or optical ethernet connections. A second logical rack 602The eight access nodes 17 within are similarly connected to the core switch 22. In some examples, each access node 17 may be connected to eight core switches 22. In the case of FCP traffic, a first logical chassis 601The SX components 32 of the access nodes 17 within apply an injection algorithm to inject packets for any given packet flow across all available paths to core switch 22. In this manner, the SX component 32 may not perform a full lookup operation for the L2/L3 exchange of received packets.
After receiving source traffic from one server 12, the first logical rack 601 Access node 17 in (b)1To the first logical bay 601All available paths of SX component 32 implemented by access node 17 in performs 8-way injection on FCP packets of traffic flows. As further illustrated in fig. 8, each SX component 32 then injects an FCP packet of traffic flows across all available paths to core switch 22. In the illustrated example, the multi-level fanout is 8x8, thus supporting a maximum of sixty-four core switches 221-2264. In other examples, where the first stage fan-out is 1:16 across the entire physical chassis, the multi-stage fan-out may be 16x16, and support up to 256 core switches.
Although illustrated in fig. 8 as occurring directly between access node 17 and core switch 22, the secondary fanout may be performed by one or more TOR devices, such as a top-of-rack ethernet switch, an optical replacement device, or an electrical replacement device. Multi-level network fanout enabling first logical shelf 601Packets of the traffic flow received at any of the access nodes 17 within can reach the core switch 22 for further forwarding to the second logical chassis 602Any access node 17 within.
In accordance with the disclosed technology, in one example implementation, each of the SF components 30 and SX components 32One using an FCP injection engine configured to apply an appropriate load balancing scheme to inject packets of a given FCP packet flow on all available paths to the destination node. In some examples, the load balancing scheme may direct each FCP packet of a packet flow to one of the parallel data paths (i.e., the least loaded path) selected based on the available bandwidth. In other examples, the load balancing scheme may direct each FCP packet of the packet flow to a random, pseudo-random, or cyclically selected one of the parallel data paths. In yet another example, the load balancing scheme may direct each FCP packet of the packet flow to one of the weighted randomly selected parallel data paths in proportion to the available bandwidth in the switch fabric. In an example of minimum load path selection, the FCP injection engine may track the number of bytes transmitted on each path in order to select the minimum load path on which to forward the packet. Additionally, in the example of weighted random path selection, the FCP injection engine may track downstream path failures to provide flow fairness by injecting packets on each active path in a manner proportional to bandwidth weight. For example, if core switch 22 is connected to SX component 32A1-228Fails, the path weights between SF component 30A and SX component 32 will change to reflect in the first logical shelf 601Internal access node 171A smaller proportion of the later available switching fabric bandwidth. In this example, the SF component 30A will communicate with the first logical rack 601The available bandwidth behind the access node 17 in the SX module 32 is proportionally injected. More specifically, based on the core switch 22 connected thereto1-228In the first logical bay 60 due to a failure of one of the logical bays1 Internal access node 171With reduced switch fabric bandwidth thereafter, SF assembly 30A will inject fewer packets toward SX assembly 32A than other SX assemblies 32. In this way, the spray of packets over the available paths towards the destination node may not be uniform, but the bandwidth between active paths will be balanced even over a relatively short period of time.
In this example, a first logical chassis601Source node (e.g., access node 17) within1 SF component 30A) to the second logical rack 602A destination node (e.g., access node 17) within1 DF component 36A) sends a request message to request a certain weight or bandwidth, and the destination node sends a grant message to the source node after reserving the outgoing bandwidth. The source node also determines the logical chassis 60 between the core switch 22 and the logical chassis including the destination node2Whether any link failure has occurred in between. The source node may then use all active links proportional to the source and destination bandwidths. As an example, assume that there are N links between a source node and a destination node, each link having a source bandwidth of SbiAnd destination bandwidth is DbiWherein i ═ 1.. N. To account for the failure, the actual bandwidth from the source node to the destination node is equal to min (Sb, Db) determined on a link-by-link basis. More specifically, the source bandwidth (Sb) is equal toAnd the destination bandwidth (Db) is equal toAnd bandwidth of each link (b)i) Is equal to min (Sb)i,Dbi). The bandwidth weight used on each link is equal to
In the case of FCP traffic, SF component 30 and SX component 32 use FCP injection engines to distribute FCP packets for traffic flows based on the load on each link toward the destination node (in proportion to its weight). The injection engine maintains a credit store to track the credits (i.e., available bandwidth) for each next hop member link, deducts the credits (i.e., reduces the available bandwidth) using the packet length included in the FCP header, and associates a given packet with one of the active links (i.e., the least loaded link) having the highest credits. In this manner, for FCP packets, SF component 30 and SX component 32 inject packets on member links for the next hop of the destination node in proportion to the bandwidth weights of the member links. More details on structural fault resilience may be found in united states provisional patent application No. 62/638,725 entitled "Resilient Network Communication Using selective multi-path Packet slow jet Flow spreading" (attorney docket No. 1242-015USP1), filed on 3/5/2018, the entire contents of which are incorporated herein by reference.
In another example implementation, each of SF components 30 or SX components 32 modifies the UDP portion of the header for each FCP packet of the packet stream in order to force the packet to be injected downstream towards core switch 22. More specifically, each of SF component 30 or SX component 32 is configured to randomly set a different UDP source port in the UDP portion of the header for each FCP packet of the packet stream. Each core switch 22 computes a hash of the N field from the UDP portion of the header for each FCP packet and selects one of the parallel data paths on which to inject the FCP packet based on the randomly set UDP source port for each FCP packet. This example implementation enables injection through core switch 22 without modifying core switch 22 to understand FCP.
The core switch 22 follows a first logical chassis 601Of (e.g., access node 17)1 SF module 30A) and second logical rack 602Of (e.g. access node 17)1The DF component 36A) operates as a single hop. The core switch 22 performs a complete lookup operation to perform an L2/L3 switch on the received packet. In this manner, the core switch 22 may forward all packets for the same traffic flow to the second logical chassis 60 supporting the destination server 122Of (2), e.g. access node 171The DF assembly 36A. Although illustrated in fig. 8 as occurring directly at the core switch 22 and the second logical chassis 602 Destination access node 171But the core switch 22 may forward all packets for the same traffic flow to an intermediate TOR device with connectivity to the destination node. In some examples, the intermediate TOR device mayTo forward all packets for the traffic flow directly to the second logical shelf 602 Access node 17 of1The DX component 34A is implemented. In other examples, the intermediate TOR device may be an optical or electrical permutation device configured to provide another fan-out through which packets may be injected between input and output ports of the permutation device. In this example, a second logical rack 602All or some portions of the DX component 34 of the access node 17 can receive injected packets of the same traffic flow.
A second logical rack 602The DX and DF assemblies 34, 36 of the internal access node 17 also have a full grid connectivity shape in that each DX assembly 34 is connected to the second logical chassis 602All DF assemblies 36 within. When any of the DX components 34 receive packets of the traffic flow from the core switch 22, the DX components 34 forward the packets on the direct path to the access node 171The DF assembly 36A. The DF component 36A may perform a limited lookup only necessary to select the appropriate output port to forward the packet to the destination server 12. The second logical rack 60 responsive to receiving packets of the traffic flow2 Internal access node 171The DF component 36A of (a) may reorder packets of the traffic flow based on their sequence numbers. Thus, with respect to the complete routing table of the data center, only the core switches 22 may need to perform a complete lookup operation. Thus, the switching fabric provides a highly scalable, flat, high-speed interconnect, where the server is effectively one L2/L3 hop from any other server 12 within the data center.
More details regarding the Data Center Network architecture and interconnected access nodes illustrated in fig. 1-8B may be found in U.S. patent application No. 15/939,227 (attorney docket No. 1242-002US01), entitled "Non-Blocking Any to Any Data Center Network with packet Spraying Over Multiple Alternate Data Paths," filed on 28/3/2018, the entire contents of which are incorporated herein by reference.
A brief description of one example of the FCP of fig. 8 and its operation is included herein. In the example of fig. 8, the access node 17 is a Fabric Endpoint (FEP) to the network fabric, which is comprised of switching elements (e.g., core switches 22) arranged in a leaf-backbone topology. The network architecture allows one access node 17 to communicate with another access node over multiple paths. The core switches 22 inside the network fabric have shallow packet buffers. The cross-sectional bandwidth of the network structure is equal to or greater than the sum of the bandwidths of all the endpoints. In this way, if each access node 17 limits the incoming data rate to the network architecture, no path within the network architecture should be congested with a high probability for a long time.
As described above, FCP data packets are received from a source node (e.g., first logical chassis 60) via logical tunnel 1001 Internal access node 171 SF component 30A) is sent to the destination node (e.g., second logical chassis 60)2 Internal access node 172 DF assembly 36A). Before any traffic is sent through the tunnel 100 using FCP, a connection must be established between the endpoints. The control plane protocols executed by the access node 17 may be used to establish a pair of tunnels in each direction between the two FCP endpoints. The FCP tunnel is optionally protected (e.g., encryption and authentication). The tunnel 100 is considered unidirectional from the source node to the destination node, and the FCP partner tunnel may be established in the other direction from the destination node to the source node. The control plane protocols negotiate the capabilities (e.g., block size, Maximum Transmission Unit (MTU) size, etc.) of the two endpoints and establish an FCP connection between the endpoints by setting up the tunnel 100 and its partner tunnels and initializing a queue state context for each tunnel.
Each endpoint is assigned a source tunnel ID and a corresponding destination tunnel ID. At each endpoint, a queue ID for a given tunnel queue is derived based on the assigned tunnel ID and priority. For example, each FCP endpoint may allocate a local tunnel handle from a handle (handle) pool and pass the handle to its FCP connection partner endpoint. The FCP partner tunnel handle is stored in a lookup table and referenced from the local tunnel handle. For the source endpoint, for example, the first logical rack 601 Internal access node 171The source queue is identified by a local tunnel ID and a priority, and the destination tunnel ID is identified from a lookup table based on the local tunnel ID. Similarly, for the destination endpoint, for example, the second logical rack 602 Internal access node 171The destination queue is identified by a local tunnel ID and a priority, and the source tunnel ID is identified from a lookup table based on the local tunnel ID.
An FCP tunnel queue is defined as a bucket of independent traffic flows that use FCP to transport payloads over a network fabric. The FCP queue for a given tunnel is identified by a tunnel ID and a priority, and the tunnel ID is identified by the source/destination endpoint pair for the given tunnel. Alternatively, the endpoint may use a mapping table to derive a tunnel ID and priority based on the internal FCP queue ID for a given tunnel. In some examples, the fabric tunnel (e.g., logical tunnel 100) of each tunnel may support 1, 2, 4, or 8 queues. The number of queues per tunnel is a network fabric attribute and can be configured at deployment time. All tunnels within the network fabric may support the same number of queues for each tunnel. Each endpoint can support up to 16,000 queues.
When a source node communicates with a destination node, the source node encapsulates the packet using the FCP encapsulated over UDP. The FCP header carries fields that identify the tunnel ID, queue ID, Packet Sequence Number (PSN) of the packet, and request, grant, and data block sequence number between the two endpoints. At the destination node, the incoming tunnel ID is unique for all packets from a particular source node. The tunnel encapsulation carries packet forwarding and reordering information used by the destination node. A single tunnel carries packets in one or more queues between a source node and a destination node. Packets within only a single tunnel are reordered based on sequence number labels that span the same tunnel queue. When sending packets through the tunnel to the destination node, the source node will mark the packet with the tunnel PSN. The destination node may reorder the packets based on the tunnel ID and PSN. At the end of the reordering, the destination node strips the tunnel encapsulation and forwards the packet to the corresponding destination queue.
It is described herein how an FCP tunnel will be entered at a source endpoint100 to the destination endpoint. The source server 12 with IP address A0 sends an IP packet to the destination server 12 with IP address B0. Source FCP endpoint (e.g., first logical Rack 60)1Internal access node 171) An FCP request packet carrying an active IP address a and a destination IP address B. The FCP request packet has an FCP header to carry a Request Block Number (RBN) and other fields. The FCP request packet is transmitted over UDP over IP. Destination FCP endpoint (e.g., second logical Rack 60)2Internal access node 171) The FCP authorization packet is sent back to the source FCP endpoint. The FCP grant packet has an FCP header to carry a Grant Block Number (GBN) and other fields. The FCP authorization packet is transmitted over UDP over IP. The source endpoint transmits the FCP data packet after receiving the FCP authorization packet. The source endpoint appends a new (IP + UDP + FCP) data header to the incoming data packet. The destination endpoint will remove the additional (IP + UDP + FCP) data header before delivering the packet to the destination host server.
Fig. 9 is a block diagram illustrating an example access node 130 that includes a networking unit 142 and two or more processing cores 140A-140N (collectively, "cores 140"). Access node 130 generally represents a hardware chip implemented in a digital logic circuit arrangement. As various examples, the access node 130 may be provided as an integrated circuit mounted on a motherboard of a computing device or mounted on a card connected to a motherboard of a computing device via PCIe, or the like. In some examples, access node 130 may be an integrated circuit within an access node group (e.g., one of access node groups 19) configured for an independent network device installed within a computing rack, storage rack, or converged rack.
The access node 130 may operate substantially similar to any of the access nodes 17 of fig. 1-8. Thus, the access node 130 may be communicatively coupled to a data center fabric (e.g., the switch fabric 14), one or more server devices (e.g., the server node 12 or the server 52), a storage medium (e.g., the solid state storage 41 of fig. 3), one or more network devices, random access memory, etc., for interconnecting each of these various elements, e.g., via PCIe, ethernet (wired or wireless), or other such communication medium.
In the illustrated example of fig. 9, the access node 130 includes a plurality of cores 140 coupled to the on-chip memory unit 134. In some examples, memory unit 134 may include a cache memory. In other examples, memory unit 134 may include two types of memory or memory devices, namely, a coherent cache memory and a non-coherent buffer memory. More details regarding bifurcated Memory systems may be found in U.S. patent application No. 15/949,892 entitled "relay consistent Memory Management in a Multiple Processor System" (attorney docket No. 1242-008US01), filed on 2018, month 4, and day 10, the entire contents of which are incorporated herein by reference.
In some examples, plurality of cores 140 may include at least two processing cores. In one particular example, the plurality of cores 140 may include six processing cores 140. The access node 130 also includes a networking unit 142, one or more host units 146, a memory controller 144, and one or more accelerators 148. As illustrated in fig. 9, each of cores 140, networking unit 142, memory controller 144, host unit 146, accelerator 148, and memory unit 134 are communicatively coupled to one another. In addition, the access node 130 is coupled to an off-chip external memory 150. The external memory 150 may include a Random Access Memory (RAM) or a Dynamic Random Access Memory (DRAM).
In this example, access node 130 represents a high performance, super converged network, storage device, and data processor and input/output hub. Core 140 may include one or more of a MIPS (non-interlocked pipeline stage microprocessor) core, an ARM (advanced RISC (reduced instruction set computing) machine) core, a PowerPC (performance optimized with enhanced RISC-performance computing) core, a RISC-V (RISC five) core, or a CISC (complex instruction set computing or x86) core. Each of the cores 140 may be programmed to process one or more events or activities related to a given data packet (e.g., a network packet or a storage packet). Each of the cores 140 may be programmed using a high-level programming language (e.g., C, C + +, etc.).
As described herein, utilizing the new processing architecture of access node 130 may be particularly effective for flow processing applications and environments. For example, stream processing is a data processing architecture that is well suited for high performance and efficient processing. A flow is defined as an ordered one-way sequence of computational objects, which may be unlimited or indeterminate in length. In a simple embodiment, the stream originates at the producer and terminates at the consumer, and the operations are performed in sequence. In some embodiments, a stream may be defined as a sequence of stream segments; each stream segment includes a memory block that is contiguously addressable in physical address space, an offset for the block, and an effective length. The stream may be discrete (such as a sequence of packets received from a network) or continuous (such as a byte stream read from a storage device). As a result of the processing, streams of one type may be transformed into streams of another type. For example, TCP receive (Rx) processing consumes segments (fragments) to produce an ordered byte stream. The reverse process is performed in the transmission (Tx) direction. Regardless of the stream type, stream manipulation requires efficient fragment manipulation, where fragments are defined as above.
In some examples, the plurality of cores 140 may be capable of processing a plurality of events related to each of the one or more data packets received by the networking unit 142 and/or the host unit 146 in a sequential manner using the one or more "work units". In general, a unit of work is a set of data exchanged between core 140 and networking unit 142 and/or host unit 146, where each unit of work may represent one or more events related to a given data packet of a flow. As one example, a Work Unit (WU) is a container associated with a stream state and used to describe (i.e., point to) data within a (stored) stream. For example, the work units may dynamically originate within a peripheral unit coupled to the multiprocessor system (e.g., injected by a networking unit, a host unit, or a solid state drive interface), or be within the processor itself in association with one or more streams of data, and terminate at another peripheral unit or another processor of the system. The units of work are associated with workloads related to entities executing the units of work to process the respective portions of the flow. In some examples, one or more processing cores 40 of the access node 130 may be configured to execute program instructions using a Work Unit (WU) stack.
In some examples, while processing multiple events related to each data packet, a first one of the multiple cores 140 (e.g., core 140A) may process a first event of the multiple events. Also, first core 140A may provide a first unit of work of the one or more units of work to a second core of the plurality of cores 140 (e.g., core 140B). Further, the second core 140B may process a second event of the plurality of events in response to receiving the first unit of work from the first core 140B.
More details about Access nodes, including their operation and example architecture, may be found in U.S. patent application No. 16/031,676 entitled "Access Node for Data Centers" filed on 7/10 2018 (attorney docket No. 1242-005US01), the entire contents of which are incorporated herein by reference.
Fig. 10 is a block diagram illustrating an example networking unit 142 of the access node 130 from fig. 9 in more detail. A Networking Unit (NU)142 exposes ethernet ports, also referred to herein as fabric ports, to connect the access node 130 to the switching fabric. The NU142 is connected to the processing core 140 and an external server and/or storage device, such as an SSD device, via endpoint ports. The NU142 supports switching packets from one fabric port to another without storing complete packets (i.e., transit (transit) switching), which helps achieve low latency for transit traffic. In this manner, the NU142 enables the creation of a fabric of access nodes with or without external switching elements. The NU142 may perform the following functions: (1) transmitting packets from the PCIe device (server and/or SSD) to the switch fabric, and receiving packets from the switch fabric and sending them to the PCIe device; (2) supporting switching of packets from one switch fabric port to another switch fabric port; (3) supporting sending of network control packets to an access node controller; and (4) implementing FCP tunneling.
As illustrated in fig. 10, NU142 includes a Fabric Port Group (FPG) 170. In other examples, NU142 may include multiple FPGs 170. FPG 170 includes two or more fabric ports connected to a switching network. FPG 170 is configured to receive ethernet packets from the switch fabric and transmit the packets to the switch fabric. FPG 170 may be responsible for generating and receiving link pause and Priority Flow Control (PFC) frames. In the receive direction, FPG 170 may have a flexible parser to parse incoming bytes and generate Parsed Result Vectors (PRVs). In the transmit direction, FPG 170 may have a packet rewrite subunit to modify an outgoing packet based on rewrite instructions stored with the packet.
NU142 has a single forwarding block 172 to forward packets from fabric ports of FPG 170 and endpoint ports of source proxy block 180. Forwarding block 172 has a fixed pipeline configured to process one PRV received from FPG 170 and/or source proxy block 180 per cycle. The forwarding pipeline of forwarding block 172 may include the following processing portions: attributes, ingress filters, packet lookup, next hop parsing, egress filters, packet replication, and statistics.
In the attribute processing section, different forwarding attributes are determined, such as virtual layer 2 interfaces, virtual routing interfaces and traffic classes. These forwarding attributes are passed on to further processing sections in the pipeline. In the ingress filter processing portion, search keys may be prepared from different fields of the PRV and searched according to programmed rules. The ingress filter block may be used to modify normal forwarding behavior using a rule set. In the packet lookup processing section, certain fields of the PRV are looked up in a table to determine the next hop index. The packet lookup block supports exact match and longest prefix match lookups.
In the next-hop parsing processing section, the next-hop instruction is parsed, and a destination egress port and an egress queue are determined. The next hop resolution block supports different next hops such as final next hop, indirect next hop, Equal Cost Multipath (ECMP) next hop, and Weighted Cost Multipath (WCMP) next hop. The final next hop stores information of the egress flow and how the egress packet should be overwritten. The indirect next hop may be used by software to embed the address of the next hop into memory, which may be used to perform an atomic next hop update.
A WECMP next hop may have multiple members and is used to inject packets on all links between the SF component and SX component of the access node (see, e.g., SF component 30 and SX component 32 of fig. 8). Due to link failure between the rack switch and the backbone switch, the SF may need to inject between SXs based on the active link for the destination rack IP address. For FCP traffic, the FCP injection engine would inject packets based on the load on each link proportional to its weight. WECMP next hop stores the address of the credit memory and the FCP injection engine selects the link with the highest credit and deducts its credit based on the packet length. The ECMP next hop may have multiple members and is used to inject packets on all links of the backbone switch connected to the access node (see, e.g., core switch 22 of fig. 8). For FCP traffic, the FCP injection engine will again inject packets based on the load on each link proportional to its weight. The ECMP next hop stores the address of the credit store and the FCP injection engine selects the link with the highest credit and deducts its credit based on the packet length.
In the egress filter processing section, packets are filtered based on egress ports and egress queues. The egress filter block cannot change the egress destination or egress queue, but can sample or mirror packets using a rule set. If any of the processing stages have determined that a copy of the packet is to be created, the packet copy block generates its associated data. The NU142 may create only one additional copy of the incoming packet. The statistics processing section has a set of counters to collect statistics for network management purposes. The statistics block also supports metering to control packet rates to some ports or queues.
The NU142 also includes a packet buffer 174 that stores packets for port bandwidth oversubscription. Packet buffer 174 may be used to store three types of packets: (1) a transport packet received from processing core 140 on an endpoint port of source agent block 180 to be transported to a fabric port of FPG 170; (2) received packets received from fabric ports of FPG 170 to be transmitted to processing core 140 via endpoint ports of destination proxy block 182; and (3) transit packets that enter on fabric ports of FPG 170 and exit on fabric ports of FPG 170.
The packet buffer 174 tracks memory usage for traffic in different directions and priorities. Based on the programmed profile, if the egress port or queue is very congested, the packet buffer 174 may decide to drop packets, assert flow control to the work cell scheduler, or send a pause frame to the other end. Key features supported by packet buffer 174 may include: cut-through for transit packets, Weighted Random Early Detection (WRED) dropping for non-Explicit Congestion Notification (ECN) -aware packets, ECN marking for ECN-aware packets, input and output based buffer resource management and PFC support.
The cell link list manager subunit maintains a list of cells that represent packets. The cell link list manager may consist of 1 write port and 1 read port memory. The packet queue manager subunit maintains a queue of packet descriptors for the egress node. The packet scheduler subunit schedules packets based on different priorities between queues. For example, the packet scheduler may be a three-level scheduler: port, channel, queue. In one example, each FPG port of FPG 170 has sixteen queues and each endpoint port of source agent block 180 and destination agent block 182 has eight queues.
For a scheduled packet, the packet reader subunit reads the cells from the packet memory and sends them to FPG 170. In some examples, the first 64 bytes of the packet may carry the overwrite information. The resource manager subunit tracks the usage of the packet memory for the different pools and queues. The packet writer block consults the resource manager block to determine whether the packet should be dropped. The resource manager block may be responsible for asserting flow control to the work unit scheduler or sending PFC frames to the ports. The cell free pool subunit manages a free pool of packet buffer cell pointers. The cell free pool allocates a cell pointer when the packet writer block wants to write a new cell to the packet buffer memory, and releases the cell pointer when the packet reader block dequeues the cell from the packet buffer memory.
The NU142 includes a source proxy control block 180 and a destination proxy control block 182 that are collectively responsible for FCP control packets. In other examples, the source proxy control block 180 and the destination control block 182 may comprise a single control block. The source proxy control block 180 generates an FCP request message for each tunnel. In response to the FCP grant message (which is received in response to the FCP request message), source proxy block 180 instructs packet buffer 174 to send FCP data packets based on the amount of bandwidth allocated by the FCP grant message. In some examples, the NU142 includes an endpoint transport pipe (not shown) that sends packets to a packet buffer 174. The endpoint transport pipeline may perform the following functions: packet jetting, fetching (fetch) packets from memory 178, packet segmentation based on programmed MTU size, packet encapsulation, packet encryption and packet parsing to create PRV. In some examples, the endpoint transport pipeline may be included in source proxy block 180 or packet buffer 174.
The destination agent control block 182 generates an FCP authorization message for each tunnel. In response to the received FCP request message, the destination agent block 182 updates the state of the tunnel as appropriate and sends an FCP grant message that allocates bandwidth on the tunnel. In response to the FCP data packets (which are received in response to the FCP grant message), the packet buffer 174 sends the received data packets to the packet reordering engine 176 for reordering and reassembly prior to storage in the memory 178. The memory 178 may include on-chip memory or external off-chip memory. Memory 178 may include RAM or DRAM. In some examples, the NU142 includes an endpoint receive pipeline (not shown) that receives packets from the packet buffer 174. The endpoint receive pipeline may perform the following functions: packet decryption, packet parsing to create a PRV, flow key generation based on the PRV, determining one of the processing cores 140 for the incoming packet and allocating a buffer handle in the buffer memory, sending the incoming FCP request and grant packet to the destination proxy block 182, and writing the incoming data packet with the allocated buffer handle to the buffer memory.
Fig. 11 is a conceptual diagram illustrating example FCP-based flow control and network communications within a network fabric 200, such as a data center switch fabric or other packet-based network. As illustrated, when FCP is used, the network fabric 200 is visualized as a fabric having multiple channels between the source access node 196 and the destination access node 198. The FCP data channel 206 carries traffic for multiple tunnels and multiple queues within each tunnel. Each channel is designated for a particular type of service. The various channels and their attributes are described below.
The strict priority of the control channel 202 is higher than all other channels. The intended use of this channel is to carry the grant message. The grant message is injected on all available paths towards the requesting or source node (e.g., source access node 196). It is not desirable that they arrive at the requesting node in order. The control channel 202 is rate limited to minimize overhead on the network fabric 200. The high priority channel 204 has a higher priority than the data and non-FCP channels. The high priority channel 204 is used to carry FCP request messages. Messages are injected on all available paths towards an authorizing or destination node (e.g., destination access node 198), and it is not expected that messages will arrive at the authorizing node in order. The high priority channel 204 is rate limited to minimize overhead on the fabric.
The FCP data channel 206 uses FCPs to carry data packets. The data channel 206 has a higher priority than the non-FCP data channel. The FCP packets are injected onto the network fabric 200 through a suitable load balancing scheme. FCP packets are not expected to be delivered in order at the destination access node 198, and the destination access node 198 is not expected to have a packet reordering implementation. The non-FCP data channel 208 carries data packets that do not use FCP. The non-FCP data channel 208 has the lowest priority compared to all other channels. The FCP data channel 206 is of higher strict priority than the non-FCP data channel 208. Thus, non-FCP packets use opportunistic bandwidth in the network and, as required, the FCP data rate can be controlled by a request/grant pacing scheme to allow non-FCP traffic to obtain the required share of bandwidth. non-FCP data packets are forwarded/routed using ECMP-based load balancing, and for a given flow (identified by the five-tuple), the packets are expected to always be delivered in order at the destination access node 198. The non-FCP data channel 208 may have multiple queues that apply any priority/QoS in scheduling packets to the fabric. The non-FCP data channel 208 may support 8 queues on each link port based on the priority of the packet flow.
FCP data packets are sent between source access node 196 and destination access node 198 via a logical tunnel. A tunnel is considered unidirectional and an incoming tunnel Identifier (ID) is unique for all packets from a particular source node for the destination. The tunnel encapsulation carries packet forwarding and reordering information. A single tunnel carries packets for one or more source queues (210) between source access node 196 and destination access node 198. Packets within a tunnel are reordered based only on sequence number labels across the same tunnel queue. When a packet is sent from the source access node 196, the packet is marked with a tunnel Packet Sequence Number (PSN). The destination access node 198 reorders the packets based on the tunnel ID and PSN (212). The tunnel encapsulation is stripped at the end of the reordering and the packet is forwarded to the corresponding destination queue (214).
Queues are defined as a bank of independent traffic flows that use FCPs to transport payloads across the network fabric 200. The FCP queues are identified by tunnel ID, priority, and tunnel ID is identified by the source/destination access node pair. Alternatively, the access nodes 196, 198 may use a mapping table to derive tunnel IDs and queue/priority pairs based on internal FCP queue IDs. The fabric tunnel may support 1, 2, 4, or 8 queues for each tunnel. The number of queues per tunnel is a network fabric attribute and can be configured at deployment time. The access node can support up to 16K queues. All tunnels within network fabric 200 may support the same number of queues for each tunnel.
As indicated above, FCP messages include request messages, grant messages, and data messages. The request message is generated when the source access node 196 wishes to transmit a certain amount of data to the destination access node 198. The request message carries a destination tunnel ID, a queue ID, a Request Block Number (RBN) of the queue, and metadata. The request message is sent over a high priority channel 204 on the network fabric 200 and is injected on all available paths. The metadata may be used to indicate request retries, etc. The grant message is generated when the destination access node 198 responds to a request from the source access node 196 to transmit an amount of data. The grant message carries the source tunnel ID, queue ID, Grant Block Number (GBN) of the queue, metadata (scale factor, etc.), and a timestamp. The grant message is sent over a control channel 202 on the network fabric 200 and is injected on all available paths. The control packet structure of the request and grant messages is described below with respect to fig. 18. The FCP data packet carries an FCP header that contains a destination tunnel ID, a queue ID, a Packet Sequence Number (PSN) and a Data Block Number (DBN), and metadata. The average size of the FCP data packets may be about 800B. The Maximum Transmission Unit (MTU) of FCP may be about 1.6KB-2KB to minimize packet delay jitter in the fabric. The FCP data packet structure is described below with respect to fig. 19.
Fig. 12 is a conceptual diagram illustrating an example FCP queue pair structure between a source access node and a destination access node. FCP is an end-to-end admission control protocol. The sender explicitly requests the receiver that a certain amount of payload data is to be transmitted. The receiver issues grants based on its buffer resources, QoS, and/or fabric congestion metrics. Fabric Endpoint (FEP) nodes are nodes connected to fabric consisting of switching elements (leaf-backbone topology). This architecture allows one endpoint to communicate with another endpoint through multiple paths. The switching elements inside the fabric have shallow packet buffers. The cross-sectional bandwidth of the structure is equal to or greater than the sum of the bandwidths of all the structure endpoints. If each fabric endpoint restricts the incoming data rate to the fabric, then no path inside the fabric should be congested with a long-term high probability.
As illustrated in fig. 12, the FCP establishes a pair of tunnels 220, 222 between two FCP endpoints (i.e., source access node 216 and destination access node 218), as each tunnel 220, 222 is considered unidirectional. Each node 216, 218 has been assigned a source tunnel ID and a corresponding destination tunnel ID. The queue ID is derived based on the assigned tunnel ID, priority, of each endpoint. When one endpoint communicates with another endpoint, it encapsulates the packet using UDP + FCP encapsulation. Each node 216, 218 passes from the local queue to the remote queue through a set of tunnels 220, 222. The FCP header carries fields that identify the tunnel ID, queue ID, packet sequence number of the packet, and request, grant, and data block sequence number between the source access node 216 and the destination access node 218.
Before any traffic can be sent using FCP, a connection must be established between the two endpoints 216, 218. The control plane protocols negotiate the capabilities of both endpoints (e.g., block size, MTU size, etc.) and establish an FCP connection between them by establishing tunnels 220, 222 and initializing queue state contexts. Each endpoint 216, 218 allocates a local tunnel handle from the handle pool and passes the handle to its FCP connection partner (e.g., in fig. 12, the destination access node 218 is the FCP connection partner of the source access node 216). The local tunnel handle may be stored in a local tunnel ID table (e.g., local tunnel ID table 226 of source access node 216 and local tunnel ID table 228 of destination access node 218). The FCP partner tunnel handle is stored in a lookup table (e.g., mapping table 224 of the source access node 216 and mapping table 230 of the destination access node 218) and referenced from the local tunnel handle.
For the sender, the source queue is identified by [ local tunnel ID, priority ], and the destination tunnel ID is identified by MAP [ local tunnel ID ]. For the receiver, the queue is identified by [ local tunnel ID, priority ]. As illustrated in fig. 12, the source access node 216 has a source or local tunnel ID "4" in the local tunnel ID table 226 that maps to a remote or destination tunnel ID "1024" in the mapping table 224. Instead, destination access node 218 has a source or local tunnel ID "1024" in local tunnel ID table 228 that maps to a remote or destination tunnel ID "4" in mapping table 230.
Fig. 13 is a conceptual diagram illustrating an example of FCP queue status at a source access node and a destination access node. Each FCP queue at the access node endpoint maintains a set of block sequence numbers for the corresponding transmitter/receiver queue to track queue status. The sequence number indicates the amount of data flowing through the queue at any given time. The sequence number may be in bytes (similar to TCP) or in blocks (to reduce FCP header overhead). The block size may be 64, 128 or 256 bytes and may be negotiated at the time of FCP connection establishment. As one example, the FCP header may carry a 16-bit block sequence number and span 8 megabytes of data before wrapping around (128B). In this example, it is assumed that the Round Trip Time (RTT) or network delay is too low for the sequence number to wrap around in one RTT.
Each access node endpoint maintains a set of block sequence numbers to track blocks that are enqueued, pending requests, or pending/unauthorized blocks. The queue tail block number (QBN) represents a tail block in the transmit queue 240 at the source access node 236. The fabric transmit/output queue 240 tracks incoming packets (WU), which are in blocks, available for transmission to the destination access node 238. Once WU is added to the queue 240, QBN is incremented as follows: QBN + ═ WU _ size/block _ size. The transmit queue 240 tracks the WU boundary only on dequeue, which ensures that a partial WU is never transmitted on the fabric. However, at the time of transmission, the WU may be split into packets of multiple MTU sizes.
At the source access node 236, the Request Block Number (RBN) indicates the last block for which the source access node 236 has sent a request on its fabric. The difference between QBN and RBN at the source access node 236 represents the number of unsolicited blocks in the transmit queue 240. If QBN is greater than RBN, source access node 236 may send a request message for the unsolicited block through the local request scheduler. The local request scheduler may rate limit outgoing request messages. It may also reduce the overall requested bandwidth throughput via the request rate limiter, depending on long term "near" fabric congestion. Due to backbone link loss, near-fabric congestion is referred to as a localized phenomenon at the sender access node 236. The RBN is incremented based on the maximum allowed/configured request size. The outgoing request message carries the updated RBN value. At the destination access node 238, the RBN indicates the last block that the destination access node 238 received the request from the fabric.
When the request message arrives at destination access node 238 out of order, destination access node 238 updates its RBN to a message RBN if the request message RBN is more recent than the previously accepted RBN. If the RBN carried by the out-of-order request message is earlier than the accepted RBN, it is discarded. When a request message is lost, subsequent request messages carrying newer RBNs successfully update the RBNs at destination access node 238, recovering from the lost request message.
If source access node 236 sent its last request message and the request message was lost, destination access node 238 would not perceive the request message as lost because it was the last request from source access node 236. The source access node 236 may maintain a request retry timer and if at the end of the timeout the source access node 236 has not received the grant message, the source access node 236 may retransmit the request again in an attempt to recover from the assumed loss.
At destination access node 238, the Grant Block Number (GBN) indicates the last grant block in receive queue 242. The distance between the RBN and GBN represents the number of unauthorized blocks at the receive queue 242. The egress grant scheduler may move the GBN forward after issuing a grant for receive queue 242. The GBN is updated with the minimum allowed grant size or difference between the RBN and GBN. At source access node 236, the GBN indicates the last block number authorized by destination access node 238. Like the RBN, the GBN may not meet the WU boundary in the output queue 240. The distance between the RBN and GBN represents the number of unauthorized blocks at the transmit queue 240. The transmitter is allowed to pass GBN to complete the current WU processing.
When the grant message arrives at source access node 236 out of order, source access node 236 updates its GBN with the latest GBN compared to the previously accepted GBN. If the GBN of the out-of-order grant message bearer is earlier than the accepted GBN, it is discarded. When a grant message is lost, subsequent grant messages successfully update the GBN at the source access node 236, recovering from the lost grant message.
When destination access node 238 sends the last grant message and the grant message is lost, or when source access node 236 receives the grant and sends a packet that is dropped in the fabric, destination access node 238 does not perceive the grant message as lost or the packet is lost because it only knows that it sent the grant and fails to retrieve the packet. If there are more packets in the tunnel, the tunnel will recover from the loss due to reordering timeout. Destination access node 238 may maintain the timeout and if destination access node 238 has not received the packet at the end of the timeout, destination access node 238 retransmits the grant again in an attempt to recover from the grant/packet loss. In response to the timeout authorization, if the source access node 236 has sent a packet, the source access node 236 may send a packet with zero payload, carrying only the DBN. Zero-length packets travel through the regular data channel and update the receiver state for packet loss. In response to the timeout grant, if the source access node 236 does not receive an earlier grant, the source access node 236 responds to the timeout grant with a regular packet transmission.
At the source access node 236, the Data Block Number (DBN) indicates the last block transmitted from the transmit queue 240. The distance between GBN and DBN represents the number of grant blocks to be transmitted. The transmitter is allowed to transmit blocks until the end of the current WU portion. At destination access node 238, the DBN indicates the last block that has been received after the reordering process is complete. The DBN is updated when a packet is received from the fabric. The distance between the GBN and the DBN represents the number of pending grant blocks that have not been received or are waiting for reordering at the receive queue 242.
When a data packet arrives out of order at destination access node 238, it passes through a packet reordering engine. At the end of the reordering process, the packet is sent to one of the processing cores (e.g., core 140 from fig. 9). If a packet is lost in the fabric, the reordering engine times out and proceeds to the next packet if there are more packets in the tunnel after the lost packet. If the packet is the last packet in the sender queue at the source access node 236, then the loss may be detected after the above-described timeout grant. The source access node 236 may send a zero-length packet in response to the timeout authorization and the destination access node 238 updates its status when a zero-length packet is received. The lost packets are recovered by upper layer protocols.
Fig. 14 is a conceptual diagram illustrating example FCP operations for transmitting an incoming packet flow from a source access node to a destination access node. The primary goal of the FCP protocol is to transfer incoming packet flows from one endpoint to another endpoint with predictable latency in an efficient manner to maximize fabric utilization. The source endpoint ejects packets between the available paths. The destination endpoint reorders the packets of the queue pair based on the packet sequence number. Conceptually, FIG. 14 depicts handshaking between source/destination queues.
The example of fig. 14 includes two source access nodes 250A and 250B (collectively "source nodes 250"), each having a queue 254A, 254B of packets to be transmitted to the same destination access node ("DN") 252. Destination access node 252 maintains a request queue 256. Source access node 250 requests the bandwidth of packets within queues 254A, 254B by sending request messages (shown as dashed lines) to respective request queues 256 at destination access node 252. The request is throttled using a Rate Limiter (RL) of the source access node 250.
In response to the grant message, source node 250 transmits packets (illustrated in dashed lines) from queues 254A, 254B to destination access node 252. At packet reordering engine 257 of destination access node 252, packets are reordered on a per tunnel context before being pushed to application queue 259. The example of fig. 14 shows destination access node 252 performing packet reordering and enqueuing packets after reordering is complete. Due to packet loss, the reordering engine times out and enqueues the next packet in order for processing.
To reduce the amount of reordering resources required to support the protocol, the endpoint node does not reorder the request/grant message when it receives it. Instead, the sliding window queue block sequence numbers are cumulative. Due to the sliding window nature of the request/grant handshake, each new message provides updated information about the window. Therefore, the receiver only needs to pay attention to the message with the update window up. The block sequence number is used so that the endpoint node only needs to remember the highest sequence number received for each type of message that updates the forward window move.
Fig. 15 is a conceptual diagram illustrating an example FCP source access node operational flow. First, packets/payloads to be transported over the network fabric are enqueued in a packet queue to await authorization (270), (272) for transmission of the packet/payload to a destination access node. Packet queue manager 260 maintains queues for both FCP and non-FCP traffic flows (272). FCP and non-FCP packets should be pushed into separate queues.
The packet queue manager 260 sends information about the enqueued packet/payload size to update the FCP source queue status at the FCP sender status processor 262 (274). FCP sender status handler 262 maintains the FCP status of each queue that is used to generate request messages (276), (278) to be sent to the destination access node. For non-FCP queues, FCP sender state handler 262 may operate in an Infinite grant mode, where grants are generated internally, as if the grants were received from the fabric. After the FCP bandwidth requirement is met, the non-FCP queues obtain the residual bandwidth. The FCP requirements include request messages, grant messages, and FCP data packets.
Based on the FCP source queue status of the non-empty FCP queue (QBN > RBN), FCP sender status handler 262 participates in request generation by generating requests to request scheduler 264 (276). The request scheduler 264 may include up to eight priority-based request queues to schedule request messages for transmission over the network fabric to the destination access node (278). Rate limiting (mmps) and throttling (controlling bandwidth rate) the request message based on the requested payload size to manage fabric congestion.
FCP sender status handler 262 generates internal grants for non-FCP queues as well as Unsolicited decision queues (i.e., queues of QBN-GBN < unolicified _ Threshold). The non-FCP internal grants, unsolicited internal grants, and fabric grants are enqueued in a de-queue of the packet scheduler 266 (282). Since arrivals may be out of order, FCP sender state processor 262 parses incoming fabric grants for FCP source queue states (280). The accepted FCP grant is queued in a de-queue of the packet scheduler 266 (282).
The packet scheduler 266 maintains two sets of queues, one for non-FCPs and one for FCPs (based on grant messages). The packet scheduler 266 may be considered a hierarchical scheduler for FCP packets with strict priority that allows non-FCP packets to use the remaining bandwidth. Alternatively, packets may be scheduled between FCP/non-FCP flows based on a Weighted Round Robin (WRR). In general, a global rate limiter should be used to limit the total bandwidth flowing from the source node. The FCP packet queues may be serviced on an SRR (strict round robin) basis and the winning packets are sent to the packet queue manager 260(284) to dequeue and send packet descriptors for transmission processing and queuing (286). The non-FCP packet queues may be served based on WRR scheduling.
The packet queue manager 260, after dequeuing the packet/payload (286), sends a size update to the FCP source queue state at the FCP sender state processor 262(274) and the request pacer. In the case of payload dequeuing, the packet may result in one or more packets due to the MTU fragment of the payload in response to the grant message. Each new packet on the tunnel is marked with a running sequence number for each tunnel packet. The packet buffer stores all outgoing FCP packets and packet handles including tunnel IDs and packet sequence numbers.
FCP source node operation can be divided into the following major parts: transport buffer management, request generation, and packet scheduler.
Transmission buffer management at a source access node is briefly described herein. The FCP queue stores packet descriptors to be transmitted. The packet descriptor has the size and address of the payload stored in the transmit buffer. The term payload is used to indicate a packet or large segment to be transported. The transfer buffer may be held in an external memory (e.g., external memory 150 from fig. 9), but an on-chip memory (buffer memory) may also be used as the transfer buffer (e.g., on-chip memory unit 134 from fig. 9). At the source access node, a processor (e.g., within networking unit 142 of fig. 9) is associated with the flow and is responsible for fetching the payload from host memory to the transport buffer. The flow processor may be associated with a connection in a server and have credit-based flow control. The stream processor may retrieve the allocated number of descriptors from the descriptor queue to avoid head-of-line (head-of-line) blocking.
For each FCP queue, four block numbers are maintained as FCP queue states, as described above with respect to fig. 13. The window from RBN to GBN indicates the "request window" that is requested on the fabric. The window from QBN to DBN indicates a "transmission window" and represents the block stored in the transmission buffer. Assuming that most of the time DBN-GBN, the transmission window is equal to QBN-GBN. The window from QBN to RBN should be large enough to obtain data from host memory and generate work units for the FCP queues. The RBN will eventually reach the QBN in the process or request generation based on the request window based back pressure of the flow processor sent to the source access node.
By default, the FCP limits the size of the "request window" to a Maximum Request Block Size (MRBS) based on the maximum queue drain rate and the round trip time from the destination queue (FCP request to FCP grant). The value of the MRBS is software programmed based on the estimated maximum queue drain rate and RTT (also known as BDP or bandwidth delay product). After the FCP queue has reached its maximum allowed request window, it should assert flow control to the flow processor. The maximum allowed request window is a function of the request window scale factor and the MRBS. The reduction factor may be used directly to calculate the maximum allowed request window or may be derived based on a table lookup. The maximum allowed request window determines a back pressure to send back to the stream processor based on the unrequested blocks in the queue.
The flow processor calculates the flow weights based on the amount of data that needs to be transferred using a given FCP queue. The derived flow weights are dynamic entities of the queues that are continually updated based on the dynamics of the transmit work requirements. The sender communicates the flow weight to the destination node via each outgoing FCP request message.
The destination estimates a source queue drain rate based on the source queue flow weights for all of the topped flows. In other words, it generates a reduction factor for a given source based on the ratio of the amount of work required by the given source node to the total amount of work that all active source nodes viewed by the destination need to process. When a request arrives, the destination node maintains a sum of all flow weights by maintaining the flow weight for each individual queue in its database. The grant scheduler at the destination access node calculates a "scaled down" value for the source access node and sends the factor with each FCP grant message.
When a queue becomes empty and grant data is received, the queue is considered idle and the flow weights may be reset by the aging timer so that it does not participate in the overall flow weight. Once the queue is empty at a source similar to the destination, the transmitter may shrink by aging timer reset. The software may also program the Global Transmission Buffer Size (GTBS). The value of the GTBS indicates the size of the transmission buffer. The software should reserve separate transmission buffers for different traffic priority classes. The FCP asserts flow control if the total transmit buffer in all FCP queues reaches the GTBS limit. The buffer may also be partitioned into separate GTBS pools on a priority/category basis or may be managed as a single entity with separate thresholds for each category/priority.
Request message generation at a source access node is described. Although this is an example implementation, the request scheduler in the FCP operation can be divided into two functions: request scheduling and rate limiting.
In the request scheduling function, the FCP queue of each request is arbitrated by the request scheduler to issue the request. FCP queues are divided into priority-based groups (e.g., up to 8 priorities) for scheduling purposes. The request scheduler may select one of the priority groups by a hierarchical shortage weighted round robin (DWRR) scheme. Once a priority group is selected, FCP queues within the priority group are serviced in a Round Robin (RR) fashion.
When a queue schedules an FCP request, the request may carry up to the maximum configured request size for the requested block, or until the end of the queue. If there are more unrequested blocks in the FCP queue (QBN > RBN), then it is only allowed to participate in the request scheduler. It is assumed that the stream processor of the source access node will react to the request window reduction factor from the destination and stop placing WUs in the source queue. The incoming grant bearer may increase/decrease the scaling factor of the allowed request window.
In the rate limiting function, the request rate is controlled so that the data request from the source access node does not exceed the rate at which it can transmit data. This rate (referred to as the requested data rate limiter) should be software programmable. As one example, a source access node may be able to obtain more than 400G of host bandwidth from its PCIe interface, but may only support 200G of outgoing network connectivity. If the source access node is allowed to send all requests of about 400G to different destination access nodes, and if the source access node receives an overhead of grants (grant collision), it will not be able to deliver the committed bandwidth to the destination access node. In this example, the source access node will cause near-end congestion, thereby becoming the master controller for traffic admitted into the fabric. The destination grant scheduler will no longer be able to pull data from the source access node with a predictable latency or RTT.
According to the techniques described in this disclosure, a request data rate limiter makes a request based on the capabilities of the transmitted data rate. The rate limiter uses the block size carried in the request message to adjust the speed of the request message. For each packet, the block size is rounded to the block boundary and corrections are performed on the request pacers as the actual packets are transferred to the fabric. Similarly, the request data rate limiter is charged whenever a speculative or non-FCP packet is transmitted, so that the transmission bandwidth of the source node is not oversubscribed at any time. Returning to the example above, where the source access node supports 200G of outgoing network connectivity, the outgoing request may be adjusted to a throughput (1-epsilon) of approximately 200G, where epsilon is a fraction between 0-1. By varying epsilon, the FCP can limit the rate at which the source access node can generate requests to the fabric. In some examples, the source access node may also control the bandwidth consumed by the request message itself. As a result, the source access node may include another rate limiter, referred to as a request control rate limiter.
Packet scheduler operation at the source access node is briefly described herein. The source access node schedules FCP/non-FCP packets based on incoming grant messages (FCPs) and based on scheduling criteria and buffer occupancy (non-FCPs). Traffic flows from the FCP/non-FCP queues may optionally be rate limited and DWRR arbitrated separately, or FCP traffic may be configured with strict priority. The overall traffic is limited by a global rate limiter to limit outgoing traffic to a maximum bandwidth throughput. The non-FCP scheduler may receive each non-FCP queue back pressure from each queue packet port buffer due to destination queue congestion. The non-FCP scheduler schedules packets to queues that are not back-pressured. When not rate limited or bandwidth sharing limited, the FCP packet can only withstand temporary link-level data path back pressure from downstream modules. The overall bandwidth rate limiter controls the amount of bandwidth injected into the network if FCP grants cause temporary grant congestion at the source access node. Since the overall grant and request rates are controlled to operate at slightly less than the overall maximum binary bandwidth, source queue congestion may only be temporary. The shares of FCP traffic and non-FCP traffic may be explicitly divided. In addition, the network ensures that delivery of FCP packets (i.e., data/requests/grants) is prioritized over non-FCP traffic. For example, if non-FCP traffic is congested, the network may drop non-FCP packets. However, FCP packets should not be dropped because congestion in FCP traffic may be temporary due to end-to-end admission control.
The non-FCP packet/payload segment is scheduled whenever the non-FCP queue is non-empty. If traffic needs to be shared between the FCP/non-FCP queues, outgoing non-FCP packets are enqueued with the packet scheduler where they are rate limited. Conventional FCP packets/payload segments are scheduled when grants for queues are received. The FCP packet queue has the highest priority and takes precedence over non-FCPs. The source access node sends traffic up to the current packet/segment boundary and updates the DBN based on the transmitted packet size. Any additional bytes sent by the source access node due to packet boundary transfer constraints are compensated at the grant pacer at the destination access node. Outgoing packets may not always end at a block boundary. For each outgoing packet, the rounding error is compensated at the request pacer.
In this manner, the techniques of this disclosure enable delayed packet fragmentation at the source access node until the FCP grant message is received. Upon receipt of the grant message, transport layer FCP packet segmentation may be performed on the data identified in the queue. The generated FCP packet may then include additional data received from the processing core after the request message is sent but before the grant message is received for the queue.
Allowing streamlets to send packets without explicit request grant handshakes may reduce latency and network overhead. However, speculative bandwidth should be used with great care, as it may cause the destination access node to be overwhelmed by unsolicited topping traffic. According to the disclosed techniques, each source access node may be allowed to use a particular share of its bandwidth (destination node buffer) for unsolicited traffic, and if the accumulation of an unauthorized queue is small and below a certain threshold, the queue may allow unsolicited packets to be sent without waiting for an explicit request/grant message exchange. If the unauthorized queue size is small and the source access node has an available bandwidth share for the unsolicited traffic, the unsolicited packet can only be sent by the source access node. FCP packets are serviced in the order of grant arrival for packets scheduled due to FCP grant arrival, or in enqueue order for unsolicited packets. Unsolicited packets may have lower latency because they avoid the round trip delay of the request and grant message exchange.
Fig. 16 is a conceptual diagram illustrating an example FCP destination access node operational flow. The FCP receiver state processor 310 maintains an FCP egress context, such as an RBN, GBN, DBN, etc., for each queue. The egress reorder status processor 312 maintains a database of packet reordering contexts for each tunnel. The FCP grant scheduler 314 may support two or more grant queues for high and low priority. Grants may be rate limited/adjusted by grant rate limiter 316 based on fabric congestion.
The FCP receiver state handler 310 receives request messages from the fabric (290) and, after initial parsing (e.g., filtering duplicate entries), the accepted request messages update the FCP egress at the FCP receiver state handler 310 per queue context. Once the request queue at the FCP receiver state handler 310 is not empty, it is scheduled for grant generation by the grant scheduler 314 (292). When the grant rate limiter 316 allows the next grant message to be generated, the winner queue is allowed to send a grant message (294). The grant scheduler 314 reacts 296 to the reorder buffer status at the egress reorder status processor 312 and stops sending all new grants if the reorder buffer status (out of order bytes, in-flight grants, and buffer occupancy) reaches a limit. The admission may also react to fabric congestion and failures, and the grant rate may be adjusted in response to fabric congestion metrics. The basic authorization rate is configured by software. The grant size of each grant is based on the request queue size and is limited to a maximum allowed grant size.
The network fabric interface receives the packets and stores them in packet receive buffer 318 to await reordering (298). Once the packets are reordered, the packets are enqueued in downstream blocks (300). The egress reorder status processor 312 maintains each tunnel reorder status context. A reordering engine at the egress reordering state processor 312 performs reordering based on arrival of packets on the tunnels and maintains a reordering timer on a per-tunnel basis. If the tunnel has out-of-order packets and the expected packet does not arrive within the reordering timer timeout period (-2 xRTT), the timeout will cause the reordering engine to skip the packet and search for the next packet.
FCP destination node operation can be divided into the following major components: grant generation, fabric load balancing, and receive buffer management.
The generation of the grant at the destination access node is briefly described herein. The grant generation operation may be divided into a grant queue scheduler and a grant pacer. The grant scheduler provides flow fair bandwidth allocation for traffic delivered to the destination access node (described in more detail below with respect to fig. 17A-17B). The grant scheduler also limits grants based on buffer usage, number of outstanding grant blocks, and status of the reorder buffer.
The FCP queues are divided into tunnels and priorities. The FCP grants the scheduler to group the queues for scheduling based on their priority (e.g., up to 8 priorities). The grant scheduler may select one of the priority groups by a strict priority or hierarchical shortage weighted round robin (DWRR) scheme. On top of each priority group schedule, a flow aware algorithm may be used to arbitrate between FCP queues that are part of the priority group. Incoming stream weights from the FCP queues may be normalized and used by the DWRR grant scheduler to update credits to the arbitrated FCP queues.
The grant pacemakers provide admission control and manage fabric congestion. The grant pacer may be implemented at a leaky bucket that allows grants to be sent whenever the bucket level drops below a certain threshold. When an authorization is sent, the bucket will load a block of size authorization in the authorization message. Buckets leak at a rate (software programming) that is a function of the incoming fabric rate and the number of active fabric links connected to the chassis. Based on the actual arriving packet size and the non-FCP packets, the grant pacers are compensated to correct so that the fabric remains uncongested for long periods of time.
The destination access node controls the rate of incoming data packets by adjusting the FCP grant using the grant data rate limiter and the grant control rate limiter, similar to the request data rate limiter and the request control rate limiter described above with respect to the source access node operation. In addition, the grant pacer tracks pending blocks on the fabric by incrementing an authorized block counter when sending the FCP grant message, and decrementing the counter with the data block count when receiving the FCP packet. The grant pacer also tracks pending packets in the reorder buffer and stops generating new FCP grants if the reordered pending packets are greater than a threshold.
In accordance with the techniques of this disclosure, a destination access node may perform Explicit Congestion Notification (ECN) marking of FCP packets based on a global view of packet flows in a switch fabric. The grant scheduler provides a unique view of the total load based on the total number of all pending requests seen at the grant scheduler. ECN marking based on global load seen by the destination endpoint provides a significant improvement over ECN marking based on local congestion seen by individual switches/paths of the fabric. Since data center TCP implementations rely on the widespread use of ECNs to manage congestion, ECN marking based on a global view of the output egress queues at the grant scheduler has a significant improvement compared to discontinuous and local views of some paths through the fabric and provides better congestion management at the TCP level.
The fabric load balancing at the destination access node is briefly described herein. FCP requires that all outgoing fabric links be balanced. One example implementation is to use randomly shuffled (shuffled) DRRs. The SDRR is a conventional starved cyclic scheduler that carries equal weight for all available links. The random shuffling of RR pointers provides randomness in link selection and allows the fabric to not follow a set pattern.
Receive buffer management at a destination access node is briefly described herein. If the RBN of the grant scheduler is ahead of the GBN and grant pacer credits are available, the grant scheduler generates an FCP grant message for the queue. The source access node transmits the data packet after receiving the queued FCP grant message. The destination access node stores the incoming data packet in a buffer memory. The destination access node reorders the work unit messages based on the packet sequence number and sends the work units to an associated flow processor in the destination access node. The stream processor may have descriptors (addresses of the host memory) and may move data from the receiver buffer in the on-chip buffer memory to the host memory in the server. If the stream processor cannot move data from the buffer memory to the host memory, it should move the data to an external memory (e.g., external memory 150 of FIG. 9).
Fig. 17A and 17B are conceptual diagrams illustrating an example of flow fairness implemented at a destination access node using an FCP grant scheduler. If the grant scheduler generates grants without knowledge of the number of flows per source access node, the bandwidth may be divided unfairly among the flows. The following example with respect to fig. 17A illustrates unfair bandwidth allocation. The amount of bandwidth used in this example is purely illustrative and not limiting. Two sources (source 0 and source 1) are sending traffic to the destination. Two streams (stream 0 and stream 1) are active at source 0 and one stream (stream 2) is active at source 1. Each flow wants to send traffic at a rate of 100G, so that source 0 sends a 200G request message and source 1 sends a 100G request message. The destination allocates bandwidth between the two sources regardless of the number of active streams at each source. The destination churn rate is 200G and the destination divides the bandwidth by the number of sources (i.e., 2) and sends grant messages to source 0 at 100G and to source 1 at 100G. Source 0 allocates its 100G bandwidth between its two streams so that stream 0 and stream 1 are each granted a 50G rate. However, stream 2, active at source 1, is authorized for a full 100G rate. As a result, stream 0 and stream 1 sent from source 0 experience a higher end-to-end delay than stream 2 sent from source 1 experiences a nominal or lower end-to-end delay.
In accordance with the techniques of this disclosure, as illustrated in fig. 17B, the grant scheduler is configured to allocate bandwidth in proportion to the number of active flows at each source and equalize the latency experienced by all flows. Again, the amount of bandwidth used in this example is purely illustrative and not limiting. To assist in grant scheduling in a fair manner, each source (source 0 and source 1) sends its expected load to the destination through the flow weights carried in the request messages. In this example, source 0 sends a 200G request message with flowCount (2) and source 1 sends a 100G request message with flowCount (1) (e.g., weight ═ flow number, since all flows are for the same bandwidth in this example). The destination grant scheduler schedules grants to the source according to the passed weights. Again, the destination churn rate is 200G, and the destination divides the bandwidth by the number of streams (i.e., 3) and sends grant messages to source 0 of 133.3G and source 1 of 66.6G. Source 0 allocates its 133.3G bandwidth between its two streams such that stream 0 and stream 1 are granted 66.6G rates, respectively, and source 1 is also granted 66.6G rate from stream 2, which is active.
By performing flow fair grant scheduling, the destination provides a fair allocation of bandwidth to the head-added source in response to its expected load. With this modification, the technique can achieve flow fairness. As shown in fig. 17B, all streams (stream 0, stream 1, and stream 2) are granted similar bandwidth and experience similar latency. The grant scheduler may continually update the flow weights with incoming requests. The source may change its expected weight at any time and the grant scheduler may adjust the bandwidth allocation based on the new weight.
Fig. 18-19 illustrate example formats of FCP packets. In these examples, each FCP packet includes at least an ethernet header, an IP header, and an FCP header. The FCP data packet format of fig. 19 also includes a data payload. Each FCP packet may include an optional UDP header and an optional FCP security header and/or an optional Integrity Check Value (ICV). In some examples, the FCP packet may be carried over UDP over IPv4, thus including an optional UDP header. In other examples, FCP packets may be carried directly through IPv 6.
Each example FCP packet includes an FCP header to carry information on the other side. The FCP header may be a multiple of 4 bytes and may vary in size. The FCP header may generally include an FCP version field, an FCP packet type field (e.g., request, grant, data, or control), a next protocol field identifying a protocol (e.g., IPv4 or IPv6) following the FCP header, an FCP flag (e.g., Global Port Health (GPH) matrix size, timestamp present, FCP security header present), an FCP tunnel number local to the destination access node, an FCP QoS level, one or more FCP block sequence numbers, and optional fields of the GPH matrix, timestamp, and FCP security header indicated by the FCP flag. The FCP header field may be protected with an ethernet frame Cyclic Redundancy Check (CRC) or FCP security header (if present).
As described above, the FCP control software establishes a bidirectional tunnel between the source access node and the destination access node. The FCP tunnel is optionally protected (encryption and authentication). In examples where the FCP control software provides end-to-end encryption and authentication for the tunnel, the control protocol may handle the creation and distribution of keys for use by the encryption algorithm. In these examples, the FCP frame format may include four different contiguous regions defined by whether the data is encrypted and/or authenticated. For example, headers before the FCP (e.g., ethernet headers, IP headers in the IP headers except for source and destination addresses, and UDP headers) are not encrypted or authenticated; the source and destination addresses of the IP header, FCP security header and some payload (in the case of data packets) are authenticated but not encrypted; the remaining payload is both encrypted and authenticated; and appends the ICV to the frame. In this way, the block sequence numbers (e.g., RBN, GBN, DBN, and/or PSN) carried in the FCP header are authenticated but not encrypted. The authentication of the block sequence number avoids spoofing of request and grant messages and protects the source/destination queue state machine. In addition, jetting FCP packets of a packet stream across all available data paths is difficult, if not impossible, to listen or sniff for encrypted data within the packet stream, as a listener or sniffer would need to access encrypted packets on each data path.
Fig. 18 is a conceptual diagram illustrating an example format of an FCP control packet for a request message or an authorization message. In the case of a request message, the source access node will generate an FCP request packet. The FCP header of the FCP request packet carries the RBN (request block number) and an FCP request weight field that identifies the flow weight of the request packet. A grant scheduler at the destination access node may fairly allocate egress bandwidth for FCP grant generation using flow weights. In the case of the grant message, the destination access node will generate an FCP grant packet. The FCP header of the FCP grant packet carries the GBN (grant block number) and FCP shrink field to request that the request window be shrunk at the source access node.
Fig. 19 is a conceptual diagram illustrating an example format of an FCP data packet. The source access node sends the FCP data packet in response to the FCP grant message. The FCP header of an FCP data packet includes the PSN (packet sequence number) and the DBN (data block number). The source access node may optionally send a null FCP data packet with zero payload bytes and a "next protocol" field programmed to "no payload".
Fig. 20 is a block diagram illustrating an example system with a packet-switched network having multiple network access node virtual fabrics dynamically configured on the packet-switched network in accordance with the techniques described herein. As illustrated in fig. 20, a customer 411 is coupled to a packet-switched network 410 through a content/service provider network 407 and a gateway device 420. The service provider network 407 and the gateway device 420 may be substantially similar to the service provider network 7 and the gateway device 20 described with respect to fig. 1. Access nodes 417A-417G (collectively, "access nodes 417") are coupled to packet-switched network 410 to process information flows, such as network packets or storage packets, between groups of servers (not shown in fig. 20) connected to access node 417, which access node 417 provides the computing and storage facilities for applications and data associated with customer 411. Access node 417 may operate substantially similar to any of access node 17 or access node 132 described in detail above. Access node 417 may also be referred to as a Data Processing Unit (DPU) or a device that includes a DPU.
In the illustrated example of fig. 20, a Software Defined Network (SDN) controller 421 provides a high-level centralized controller for configuring and managing the routing and switching infrastructure of the packet-switched network 420. SDN controller 421 provides a logically and, in some cases, physically centralized controller to facilitate operation of one or more virtual networks within packet switched network 420. In some examples, SDN controller 421 may operate in response to configuration input received from a network administrator.
In accordance with the described techniques, SDN controller 421 is configured to establish one or more virtual fabrics 430A-430D (collectively "virtual fabrics 430") as overlay networks on top of the physical underlying network of packet-switched network 410. For example, the SDN controller 421 learns and maintains knowledge of the access nodes 417 coupled to the packet-switched network 410. SDN controller 421 then establishes a communication control channel with each access node 417. SDN controller 421 uses its knowledge of access nodes 417 to define multiple sets (groups) of two or more access nodes 417 to establish different virtual structures 430 on packet-switched network 420. More specifically, the SDN controller 421 may use a communication control channel to inform each access node 417 of a given set of which access nodes are included in the same set. In response, access node 417 dynamically establishes FCP tunnels with other access nodes included in the same set as the virtual fabric over packet-switched network 410. In this way, SDN controller 421 defines a set of access nodes 417 for each virtual structure 430, and the access nodes are responsible for establishing virtual structures 430. Accordingly, packet switched network 410 may not be aware of virtual fabric 430.
In general, access node 417 interfaces and utilizes packet switched network 410 to provide a full mesh (any to any) interconnection between access nodes of the same virtual fabric 430. In this way, a server forming one of the given virtual fabrics 430 connected to any one of the access nodes may communicate packet data of a given packet flow to any other of the servers coupled to the access node of that virtual fabric using any one of a plurality of parallel data paths within the packet switched network 410 interconnecting the access nodes of that virtual fabric. Packet switched network 410 may include one or more data centers, a Local Area Network (LAN), a Wide Area Network (WAN), or a collection of one or more networks of routing and switching fabrics. The packet switched network 410 may have any topology, e.g., flat or multi-tiered, as long as there is a full connection between access nodes 417 of the same virtual fabric. Packet switched network 410 may use any technology, including IP over ethernet, among others.
In the example illustrated in fig. 20, SDN controller 421 defines four sets of access nodes for which respective virtual structures should be established. SDN controller 421 defines the first group to include access nodes 417A and 417B, and access nodes 417A and 417B set FCP tunnels configured to traverse any available paths between the two access nodes through packet-switched network 410 as virtual fabric 430A. In addition, SDN controller 421 defines the second group to include access nodes 417B-417D, and access nodes 417B-417D set FCP tunnels to virtual fabric 430B, where the FCP tunnels are configured to similarly traverse any available paths between access nodes through packet-switched network 410. SDN controller 421 defines the third group as including access nodes 417D and 417E, and access nodes 417D and 417E set the FCP tunnel as virtual fabric 430C. SDN controller 421 defines the fourth group to include access nodes 417E-417G, and access nodes 417E-417G set the FCP tunnel as virtual fabric 430D. Although shown generally as dashed arrows in fig. 20, the FCP tunnels for the four virtual fabrics 430 are configured by the access nodes 417 of each group to traverse any available path or subset of available paths through the packet switched network 410 for the access nodes of a particular virtual fabric.
The access node 17 for the defined group uses FCP control software to establish FCP tunnels with other access nodes for the same group to establish a virtual fabric to support packet injection over available paths. For example, for virtual fabric 430A, the FCP tunnel between access node 417A and access node 417B of virtual fabric 430A includes all or a subset of the paths through packet-switched network 410 between access nodes 417A and 417B. Access node 417A may then inject a single packet for the same packet flow across some or all of the multiple parallel data paths in packet switched network 410 to access node 417B, and access node 417B may perform packet reordering to provide a full mesh connection within virtual fabric 430A.
Each virtual fabric 430 may be isolated from other virtual fabrics established on packet switched network 410. In this manner, the access node of a given one of virtual fabrics 430 (e.g., virtual fabric 430A) may be reset without affecting other virtual fabrics 430 on packet-switched network 410, and in addition, different security parameters may be exchanged for the set of access nodes 417 defined for each virtual fabric 430. As described above, FCP supports end-to-end encryption of tunnels. In the case of virtual fabrics, SDN controller 421 may create and assign a different encryption key for each different virtual fabric 430 for use by access nodes within a defined set of access nodes. In this manner, only the set of access nodes of a given one of virtual fabrics 430 (e.g., virtual fabric 430A) may decrypt packets exchanged on virtual fabric 430A.
Fig. 21 is a flow chart illustrating an example of operation of a network system in accordance with the techniques described herein. For ease of illustration, the flow diagram of fig. 21 is described with respect to network system 8 of fig. 1, including server 12, access node 17, and virtual fabric 14 of data center 10. However, the techniques illustrated in fig. 21 are readily applicable to other example network implementations described herein.
As shown in this example, a set of access nodes 17 exchange control plane messages to establish a logical tunnel over multiple parallel data paths, the logical tunnel providing a packet-based connection between the access nodes (510). For example, with respect to fig. 1, the switching fabric 14 may include one or more layers of switches and/or routers that provide multiple paths for forwarding communications between access nodes 17. A corresponding pair of access nodes 17 exchange control plane messages to negotiate logical end-to-end tunnels configured over multiple parallel paths between the access nodes, possibly in response to direction from the SDN controller 21.
Once the logical tunnel is established, one of the access nodes (referred to as the "source access node" in fig. 21) may receive outbound packets associated with the same packet flow from, for example, application or storage source server 12 (512). In response, the source access node sends an FCP request message to request the amount of data to be transmitted in the packet flow (514). In response to receiving the FCP request message, another one of the access nodes (referred to as the "destination access node" in fig. 21) performs a grant schedule (522) and sends an FCP grant message (524) indicating the amount of bandwidth reserved for the packet flow.
Upon receiving the FCP grant message from the destination access node, the source access node encapsulates the outbound packets within the payloads of the FCP packets, thereby forming each FCP packet to have a header for traversing the logical tunnel and a payload containing one or more outbound packets (516). The source access node then forwards the FCP packet by injecting the FCP packet across the parallel data paths on the switch fabric 14 (518). In some example implementations, the source access node may inject FCP packets across a subset of access nodes, e.g., forming one or more access node groups (e.g., within one or more logical chassis groups near the source access node), prior to forwarding the FCP packets across the switch fabric 14, thereby providing a first stage fan-out for distributing FCP packets over parallel data paths. In addition, as the FCP packet traverses the parallel data paths, each subset of access nodes may inject FCP packets into a subset of core switches included in the switch fabric 14, thereby providing a second level of fan-out to additional parallel data paths, thus providing higher network system scalability while still providing superior connectivity between access nodes.
Upon receiving the FCP packet, the destination access node extracts the outbound packet encapsulated within the FCP packet (526) and delivers the outbound packet to the destination server (528). In some examples, prior to extracting and delivering outbound packets, the destination access node first reorders the FCP packets to the original sequence of packet flows sent by the source server. The source access node assigns a packet sequence number to each of the FCP packets of the packet flow, enabling the destination access node to reorder the FCP packets based on the packet sequence number of each of the FCP packets.
Fig. 22 is a flow chart illustrating another example of operation of a network system in accordance with the techniques described herein. For ease of illustration, the flow diagram of fig. 22 is described with respect to network system 408 of fig. 20, including packet-switched network 410, access node 417, SDN controller 421, and virtual fabric 430. However, the technique illustrated in fig. 22 is readily applicable to other example network implementations described herein.
In this example, the server group is interconnected 610 with the packet-switched network 410 through an access node 417. SDN controller 421 of packet switched network 410 provides a high-level centralized controller for configuring and managing the routing and switching infrastructure of packet switched network 420. SDN controller 421 provides a logically and, in some cases, physically centralized controller to facilitate operation of one or more virtual networks within packet switched network 420. SDN controller 421 establishes virtual structures 430, each virtual structure comprising a set of two or more access nodes 417 (612). Virtual fabric 430 is established as an overlay network on top of the physical underlay network of packet switched network 410. More specifically, in response to the notification from SDN controller 421, a given set of access nodes (e.g., access nodes 417B, 417C, and 417D) exchange control plane messages to establish a logical tunnel as a virtual fabric (e.g., virtual fabric 430B) between the given set of access nodes over packet-switched network 410. The access node may establish the tunnel as a virtual fabric using FCP.
A first access node of virtual fabric 430B may receive a packet stream of packets from a source server coupled to the first access node and directed to a destination server coupled to a second access node of virtual fabric 430B. In response, the first one of the access nodes injects 614 a packet across the parallel data paths through the packet switched network 410 to the second one of the access nodes of the virtual fabric 430B. Upon receiving the packet, the second one of the access nodes of virtual fabric 430B delivers the packet to the destination server (616). In some examples, the second access node reorders the packets into the original sequence of the packet stream sent by the source server before delivering the packets.
Various examples have been described. These and other examples are within the scope of the following claims.
Claims (25)
1. A network system, comprising:
a plurality of servers;
a packet-switched network comprising a centralized controller; and
a plurality of access nodes, each of the access nodes coupled to a subset of the servers and to the packet-switched network,
wherein the centralized controller is configured to establish one or more virtual fabrics, wherein each of the virtual fabrics comprises two or more of the access nodes,
wherein when communicating a packet flow of packets between a source server and a destination server coupled to the access node for one of the virtual fabrics, a first one of the access nodes coupled to the source server is configured to: injecting, by the packet-switched network, the packets of the packet flow across a plurality of parallel data paths to a second one of the access nodes coupled to the destination server, and
wherein the second one of the access nodes is configured to deliver the packet to the destination server.
2. The network system of claim 1, wherein to establish the one or more virtual fabrics, the centralized controller is configured to: multiple sets of two or more access nodes are defined for each of the different virtual structures, and each of the access nodes for a given set is informed of other access nodes included in the same given set.
3. The network system of claim 2, wherein to establish the one or more virtual fabrics, the access node for the given set is configured to: establishing a tunnel with the other access nodes included in the given set as the virtual structure on the packet-switched network, wherein the tunnel includes all or a subset of the plurality of parallel data paths through the packet-switched network between the access nodes for the given set.
4. The network system of claim 3, wherein the access nodes for the given set use a Fabric Control Protocol (FCP) to establish the tunnel as the virtual fabric, the tunnel including all or the subset of the plurality of parallel data paths through the packet-switched network between the access nodes for the given set.
5. The network system according to claim 4, wherein,
wherein the first one of the access nodes is configured to: transmitting an FCP request message for an amount of data to be transmitted in the packet stream, and in response to receipt of an FCP grant message indicating an amount of bandwidth reserved for the packet stream, injecting FCP packets of the packet stream across the plurality of parallel data paths in accordance with the reserved bandwidth, an
Wherein the second one of the access nodes is configured to: performing a grant scheduling in response to receipt of the FCP request message and sending the FCP grant message indicating the amount of bandwidth reserved for the packet flow, and delivering the data transmitted in the packet flow to the destination server in response to receipt of the FCP packet of the packet flow.
6. The network system of claim 1, wherein to deliver the packet to the destination server, the second one of the access nodes is configured to: reordering the packets into an original sequence of the packet stream and delivering the reordered packets to the destination server.
7. The network system of claim 1, wherein the centralized controller is configured to: resetting the access node for one of the virtual fabrics without affecting other virtual fabrics established on the packet-switched network.
8. The network system of claim 1, wherein the centralized controller is configured to: exchanging different security parameters for each of different virtual fabrics established on the packet-switched network.
9. The network system of claim 8, wherein the centralized controller is configured to: assigning a different encryption key for each of the different virtual fabrics for use by the access nodes included in the virtual fabrics, such that only the access node for a given one of the virtual fabrics may decrypt packets exchanged on the given one of the virtual fabrics.
10. The network system of claim 1, wherein each of the virtual fabrics comprises an overlay network and the packet-switched network comprises an underlay network.
11. The network system of claim 1, wherein to inject the packets of the packet stream across the plurality of parallel data paths, the first one of the access nodes is configured to: ejecting the packets of the packet stream by directing each of the packets to a randomly, pseudo-randomly, or cyclically selected one of the parallel data paths.
12. The network system of claim 1, wherein to inject the packets of the packet stream across the plurality of parallel data paths, the first one of the access nodes is configured to: ejecting the packets of the packet stream by directing each of the packets to a least loaded one of the parallel data paths that is selected based on a byte count of each path.
13. The network system of claim 1, wherein to inject the packets of the packet stream across the plurality of parallel data paths, the first one of the access nodes is configured to: injecting the packets of the packet stream by directing each of the packets to a weighted randomly selected one of the parallel data paths that is proportional to available bandwidth in the one of the virtual structures.
14. The network system of claim 1, wherein the access node for the one of the virtual fabrics is configured to: providing full mesh connectivity over the packet-switched network between any pair-wise combination of the servers coupled to the access node for the one of the virtual fabrics.
15. The network system of claim 1, wherein the first one of the access nodes has full mesh connectivity with a subset of the access nodes included in a logical chassis as a first level of network fanout, and wherein the first one of the access nodes is configured to: injecting the packets of the packet flow to the subset of the access nodes included in the logical chassis on the primary network fan-out.
16. The network system of claim 15, wherein each of the access nodes has full mesh connectivity with a subset of core switches included in the packet-switched network as a second level network fan-out, and wherein each of the subset of access nodes included in the logical chassis is configured to: injecting the packets of the packet flow to the subset of core switches on the second level network fan-out.
17. The network system of claim 1, wherein the packet-switched network comprises a routing and switching fabric of one or more data centers, Local Area Networks (LANs), Wide Area Networks (WANs), or a collection of one or more networks.
18. A method, comprising:
interconnecting a plurality of servers by a packet-switched network and a plurality of access nodes, each of the access nodes being coupled to a subset of the servers and to the packet-switched network;
establishing, by a centralized controller of the packet-switched network, one or more virtual fabrics, wherein each of the virtual fabrics comprises two or more of the access nodes; and
communicating a packet flow of packets between a source server and a destination server coupled to the access node for one of the virtual fabrics, comprising:
injecting, by a first one of the access nodes coupled to the source server, packets of the packet flow across multiple parallel data paths through the packet-switched network to a second one of the access nodes coupled to the destination server, an
Delivering, by the second one of the access nodes, the packet to the destination server.
19. The method of claim 18, wherein establishing the one or more virtual structures comprises: defining, by the centralized controller, a plurality of sets of two or more access nodes for each of the different virtual structures, and notifying each of the access nodes for a given set of other access nodes included in the same given set.
20. The method of claim 19, wherein establishing the one or more virtual structures further comprises: establishing, by the access node for the given set, a tunnel with the other access nodes included in the given set using a Fabric Control Protocol (FCP) as the virtual fabric over the packet-switched network, wherein the tunnel includes all or a subset of the plurality of parallel data paths through the packet-switched network between the access nodes for the given set.
21. The method of claim 20, wherein communicating the packet flow of packets between the source server and the destination server comprises:
sending, by the first one of the access nodes coupled to the origin server, an FCP request message for an amount of data to be transmitted in the packet flow; and
in response to receipt of an FCP grant message indicating an amount of bandwidth reserved for the packet flow, injecting, by the first one of the access nodes, FCP packets of the packet flow across the plurality of parallel data paths in accordance with the reserved bandwidth.
22. The method of claim 20, wherein communicating the packet flow of packets between the source server and the destination server comprises:
performing, by the second one of the access nodes coupled to the destination server, a grant schedule in response to receipt of an FCP request message for an amount of data to be transmitted in the packet flow;
sending, by the second one of the access nodes, an FCP grant message indicating an amount of bandwidth reserved for the packet flow; and
delivering, by the second one of the access nodes, the data transmitted in the packet flow to the destination server in response to receiving the FCP packet of the packet flow.
23. The method of claim 18, wherein to deliver the packet to the destination server, the second one of the access nodes is configured to: reordering the packets into an original sequence of the packet stream and delivering the reordered packets to the destination server.
24. The method of claim 18, wherein the first one of the access nodes has full mesh connectivity with a subset of the access nodes included in a logical chassis as a first level of network fan-out, the method further comprising: injecting, by the first one of the access nodes, the packets of the packet stream to the subset of the access nodes included in the logical chassis on the first level network fan-out.
25. The method of claim 24, wherein each of the access nodes has full mesh connectivity with a subset of core switches included in the packet-switched network as a second level network fan-out, the method further comprising: injecting, by each access node in the subset of access nodes included in the logical chassis, the packets of the packet stream to the subset of core switches over the second level network fanout.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762566060P | 2017-09-29 | 2017-09-29 | |
US62/566,060 | 2017-09-29 | ||
US201862638788P | 2018-03-05 | 2018-03-05 | |
US62/638,788 | 2018-03-05 | ||
PCT/US2018/053586 WO2019068010A1 (en) | 2017-09-29 | 2018-09-28 | Network access node virtual fabrics configured dynamically over an underlay network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111201757A true CN111201757A (en) | 2020-05-26 |
CN111201757B CN111201757B (en) | 2022-04-26 |
Family
ID=63963474
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880063473.2A Active CN111201757B (en) | 2017-09-29 | 2018-09-28 | Network access node virtual structure dynamically configured on underlying network |
CN201880062872.7A Pending CN111149329A (en) | 2017-09-29 | 2018-09-28 | Architecture control protocol for data center networks with packet injection via multiple backup data paths |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880062872.7A Pending CN111149329A (en) | 2017-09-29 | 2018-09-28 | Architecture control protocol for data center networks with packet injection via multiple backup data paths |
Country Status (3)
Country | Link |
---|---|
US (4) | US10904367B2 (en) |
CN (2) | CN111201757B (en) |
WO (2) | WO2019068013A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113067819A (en) * | 2021-03-18 | 2021-07-02 | 哈尔滨工业大学 | Distributed asynchronous parallel detection algorithm for multi-path attack of MPTCP (Multi-path Transmission control protocol) |
Families Citing this family (96)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10805840B2 (en) | 2008-07-03 | 2020-10-13 | Silver Peak Systems, Inc. | Data transmission via a virtual wide area network overlay |
US10164861B2 (en) | 2015-12-28 | 2018-12-25 | Silver Peak Systems, Inc. | Dynamic monitoring and visualization for network health characteristics |
US9948496B1 (en) | 2014-07-30 | 2018-04-17 | Silver Peak Systems, Inc. | Determining a transit appliance for data traffic to a software service |
US9875344B1 (en) | 2014-09-05 | 2018-01-23 | Silver Peak Systems, Inc. | Dynamic monitoring and authorization of an optimization device |
US10397315B2 (en) * | 2016-05-26 | 2019-08-27 | Fujitsu Limited | Information processing apparatus and load distribution control method |
US10432484B2 (en) | 2016-06-13 | 2019-10-01 | Silver Peak Systems, Inc. | Aggregating select network traffic statistics |
US9967056B1 (en) | 2016-08-19 | 2018-05-08 | Silver Peak Systems, Inc. | Forward packet recovery with constrained overhead |
US11044202B2 (en) | 2017-02-06 | 2021-06-22 | Silver Peak Systems, Inc. | Multi-level learning for predicting and classifying traffic flows from first packet data |
US10771394B2 (en) | 2017-02-06 | 2020-09-08 | Silver Peak Systems, Inc. | Multi-level learning for classifying traffic flows on a first packet from DNS data |
US10892978B2 (en) | 2017-02-06 | 2021-01-12 | Silver Peak Systems, Inc. | Multi-level learning for classifying traffic flows from first packet data |
CN110731070A (en) | 2017-03-29 | 2020-01-24 | 芬基波尔有限责任公司 | Non-blocking arbitrary to arbitrary data center networks with grouped injection via multiple alternate data paths |
CN110710172A (en) | 2017-03-29 | 2020-01-17 | 芬基波尔有限责任公司 | Multiplexing non-blocking arbitrary to arbitrary data center networks of packet injection within a group of access nodes |
CN110710139A (en) | 2017-03-29 | 2020-01-17 | 芬基波尔有限责任公司 | Non-blocking full mesh data center network with optical displacers |
US10565112B2 (en) | 2017-04-10 | 2020-02-18 | Fungible, Inc. | Relay consistent memory management in a multiple processor system |
US10659254B2 (en) | 2017-07-10 | 2020-05-19 | Fungible, Inc. | Access node integrated circuit for data centers which includes a networking unit, a plurality of host units, processing clusters, a data network fabric, and a control network fabric |
CN117348976A (en) | 2017-07-10 | 2024-01-05 | 微软技术许可有限责任公司 | Data processing unit for stream processing |
US11212210B2 (en) | 2017-09-21 | 2021-12-28 | Silver Peak Systems, Inc. | Selective route exporting using source type |
WO2019068017A1 (en) | 2017-09-29 | 2019-04-04 | Fungible, Inc. | Resilient network communication using selective multipath packet flow spraying |
US10904367B2 (en) | 2017-09-29 | 2021-01-26 | Fungible, Inc. | Network access node virtual fabrics configured dynamically over an underlay network |
US10841245B2 (en) | 2017-11-21 | 2020-11-17 | Fungible, Inc. | Work unit stack data structures in multiple core processor system for stream data processing |
WO2019118356A1 (en) | 2017-12-11 | 2019-06-20 | Fungible, Inc. | Durable block storage in data center access nodes with inline erasure coding |
US10540288B2 (en) | 2018-02-02 | 2020-01-21 | Fungible, Inc. | Efficient work unit processing in a multicore system |
CN112866127B (en) * | 2018-02-14 | 2022-12-30 | 华为技术有限公司 | Method and device for controlling flow in packet network |
US10637721B2 (en) | 2018-03-12 | 2020-04-28 | Silver Peak Systems, Inc. | Detecting path break conditions while minimizing network overhead |
US11038993B2 (en) | 2018-03-14 | 2021-06-15 | Fungible, Inc. | Flexible processing of network packets |
WO2019237029A1 (en) | 2018-06-08 | 2019-12-12 | Fungible, Inc. | Directed graph traversal using content-addressable memory |
WO2019237010A1 (en) | 2018-06-08 | 2019-12-12 | Fungible, Inc. | Early acknowledgment for write operations |
US10656949B2 (en) | 2018-07-13 | 2020-05-19 | Fungible, Inc. | Instruction-based non-deterministic finite state automata accelerator |
US10983721B2 (en) | 2018-07-13 | 2021-04-20 | Fungible, Inc. | Deterministic finite automata node construction and memory mapping for regular expression accelerator |
WO2020047351A1 (en) | 2018-08-31 | 2020-03-05 | Fungible, Inc. | Rapidly establishing a chain of trust in a computing system |
WO2020051254A1 (en) | 2018-09-05 | 2020-03-12 | Fungible, Inc. | Dynamically changing configuration of data processing unit when connected to storage device or computing device |
US11102129B2 (en) * | 2018-09-09 | 2021-08-24 | Mellanox Technologies, Ltd. | Adjusting rate of outgoing data requests for avoiding incast congestion |
US11252109B1 (en) * | 2018-09-21 | 2022-02-15 | Marvell Asia Pte Ltd | Out of order placement of data in network devices |
CN110943933B (en) * | 2018-09-25 | 2023-09-01 | 华为技术有限公司 | Method, device and system for realizing data transmission |
US20210092103A1 (en) * | 2018-10-02 | 2021-03-25 | Arista Networks, Inc. | In-line encryption of network data |
US10958770B2 (en) | 2018-10-15 | 2021-03-23 | Fungible, Inc. | Realization of a programmable forwarding pipeline through packet header summaries in a data processing unit |
US11070474B1 (en) * | 2018-10-22 | 2021-07-20 | Juniper Networks, Inc. | Selective load balancing for spraying over fabric paths |
US10990478B2 (en) | 2019-02-01 | 2021-04-27 | Fungible, Inc. | Flexible reliability coding for storage on a network |
US10761931B2 (en) | 2018-10-24 | 2020-09-01 | Fungible, Inc. | Inline reliability coding for storage on a network |
US10929175B2 (en) | 2018-11-21 | 2021-02-23 | Fungible, Inc. | Service chaining hardware accelerators within a data stream processing integrated circuit |
US10887091B2 (en) * | 2018-11-27 | 2021-01-05 | Bae Systems Information And Electronic Systems Integration Inc. | Multi-hop security amplification |
US10932307B2 (en) * | 2018-12-31 | 2021-02-23 | Wipro Limited | Method and device for providing wireless data communication in datacenters |
WO2020190558A1 (en) | 2019-03-15 | 2020-09-24 | Fungible, Inc. | Providing scalable and concurrent file systems |
WO2020197720A1 (en) | 2019-03-27 | 2020-10-01 | Fungible, Inc. | Low latency packet switch architecture |
US10785094B1 (en) * | 2019-04-24 | 2020-09-22 | Cisco Technology, Inc. | Repairing fallen leaves in an SDN fabric using super pods |
US11418399B2 (en) * | 2019-04-30 | 2022-08-16 | Cisco Technology, Inc. | Multi-fabric deployment and management platform |
US11240143B2 (en) | 2019-05-02 | 2022-02-01 | Fungible, Inc. | Embedded network packet data for use of alternative paths within a group of network devices |
US11153202B2 (en) * | 2019-05-13 | 2021-10-19 | 128 Technology, Inc. | Service and topology exchange protocol |
US11329912B2 (en) | 2019-05-13 | 2022-05-10 | 128 Technology, Inc. | Source-based routing |
US11005749B2 (en) | 2019-05-13 | 2021-05-11 | 128 Technology, Inc. | Multicast source and receiver access control |
US11451464B2 (en) | 2019-05-13 | 2022-09-20 | 128 Technology, Inc. | Central authority for service and topology exchange |
US11070465B2 (en) | 2019-05-13 | 2021-07-20 | 128 Technology, Inc. | Distribution of multicast information in a routing system |
US10999182B2 (en) | 2019-05-13 | 2021-05-04 | 128 Technology, Inc. | Routing using segment-based metrics |
WO2020236272A1 (en) | 2019-05-23 | 2020-11-26 | Cray Inc. | System and method for facilitating fine-grain flow control in a network interface controller (nic) |
US11575777B2 (en) | 2019-05-27 | 2023-02-07 | Massachusetts Institute Of Technology | Adaptive causal network coding with feedback |
CN112039795B (en) * | 2019-06-04 | 2022-08-26 | 华为技术有限公司 | Load sharing method, device and network equipment |
US11064020B2 (en) | 2019-06-25 | 2021-07-13 | Western Digital Technologies, Inc. | Connection load distribution in distributed object storage systems |
US11343308B2 (en) * | 2019-06-25 | 2022-05-24 | Western Digital Technologies, Inc. | Reduction of adjacent rack traffic in multi-rack distributed object storage systems |
US20200412670A1 (en) * | 2019-06-28 | 2020-12-31 | Intel Corporation | Output queueing with scalability for segmented traffic in a high-radix switch |
US10877817B1 (en) * | 2019-06-28 | 2020-12-29 | Intel Corporation | Technologies for providing inter-kernel application programming interfaces for an accelerated architecture |
US11070480B2 (en) | 2019-07-03 | 2021-07-20 | Kaloom Inc. | Method and computing devices for enforcing packet order based on packet marking |
CN110474981A (en) * | 2019-08-13 | 2019-11-19 | 中科天御(苏州)科技有限公司 | A kind of software definition dynamic security storage method and device |
US11552907B2 (en) | 2019-08-16 | 2023-01-10 | Fungible, Inc. | Efficient packet queueing for computer networks |
US11263190B2 (en) | 2019-09-26 | 2022-03-01 | Fungible, Inc. | Data ingestion and storage by data processing unit having stream-processing hardware accelerators |
US11636154B2 (en) | 2019-09-26 | 2023-04-25 | Fungible, Inc. | Data flow graph-driven analytics platform using data processing units having hardware accelerators |
US11636115B2 (en) | 2019-09-26 | 2023-04-25 | Fungible, Inc. | Query processing using data processing units having DFA/NFA hardware accelerators |
US11552898B2 (en) | 2019-09-27 | 2023-01-10 | Amazon Technologies, Inc. | Managing data throughput in a distributed endpoint network |
US11425042B2 (en) * | 2019-09-27 | 2022-08-23 | Amazon Technologies, Inc. | Managing data throughput in a distributed endpoint network |
US11579802B2 (en) | 2019-10-04 | 2023-02-14 | Fungible, Inc. | Pipeline using match-action blocks |
US20210119930A1 (en) * | 2019-10-31 | 2021-04-22 | Intel Corporation | Reliable transport architecture |
ES2972036T3 (en) * | 2019-11-06 | 2024-06-10 | Deutsche Telekom Ag | Procedure and network device for multipath communication |
US11316796B2 (en) * | 2019-12-30 | 2022-04-26 | Juniper Networks, Inc. | Spraying for unequal link connections in an internal switch fabric |
US11637773B2 (en) | 2020-02-11 | 2023-04-25 | Fungible, Inc. | Scaled-out transport as connection proxy for device-to-device communications |
US11934964B2 (en) | 2020-03-20 | 2024-03-19 | Microsoft Technology Licensing, Llc | Finite automata global counter in a data flow graph-driven analytics platform having analytics hardware accelerators |
US11296987B2 (en) * | 2020-04-20 | 2022-04-05 | Hewlett Packard Enterprise Development Lp | Congestion management mechanism |
US11630729B2 (en) | 2020-04-27 | 2023-04-18 | Fungible, Inc. | Reliability coding with reduced network traffic |
US11489762B2 (en) * | 2020-06-02 | 2022-11-01 | Cisco Technology, Inc. | Distributed sub-controller permission for control of data-traffic flow within software-defined networking (SDN) mesh network |
WO2021263047A1 (en) | 2020-06-24 | 2021-12-30 | Juniper Networks, Inc. | Layer-2 network extension over layer-3 network using encapsulation |
EP4164187A4 (en) * | 2020-07-17 | 2023-11-29 | Huawei Technologies Co., Ltd. | Load balancing method, apparatus and system |
US11290380B2 (en) | 2020-07-30 | 2022-03-29 | S.C Correct Networks S.R.L. | Method for transferring information across a data center network |
US11671360B2 (en) * | 2020-11-04 | 2023-06-06 | Cisco Technology, Inc. | Selecting forwarding paths and return paths in a networked environment |
CN112911716B (en) * | 2021-02-05 | 2023-02-17 | 贵州久华信电子技术有限公司 | Data transmission method, device, equipment and storage medium |
US11469958B1 (en) * | 2021-02-25 | 2022-10-11 | Juniper Networks, Inc. | Network controller deployment |
CN113326275B (en) * | 2021-06-09 | 2022-09-13 | 烽火通信科技股份有限公司 | Data aging method and system for router |
US11729099B2 (en) * | 2021-07-30 | 2023-08-15 | Avago Technologies International Sales Pte. Limited | Scalable E2E network architecture and components to support low latency and high throughput |
CN113726614B (en) * | 2021-10-20 | 2023-01-24 | 迈普通信技术股份有限公司 | Method and device for preventing packet loss, distributed equipment and storage medium |
US11968115B2 (en) | 2021-10-31 | 2024-04-23 | Avago Technologies International Sales Pte. Limited | Method for verifying data center network performance |
US12063163B2 (en) | 2021-12-16 | 2024-08-13 | Microsoft Technology Licensing, Llc | Sending and receiving messages including training data using a multi-path packet spraying protocol |
US11743138B1 (en) * | 2022-03-29 | 2023-08-29 | Vmware, Inc. | Discovery of physical network architecture |
CN115118448B (en) * | 2022-04-21 | 2023-09-01 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
US20230353472A1 (en) * | 2022-04-28 | 2023-11-02 | Avago Technologies International Sales Pte. Limited | Method for verifying flow completion times in data centers |
US12081460B2 (en) * | 2022-05-27 | 2024-09-03 | Corning Research & Development Corporation | Out-of-order data packet processing in a wireless communications system (WCS) |
CN115086185B (en) * | 2022-06-10 | 2024-04-02 | 清华大学深圳国际研究生院 | Data center network system and data center transmission method |
WO2023244948A1 (en) | 2022-06-14 | 2023-12-21 | Microsoft Technology Licensing, Llc | Graph-based storage management |
US12120028B1 (en) * | 2023-03-31 | 2024-10-15 | Scatr, Corp | Secure data routing with channel resiliency |
US11929908B1 (en) * | 2023-05-22 | 2024-03-12 | Uab 360 It | Accessing local network devices via mesh network devices |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7738452B1 (en) * | 2005-06-22 | 2010-06-15 | Cisco Technology, Inc. | Techniques for load balancing subscriber-aware application proxies |
US20150244617A1 (en) * | 2012-06-06 | 2015-08-27 | Juniper Networks, Inc. | Physical path determination for virtual network packet flows |
CN104937892A (en) * | 2013-01-23 | 2015-09-23 | 思科技术公司 | Multi-node virtual switching system (MVSS) |
Family Cites Families (216)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4872157A (en) | 1988-03-31 | 1989-10-03 | American Telephone And Telegraph Company, At&T Bell Laboratories | Architecture and organization of a high performance metropolitan area telecommunications packet network |
US4872159A (en) | 1988-03-31 | 1989-10-03 | American Telephone And Telegraph Company At&T Bell Laboratories | Packet network architecture for providing rapid response time |
JPH06222990A (en) | 1992-10-16 | 1994-08-12 | Fujitsu Ltd | Data processor |
US5301324A (en) | 1992-11-19 | 1994-04-05 | International Business Machines Corp. | Method and apparatus for dynamic work reassignment among asymmetric, coupled processors |
US7035914B1 (en) | 1996-01-26 | 2006-04-25 | Simpleair Holdings, Inc. | System and method for transmission of data |
US5812549A (en) | 1996-06-25 | 1998-09-22 | International Business Machines Corporation | Route restrictions for deadlock free routing with increased bandwidth in a multi-stage cross point packet switch |
US6021473A (en) | 1996-08-27 | 2000-02-01 | Vlsi Technology, Inc. | Method and apparatus for maintaining coherency for data transaction of CPU and bus device utilizing selective flushing mechanism |
US6055579A (en) | 1997-11-17 | 2000-04-25 | Silicon Graphics, Inc. | Distributed control and synchronization of multiple data processors using flexible command queues |
US6314491B1 (en) | 1999-03-01 | 2001-11-06 | International Business Machines Corporation | Peer-to-peer cache moves in a multiprocessor data processing system |
US20090219879A1 (en) * | 1999-05-21 | 2009-09-03 | Wi-Lan, Inc. | Method and apparatus for bandwidth request/grant protocols in a wireless communication system |
US6782210B1 (en) | 1999-08-25 | 2004-08-24 | Nippon Telegraph And Telephone Corporation | Optical packet routing network system based on optical label switching technique |
US7289964B1 (en) | 1999-08-31 | 2007-10-30 | Accenture Llp | System and method for transaction services patterns in a netcentric environment |
US6842906B1 (en) | 1999-08-31 | 2005-01-11 | Accenture Llp | System and method for a refreshable proxy pool in a communication services patterns environment |
US7102999B1 (en) | 1999-11-24 | 2006-09-05 | Juniper Networks, Inc. | Switching device |
US6990063B1 (en) | 2000-03-07 | 2006-01-24 | Cisco Technology, Inc. | Distributing fault indications and maintaining and using a data structure indicating faults to route traffic in a packet switching system |
US6591285B1 (en) | 2000-06-16 | 2003-07-08 | Shuo-Yen Robert Li | Running-sum adder networks determined by recursive construction of multi-stage networks |
US6901500B1 (en) | 2000-07-28 | 2005-05-31 | Silicon Graphics, Inc. | Method and apparatus for prefetching information and storing the information in a stream buffer |
US20030091271A1 (en) | 2000-07-31 | 2003-05-15 | Dragone Corrado P | Non-blocking switching arrangements with minimum number of 2x2 elements |
US20020015387A1 (en) | 2000-08-02 | 2002-02-07 | Henry Houh | Voice traffic packet capture and analysis tool for a data network |
US20020049859A1 (en) | 2000-08-25 | 2002-04-25 | William Bruckert | Clustered computer system and a method of forming and controlling the clustered computer system |
US7058009B1 (en) | 2000-09-15 | 2006-06-06 | Pluris, Inc. | Router-level automatic protection switching |
US6901451B1 (en) | 2000-10-31 | 2005-05-31 | Fujitsu Limited | PCI bridge over network |
US20020075862A1 (en) | 2000-12-20 | 2002-06-20 | Mayes Mark G. | Recursion based switch fabric for aggregate tipor |
US6553030B2 (en) * | 2000-12-28 | 2003-04-22 | Maple Optical Systems Inc. | Technique for forwarding multi-cast data packets |
US6999682B2 (en) | 2000-12-29 | 2006-02-14 | Nortel Networks Limited | Technique for optically converting wavelengths in a multi-wavelength system |
US7042883B2 (en) | 2001-01-03 | 2006-05-09 | Juniper Networks, Inc. | Pipeline scheduler with fairness and minimum bandwidth guarantee |
US6831891B2 (en) | 2001-03-06 | 2004-12-14 | Pluris, Inc. | System for fabric packet control |
US8429296B2 (en) | 2001-03-06 | 2013-04-23 | Pluris, Inc. | Method and apparatus for distributing routing instructions over multiple interfaces of a data router |
US7289513B1 (en) | 2001-06-15 | 2007-10-30 | Cisco Technology, Inc. | Switching fabric port mapping in large scale redundant switches |
US7215679B2 (en) | 2001-08-30 | 2007-05-08 | Thomson Licensing | Method, apparatus and data structure enabling multiple channel data stream transmission |
US7050663B2 (en) | 2001-10-17 | 2006-05-23 | Intel Corporation | Integrated optical circuit having an integrated arrayed waveguide grating (AWG) and optical amplifier(s) |
US7352694B1 (en) | 2001-12-14 | 2008-04-01 | Applied Micro Circuits Corporation | System and method for tolerating data link faults in a packet communications switch fabric |
CA2369201A1 (en) | 2002-01-24 | 2003-07-24 | Alcatel Canada Inc. | System and method for providing maintenance of fabric links for a network element |
US7039851B2 (en) | 2002-06-08 | 2006-05-02 | Axiowave Networks, Inc. | Method of and apparatus for correcting errors in data packet flow streams as in closed ring sequential address generators and the like without data flow stream interruption |
US7486678B1 (en) | 2002-07-03 | 2009-02-03 | Greenfield Networks | Multi-slice network processor |
US20050166086A1 (en) | 2002-09-20 | 2005-07-28 | Fujitsu Limited | Storage control apparatus, storage control method, and computer product |
US6993630B1 (en) | 2002-09-26 | 2006-01-31 | Unisys Corporation | Data pre-fetch system and method for a cache memory |
US7275103B1 (en) | 2002-12-18 | 2007-09-25 | Veritas Operating Corporation | Storage path optimization for SANs |
US7660239B2 (en) | 2003-04-25 | 2010-02-09 | Alcatel-Lucent Usa Inc. | Network data re-routing |
US7633861B2 (en) | 2003-04-25 | 2009-12-15 | Alcatel-Lucent Usa Inc. | Fabric access integrated circuit configured to bound cell reorder depth |
US7334089B2 (en) | 2003-05-20 | 2008-02-19 | Newisys, Inc. | Methods and apparatus for providing cache state information |
GB2406742B (en) * | 2003-10-03 | 2006-03-22 | 3Com Corp | Switching fabrics and control protocols for them |
US7623524B2 (en) | 2003-12-22 | 2009-11-24 | Intel Corporation | Scheduling system utilizing pointer perturbation mechanism to improve efficiency |
US7664110B1 (en) | 2004-02-07 | 2010-02-16 | Habanero Holdings, Inc. | Input/output controller for coupling the processor-memory complex to the fabric in fabric-backplane interprise servers |
US7843907B1 (en) | 2004-02-13 | 2010-11-30 | Habanero Holdings, Inc. | Storage gateway target for fabric-backplane enterprise servers |
US7895431B2 (en) | 2004-09-10 | 2011-02-22 | Cavium Networks, Inc. | Packet queuing, scheduling and ordering |
US20060112226A1 (en) | 2004-11-19 | 2006-05-25 | Hady Frank T | Heterogeneous processors sharing a common cache |
US7480304B2 (en) * | 2004-12-29 | 2009-01-20 | Alcatel Lucent | Predictive congestion management in a data communications switch using traffic and system statistics |
US7320078B2 (en) | 2005-06-03 | 2008-01-15 | Cisco Technology, Inc. | Controlling delivery of power and network communications to a set of devices |
US20080270220A1 (en) | 2005-11-05 | 2008-10-30 | Jorey Ramer | Embedding a nonsponsored mobile content within a sponsored mobile content |
US20070073966A1 (en) | 2005-09-23 | 2007-03-29 | Corbin John R | Network processor-based storage controller, compute element and method of using same |
GB0519981D0 (en) | 2005-09-30 | 2005-11-09 | Ignios Ltd | Scheduling in a multicore architecture |
US7720377B2 (en) | 2006-01-23 | 2010-05-18 | Hewlett-Packard Development Company, L.P. | Compute clusters employing photonic interconnections for transmitting optical signals between compute cluster nodes |
US20070174429A1 (en) | 2006-01-24 | 2007-07-26 | Citrix Systems, Inc. | Methods and servers for establishing a connection between a client system and a virtual machine hosting a requested computing environment |
US7404041B2 (en) | 2006-02-10 | 2008-07-22 | International Business Machines Corporation | Low complexity speculative multithreading system based on unmodified microprocessor core |
US7733781B2 (en) | 2006-04-24 | 2010-06-08 | Broadcom Corporation | Distributed congestion avoidance in a network switching system |
US7941610B2 (en) | 2006-04-27 | 2011-05-10 | Hewlett-Packard Development Company, L.P. | Coherency directory updating in a multiprocessor computing system |
US20080002702A1 (en) | 2006-06-30 | 2008-01-03 | Symbol Technologies, Inc. | Systems and methods for processing data packets using a multi-core abstraction layer (MCAL) |
US8050257B2 (en) | 2006-12-12 | 2011-11-01 | Maged E Beshai | Network with a fast-switching optical core |
US7937532B2 (en) | 2007-03-30 | 2011-05-03 | Intel Corporation | Method and apparatus for speculative prefetching in a multi-processor/multi-core message-passing machine |
US7743232B2 (en) | 2007-07-18 | 2010-06-22 | Advanced Micro Devices, Inc. | Multiple-core processor with hierarchical microcode store |
US8200992B2 (en) | 2007-09-24 | 2012-06-12 | Cognitive Electronics, Inc. | Parallel processing computer systems with reduced power consumption and methods for providing the same |
FI20085217A0 (en) | 2008-03-07 | 2008-03-07 | Nokia Corp | Data Processing device |
US8001283B2 (en) | 2008-03-12 | 2011-08-16 | Mips Technologies, Inc. | Efficient, scalable and high performance mechanism for handling IO requests |
US7822731B1 (en) | 2008-03-28 | 2010-10-26 | Emc Corporation | Techniques for management of information regarding a sequential stream |
GB2459838B (en) * | 2008-05-01 | 2010-10-06 | Gnodal Ltd | An ethernet bridge and a method of data delivery across a network |
US8094560B2 (en) | 2008-05-19 | 2012-01-10 | Cisco Technology, Inc. | Multi-stage multi-core processing of network packets |
US8160063B2 (en) | 2008-06-09 | 2012-04-17 | Microsoft Corporation | Data center interconnect and traffic engineering |
US8619769B2 (en) | 2008-06-12 | 2013-12-31 | Mark Henrik Sandstrom | Packet-layer transparent packet-switching network |
US8265071B2 (en) | 2008-09-11 | 2012-09-11 | Juniper Networks, Inc. | Methods and apparatus related to a flexible data center security architecture |
US8340088B2 (en) * | 2008-09-11 | 2012-12-25 | Juniper Networks, Inc. | Methods and apparatus related to a low cost data center architecture |
US8918631B1 (en) | 2009-03-31 | 2014-12-23 | Juniper Networks, Inc. | Methods and apparatus for dynamic automated configuration within a control plane of a switch fabric |
US9218290B2 (en) | 2009-04-27 | 2015-12-22 | Intel Corporation | Data caching in a network communications processor architecture |
US9183145B2 (en) | 2009-04-27 | 2015-11-10 | Intel Corporation | Data caching in a network communications processor architecture |
US9444757B2 (en) | 2009-04-27 | 2016-09-13 | Intel Corporation | Dynamic configuration of processing modules in a network communications processor architecture |
KR20100133649A (en) | 2009-06-12 | 2010-12-22 | 삼성전자주식회사 | Multi processor system having data loss protection function at power-off time in memory link architecture |
US9742862B2 (en) | 2009-07-14 | 2017-08-22 | Saguna Networks Ltd. | Methods circuits devices systems and associated machine executable code for efficient delivery of multi-unicast communication traffic |
US8745618B2 (en) | 2009-08-25 | 2014-06-03 | International Business Machines Corporation | Cache partitioning with a partition table to effect allocation of ways and rows of the cache to virtual machine in virtualized environments |
US8625427B1 (en) | 2009-09-03 | 2014-01-07 | Brocade Communications Systems, Inc. | Multi-path switching with edge-to-edge flow control |
US9639479B2 (en) | 2009-09-23 | 2017-05-02 | Nvidia Corporation | Instructions for managing a parallel cache hierarchy |
US9876735B2 (en) | 2009-10-30 | 2018-01-23 | Iii Holdings 2, Llc | Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect |
US20110103391A1 (en) | 2009-10-30 | 2011-05-05 | Smooth-Stone, Inc. C/O Barry Evans | System and method for high-performance, low-power data center interconnect fabric |
US8599863B2 (en) | 2009-10-30 | 2013-12-03 | Calxeda, Inc. | System and method for using a multi-protocol fabric module across a distributed server interconnect fabric |
US9800495B2 (en) * | 2009-09-30 | 2017-10-24 | Infinera Corporation | Fast protection path activation using control plane messages |
TWI385523B (en) | 2009-11-06 | 2013-02-11 | Phison Electronics Corp | Data backup method for a flash memory and controller and storage system using the same |
US8838906B2 (en) | 2010-01-08 | 2014-09-16 | International Business Machines Corporation | Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution |
US8582440B2 (en) * | 2010-02-05 | 2013-11-12 | Juniper Networks, Inc. | Oversubscribed packet stream-based interconnect protocol |
WO2011099320A1 (en) | 2010-02-12 | 2011-08-18 | 株式会社日立製作所 | Information processing device, and method of processing information upon information processing device |
JP5506444B2 (en) | 2010-02-18 | 2014-05-28 | 株式会社日立製作所 | Information system, apparatus and method |
US8863144B2 (en) | 2010-03-15 | 2014-10-14 | International Business Machines Corporation | Method and apparatus for determining resources consumed by tasks |
US8358658B2 (en) | 2010-03-19 | 2013-01-22 | International Business Machines Corporation | Implementing ordered and reliable transfer of packets while spraying packets over multiple links |
US8694654B1 (en) * | 2010-03-23 | 2014-04-08 | Juniper Networks, Inc. | Host side protocols for use with distributed control plane of a switch |
US8645631B2 (en) | 2010-03-29 | 2014-02-04 | Via Technologies, Inc. | Combined L2 cache and L1D cache prefetcher |
US8848728B1 (en) | 2010-04-08 | 2014-09-30 | Marvell Israel (M.I.S.L) Ltd. | Dynamic load balancing switch architecture |
WO2012003486A1 (en) | 2010-07-01 | 2012-01-05 | Neodana, Inc. | A system and method for virtualization and cloud security |
US8473689B2 (en) | 2010-07-27 | 2013-06-25 | Texas Instruments Incorporated | Predictive sequential prefetching for data caching |
US8484287B2 (en) | 2010-08-05 | 2013-07-09 | Citrix Systems, Inc. | Systems and methods for cookie proxy jar management across cores in a multi-core system |
WO2012052773A1 (en) | 2010-10-21 | 2012-04-26 | Bluwireless Technology Limited | Data processing systems |
US8798077B2 (en) | 2010-12-29 | 2014-08-05 | Juniper Networks, Inc. | Methods and apparatus for standard protocol validation mechanisms deployed over a switch fabric system |
WO2012120769A1 (en) | 2011-03-09 | 2012-09-13 | パナソニック株式会社 | Relay device, method for controlling relay device, and program |
EP2501119B1 (en) | 2011-03-15 | 2013-08-07 | Alcatel Lucent | A gateway for the survivability of an enterprise network using sip |
CN102123052B (en) | 2011-03-30 | 2013-05-29 | 北京星网锐捷网络技术有限公司 | Method and system for estimating service system availability |
US9405550B2 (en) | 2011-03-31 | 2016-08-02 | International Business Machines Corporation | Methods for the transmission of accelerator commands and corresponding command structure to remote hardware accelerator engines over an interconnect link |
US9225628B2 (en) | 2011-05-24 | 2015-12-29 | Mellanox Technologies Ltd. | Topology-based consolidation of link state information |
US9042383B2 (en) | 2011-06-30 | 2015-05-26 | Broadcom Corporation | Universal network interface controller |
US20130024875A1 (en) | 2011-07-22 | 2013-01-24 | Yilin Wang | Event System And Methods For Using Same |
CN103858386B (en) | 2011-08-02 | 2017-08-25 | 凯为公司 | For performing the method and apparatus for wrapping classification by the decision tree of optimization |
CN103748841B (en) | 2011-08-18 | 2017-03-15 | 瑞典爱立信有限公司 | Method and central control entity for the data surface current of control data packet stream |
US9042229B2 (en) | 2011-10-06 | 2015-05-26 | International Business Machines Corporation | Partitioning a network switch into multiple switching domains |
US8560757B2 (en) | 2011-10-25 | 2013-10-15 | Cavium, Inc. | System and method to reduce memory access latencies using selective replication across multiple memory ports |
US8850125B2 (en) | 2011-10-25 | 2014-09-30 | Cavium, Inc. | System and method to provide non-coherent access to a coherent memory system |
WO2013063486A1 (en) | 2011-10-28 | 2013-05-02 | The Regents Of The University Of California | Multiple-core computer processor for reverse time migration |
US8689049B2 (en) | 2011-11-03 | 2014-04-01 | Hewlett-Packard Development Company, L.P. | Corrective actions based on probabilities |
CN102447712B (en) | 2012-01-20 | 2015-07-08 | 华为技术有限公司 | Method and system for interconnecting nodes in content delivery network (CDN) as well as nodes |
US9430391B2 (en) | 2012-03-29 | 2016-08-30 | Advanced Micro Devices, Inc. | Managing coherent memory between an accelerated processing device and a central processing unit |
WO2013164044A1 (en) | 2012-05-04 | 2013-11-07 | Deutsche Telekom Ag | Method and device for constructing and operating a modular, highly scalable, very simple, cost-efficient and sustainable transparent optically-routed network for network capacities of greater than 1 petabit(s) |
US9118573B2 (en) | 2012-05-31 | 2015-08-25 | International Business Machines Corporation | Multipath effectuation within singly contiguous network fabric via switching device routing logic programming |
US8750288B2 (en) | 2012-06-06 | 2014-06-10 | Juniper Networks, Inc. | Physical path determination for virtual network packet flows |
US9485048B2 (en) | 2012-06-08 | 2016-11-01 | The Royal Institution For The Advancement Of Learning/Mcgill University | Methods and devices for space-time multi-plane optical networks |
US9100216B2 (en) | 2012-07-23 | 2015-08-04 | Cisco Technology, Inc. | System and method for scaling IPv6 on a three-tier network architecture at a large data center |
US8982884B2 (en) | 2012-08-07 | 2015-03-17 | Broadcom Corporation | Serial replication of multicast packets |
US8978031B2 (en) | 2012-08-21 | 2015-03-10 | International Business Machines Corporation | Processing of overlay networks using an accelerated network interface card |
CN104641609B (en) | 2012-09-10 | 2018-03-09 | 马维尔国际贸易有限公司 | Method and apparatus for transmitting packet between the interface control module of Line cards |
US8971321B2 (en) | 2012-12-11 | 2015-03-03 | Futurewei Technologies, Inc. | System and method for accelerating and decelerating packets |
CN103067291B (en) | 2012-12-24 | 2016-08-17 | 杭州华三通信技术有限公司 | A kind of method and apparatus of up-down link correlation |
US10137376B2 (en) | 2012-12-31 | 2018-11-27 | Activision Publishing, Inc. | System and method for creating and streaming augmented game sessions |
US9755900B2 (en) | 2013-03-11 | 2017-09-05 | Amazon Technologies, Inc. | Managing configuration updates |
US10015111B2 (en) | 2013-03-15 | 2018-07-03 | Huawei Technologies Co., Ltd. | System and method for steering packet streams |
US9118984B2 (en) | 2013-03-15 | 2015-08-25 | International Business Machines Corporation | Control plane for integrated switch wavelength division multiplexing |
CN105706068B (en) | 2013-04-30 | 2019-08-23 | 慧与发展有限责任合伙企业 | The storage network of route memory flow and I/O flow |
US20150019702A1 (en) | 2013-07-10 | 2015-01-15 | Brocade Communications Systems, Inc. | Flexible flow offload |
EP3022858A4 (en) | 2013-07-16 | 2017-03-15 | ADC Telecommunications Inc. | Distributed wave division multiplexing systems |
US9124959B2 (en) | 2013-08-05 | 2015-09-01 | Telefonaktiebolaget L M Ericsson (Publ) | High connectivity multiple dimension optical network in glass |
US9369409B2 (en) | 2013-08-12 | 2016-06-14 | Nec Corporation | End-to-end hitless protection in packet switched networks |
US20150143073A1 (en) | 2013-10-23 | 2015-05-21 | Bluwireless Technology Limited | Data processing systems |
EP3605971B1 (en) * | 2013-11-05 | 2021-10-13 | Cisco Technology, Inc. | Network fabric overlay |
KR20150057798A (en) | 2013-11-20 | 2015-05-28 | 한국전자통신연구원 | Apparatus and method for controlling a cache |
US9300528B2 (en) | 2013-12-13 | 2016-03-29 | International Business Machines Corporation | Trill network with multipath redundancy |
US9225458B2 (en) | 2013-12-20 | 2015-12-29 | Alcatel Lucent | Wavelength-selective cross-connect device having a variable number of common ports |
WO2015095996A1 (en) | 2013-12-23 | 2015-07-02 | Telefonaktiebolaget L M Ericsson(Publ) | Technique for network service availability |
US9264308B2 (en) | 2013-12-27 | 2016-02-16 | Dell Products L.P. | N-node virtual link trunking (VLT) systems data plane |
US9626316B2 (en) | 2013-12-27 | 2017-04-18 | Intel Corporation | Managing shared resources between multiple processing devices |
US9369408B1 (en) | 2014-01-31 | 2016-06-14 | Google Inc. | High performance and resilience in wide area networking |
US9628382B2 (en) | 2014-02-05 | 2017-04-18 | Intel Corporation | Reliable transport of ethernet packet data with wire-speed and packet data rate match |
US9565114B1 (en) | 2014-03-08 | 2017-02-07 | Google Inc. | Weighted load balancing using scaled parallel hashing |
US9436972B2 (en) | 2014-03-27 | 2016-09-06 | Intel Corporation | System coherency in a distributed graphics processor hierarchy |
US9294304B2 (en) | 2014-03-31 | 2016-03-22 | Juniper Networks, Inc. | Host network accelerator for data center overlay network |
US9479457B2 (en) | 2014-03-31 | 2016-10-25 | Juniper Networks, Inc. | High-performance, scalable and drop-free data center switch fabric |
US9703743B2 (en) | 2014-03-31 | 2017-07-11 | Juniper Networks, Inc. | PCIe-based host network accelerators (HNAS) for data center overlay network |
CN105024844B (en) | 2014-04-30 | 2019-01-01 | 中国电信股份有限公司 | A kind of method calculating cross-domain routing, server and system |
US10225195B2 (en) * | 2014-05-07 | 2019-03-05 | Adtran, Inc. | Telecommunication systems and methods using dynamic shaping for allocating network bandwidth |
CN106415522B (en) | 2014-05-08 | 2020-07-21 | 美光科技公司 | Lightweight coherency within memory |
US9672043B2 (en) | 2014-05-12 | 2017-06-06 | International Business Machines Corporation | Processing of multiple instruction streams in a parallel slice processor |
US11418629B2 (en) | 2014-05-19 | 2022-08-16 | Bay Microsystems, Inc. | Methods and systems for accessing remote digital data over a wide area network (WAN) |
CN104244118B (en) | 2014-08-20 | 2018-05-08 | 上海交通大学 | The construction method of modularization interference networks based on array waveguide grating |
US9760406B2 (en) | 2014-09-02 | 2017-09-12 | Ab Initio Technology Llc | Controlling data processing tasks |
WO2016037262A1 (en) | 2014-09-09 | 2016-03-17 | Viscore Technologies Inc. | Low latency optically distributed dynamic optical interconnection networks |
US9282384B1 (en) | 2014-10-07 | 2016-03-08 | Huawei Technologies Co., Ltd. | System and method for commutation in photonic switching |
US9632936B1 (en) | 2014-12-09 | 2017-04-25 | Parallel Machines Ltd. | Two-tier distributed memory |
US10387179B1 (en) | 2014-12-16 | 2019-08-20 | Amazon Technologies, Inc. | Environment aware scheduling |
US10218629B1 (en) * | 2014-12-23 | 2019-02-26 | Juniper Networks, Inc. | Moving packet flows between network paths |
US10003552B2 (en) | 2015-01-05 | 2018-06-19 | Brocade Communications Systems, Llc. | Distributed bidirectional forwarding detection protocol (D-BFD) for cluster of interconnected switches |
US20160210159A1 (en) | 2015-01-19 | 2016-07-21 | Microsoft Microsoft Technology Licensing, LLC | User Mode Driver Extension and Preprocessing |
WO2016123042A1 (en) | 2015-01-26 | 2016-08-04 | Dragonfly Data Factory Llc | Data factory platform and operating system |
US9866427B2 (en) | 2015-02-16 | 2018-01-09 | Juniper Networks, Inc. | Multi-stage switch fabric fault detection and handling |
US9860614B2 (en) | 2015-05-13 | 2018-01-02 | Huawei Technologies Co., Ltd. | System and method for hybrid photonic electronic switching |
GB2539641B (en) | 2015-06-11 | 2019-04-03 | Advanced Risc Mach Ltd | Coherency between a data processing device and interconnect |
US9847936B2 (en) | 2015-06-25 | 2017-12-19 | Intel Corporation | Apparatus and method for hardware-accelerated packet processing |
US10572509B2 (en) * | 2015-07-27 | 2020-02-25 | Cisco Technology, Inc. | Scalable spine nodes with partial replication of routing information in a network environment |
US10223162B2 (en) | 2015-07-27 | 2019-03-05 | Advanced Micro Devices, Inc. | Mechanism for resource utilization metering in a computer system |
US10445850B2 (en) | 2015-08-26 | 2019-10-15 | Intel Corporation | Technologies for offloading network packet processing to a GPU |
US10198281B2 (en) | 2015-08-28 | 2019-02-05 | Vmware, Inc. | Hybrid infrastructure provisioning framework tethering remote datacenters |
US9912614B2 (en) * | 2015-12-07 | 2018-03-06 | Brocade Communications Systems LLC | Interconnection of switches based on hierarchical overlay tunneling |
US9946671B1 (en) | 2015-12-15 | 2018-04-17 | Cavium, Inc. | Methods and systems for processing read and write requests |
US10498654B2 (en) | 2015-12-28 | 2019-12-03 | Amazon Technologies, Inc. | Multi-path transport design |
US9906460B2 (en) | 2015-12-31 | 2018-02-27 | Alcatel-Lucent Usa Inc. | Data plane for processing function scalability |
WO2017119098A1 (en) | 2016-01-07 | 2017-07-13 | 株式会社日立製作所 | Computer system and method for controlling computer |
US10492104B2 (en) | 2016-03-10 | 2019-11-26 | Cable Television Laboratories, Inc. | Latency reduction in wireless service |
US10891145B2 (en) | 2016-03-30 | 2021-01-12 | Amazon Technologies, Inc. | Processing pre-existing data sets at an on demand code execution environment |
US10552205B2 (en) | 2016-04-02 | 2020-02-04 | Intel Corporation | Work conserving, load balancing, and scheduling |
US10305821B2 (en) | 2016-05-24 | 2019-05-28 | Avago Technologies International Sales Pte. Limited | Facilitating hot-swappable switch fabric cards |
US10034407B2 (en) | 2016-07-22 | 2018-07-24 | Intel Corporation | Storage sled for a data center |
US20180150256A1 (en) | 2016-11-29 | 2018-05-31 | Intel Corporation | Technologies for data deduplication in disaggregated architectures |
US11119923B2 (en) | 2017-02-23 | 2021-09-14 | Advanced Micro Devices, Inc. | Locality-aware and sharing-aware cache coherence for collections of processors |
CN110731070A (en) | 2017-03-29 | 2020-01-24 | 芬基波尔有限责任公司 | Non-blocking arbitrary to arbitrary data center networks with grouped injection via multiple alternate data paths |
CN110710172A (en) | 2017-03-29 | 2020-01-17 | 芬基波尔有限责任公司 | Multiplexing non-blocking arbitrary to arbitrary data center networks of packet injection within a group of access nodes |
CN110710139A (en) | 2017-03-29 | 2020-01-17 | 芬基波尔有限责任公司 | Non-blocking full mesh data center network with optical displacers |
US10565112B2 (en) | 2017-04-10 | 2020-02-18 | Fungible, Inc. | Relay consistent memory management in a multiple processor system |
US10303594B2 (en) | 2017-04-17 | 2019-05-28 | Intel Corporation | Guaranteed forward progress mechanism |
US10409614B2 (en) | 2017-04-24 | 2019-09-10 | Intel Corporation | Instructions having support for floating point and integer data types in the same register |
US10304154B2 (en) | 2017-04-24 | 2019-05-28 | Intel Corporation | Coordination and increased utilization of graphics processors during inference |
US20180322386A1 (en) | 2017-05-05 | 2018-11-08 | Intel Corporation | Fine-grain compute communication execution for deep learning frameworks |
US10303603B2 (en) | 2017-06-13 | 2019-05-28 | Microsoft Technology Licensing, Llc | Low power multi-core coherency |
CN107196854B (en) | 2017-06-20 | 2020-08-25 | 西安交通大学 | Data plane exception handling method in software defined network |
US11238203B2 (en) | 2017-06-30 | 2022-02-01 | Intel Corporation | Systems and methods for accessing storage-as-memory |
CN117348976A (en) | 2017-07-10 | 2024-01-05 | 微软技术许可有限责任公司 | Data processing unit for stream processing |
US10659254B2 (en) | 2017-07-10 | 2020-05-19 | Fungible, Inc. | Access node integrated circuit for data centers which includes a networking unit, a plurality of host units, processing clusters, a data network fabric, and a control network fabric |
US11030126B2 (en) | 2017-07-14 | 2021-06-08 | Intel Corporation | Techniques for managing access to hardware accelerator memory |
US11194753B2 (en) | 2017-09-01 | 2021-12-07 | Intel Corporation | Platform interface layer and protocol for accelerators |
US11249779B2 (en) | 2017-09-01 | 2022-02-15 | Intel Corporation | Accelerator interconnect assignments for virtual environments |
US10303609B2 (en) | 2017-09-28 | 2019-05-28 | Intel Corporation | Independent tuning of multiple hardware prefetchers |
WO2019068017A1 (en) | 2017-09-29 | 2019-04-04 | Fungible, Inc. | Resilient network communication using selective multipath packet flow spraying |
US11263143B2 (en) | 2017-09-29 | 2022-03-01 | Intel Corporation | Coherent accelerator fabric controller |
US10904367B2 (en) | 2017-09-29 | 2021-01-26 | Fungible, Inc. | Network access node virtual fabrics configured dynamically over an underlay network |
US20200169513A1 (en) | 2017-09-29 | 2020-05-28 | Fungible, Inc. | Fabric control protocol for data center networks with packet spraying over multiple alternate data paths |
US10841245B2 (en) | 2017-11-21 | 2020-11-17 | Fungible, Inc. | Work unit stack data structures in multiple core processor system for stream data processing |
WO2019118356A1 (en) | 2017-12-11 | 2019-06-20 | Fungible, Inc. | Durable block storage in data center access nodes with inline erasure coding |
US10540288B2 (en) | 2018-02-02 | 2020-01-21 | Fungible, Inc. | Efficient work unit processing in a multicore system |
US10673737B2 (en) * | 2018-04-17 | 2020-06-02 | Cisco Technology, Inc. | Multi-VRF universal device internet protocol address for fabric edge devices |
US10645187B2 (en) | 2018-07-13 | 2020-05-05 | Fungible, Inc. | ARC caching for determininstic finite automata of regular expression accelerator |
US10951393B2 (en) | 2018-10-11 | 2021-03-16 | Fungible, Inc. | Multimode cryptographic processor |
US10761931B2 (en) | 2018-10-24 | 2020-09-01 | Fungible, Inc. | Inline reliability coding for storage on a network |
US10827191B2 (en) | 2018-11-02 | 2020-11-03 | Fungible, Inc. | Parallel coding of syntax elements for JPEG accelerator |
US20200159859A1 (en) | 2018-11-19 | 2020-05-21 | Fungible, Inc. | History-based compression pipeline for data compression accelerator of a data processing unit |
US10929175B2 (en) | 2018-11-21 | 2021-02-23 | Fungible, Inc. | Service chaining hardware accelerators within a data stream processing integrated circuit |
US11636154B2 (en) | 2019-09-26 | 2023-04-25 | Fungible, Inc. | Data flow graph-driven analytics platform using data processing units having hardware accelerators |
-
2018
- 2018-09-28 US US16/147,099 patent/US10904367B2/en active Active
- 2018-09-28 WO PCT/US2018/053591 patent/WO2019068013A1/en active Application Filing
- 2018-09-28 CN CN201880063473.2A patent/CN111201757B/en active Active
- 2018-09-28 US US16/147,070 patent/US11178262B2/en active Active
- 2018-09-28 WO PCT/US2018/053586 patent/WO2019068010A1/en active Application Filing
- 2018-09-28 CN CN201880062872.7A patent/CN111149329A/en active Pending
-
2021
- 2021-01-21 US US17/248,354 patent/US11412076B2/en active Active
- 2021-11-12 US US17/454,731 patent/US20220103661A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7738452B1 (en) * | 2005-06-22 | 2010-06-15 | Cisco Technology, Inc. | Techniques for load balancing subscriber-aware application proxies |
US20150244617A1 (en) * | 2012-06-06 | 2015-08-27 | Juniper Networks, Inc. | Physical path determination for virtual network packet flows |
CN104937892A (en) * | 2013-01-23 | 2015-09-23 | 思科技术公司 | Multi-node virtual switching system (MVSS) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113067819A (en) * | 2021-03-18 | 2021-07-02 | 哈尔滨工业大学 | Distributed asynchronous parallel detection algorithm for multi-path attack of MPTCP (Multi-path Transmission control protocol) |
Also Published As
Publication number | Publication date |
---|---|
US20190104207A1 (en) | 2019-04-04 |
WO2019068010A1 (en) | 2019-04-04 |
WO2019068013A1 (en) | 2019-04-04 |
CN111201757B (en) | 2022-04-26 |
US20210176347A1 (en) | 2021-06-10 |
US11178262B2 (en) | 2021-11-16 |
US20190104206A1 (en) | 2019-04-04 |
CN111149329A (en) | 2020-05-12 |
US11412076B2 (en) | 2022-08-09 |
US10904367B2 (en) | 2021-01-26 |
US20220103661A1 (en) | 2022-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111201757B (en) | Network access node virtual structure dynamically configured on underlying network | |
US20240250919A1 (en) | Fabric control protocol for data center networks with packet spraying over multiple alternate data paths | |
US20210320820A1 (en) | Fabric control protocol for large-scale multi-stage data center networks | |
US11601359B2 (en) | Resilient network communication using selective multipath packet flow spraying | |
US11757764B2 (en) | Optimized adaptive routing to reduce number of hops | |
US11165887B2 (en) | Per-input port, per-control plane network data traffic class control plane policing | |
US20210297350A1 (en) | Reliable fabric control protocol extensions for data center networks with unsolicited packet spraying over multiple alternate data paths | |
Handley et al. | Re-architecting datacenter networks and stacks for low latency and high performance | |
US20210297351A1 (en) | Fabric control protocol with congestion control for data center networks | |
Wang et al. | Rethinking the data center networking: Architecture, network protocols, and resource sharing | |
Kumar et al. | Beyond best effort: Router architectures for the differentiated services of tomorrow's internet | |
US8625427B1 (en) | Multi-path switching with edge-to-edge flow control | |
EP3563535B1 (en) | Transmission of messages by acceleration components configured to accelerate a service | |
US8630296B2 (en) | Shared and separate network stack instances | |
Hoefler et al. | Data center ethernet and remote direct memory access: Issues at hyperscale | |
Chen et al. | Promenade: Proportionally fair multipath rate control in datacenter networks with random network coding | |
Hoefler et al. | Datacenter ethernet and rdma: Issues at hyperscale | |
Hu et al. | Aeolus: A building block for proactive transport in datacenter networks | |
US20210297343A1 (en) | Reliable fabric control protocol extensions for data center networks with failure resilience | |
Kheirkhah Sabetghadam | Mmptcp: a novel transport protocol for data centre networks | |
McAlpine et al. | An architecture for congestion management in ethernet clusters | |
Chen et al. | Alleviating flow interference in data center networks through fine-grained switch queue management | |
Chen et al. | CQRD: A switch-based approach to flow interference in Data Center Networks | |
CN118631737A (en) | Congestion management method, network equipment and data center |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230921 Address after: Washington State Patentee after: MICROSOFT TECHNOLOGY LICENSING, LLC Address before: California, USA Patentee before: FUNGIBLE Inc. |
|
TR01 | Transfer of patent right |