CN103916326B - System, method and equipment for data center - Google Patents

System, method and equipment for data center Download PDF

Info

Publication number
CN103916326B
CN103916326B CN201410138824.5A CN201410138824A CN103916326B CN 103916326 B CN103916326 B CN 103916326B CN 201410138824 A CN201410138824 A CN 201410138824A CN 103916326 B CN103916326 B CN 103916326B
Authority
CN
China
Prior art keywords
module
queue
peripheral processor
packet
exchcange core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410138824.5A
Other languages
Chinese (zh)
Other versions
CN103916326A (en
Inventor
P·辛德胡
G·艾贝
J-M·弗爱龙
A·文卡特马尼
Q·沃赫拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peribit Networks Inc
Original Assignee
Peribit Networks Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/242,224 external-priority patent/US8154996B2/en
Priority claimed from US12/343,728 external-priority patent/US8325749B2/en
Priority claimed from US12/345,500 external-priority patent/US8804710B2/en
Priority claimed from US12/345,502 external-priority patent/US8804711B2/en
Priority claimed from US12/495,361 external-priority patent/US8755396B2/en
Priority claimed from US12/495,364 external-priority patent/US9847953B2/en
Priority claimed from US12/495,337 external-priority patent/US8730954B2/en
Priority claimed from US12/495,344 external-priority patent/US20100061367A1/en
Priority claimed from US12/495,358 external-priority patent/US8335213B2/en
Application filed by Peribit Networks Inc filed Critical Peribit Networks Inc
Publication of CN103916326A publication Critical patent/CN103916326A/en
Publication of CN103916326B publication Critical patent/CN103916326B/en
Application granted granted Critical
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A kind of system, method and equipment for data center.In embodiment of the disclosure, equipment includes the first edge equipment can with PHM packet handling module.First edge equipment is configured as receiving packet.The PHM packet handling module of first edge equipment can be configured as producing multiple cells based on packet.Second edge equipment has the PHM packet handling module for being configured to multiple cell restructuring packets.Multilevel interchange frame can be couple to first edge equipment and second edge equipment.Multilevel interchange frame can define single logic entity.Multilevel interchange frame can have multiple Switching Modules.Each Switching Module in multiple Switching Modules has shared storage device.Multilevel interchange frame can be configured as exchanging multiple cells, so that multiple cells are sent to second edge equipment.

Description

System, method and equipment for data center
The application is the applying date for September in 2009 11 days, Application No. 200910246898.X and entitled " used In the system, method and equipment of data center " Chinese patent application divisional application.
The cross reference of related application
Patent application claims are entitled, and " Systems, Apparatus and Methods for a Data Centre (are used In the system, apparatus and method of data center) " and the U.S. Patent application No.61/098516 that is submitted for 19th in September in 2008 Priority and interests;Require entitled " Methods and Apparatus Related to Flow Control simultaneously Within a Data Centre (method and apparatus for being related to the control of flow in the data center) " and in September in 2008 11 days The U.S. Patent application No.61/096209 of submission priority and interests;Both are all fully incorporated by reference herein.
Present patent application is entitled " Methods and Apparatus for Transmission of Groups of Cell via a Switch Fabric (method and apparatus that cell group is transmitted via switching fabric) " are and December 24 in 2008 The part continuation application for the U.S. Patent application No.12/343728 that day submits;It is entitled " System Architecture for A Scalable and Distributed Multi-Stage Switch Fabric (are used for scalable and distributed multi-stage The system architecture of switching fabric) " and the U.S. Patent application No.12/345500 part submitted on December 29th, 2008 after Continuous application;It is entitled " Methods and Apparatus Related to a Modular Switch Architecture (method and apparatus for being related to modularization architecture for exchanging) " and the U.S. Patent application No.12/ submitted on December 29th, 2008 345502 part continuation application;It is entitled " Methods and Apparatus for Flow Control Associated Withs Multi-Stage Queue (being used for the method and apparatus that the flow relevant with multi-queue is controlled) " and in 2008 9 Submit months 30 days, it is desirable to entitled " Methods and Apparatus Related to Flow Control within a Data Center (method and apparatus for being related to flow control in the data center) ", the U.S. that September in 2008 is submitted on the 11st is special The part continuation application of profit application No.6I/096209 priority and the U.S. Patent application No.12/242224 of interests;It is name For " Methods and Apparatus for Flow-Controllable Multi-Staged Queues (are used to can control The method and apparatus of the multi-queue of flow) " and submitted within 30th in September in 2008, it is desirable to entitled " Methods and Apparatus Related to Flow Control within a Data Centre (are related to flow control in the data center The method and apparatus of system) ", the U.S. Patent application No.61/096209 that September in 2008 is submitted on the 11st priority and interests U.S. Patent application No.12/242230 part continuation application.Each above-mentioned application referred to is quoted completely herein to be made For reference.
Present patent application or entitled " Methods and Apparatus Related to Any-to-Any Connectivity within a Data Centre (method and apparatus for being related to any-to-any connectivity in data center) " and in The part continuation application for the U.S. Patent application No.12/495337 that on June 30th, 2009 submits;It is entitled " Methods and Apparatus Related to Lossless Operation within a Data Centre (are related to nothing in data center The method and apparatus for damaging operation) " and the U.S. Patent application No.12/495344 part submitted on June 30th, 2009 continue Application;It is entitled " Methods and Apparatus Related to Low Latercy within a Data Centre (method and apparatus for being related to low latency in data center) " and the U.S. Patent application submitted on June 30th, 2009 No.12/495358 part continuation application;It is entitled " Methods and Apparatus Related to Flow Control within a Data Centre Switch Fabric (are related to the side that flow is controlled in data center's switching fabric Method and equipment) " and the U.S. Patent application No.12/495361 part continuation application submitted on June 30th, 2009;It is name For " Methods and Apparatus Related to Virtualization of Data Centre Resources (method and apparatus for being related to data center resource virtualization) " and the U.S. Patent application submitted on June 30th, 2009 No.12/495364 part continuation application.Each above-mentioned application referred to is all fully incorporated by reference herein.
Technical field
Generally, embodiment is related to data center's equipment, and more particularly relates to exchcange core (switch Core) and edge device data center systems architecture, apparatus and method.
Background technology
Known architecture for data center systems is related to excessively intractable and complicated method, adds this germline The expense of system and stand-by period.For example, some known data center networks are made up of three or more switching layer, wherein every One layer is carried out Ethernet and/or Internet Protocol (IP) packet transaction.Packet transaction and queuing expense are unnecessarily each Layer is repeated, and directly increases expense and end-to-end stand-by period.Similarly, such known data center network and atypically Extended in cost-effectively mode:For given data center systems, the increase in number of servers usually requires extra Port, causes in each layer of more equipment of increase of data center systems.So bad scalability adds such data The expense of centring system.
Accordingly, there exist include improved architecture, the demand of the data center systems of apparatus and method for improvement.
The content of the invention
In one embodiment, a kind of communication equipment includes the first edge equipment can with PHM packet handling module.First Edge device can be configured as receiving packet.The PHM packet handling module of first edge equipment can be configured as based on the packet production Raw multiple cells.Second edge equipment can have PHM packet handling module, and the PHM packet handling module is configured as based on the multiple Cell re-assemblies the packet.Multilevel interchange frame can be coupled to first edge equipment and second edge equipment.This is more Level switching fabric can define a single logic entity.The multilevel interchange frame can have multiple Switching Modules.It is multiple to hand over Each Switching Module changed the mold in block has shared storage device.Multilevel interchange frame can be configured as exchanging multiple letters Member is so that multiple cells are sent to second edge equipment.
One side in accordance with an embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart defines single logic entity and with the multilevel interchange frame being physically distributed across multiple frames, and multistage exchange is tied Structure has multiple input ports and multiple output ports, and the exchcange core is configured as via the multiple input port and institute State multiple output ports and be couple to multiple peripheral processors, the exchcange core is configured as arranging to have the of the first frame Between one peripheral processor and the second peripheral processor being arranged in the second frame clog-free connection is provided with line rate Property.
According to one embodiment of the disclosure, the multiple peripheral processor, which includes at least one, has virtual resources Peripheral processor and at least one do not have virtual resources peripheral processor.
According to one embodiment of the disclosure, the number of the multiple input port and the multiple output port is more than 1000, each output port quilt of each input port and the multiple output port in the multiple input port It is configured to operate with the speed for being not less than 10Gb/s.
According to one embodiment of the disclosure, first peripheral processor and second peripheral processor are One in memory node device, calculate node device, service node device or router.
According to one embodiment of the disclosure, the multiple peripheral processor includes the 3rd peripheral processor, described Exchcange core is configured as providing with line rate between second peripheral processor and the 3rd peripheral processor Clog-free connectivity, the exchcange core is configured as receiving first packet associated with first peripheral processor, The exchcange core is configured as, based on the cell associated with the described first packet, sequentially filling to second peripheral processes Put transmission second packet and send the 3rd packet to the 3rd peripheral processor, the multilevel interchange frame is configured as from institute State output port of the input port in multiple input ports into the output port and send the cell.
According to one embodiment of the disclosure, first peripheral processor and the 3rd peripheral processor are One in memory node device, calculate node device, service node device or router;And the second peripheral processes dress It is at least one in firewall device, intersecting detection means or load balance device to put.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart have physically across multiple frames be distributed multilevel interchange frame, the multilevel interchange frame have multiple input ports and Multiple output ports, the exchcange core is configured as being couple to via the multiple input port and the multiple output port Multiple peripheral processors, the exchcange core be configured as using line rate as the multiple peripheral processor in each Peripheral processor provides the connectedness to each remaining processing unit in the multiple peripheral processor, so that institute Each output port stated in multiple output ports can be by each peripheral processes in the multiple peripheral processor Device is coequally accessed via an input port in the multiple input port.
According to one embodiment of the disclosure, the multiple peripheral processor is connected including at least one via Ethernet The peripheral processor for being couple to the exchcange core is couple to the exchcange core with least one via non-Ethernet connection Peripheral processor.
According to one embodiment of the disclosure, the multiple peripheral processor includes at least one and uses the 3rd layer of route Peripheral processor and at least one the 4th layer of peripheral processor to the 7th layer of device.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart defines single logic entity and with multilevel interchange frame, and the multilevel interchange frame has multiple physically across multiple The level of frame distribution, the multiple level has multiple input ports and multiple output ports jointly, and the exchcange core is configured To be couple to multiple peripheral processors, the exchcange core quilt via the multiple input port and the multiple output port It is configured to when the transmission of the multiple cells associated with packet can be essentially ensures that without by the multilevel interchange frame Loss when, it is allowed to the multiple cell enter the multiple input port in input port.
According to one embodiment of the disclosure, the multiple peripheral processor includes being configured as and fibre channel protocol First peripheral processor of communication and it is configured as the second peripheral processes for communicating of Ethernet protocol with fiber channel covering Device.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network, so that when described When multiple cells can be sent to an output port in the multiple output port in the scheduled time, multistage exchange is tied Structure allows the packet to enter input port.
According to one embodiment of the disclosure, the exchcange core is configured as from the input port to the multiple defeated The first output port in exit port and the second output port send multiple cells associated with the packet, without Packet loss processing is performed at least one-level in multiple levels of the multilevel interchange frame.
According to one embodiment of the disclosure, the exchcange core includes multiple via the multiple input port and described Multiple output ports are couple to the edge device of the multilevel interchange frame, and the multiple edge device is couple to the multiple outer Each edge device in processing unit, the multiple edge device is enclosed to be configured as receiving described be grouped and based on described Packet defines the multiple cell.
According to one embodiment of the disclosure, the exchcange core is configured as via the multiple of the multilevel interchange frame An output port of the level from the input port into the multiple output port sends multiple associated with the packet Cell, without performing packet loss processing at least one-level in the multiple level.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart defines single logic entity and with switching fabric, and the switching fabric has multiple physically across the distribution of multiple frames Level, the multilevel interchange frame has a multiple input ports and multiple output ports, the exchcange core be configured as via The multiple input port and the multiple output port are couple to multiple peripheral processors, and the exchcange core is configured as Packet is received from the input port in the multiple input port, the exchcange core is configured as via the multiple level from institute State output port of the input port into the multiple output port and send multiple cells associated with the packet, without Packet loss processing is performed at least one-level in multiple levels of the switching fabric.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network, so as to only work as The transmission for the multiple cells associated with packet that can be essentially ensures that in the switching fabric and it is lossless when, just allow The packet of input port in the multiple input port.
According to one embodiment of the disclosure, the output port is the first output port, and the exchcange core is configured To be sent and institute from first output port of the input port into the multiple output port and the second output port State multiple cells associated with the packet.
According to one embodiment of the disclosure, the exchcange core includes multiple via the multiple input port and described Multiple output ports are couple to the edge device of the multilevel interchange frame, and the multiple edge device is couple to the multiple outer Each edge device in processing unit, the multiple edge device is enclosed to be configured as receiving described be grouped and based on described Packet defines the multiple cell.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart defines single logic entity and the multilevel interchange frame with being configured to determine that property network, the multilevel interchange frame tool There are a multiple input ports and multiple output ports, the exchcange core is configured as via the multiple input port and described many Individual output port is couple to multiple peripheral processors, and the exchcange core is configured as from defeated in the multiple input port Inbound port receives packet, and the exchcange core is configured as the output end into the multiple output port from the input port Mouth sends multiple cells associated with the packet.
According to one embodiment of the disclosure, the multilevel interchange frame is physically distributed across multiple frames.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network, so as to only work as The transmission for the multiple cells associated with packet that can be essentially ensures that in the switching fabric and it is lossless when, just allow The packet of input port in the multiple input port.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network, so that when described When multiple cells associated with packet can be sent to an output port in the multiple output port in the scheduled time, The packet of input port of the exchcange core in the multiple input port.
According to one embodiment of the disclosure, the exchcange core includes multiple via the multiple input port and described Multiple output ports are couple to the edge device of the multilevel interchange frame, and the multiple edge device is couple to the multiple outer Each edge device in processing unit, the multiple edge device is enclosed to be configured as receiving described be grouped and based on described Packet defines the multiple cell.
According to one embodiment of the disclosure, the exchcange core is configured as via the multiple of the multilevel interchange frame Level sends multiple cells associated with the packet from the input port to the output port, without described many Packet loss processing is performed at least one-level in individual level.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart has the multilevel interchange frame being physically distributed between multiple frames, and the multilevel interchange frame has multiple inputs slow Device and multiple output ports are rushed, the exchcange core is configured to couple to multiple edge devices;During operation need not Software and the controller for realizing and being needed during configuration and monitoring software to realize with hardware, the controller is couple to institute Multiple input buffers and the multiple output port are stated, the controller is configured to when one in multiple output ports is defeated Before congestion when congestion at exit port is foreseen and in the exchcange core occurs, to the multiple input buffer In an input buffer transmitted traffic control signal.
According to one embodiment of the disclosure, the controller is configured as independently of for described in the exchcange core Flow is controlled in the structure of multilevel interchange frame, and end-to-end flux control is performed to the input buffer and the output port System.
According to one embodiment of the disclosure, the controller is configured as independently of for the multiple edge device Flow is controlled, and End-to-end flow control is performed to the input buffer and the output port.
According to one embodiment of the disclosure, multiple peripheral processes dresses for being configured to couple to the multiple edge device Put, the controller is configured as independently of the flow control for the multiple edge device, to the input buffer and The output port performs End-to-end flow control.
According to one embodiment of the disclosure, the controller is configured as performing End-to-end flow control, so that cell It is buffered in being sent to before the output port at the input buffer for a period of time, the time arrives with the end Hold flow control associated.
According to one embodiment of the disclosure, the controller is configured as independently of the one of the multilevel interchange frame At individual level cache cell section and independently of at an edge device in the multiple edge device cache packet, it is right The cell cached at the input buffer performs End-to-end flow control.
According to one embodiment of the disclosure, the controller is configured as independently of the flow control associated with Ethernet Making mechanism, End-to-end flow control is performed to the cell cached at the input buffer.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart has the multilevel interchange frame being physically distributed between multiple frames, and the multilevel interchange frame is configured as receiving many The individual cell associated with packet and it is configured as based on the multiple cell switching multiple cells section;It is multiple be couple to it is described An edge device in the edge device of exchcange core, the edge device is configured as receiving the packet, the edge Device configuration is to send the multiple cell to the multilevel interchange frame;With the control for being couple to the multilevel interchange frame Device, the controller is configured as tying independently of the flow control for the multiple edge device and for multistage exchange Flow is controlled in the structure of structure, and the multiple cell traffic is controlled.
According to one embodiment of the disclosure, the controller during operation do not need software and realized with hardware, with And need software to realize during configuration and monitoring.
According to one embodiment of the disclosure, the multilevel interchange frame has multiple input buffers and multiple output ends Mouthful, when the congestion that the controller is configured as at an output port in the multiple output port is foreseen and Before congestion in the exchcange core occurs, an input buffer transmitted traffic into the multiple input buffer Control signal.
According to one embodiment of the disclosure, the multilevel interchange frame has multiple input buffers and multiple output ends Mouthful, the controller is configured as independently of the flow control mechanism associated with Ethernet, to being buffered in the multiple input The cell cached at an input buffer in device performs End-to-end flow control.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart has multilevel interchange frame;More than first peripheral processor, the multistage is couple to by multiple connections with agreement Each peripheral processor in switching fabric, more than first peripheral processor is the storage with virtual resources Node, the void that the virtual resources common definition of more than first peripheral processor is interconnected by the exchcange core Intend storage resource;With more than second peripheral processor, it is couple to multistage exchange by multiple connections with agreement and ties Each peripheral processor in structure, more than second peripheral processor is the memory node with virtual resources, The virtual meter that the virtual resources common definition of more than second peripheral processor is interconnected by the exchcange core Calculate resource.
According to one embodiment of the disclosure, each peripheral processor in more than first peripheral processor With virtual resources, each peripheral processor in more than first peripheral processor is configured such that it is empty Planization resource can be substituted by the virtual resource of remaining peripheral processor in more than first peripheral processor; And each peripheral processor in more than second peripheral processor has virtual resources, more than described second Each peripheral processor in peripheral processor is configured such that its virtual resources can be by from described second The virtual resource of remaining peripheral processor in multiple peripheral processors is substituted.
According to one embodiment of the disclosure, more than first peripheral processor is related to based on packet communication protocol Join and associated with security protocol;And more than second peripheral processor is associated simultaneously with based on packet communication protocol And it is associated with security protocol.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core The heart has a multilevel interchange frame, and the exchcange core is configured as being logically divided into the first virtual switch core and second virtual Turn exchcange core;Multiple peripheral processors for being couple to the multilevel interchange frame, the multiple peripheral processor has It is operably coupled to the first peripheral processor subset of the first virtual switch core and is operably coupled to described Second peripheral processor subset of the second virtual switch core.
According to one embodiment of the disclosure, the exchcange core be configured such that the first virtual switch core and The second virtual switch core is managed to being managed property independently from each other.
According to one embodiment of the disclosure, the exchcange core is configured such that the first virtual switch core tool There is the bandwidth of the bandwidth independently of the second virtual switch core.
According to one embodiment of the disclosure, the exchcange core is configured such that the first virtual switch core tool There are the bandwidth with the second virtual switch core and the independent bandwidth of managerial management and managerial management.
According to one embodiment of the disclosure, the exchcange core is configured such that the first virtual switch core makes Operated with l2 protocol, and the second virtual switch core is operated using l2 protocol and layer-3 protocol.
According to one embodiment of the disclosure, the first peripheral processor subset has virtual resource, described second Peripheral processor subset has virtual resource.
According to one embodiment of the disclosure, the first peripheral processor subset includes being calculate node, storage section The peripheral processor of one in point, service node device and router, and including being calculate node, memory node, clothes Remaining one peripheral processor being engaged in node apparatus and router;And the second peripheral processor subset bag It is the peripheral processor of one in calculate node, memory node, service node device and router to include, and including being meter Remaining one peripheral processor in operator node, memory node, service node device and router.
Brief description of the drawings
Fig. 1 is the system block diagram of the data center (DC) according to one embodiment.
Fig. 2 is the schematic diagram for being shown to possess the example of data center's part of any-to-any connectivity according to one embodiment.
Fig. 3 is the schematic diagram for showing the resource logic group associated with data center according to one embodiment.
Fig. 4 A are to show the schematic diagram of switching fabric that can be included in exchcange core according to one embodiment.
Fig. 4 B are to show the swap table in the memory module that can be stored in shown in Fig. 4 A according to one embodiment Schematic diagram.
Fig. 5 A are the schematic diagrames for showing switching fabric system according to one embodiment.
Fig. 5 B are the schematic diagrames for showing input/output module according to one embodiment.
Fig. 6 is the schematic diagram for the switching fabric system part for showing Fig. 5 A according to one embodiment.
Fig. 7 is the schematic diagram for the switching fabric system part for showing Fig. 5 A according to one embodiment.
Fig. 8 and 9 respectively illustrates the front view and rearview of the shell for covering switching fabric according to one embodiment.
Figure 10 shows a part for shell in Fig. 8 according to one embodiment.
Figure 11 and 12 is the switching fabric shown respectively according to another embodiment in the first configuration and the second configuration respectively Schematic diagram.
Figure 13 is the schematic diagram for showing the data flow associated with switching fabric according to one embodiment.
Figure 14 is the schematic diagram that flow is controlled in switching fabric according to showing in Figure 13 one embodiment.
Figure 15 is the schematic diagram for showing buffer module according to one embodiment.
Figure 16 A are being configured to via entering that the switching fabric coordination cell group of exchcange core is transmitted according to one embodiment The schematic block diagram of mouth scheduler module and outlet scheduler module.
Figure 16 B are to be shown to be related to the signaling process figure that cell group transmits signaling according to one embodiment.
Figure 17 be according to one embodiment show be arranged at switching fabric entrance side entry queue queue up two The schematic block diagram of cell group.
Figure 18 be according to another embodiment show be arranged at switching fabric entrance side entry queue queue up two The schematic block diagram of cell group.
Figure 19 is the flow chart for showing the method via the transmission of switching fabric scheduling cells group according to one embodiment.
Figure 20 is the signaling process figure for showing the processing request sequence value relevant with transmission request according to one embodiment.
Figure 21 is the signaling process figure for showing the response sequence value associated with transmission response according to one embodiment.
Figure 22 is the schematic block diagram for showing the controllable queue of multistage flow according to one embodiment.
Figure 23 is the schematic block diagram for showing the controllable queue of multistage flow according to one embodiment.
Figure 24 is to be shown to be configured to define the flow control signal associated with multiple receiving queues according to one embodiment Destination control module schematic block diagram.
Figure 25 is the schematic diagram for showing flow control packet according to one embodiment.
Embodiment
Fig. 1 is to show data center (DC) 100 (for example, in super data center, idealization data according to one embodiment The heart) schematic diagram.Data center 100 includes exchcange core (SC) 180, is operably connected to the peripheral processes dress of 4 types Put 170:Calculate node 110, service node 120, router 130 and memory node 140.In this embodiment, data center manages Reason (DCM) module 190 is configured as the operation that control (for example manages) data center 100.In certain embodiments, data center 100 can be referred to as data center.In certain embodiments, peripheral processor can include one or more virtual resources example Such as virtual machine.
Each peripheral processor 170 is configured to communicate via the exchcange core 180 of data center 100.Especially Ground, the exchcange core 180 of data center 100 is configured as between peripheral processor 170 carrying with the relatively low stand-by period For any-to-any connectivity.For example, exchcange core 180 can be configured as in one or more calculate nodes 110 and one or more Data are sent and (for example transmitted) between memory node 140.In certain embodiments, exchcange core 180 can have at least hundreds of Or thousands of ports (for example, the port of export and/or arrival end), can be sent by these port peripheral processing units 170 and/or Receive data.Peripheral processor 170 includes one or more Network Interface Units (such as NIC (NIC), 10G ratios Special (Gb) Ethernet concentrates network adapter (CNA) device), by these Network Interface Units, peripheral processor 170 can Transmit a signal to exchcange core 180 and/or receive signal from exchcange core 180.Signal can be outer via being operably coupled to The physical link and/or Radio Link for enclosing processing unit 170 are sent to exchcange core 180 and/or connect from exchcange core 180 Receive.In certain embodiments, peripheral processor 170 can be configured as based on one or more agreements (such as Ethernet association View, the Ethernet protocol (fibre- of multiprotocol label switching (MPLS) agreement, fibre channel protocol, fiber channel covering Channel-over Ethernet protocol), be related to the agreement (Infiniband-related of infinite bandwidth Protocol)) send data to exchcange core 180 and/or receive data from exchcange core 180.
In certain embodiments, exchcange core 180 can be that (can for example possess function) individually merges exchange (consolidated switch) (such as single large scale merges L2/L3 and exchanges (large-scale consolidated L2/L3 switch)).In other words, exchcange core 180 can be configured as with being for example configured as being connected phase via Ethernet The heterogeneous networks element set of mutual communication as single logic entity (such as single logical network element) on the contrary, grasp Make.Exchcange core 180 can be configured as in data center 100 connecting (for example, communication between being easy to it) calculate node 110th, memory node 140, service node 120 and/or router 130.In certain embodiments, exchcange core 180 can by with It is set to and is communicated via interface arrangement, wherein interface arrangement is configured as with least 10Gb/s rate sending data.In some realities Apply in example, exchcange core 180 can be configured as communicating via interface arrangement (such as fibre channel interface device), the interface Device is configured as with such as 2Gb/s, 4Gb/s, 8Gb/s, 10Gb/s, 40Gb/s, 100Gb/s and/or faster link rate Send data.
Although exchcange core 180 can be logical centralization, the implementation of exchcange core 180 can be that height is distributed , such as reliability.Intersect for example, several parts of exchcange core 180 can be physical distribution, for example, many frames. In some embodiments, the processing level segment of such as exchcange core 180 can be included in the first frame and exchcange core 180 it is another One processing level segment can be included in the second frame.Two processing level segments can serve as individually merging exchange part in logic Point.More details about the architecture of exchcange core 180 will be together described with reference to accompanying drawing 4 to 13.
As shown in fig. 1, exchcange core 180 includes marginal portion 185 and switching fabric 187.Marginal portion 185 can be wrapped Edge device (not shown) is included, can be worked as the gateway apparatus between switching fabric 187 and peripheral processor 170. In some embodiments, the edge device in marginal portion 185 can jointly have thousands of ports (such as 100000 ends Mouth, 500000 ports), it can be sent into (for example, road by data of these ports from peripheral processor 170 By) one or more parts of exchcange core 180 and/or send from one or more parts of exchcange core 180.One In a little embodiments, edge device can be referred to as access and exchange (access switch), network equipment, and/or input/output Module (for example, being shown in Fig. 5 A and Fig. 5 B).In certain embodiments, edge device can be included in the frame of such as frame Push up in (TOR).
Data can peripheral processor 170, exchcange core 180, the switching fabric 187 of exchcange core 180, and/or Put down based on different at (such as the edge device in the marginal portion 185 is included in) place of marginal portion 185 of exchcange core 180 Platform is processed.For example, in one or more peripheral processors 170 and the communication between the edge device of marginal portion 185 Can be the data packet flows based on Ethernet protocol or non-Ethernet protocol definition.In certain embodiments, at a variety of data Reason can be performed in the edge device in marginal portion 185, rather than be performed in the switching fabric 187 of exchcange core 180.Example Such as, packet can be resolvable to cell at the edge device of marginal portion 185, and the cell is sent out from edge device It is sent to switching fabric 187.Cell can be resolved to section (segment) and in the switching fabric 187 as fragment (in some realities Section (flits) can also be referred to as by applying in example) sent.In certain embodiments, packet can be in switching fabric 187 A part of place is resolved to cell.In certain embodiments, Congestion Control Solution and/or the data (example via switching fabric 187 Such as cell) transmitting and scheduling can (for example access be exchanged in the edge device inside the marginal portion 185 of switching centre 180 (access switches)) it is practiced and carried.However, Congestion Control Solution and/or data transmission scheduling cannot be in definition Performed in the module of switching fabric 187.It is related to packet, cell and/or the fragment processing of the component internal of data center More details will be described below.For example, the more details for being related to cell processing will at least combine Figure 16 A to Figure 21 and describe.
In certain embodiments, the edge device in marginal portion 185 can be configured as classification, such as in exchcange core 180 packets received from peripheral processor 170.Especially, the edge in the marginal portion 185 of exchcange core 180 is set The standby classification that can be configured as performing ethernet type, it can be included based on such as the 2nd layer ethernet address (such as media Access Control (MAC) address) and/or the 4th layer of ethernet address (such as universal datagram protocol (UDP) address) classification. In some embodiments, destination can be based on true for example in the classification of the packet of the marginal portion 185 of exchcange core 180 It is fixed.For example, first edge equipment can packet-based Classification and Identification second edge equipment as the packet destination.Packet Cell can be resolvable to and switching fabric 187 is sent to from first edge equipment.Cell can be handed over by switching fabric 187 Change, so that they can be sent to second edge equipment.In certain embodiments, cell can pass through the base of switching fabric 187 In be related to destination and with cell be associated information and exchange.
Security strategy on exchcange core 180 can be applied more effectively, because being sorted in the independent of exchcange core 180 Logical layer, is performed in the marginal portion 185 of exchcange core 180.Especially, many security strategies can be during classifying with relative Unified and seamless way is applied in the marginal portion 185 of exchcange core 180.
Such as Fig. 5 A, Fig. 5 B and Figure 19 descriptions will be combined by being related to the more details of the packet classification in data center.It is related to The additional detail for the packet classification being associated in data center is in entitled " Methods and Apparatus Related to Packet Classification Associated with a Multi-Stage Switch (are related to relevant with multistage exchange Packet classification method and apparatus) " and the U.S. patent application serial number 12/242168 submitted for 30th in September in 2008 and Entitled " Methods and Apparatus for Packet Classification Based on Policy Vectors (method and apparatus of the packet classification based on strategic vector) " and the U.S. patent application serial number submitted for 30th in September in 2008 Described in 12/242172, both is all fully incorporated by reference herein.
Exchcange core 180 can be defined not to be held so as to the classification of data (such as packet) in switching fabric 187 OK.Therefore, although switching fabric 187 can have multistage, but multistage does not need topology to redirect, and be performed in the topology is redirected Data are classified, and switching fabric 187 can define single topology and redirect.As replacement, (core is for example exchanged in edge device Edge device inside the marginal portion 185 of the heart 180) based on classification determine destination information can be used switching fabric Exchange (exchange of such as cell) inside 187.The more details being related in the inner exchanging of switching fabric 187 will combine for example attached Fig. 4 A and 4B are described.
In certain embodiments, the processing for being related to classification can be included in edge device (for example, input/output module) Sort module (not shown) perform.By packet parsing into cell, via switching fabric 187 cell transmission scheduling, packet and/ Or cell restructuring and/or etc. can be held in the processing module (not shown) of edge device (for example, input/output module) OK.In certain embodiments, sort module can be referred to as being grouped sort module, and/or processing module can be referred to as packet Processing module.Fig. 5 B descriptions will be combined by being related to the more details of the edge device including sort module and processing module.
In certain embodiments, one or more parts of data center 100 can (or can include) be based on hardware Module (for example, application specific integrated circuit (ASIC), digital signal processor (DSP), field programmable gate array (FPGA)) and/ Or the module (for example, computer code module, the processor readable instruction sets that can be performed on a processor) based on software. In some embodiments, one or more work(relevant from data center 100 can be included in different modules and/or be tied Close in one or more modules.For example, data center management module 190 can be the combination of hardware module and software module, It is configured as managing the resource (such as resource of exchcange core 180) in data center 100
One or more calculate nodes 110 can be general purpose computing engines, and it can include such as processor, storage Device, and/or one or more Network Interface Units (such as NIC (NIC)).In certain embodiments, calculate node Processor in 110 can be the part in one or more cache coherence domains.
In certain embodiments, calculate node 110 can be host apparatus, server and/or etc..In some embodiments In, one or more calculate nodes 110 can have virtual resources, so that any calculate node 110 (or part thereof) can Be used in alternate data center 100 any other calculate node 110 (or part thereof).
One or more memory nodes 140 can include such as processor, memory, locally-attached magnetic disk storage And/or the device of one or more Network Interface Units.In certain embodiments, memory node 140 can have special module (for example, hardware module and/or software module), is configured such that for example one or more calculate nodes 110 can be via friendship Core 180 is changed to read the data from one or more memory nodes 140 or write data to one or more memory nodes 140.In certain embodiments, one or more memory nodes 140 can have virtual resources, so that any memory node 140 (or part thereof) can be used in alternate data center 100 any other memory node 140 (or part thereof).
One or more service nodes 120 can be the 4th layer to the 7th layer device of open system interconnection (OSI), and it can be wrapped Include such as processor (for example, network processing unit), memory, and/or one or more Network Interface Units (for example, 10Gb with Too net device).In certain embodiments, service node 120 can include hardware and/or software, be configured as to relatively heavy Network live load performs calculating.In certain embodiments, service node 120 can be configured as based on each packet with phase Mode to effectively (such as more effective than being performed in such as calculate node 110) performs calculating.Calculating may include for example full shape The calculating of state fire wall, intrusion detection and prevention (IDP) calculating, extensible markup language (XML) speed-up computation, transmission control protocol (TCP) terminal is calculated, and/or application level load balance is calculated.In certain embodiments, one or more service nodes 120 There can be virtual resources, so that any service node 120 (or part thereof) can be used to inside alternate data center 100 Any other service node 120 (or part thereof).
One or more routers 130 can be network equipment, be configured as connecting at least a portion of data center 100 To another network (such as fhe global the Internet).For example, as shown in figure 1, exchcange core 180 can be configured as by router 130 communicate with network 135 and network 137.Although it is not shown, in certain embodiments, one or more energy of router 130 Communication enough between the activation inner assembly of data center 100 (for example, peripheral processor 170, part of exchcange core 180).It is logical Letter can be defined based on such as layer 3 routing protocol.In certain embodiments, one or more routers 130 can have one Individual or multiple Network Interface Units (for example, 10Gb Ethernet devices), by the Network Interface Unit router 130 can to/ Or send and/or receive signal from such as exchcange core 180 and/or other peripheral processors 170.
It is related to the more details of virtual resources in data center in entitled " Methods and Apparatus for Determining a Network Topology During Network Provisioning (are used for during network provisioning The method and apparatus for determining network topology) " and the Copending U.S. Patent Application No.12/ that is submitted on December 30th, 2008 346623rd, entitled " Methods and Apparatus for Distributed Dynamic Network Provisioning (being used for the method and apparatus that dynamic network supplies distribution) " and submitted on December 30th, 2008 it is common not Certainly U.S. Patent application No.12/346632 and entitled " Methods and Apparatus for Distributed The Dynamic Network Provisioning method and apparatus of distributed dynamic network provisioning (be used for) " and in 2008 12 Illustrated in the Copending U.S. Patent Application No.12/346630 that the moon is submitted on the 30th, all these applications are all quoted herein to be made For reference.
As described above, exchcange core 180 can be configured with the function of independent general switch, it can be by data center Any peripheral processor 170 in 100 is connected to any other peripheral processor 170.Especially, the energy of exchcange core 180 It is configured as providing between exchcange core 180 in peripheral processor 170 (such as relatively many peripheral processor 170) and appoints Meaning connectivity, except those by the bandwidth of Network Interface Unit and by light velocity signaling delay (when also referred to as the light velocity is waited Between) apply limitation outside, substantially without visible limitation, Network Interface Unit connection peripheral processor 170 to friendship Change core 180.In other words, exchcange core 180 can be configured such that each peripheral processor 170 seems direct It is interconnected to the every other peripheral processor in data center 100.In certain embodiments, exchcange core 180 can be configured For enable peripheral processor 170 via exchcange core 180 with line speed (line rate) (or substantially with circuit Speed) communication.Being schematically illustrated in Fig. 2 for any-to-any connectivity is shown.
In addition, exchcange core 180 can handle any peripheral processes for example communicated with exchcange core 180 in the desired manner The migration of virtual resource between device 170, because exchcange core 180 has the function of independent logic entity.Therefore, in periphery Virtual resource migration circle in processing unit 170 can be across the essentially all port (example for being couple to exchcange core 180 Such as, all of the port of the edge device 185 of exchcange core 180).
In certain embodiments, the offer associated with virtual resource migration can be partially by network management module at Reason.The network management entity or network management module of concentration can be closed with network equipment (for example, several parts of exchcange core 180) Make to collect and manage network topological information.For example, because resource is attachment or independently of network equipment, network equipment can be by The information that current operation is coupled to the relevant resource (virtual and physics) of network equipment is pushed to network management module.For example The external management devices of peripheral processor management tool (for example, server management tool) and/or network-management tool can be with Network management module communication sends network provisioning instruction with other resources into network equipment and network, without network Static state description.Such system avoids the difficulty of static network description and by other types peripheral processor 170 and network Network performance is degenerated caused by management system.
In one embodiment, server management tool or external management devices are communicated with network management module with to network Device provides the virtual resource relevant with peripheral processor 170, and determine mode of operation or situation (such as operation, pause or Migration) and virtual resource position in a network.Virtual resource can be that the access in via data center exchanges (example Such as, it is included in the access in marginal portion 187 to exchange) it is coupled to the peripheral processor 170 (for example, server) of switching fabric The virtual machine of upper execution.The peripheral processor 170 of many types can be coupled to switching fabric via access exchange.
Do not rely on what network topological information discovery and/or (including virtual resource is bundled on network equipment) were managed Static network is described, and network management module is exchanged with access and external management devices communicate and cooperates to find or determine network Topology information.Virtual machine on initialization (and/or start) main frame (and/or other kinds of peripheral processor 170) it Afterwards, external management devices can provide the device identifier of virtual machine to network management module.The device identifier can be, example Such as media access protocol (" MAC ") address, virtual machine or peripheral processes of virtual machine or the network interface of peripheral processor 170 Title, globally unique identifier (" GUID "), and/or the virtual resource of device 170 or peripheral processor 170 it is general unique Identifier (" UUID ").GUID is needed not be on all-network, virtual resource, peripheral processor 170, and/or network dress That puts is globally unique, but it is unique in the network or Webisode managed by network management module.In addition, outside The port offer that management entity can provide the access exchange being connected to for the peripheral processor 170 to management virtual machine refers to Order.Access exchange energy detection virtual machine has been initialised, has started, and/or has been moved to peripheral processor 170.Detecting After virtual machine, access, which is exchanged, can inquire peripheral processor 170 about peripheral processor 170 and/or the letter of virtual machine Breath, including such as device identifier of peripheral processor 170 or virtual machine.
Access exchange can inquire or ask for example using such as Link Layer Discovery Protocol (" LLDP "), some be based on other Standard or known protocols, or proprietary protocol virtual machine device identifier information, wherein the virtual machine be configured as via Above-mentioned protocol communication.Alternatively, virtual machine can detect its be already connected to access exchange after, using for example with The too information (device identifier for including virtual machine) of net or IP broadcast multicasting on its own.
Access exchange and then push virtual bench device identifier (sometimes referred to as virtual unit identifier) and, In certain embodiments, the other information from virtual machine reception is to network management module.Handed in addition, access exchange energy pushes access The device identifier and the port identifiers of access switching port changed is to network management module, the peripheral processes dress of control virtual machine Put 170 and be connected to the access exchange.The informational function is used as the description of virtual machine position in network, and define will be virtual Machine is bundled into the peripheral processor 170 for network management module and external management devices.In other words, the letter is being received After breath, network management module can by the device identifier of virtual machine to it is specific access exchange on particular port it is related Connection, the virtual machine (and/or peripheral processor 170 of operation virtual machine) is connected to the specific access and exchanged.
Device identifier that the device identifier of virtual machine, access are exchanged, port identifiers and carried by external management devices The supply instruction of confession can be stored in the accessible memory of network management module.For example, the device identifier of virtual machine, Accessing the device identifier exchanged and port identifiers can be stored in the memory for being configured as database, so that Device identifier, port identifiers and the supply that the database query of device identifier based on virtual machine returns to access exchange refer to Order.
Because network management module can be based on virtual machine position of the device identifier to virtual machine in a network it is related Connection, external management devices do not need the topology of attentional network or are bundled on peripheral processor 170 to provide net by virtual machine Network resource (for example, network equipment, virtual machine, virtual switch or physical server).In other words, external management devices are as network In interconnection and virtual machine in a network position (for example, in a network which access exchange which port, which periphery In processing unit 170) it is equally unknowable, and can the equipment based on the virtual machine that peripheral processor 170 is controlled in network The access that identifier is provided in network is exchanged.In certain embodiments, external management devices can also provide physical peripheral processing dress Put 170.In addition, because network management module be dynamically determined and manage network topological information, external management devices not against for The network static description of supply network.
As used in this specification, supply can include polytype or the device and/or software module of form Set, configure and/or adjust.For example, supply can include the network equipment that such as network switch is configured based on network strategy. More particularly, for example, network provisioning can include:Configuration network device is operated as the 2nd layer or layer 3 network interchanger;Change Become the routing table of network equipment;Update be operably coupled to network equipment equipment security strategy and/or device address or Device identifier;Selection network equipment is implemented using which procotol;Set for example for the virtual of network equipment port The network segment identifier of LAN (" VLAN ") mark;And/or application access control listses (" ACL ") arrive network equipment.Should Network exchange function is provided or configured, so that the rule and/or access restriction that are defined by network strategy are applied to from network The packet that interchanger passes through.In certain embodiments, virtual bench is provided.Virtual bench can for example be realized empty Intend the software module of exchange, virtual router or virtual gateway, it is configured as the medium operation between physical network And it is controlled by the host apparatus of such as peripheral processor 170.In certain embodiments, supply can include setting up virtual terminal Mouth or the connection between virtual resource and virtual bench.
Fig. 2 is the signal of the example for the part for showing the data center with any-to-any connectivity according to one embodiment Figure.It is each (210 groups of peripheral processor is come from) as shown in Fig. 2 peripheral processor PD to be connected to via exchcange core 280 Individual peripheral processor 210.In certain embodiments, for clarity, only from peripheral processor PD to other peripheral processes The connection of device 210 (except peripheral processor PD) is illustrated.
In certain embodiments, exchcange core 280 is defined, so that exchcange core 280 is fair in some sense, The bandwidth of purpose link i.e. between peripheral processor PD and other peripheral processors 210 is by substantially reasonably competing Shared between the peripheral processor 210 striven.For example, when some (or whole) peripheral processors 210 shown in Fig. 2 attempt When preset time accesses peripheral processor PD, access peripheral processor PD's available for each peripheral processor 280 Bandwidth (for example, i.e. Time Bandwidth) will be substantially identical.In certain embodiments, exchcange core 280 can be configured such that Some (or all) peripheral processors 210 can with peripheral processor PD with full bandwidth (for example, peripheral processor PD's is complete Bandwidth) and/or communicated in clog-free mode.In addition, exchcange core 280 can be configured such that by peripheral processor (coming from peripheral processor 210) to peripheral processor PD access can not be by other peripheral processors and peripheral processes Other links (for example, in the presence of or attempt) limitation between device PD.
In certain embodiments, the attribute of exchcange core 280, any-to-any connectivity, low latency, fairness and/or is waited Etc. given type (such as memory node type, the calculating section for enabling to be connected to (for example, communicating with) exchcange core 280 Vertex type) peripheral processor 210 can interchangeably be treated (for example, relative to other processing units 210 and exchange core The position of the heart 280 is independent).This can be referred to as interchangeability, and can promote to include the validity of the data center of exchcange core 280 And simplification.Even if exchcange core 280 may have substantial amounts of port (for example, more than 1000 ports), exchcange core 280 is still Can have any-to-any connectivity and/or the attribute of fairness, so that each port can operate (example at a relatively high speed Such as, operated with the speed more than 10Gb/s).This need not necessarily include the special interconnection of such as supercomputer and/or is not required to The complete prophet of all communication patterns can be achieved with.It is related to the exchcange core system with any-to-any connectivity and/or fairness The more details of structure will be described with reference to accompanying drawing 4 to 13 at least in part.
Referring again to Fig. 1, in certain embodiments, data center 100 is configured as allowing flexible over-booking (oversubscription).In certain embodiments, by flexible over-booking, network infrastructure is (for example, be related to exchange The network infrastructure of core 180) the relative cost cost that for example can calculate and store relatively be lowered.For example, in number Can be as flexibly merging resource operation, so that with the according to the resource (such as all resources) in the exchcange core 180 at center 100 The associated resource underused of one application (or application collection) can be by second during the peak value processing of the such as second application Dynamically provided and used using (or application collection).Therefore, the resource (or subset of resource) of data center 100 can be configured as If being strictly assigned as storage resources than resource and distributing to application-specific (or application collection) and can more effectively handle excess to order Purchase.If managed as storage resources, over-booking can be implemented only in storage resources, rather than for example across whole number According to center 107.In certain embodiments, the one or more agreements and/or component in data center 100 can be based on open marks Accurate (such as Institute of Electrical and Electric Engineers (IEEE) standard, Internet Engineering Task group (IETF) standard, international information technology Standard committee (INCITS) standard).
In certain embodiments, data center 100 can support the safe mode for allowing to implement wide scope strategy.For example, number No communication strategy can be supported according to center 100, wherein application rests on the independent virtual data center of data center 100, but It is that can share identical physical peripheral processing unit (such as calculate node 100, memory node 140) and network infrastructure (such as exchcange core 180).In some configurations, data center 100 can support multiprocessing and the needs of same application part Communicated almost without limitation.In some configurations, data center 100 can support to need for example to go deep into packet inspection, total state to prevent The strategy of wall with flues and/or stateless wave filter.
When data center 100 can have end-to-end be applied to based on source stand-by period, zero load stand-by period, congestion wait Between and application stand-by period (also referred to as end-to-end stand-by period) for defining destination stand-by period.In certain embodiments, The source stand-by period can be the time for example paid during the processing of source peripheral processor (for example, by software and/or NIC branch The time gone out).Similarly, the destination stand-by period can for example be paid during the processing of destination peripheral processor Time (for example, the time paid by software and/or NIC).In certain embodiments, zero load delay can be that light velocity delay adds On processing inside such as data center 180 and storage forward delay.In certain embodiments, the congestion stand-by period can be, The queueing delay for example as caused by the congestion in network.Data center 100 can have the low end-to-end stand-by period to activate application Expectation application performance, the application for for example with real-time constraint and/or with LEVEL INTERNAL handle communication requirement should Latency-sensitive.
The zero load stand-by period of exchcange core 180 can be significantly less than in the data with the interconnection redirected based on Ethernet Heart core is significantly reduced.In certain embodiments, for example, exchcange core 180 can have from the input port of exchcange core 180 It is less than the zero load stand-by period of 6 microseconds to the output port of exchcange core 180 (except the light velocity stand-by period).In some embodiments In, for example, exchcange core 180 can have the zero load stand-by period less than 12 microseconds (except congestion stand-by period and light velocity etc. Treat the time).Data center core based on Ethernet is due to the substantially high stand-by period, such as undesirable to gather around Fill in rank (such as the congestion between link).Congestion in the data center core based on Ethernet may due to based on The data center's core (or managing device relevant with data center's core based on Ethernet) netted very much incapability and aggravate, So as to handle congestion in an undesired manner.In addition, the stand-by period in the data center core based on Ethernet can To be skimble-scamble, because core is between not homologous-destination pair and/or many storages forward energy between switching node Redirected with different number of, the classification of packet is performed in the storage forwards switching node.On the contrary, exchcange core 180 The marginal portion 185 that is sorted in perform, without being performed in switching fabric 187, and exchcange core 180 has and deterministic is based on The switching fabric 187 of cell.For example, (rather than passing through switching fabric by the cell processing latency of switching fabric 187 187 cell path) can be predictable.
The exchcange core 180 of data center 100 can provide lossless end-to-end packet transmission, be based, at least in part, in data The flow control mechanism performed in the heart 100.For example, via the data of switching fabric 187 (for example, the number relevant with packet According to) transmitting and scheduling is performed on the basis of cell using request grant mechanism (also referred to as asking authentication mechanism).Especially, exist Send cell request have been based on substantially authorize transmission it is (lossless) be authorized to after, cell is sent to switching fabric 187 (being for example sent to switching fabric 187 from marginal portion 185).Once being allowed to enter switching fabric 187, cell is in switching fabric Handled in 187 as fragment.Clip stream in switching fabric 187 can be controlled further, for example, so work as switching fabric When congestion in 187 is detected, fragment is not lost.It is related to the cell in exchcange core 180 and the more details of fragment processing It will be described below.
Pass through in addition, can be terminated to by data flow of the switching fabric 187 from each peripheral processor 170 Data flow of the switching fabric 187 from remaining peripheral processor 170.Especially, in one or more peripheral processors Influence does not pass through the data flow of the switching fabric 187 of exchcange core 180 to 170 data congestion in an undesired manner, because The marginal portion 185 of exchcange core 180, sends request and has been authorized to work, cell is sent only to the friendship of exchcange core 180 Change structure 187.For example, the high-level data traffic in the first peripheral processor 170 can authorize congestion solution based on request Certainly mechanism is processed, so that the high-level data traffic in the first peripheral processor 170 will not negatively affect second Peripheral processor 170 is linked into the independent logic entity of exchcange core 180.In other words, when being allowed to enter exchcange core During 180 switching fabric 187, the traffic associated with the first peripheral processor 170 will be isolated (for example, from congestion angle Degree is isolated) in the traffic relevant with the second peripheral processor 170.
Equally, the data packet flows that can be resolvable in the exchcange core 180 of cell and fragment can be in peripheral processor 170 flow control mechanisms based on fine granulation (fine grain) are controlled.In certain embodiments, the flow of fine granulation Level segment of the control based on queue is performed.The flow Control Cooling of fine granulation can prevent (or being essentially prevented) from causing bad luck The end of a thread obstruction (head-of-line blocking) of network usage.The flow control of fine granulation can also be used for reduction Stand-by period in (or reduction) exchcange core 180.In certain embodiments, the flow control of fine granulation can activate high-performance Block sends the disk traffic to peripheral processor 170 and from the reception of magnetic disc traffic of peripheral processor 170, the peripheral processes Device 170 cannot use Ethernet and internet (IP) network to realize in the desired manner.It is related to the flow control of fine granulation The more details combination accompanying drawing 22 to 25 of system is described.
In certain embodiments, data center 100, and especially, exchcange core 180 can have modular architecture. Especially, the exchcange core 180 of data center 100 can be implemented in small-scale place's starting and can be according to needing extension (for example Increase extension).Exchcange core 180 can be expanded and need not substantially interrupt the continuous operation of existing network and/or can expand Exhibition without the new equipment of exchcange core 180 should in physical placement it is constrained.
In certain embodiments, one or more parts of exchcange core 180 can be configured to be based on Virtual Private Network (" VPN ") is operated.Especially, exchcange core 180 can be divided so as to which one or more peripheral processors 170 can be configured as Via exchcange core 180 it is overlapping or it is nonoverlapping virtualization divide communication.Exchcange core 180 can also be broken down into separation or The virtual resources of overlapping subset.In other words, exchcange core 180 can be the independent exchange that can be divided with flexi mode. In certain embodiments, this method can make it that one extension is networked in the merging exchcange core 180 of data center 100.This with Data center is on the contrary, data center can be the set of independent scalable network, and each of the network has customization and/or special Fixed resource.In certain embodiments, defining the Internet resources of exchcange core 180 can be merged so as to which it can effectively make With.
In certain embodiments, data center management module 190 can be configured as defining physics (and/or virtual) resource void That intends is multi-level, resource definition data center 100.For example, data center management module 190 is configured as defining virtually Multi-level, it can embody the application width of data center 100.In certain embodiments, relatively low rank can (in two ranks) With including virtual application cluster (VAC), its can be allocated to and belong to (for example, being controlled by) one or more entities (for example, Management entity, financial rule) physics (or virtual) resource set being used alone.(in two ranks) higher level can be wrapped Include virtual data center (VDC), it can include the VAC collection for belonging to (for example, being controlled by) one or more entities.One In a little embodiments, data center 100 includes multiple VAC, and each of which may belong to different management entities.
Fig. 3 is the schematic diagram for the logical groups 300 for showing the resource associated with data center according to one embodiment.As schemed Shown in 3, logical groups 300 include virtual data center VDC1, virtual data center VDC2, and virtual data center VDC3 is (together It is referred to as VDC).Equally, as shown in figure 3, each VDC includes virtual application cluster VAC (such as the VAC32 in VDC3).It is each The physics or virtual part of the data center of individual VDC aspects data center 100 as shown in Figure 1 are (for example, the portion of exchcange core Point, the virtual machine inside the part of peripheral processor and/or peripheral processor) logical groups.For example, each in VDC Individual VAC embodies the logical groups of the peripheral processor of such as calculate node.For example, VDC1 can embody typical data center part Logical groups, and VAC22 embody VDC1 in peripheral processor 370 logical groups.As shown in figure 3, each VDC can be based on One group of tactful PY that can be configured as example being defined on operating parameter allowed band in the application of operation in VDC (can also be claimed For business rules) it is managed.In certain embodiments, VDC can be referred to as the first layer (tier) of logical resource, and VAC is claimed For the second layer of logical resource.
In certain embodiments, VDC (and VAC) can be established, so that the resource associated with data center is to expect Mode be managed for example, by entity, the entity use (for example, hires out, possess, by its communication) data center resource And/or the manager of data center resource.For example, VDC1 can be the virtual data center associated with financial rule, and VDC2 can be the virtual data center associated with telecommunications service provider.Therefore, tactful PY1 can be defined by financial rule So as to which VDC1 (and the physics associated with VDC1 and/or virtual data center resource) can be with different from the pipe based on tactful PY2 Reason VDC2 (and the physics relevant with VDC2 and/or virtual data center resource) mode is managed, and PY2 strategies are taken by telecommunications The definition of business provider.In certain embodiments, one or more tactful (for example, strategy PY1 parts) are by network manager Set up, so as to when implemented, be carried between the VDC1 relevant with financial rule and the VDC2 relevant with telco service provider For information security and/or fire wall.
In certain embodiments, strategy can be associated with data center management (not shown) (or integrated wherein).For example, VDC2 can be based on strategy PY2 (or strategy PY2 subset) management.In certain embodiments, data center management can be configured as, For example monitor the real-time performance applied in VDC and/or can be configured as distributing or deallocate resource automatically meeting for VDC The corresponding strategy of interior application.In certain embodiments, strategy can be configured as based on time threshold operation.For example, one or many Individual strategy can be configured as based on for example in the special time of one day or parameter value during certain day of one week (for example, the traffic Rank) change periodic event (for example, predictable periodic event) work.
In certain embodiments, strategy can be defined based on high level language.Therefore, strategy be able to can be connect with relative The mode entered is provided.The example of strategy includes information security policy, the Fault Isolation Strategy, firewall policy, performance guarantee strategy (being for example related to by the strategy of the service class of application implementation), and/or other be related to the management strategy (example of information protection or acquisition Such as manage isolation strategy).
In certain embodiments, strategy can be implemented in packet sort module, exemplified by the packet sort module can be configured Such as, grouped data packet at peripheral processor (for example, IP packets, session control protocol packet, media packet, define Packet).For example, implementing in the packet sort module that tactful access that can be in the marginal portion of exchcange core is exchanged.Point Class can include the processing of any execution, so that packet can be based on strategy in data center (for example, the exchange core of data center The heart) in be processed.In certain embodiments, strategy includes one or more tactful bars associated with instruction that is being performed Part.Strategy can be, if such as packet has the certain types of network address (policy condition), route data packet To the strategy of specific destination (instruction).Packet classification can include determining whether policy condition has met, so that the instruction energy Enough it is performed.For example, one or more parts (for example, field, payload, address part, port section) of packet Sort module analysis can be grouped based on the policy condition defined in strategy.When policy condition is met, packet can be based on The instruction associated with policy condition is performed.
In certain embodiments, one or more parts of logical groups 300 can be configured as with from multiple remote locations " lights-out " (" lights out ") pattern operation-for example for each VDC independent position and one or two status of a sovereign Put and carry out control logic group 300.In certain embodiments, the data center with logical groups for example shown in Fig. 3 can be configured as The personnel of not needing physically can just operate in data center side.In certain embodiments, data center has enough redundancy money Source to adapt to the generation of failure, such as one or more peripheral processors (such as the peripheral processor in VAC) therefore The failure of barrier, the failure of data center management module, and/or exchcange core component.When in data center (such as in data In the data center management of the heart) monitoring software when indicating that the failure has arrived at predetermined threshold, personnel can be notified and/or send Send to replace the component of the failure.
As shown in figure 3, VDC can be logical groups independent mutually.In certain embodiments, data center is (such as in Fig. 1 It is shown) resource (for example, virtual resource, physical resource) can be divided into it is different compared to logical groups shown in Fig. 3 Logical groups 300 (for example, different layers of logical groups).In certain embodiments, two or more VDC of logical groups 300 are overlapping.Example Such as, the resource (for example, physical resource, virtual resource) that the first VDC can be with the 2nd VDC shared datas center.Especially, first A part for VDC exchcange core can be shared with the 2nd VDC.In certain embodiments, it may for example comprise in the first VDC VAC In resource can be included in the 2nd VDC VAC.
In certain embodiments, one or more VDC can be by manual definition (for example, by network manager manual definition) And/or automatic definition (such as based on tactful automatic definition).In certain embodiments, VDC can be configured as changing (such as dynamic Change).For example, VDC (such as VDC1) can be included in the specific resources collection in a time cycle and can be included in one not Different resource collection with (such as separate time cycle, overlapping time cycle) in the time cycle is (such as separate Resource set, overlapping resource set).
In certain embodiments, one or more parts of data center can be in response to changing, before changing or changing It is provided dynamically during change, the change is related to VDC (the such as the same VDC of VDC virtual machine part migration).For example, number Can be including multiple network equipments, such as network switch (network switches) according to the exchcange core at center, each is deposited Storage includes providing the configuration template database of service order, and the service order is provided and/or asked by virtual machine.When virtual machine to And/or be connected on the server of network switch port of exchcange core migrate and/or initialize or start when, server The identifier that the service provided by virtual machine is provided can be sent to the network switch.Network equipment can be based on the identifier from configuration Option and installment template in template database, and provide port and/or server based on the configuration template.So, supply network end It is distributed in the network switch that the task of mouth and/or device can be in exchcange core (for example, being distributed, not needing in an automatic fashion Redefine template distribution), and can be migrated as virtual machine dynamic change or resource between peripheral processor.
In certain embodiments, supply can include multiple types or the device and/or software module of form set, configured And/or adjustment.For example, supply can include based in the tactful configuration data center of one in the tactful PY for example shown in Fig. 3 Network equipment, such as network switch.More particularly, for example, the supply for being related to data center can be including one in following Or it is multiple:Configuration network device is to be used as network router or network switch operation;Change the routing table of network equipment;Update Security strategy and/or address or the identifier for being operably coupled to network equipment equipment;Which selection network equipment will implement Individual procotol;Virtual Local Area Network (" VLAN ") of the Webisode identifier for example for network equipment port is set to mark; And/or application access control listses (" ACL ") arrive network equipment.A part for data center can be supplied or configure, so that by The rule and/or access restriction that tactful (for example, PY3) is defined, which are employed and (applied for example, handling by classifying) to arrive, passes through data The packet of the part at center.
In certain embodiments, the virtual resource associated with data center can be supplied.Virtual resource can be for example, Implement software module, the virtual router of virtual switch (virtual switch), or be configured to as in physical network and void The virtual gateway that medium is operated between plan resource, virtual resource is controlled by the master device of such as server.In certain embodiments, Virtual resource can be by master device control.In certain embodiments, supply can include setting up virtual resource and virtual bench it Between virtual port or connection.
It is related to the more details of virtual resources in data center in entitled " Method and Apparatus for Determining a Network Topology During Network Provisioning (are used for during network provisioning The method and apparatus for determining network topology) " and the Copending U.S. Patent Application No.12/ that is submitted on December 30th, 2008 346623rd, entitled " Methods and Apparatus for Distributed Dynamic Netowrk Provisioning (being used for the method and apparatus that dynamic network supplies distribution) " and submitted on December 30th, 2008 it is common not Certainly U.S. Patent application No.12/346632, entitled " Methods and Apparatus for Distributed Dynamic Network Provisioning (being used for the method and apparatus that dynamic network supplies distribution) " were simultaneously submitted on December 30th, 2008 Copending U.S. Patent Application No.12/346630 in illustrate, it is all these application herein all quote is used as reference.
Fig. 4 A are to show the schematic diagram of switching fabric 400 that can be included in exchcange core according to one embodiment. In some embodiments, switching fabric 400 can be included in the exchcange core of the exchcange core 180 for example shown in Fig. 1.As schemed Shown in 4A, switching fabric 400 is three-level, clog-free Clos (clo this) network, and including the first order 440, the second level 442 With the third level 444.The first order 440 includes module 412 (each of which can be referred to as Switching Module or Cell Switch).The first order 440 each module 412 is the integrated of electronic building brick and circuit.In certain embodiments, for example, each module is special Integrated circuit (ASIC).In other embodiments, multiple modules are comprised on a single ASIC.In some embodiments In, each module is the integrated of discrete electronic components.In certain embodiments, it can be referred to as with multistage switching fabric many Level switching fabric.
In certain embodiments, each module 412 of the first order 440 can be Cell Switch.Cell switching function It is configured as efficiently redirecting data (for example, fragment), because it is flowed by switching fabric 400.In certain embodiments, For example, each module 412 of the first order can be configured as redirecting data based on the information being included in swap table.At some In embodiment, such as the data redirection of the cell in 400 grades of switching fabric can be referred to as exchanging (for example, data exchange) or If data are in the form of cell in switching fabric 400, referred to as Cell Switch.In certain embodiments, switching fabric 400 Module in exchange can be based on information (for example, header) for example associated with data.Held by the module of switching fabric 400 Capable exchange can with edge device (for example, the edge in the marginal portion 185 of exchcange core 180 shown in Fig. 1 is set It is standby) the internal ethernet type classification difference performed.In other words, the exchange in the module of switching fabric 400 cannot base In such as the 2nd layer ethernet address and/or the 4th layer of ethernet address.Being related to the more details based on swap table data exchange will With reference to Fig. 4 B descriptions.
In certain embodiments, each Cell Switch is also operably coupled to storage buffer (example including multiple Such as, lead directly to buffer (cut-through buffer)) the input port for writing interface.In certain embodiments, storage buffering Device is included in buffer module.Similarly, output port collection can be operably coupled to the reading interface of storage buffer Place.In certain embodiments, storage buffer can be with to all defeated using static RAM on piece (SRAM) Inbound port provide enough bandwidth be used to writing per a period of time one enter cell (for example, part of packet) and to All output ports provide the shared storage buffer that enough bandwidth are used to read a removal cell per a period of time.It is each Individual Cell Switch operation is similar to the exchange in length and breadth (crossbar switch) that can be configured after every a period of time.
In certain embodiments, storage buffer is (for example, the storage buffer of joint particular port and/or stream is several Part) there is the module (for example, module 412) that enough sizes (for example, length) are used in switching fabric 400 to implement to exchange (for example, Cell Switch, data exchange) and/or data (for example, cell) are synchronous.However, storage buffer can have pair Implement in the not enough size (and/or too short processing latency) of the module (for example, module 412) in switching fabric 400 Congestion Control Solution.For example the Congestion Control Solution of mandate/request mechanism can be set at edge for example associated with exchcange core Implement at standby (not shown), but the data queue relevant with Congestion Control Solution can not be used for using storage buffer Implement in module in switching fabric 400.In certain embodiments, one or more storages in module (for example, module 414) There is device buffer inadequate size (and/or too short processing latency) to be used to for example be binned in the data at module (for example, cell).The more details for being related to shared storage buffer will be with reference to accompanying drawing 15 and entitled " Methods and Apparatus Related to a Shared Memory Buffer for Variable-Sized Cells (are related to variable Change the method and apparatus of the shared storage buffer of sized cells) " and the copending United States submitted on March 31st, 2009 are special Described in profit application No.12/415517, the patent application is incorporated by reference completely herein.
In alternative embodiments, each module of the first order can be the exchange in length and breadth with input port and delivery outlet Machine.Each input lever (bar) is connected to each take-off lever by multiple exchanges in crossbar switch.Work as crossbar switch Interior exchange is at " unlatching " position, and input is operably coupled to output and data can flow.Alternatively, ought hand in length and breadth When exchange in changing is located at " closing " position, input is not operably coupled to output and data do not flow.So, hand in length and breadth Which input lever exchange in changing planes controls be operably coupled to take-off lever.
Each module 412 of the first order 440 collects including input port 460, is configured as data and enters switching fabric Data are received when 400.In this embodiment, each module 412 of the first order 440 includes equal number of input port 460.
Similar to the first order 440, the second level 442 of switching fabric 400 includes module 414.The module 414 of the second level 442 Similar in construction to the module 412 of the first order 440.Each module 414 of the second level 442 is operable by data path 420 Ground is couple to each module of the first order 440.In each module and each module of the second level 442 of the first order 440 Each data paths 420 between 414 are configured as promoting mould of the data from the module 412 of the first order 440 to the second level 442 Block 414 is transmitted.
Data path 420 between the module 412 of the first order 440 and the module 414 of the second level 442 can be with any side Formula builds the (example in the desired manner of module 414 of the module 412 that is configured to promote data from the first order 440 to the second level 442 Such as, in an efficient way) transmit.In certain embodiments, for example, data path is the optical connector of intermodule.In other realities Apply in example, data path is in midplane.Such midplane can be similar to what is described here in the way of more details.So Midplane can be efficiently used for each module of the second level being connected to each module of the first order.In other reality Apply in example, module is comprised in single chip bag, and the data path is electron trajectory.
In certain embodiments, switching fabric 400 is clog-free Clos (clo this) network.So, switching fabric 400 The number of the input port 460 of each module 412 of the number of module 414 of the second level 442 based on the first order 440 and change. In the clog-free Clos of rearrangable (clo this) network (for example, Benes (David Barnes) network), the module 414 of the second level 442 Number is more than or equal to the number of the input port 460 of each module 412 of the first order 440.So, if n is the first order The number and m of the input port 460 of 440 each module 412 are the number of the module 414 of the second level 442, m >=n. In some embodiments, for example, each module of the first order has 5 input ports.So, the second level has at least five module. All 5 modules of the first order are operably coupled to all 5 modules of the second level by data path.In other words, Each module of one-level can send data to any module of the second level.
The third level 444 of switching fabric 400 includes module 416.The module 416 of the third level 444 is similar in construction to first The module 412 of level 440.The number of the module 416 of the third level 444 is equal to the number of the module 412 of the first order 440.The third level 444 Each module 416 include output port 462, output port be configured as allow data sent out from switching fabric 400.3rd Each module 416 of level 444 includes equal number of output port 462.In addition, each module 416 of the third level 444 The number of output port 462 is equal to the number of input port 460 of each module 412 of the first order 440.
Each module 416 of the third level 444 is connected to each module of the second level 442 by data path 424 414.Data path 424 between the module 414 of the second level 442 and the module 416 of the third level 444 is configured as promoting data Transmitted from the module 414 of the second level 442 to the module 416 of the third level 444.
Data path 424 between the module 414 of the second level 442 and the module 416 of the third level 444 can be with any side Formula is constructed to be configured to effectively to promote data to transmit to the module 416 of the third level 444 from the module 414 of the second level 442. In some embodiments, for example, data path is the optical connector in intermodule.In other embodiments, data path is in In plane.Such midplane is similar to what is be described in detail here.Such midplane can be efficiently used for the second level Each module be connected to each module of the third level.In another embodiment, module is comprised in single chip In bag and data path is electron trajectory.
Fig. 4 B are to show the swap table that can be stored in the memory 498 of module as shown in Figure 4 A according to one embodiment 49 schematic diagram.For example in second level module 414 shown in Fig. 4 A the module (such as Switching Module) of one can be configured as being based on The swap table of swap table 49 performs Cell Switch for example shown in Fig. 4 B.For example, swap table 49 (or swap table of similar configuration) It can be used in by the module in (and/or being included) previous module for example, determining that can cell via another grade of mould Module in block is sent to its destination.In certain embodiments, cell can be sent to its destination via the module Module is with being referred to as switching purpose.Especially, switching purpose can (it can based on the destination information including such as cell It is determined outside switching fabric 400) searched in swap table 49.
Swap table 49 includes binary value (for example, binary value " 1 ", binary value " 0 "), and its expression is worth by destination DT1 to DTk (being shown in 47 rows) represent one or more destinations can by by module value SM1 to SMM (48 row in Show) represent one or more modules (it can be located at adjacent level) arrival.Especially, when in the row including binary value When destination (for example, destination DT1) can be reached via the module (for example, module SM2) in the row intersected with row, swap table 49 include binary value " 1 ".When the destination in the row including binary value can not be via with arranging the mould in the row intersected When block is reached, swap table 49 includes binary value " 0 ".For example, the binary value " 1 " in each entry at 46 represent if Module (including swap table 49) sends data to the module represented by module value SM1 to SM3, then data can finally be sent to by The destination that destination value DT3 is represented.In certain embodiments, module can be configured as random selection by module value SM1 to SM3 A module in the module group that (its be switching purpose) is represented, and selected module can be transmitted data to, from And data can be sent to the destination represented by destination value DT3.
In certain embodiments, destination value 47 can be the edge device with such as exchcange core (for example, access is exchanged Machine), the associated destination port value of server etc. that is communicated with edge device.In certain embodiments, destination value (its Corresponding at least one the destination value 47 being included in swap table 49) can be based on the packet being for example included in cell Classification is associated with cell (for example, being included in cell header).Therefore, the destination associated with cell value can pass through module It is used for using swap table 49 with inquiring about switching purpose.Packet classification (can be exchanged in the edge device of exchcange core for example, accessing Machine) it is performed.
In certain embodiments, memory (and such swap table 49) can be included in the module of one or more modules In system.In certain embodiments, swap table 49 can with the more than one input port of modular system (or multiple systems) and/or More than one output port is associated.Being related to the more details of modular system will be described with reference to Fig. 7.
Fig. 5 A are the schematic diagrames for showing switching fabric system 500 according to one embodiment.Switching fabric system 500 includes many Individual input/output module 502, the first cable collection 540, the second cable collection 542 and switching fabric 575.Switching fabric 575 includes portion The first switching fabric part 571 in shell 570 or frame is affixed one's name to, and second be deployed in shell 572 and frame exchanges Structure division 573.
Input/output module 502 (it can be such as edge device) is configured as to and/or from the first switching fabric portion Points 571 and/or second switching fabric part 573 send data and/or receive data.In addition, each input/output module 502 include analytical capabilities, classification feature, forwarding capability and/or queuing and scheduling feature.So, packet parsing, packet classification, Packets forwarding and packet queue and scheduling all enter the first switching fabric part 571 and/or the second switching fabric in packet Occur before part 573.Therefore, these functions need not be performed in every one-level of switching fabric 575, and switching fabric part 571,573 each module (being described in further detail here) need not include the ability for performing these functions.This can be reduced Cost, power attenuation, the cooling of each module of switching fabric part 571,573 are required and/or physical extent needs.This can also Reduce the stand-by period associated with switching fabric.In certain embodiments, for example, the end-to-end stand-by period is (i.e. by exchanging Structure sends the time required for data from input/output module to another input/output module) can be than being assisted using Ethernet The end-to-end stand-by period of the switching fabric system of view is lower.In certain embodiments, switching fabric part 571,573 handle up Amount is only constrained by Connection Density rather than power and/or the heat limitation of switching fabric system 500.In certain embodiments, Input/output module 502 (and/or function associated with input/output module 502) can be included in, for example, such as Fig. 1 institutes In edge device in the marginal portion of the exchcange core shown.Analytical capabilities, classification feature, forwarding capability and queuing and scheduling work( It is able to can be similar in entitled " Methods and Apparatus Related to Packet Classification Associated with a Multi-Stage Switch (method and apparatus for being related to the packet classification exchanged about multistage) " And the U.S. patent application serial number 12/242168 and entitled " Methods and Apparatus submitted for 30th in September in 2008 The for Packet Classification Based on Policy Vectors (sides of the packet classification based on policy vector Method and equipment) " and function disclosed in the U.S. patent application serial number 12/242172 that September in 2008 is submitted on the 30th perform, this Both are all fully incorporated by reference herein.
Each input/output module 502 is configured as the first end of the cable of the first cable collection 540 being connected to the second electricity The first end of the cable of cable collection 542.Each cable 540 is between the switching fabric part 571 of input/output module 502 and first Deployment.Similarly, each cable 542 is disposed between the switching fabric part 573 of input/output module 502 and second.Use First cable collection 540 and the second cable collection 542, each input/output module 502 can exchange knot to and/or from first respectively The switching fabric part 573 of structure part 571 and/or second sends data and/or receives data.
First cable collection 540 and the second cable collection 542 can be by suitable in input/output modules 502 and switching fabric part Any materials composition of data is transmitted between 571,573.In certain embodiments, for example, each cable 540,542 is by many Optical fiber is constituted.In such embodiments, each cable 540,542 can have 12 and send and 12 root receiving fibers.Often 12 of a piece cable 540,542, which send optical fiber, to be believed including 8 optical fiber for sending data, 1 for sending control Number optical fiber, and 3 be used for growth data capacity and/or the optical fiber for redundancy.Similarly, each cable 540,542 12 root receiving fibers can include 8 be used to sending the optical fiber of data, 1 be used to send the optical fiber of control signal, and 3 Optical fiber for growth data capacity and/or for redundancy.In other embodiments, any number of optical fiber can by comprising In each cable.
First switching fabric part 571 and the second switching fabric part 573 are used for redundancy and/or bigger capacity together. In other embodiments, only one switching fabric part is used.Still in other embodiments, more than 2 switching fabric portions Divide and be used for increased redundancy and/or bigger capacity.For example, 4 switching fabric parts can be operably by such as 4 Cable is couple to each input/input module.Second switching fabric part 573 is structurally and functionally similar to first and handed over Change structure 571.Therefore, the first switching fabric part 571 is only described in detail here.
Fig. 5 B are the schematic diagrames for showing input/output module 502 according to one embodiment.As shown in Figure 5 B, input/output Module 502 includes sort module 596, processing module 597, and memory 598.Sort module 596 can be configured as performing data Classification, the ethernet type classification of such as packet.
The all kinds of data processing can be performed in processing module 597.For example, data, for example packet can be in processing module Cell is resolvable at 597.In certain embodiments, Congestion Control Solution can be implemented and/or via friendship at processing module 597 Changing data (such as cell) transmitting and scheduling of structure (for example, switching fabric 400 shown in Fig. 4 A) can hold at processing module 597 OK.Processing module 597 can also be configured as that information (for example, header information, destination information, source information) is connected into and for example believed First net load, cell net load can be used by switching fabric (for example, switching fabric 400 shown in Fig. 4 A) cell-switching (base In swap table as shown in Figure 4 B).
When data processing is performed at sort module 596 and/or processing module 597, data (such as packet, cell) One or more parts can be stored in (for example, queuing) memory 598.For example, being related to congestion solution when processing module 597 is performed Certainly during the processing of scheme, being resolvable to the data of cell can queue up in memory 598.Therefore, memory 598 can have enough Size to implement the Congestion Control Solution as described in accompanying drawing 16A to accompanying drawing 21.
One of Fig. 5 A switching fabric system 500 including the first switching fabric part 571 is shown in greater detail in Fig. 6 Point.First switching fabric part 571 includes interface card 510, its first order and third level phase with the first switching fabric part 571 Association;Interface card 516, it is associated with the second level of the first switching fabric part 571;And midplane 550.In some implementations The first switching fabric part 571 includes 8 interface cards 510 in example, and it is related to the first order of the first switching fabric and the third level Connection, and 8 interface cards 516, it is associated with the second level of the first switching fabric.In other embodiments, can use with The different numbers for the interface card that the first switching fabric first order and the third level are associated and/or with the first switching fabric second level phase The different numbers of the interface card of association.
As shown in fig. 6, each input/output module 502 is operationally via a cable coupling of the first cable collection 540 It is connected to interface card 510.In certain embodiments, such as each of 8 interface cards 510 be operably coupled to 16 input/ Output module 502, such as here in greater detail.So, the first switching fabric part 571 can be coupled to 128 inputs/defeated Go out module (16 × 8=128).Each of 128 input/output modules 502 can be sent out to from the first switching fabric part 571 Send data and receive data.
Each interface card 510 is connected to each interface card 516 via midplane 550.So, each interface card 510 can send data and reception data to from each interface card 516, such as here in greater detail.Use midplane 550 Interface card 510 is connected into interface card 516 reduces the number of cable for connecting 571 grades of the first switching fabric part.
First interface card 510 ', midplane 550, and first interface card 516 ' is shown in greater detail in Fig. 7.Interface card 510 ' is associated with the first order of the first switching fabric part 571 and the third level, and interface card 516 ' and the first switching fabric The second level of part 571 is associated.Each interface card 510 is structurally and functionally similar with first interface card 510 '.Class As, each interface card 516 is structurally and functionally similar with first interface card 516 '.
First interface card 510 ' includes multiple cable connector ports 560, the first modular system 512, the second modular system 514, and multiple midplane connector ports 562.For example, Fig. 7 is shown with 16 cable connector ports 560 and 8 The first interface card 510 ' of midplane connector port 562.Each quilt of cable connector port 560 of first interface card 510 ' It is configured to receive the second end of the cable from the first cable collection 540.So, as described above, 8 interface cards 510 are on each 16 cable connector ports 560 be used for receive 128 cables (16 × 8=128).Although shown in the figure 7 have 16 Individual cable connector port 560, and in other embodiments, any number of cable connector port can be used, so that The cable connector port that each cable of the first cable collection can be transferred through in the first switching fabric is received.If for example, 16 interface cards are all used, then each interface card can include 8 cable connector ports.
The first modular system 512 and the second modular system 514 of first interface card 510 ' each include first exchange knot The module of the first order of structure part 571 and the module of the third level of the first switching fabric part 571.In certain embodiments, 16 electricity 8 cable connector ports of cable connector port 560 are operably coupled to the first modular system 512 and 16 cables connect Connect the cable connector port of device port 560 remaining 8 and be operably coupled to the second modular system 514.First modular system 512 and second modular system 514 can be operably coupled to interface card 510 ' 8 midplane connector ports 562 it is every One.
The first modular system 512 and the second modular system 514 of first interface card 510 ' are ASIC.First modular system 512 and second modular system 514 be identical ASIC example.So, due to independent ASIC multiple examples can be produced, manufacture Cost can be reduced.In addition, the module of the first order of the first switching fabric part 571 and the module of the first switching fabric third level are all It is included on each ASIC.
In certain embodiments, each midplane connector port in 8 midplane connector ports 562 has twice The data capacity of each cable connector port in 16 cable connector ports 560.So, 8 midplane connectors There are 16 data to send and the connection of 16 data receivers for each for port 562, rather than connect with the transmission of 8 data and 8 data receivers Connect.So, the bandwidth of 8 midplane connector ports 562 is equal to the bandwidth of 16 cable connector ports 560.In other realities Apply in example, there are each midplane connector port 32 data to send and the connection of 32 data receivers.In such embodiments, There are each cable connector port 16 data to send and the connection of 16 data receivers.
8 midplane connector ports 562 of first interface card 510 ' are connected to midplane 550.Midplane 550 by with It is set to and is connected to each interface card 510 associated with the third level with the first order of the first switching fabric part 571 and first Each associated interface card 516 of the second level of switching fabric part 571.So, midplane 550 ensures each interface card 510 each midplane connector port 562 is connected to the midplane connector port 580 of distinct interface card 516.Change sentence Talk about, two identical midplane connector ports of no interface card 510 are operably coupled to identical interface card 516. So, midplane 550 allows each interface card 510 to send data and reception to from any one in 8 interface cards 516 Data.
Although Fig. 7 shows the schematic diagram of first interface card 510 ', midplane 550 and first interface card 516 ', and one In a little embodiments, first interface card 510, midplane 550 and first interface card 516 are that physical location is analogous respectively to horizontal level Interface card 620, midplane 640 and upright position interface card 630, are described in further detail as illustrated in figs. 5-7 and herein.This Sample, the module associated with the first order and the module (on interface card 510) associated with the third level are located at the one of midplane Side, and the module (on interface card 516) associated with the second level is located at the opposite side of midplane 550.It is such topology allow with Each associated module of the first order is operably coupled to each module related to the second level, and with second level phase Each module closed is operably coupled to each module related to the third level.
First interface card 516 ' includes multiple midplane connector ports 580, the first modular system 518, and the second module System 519.Multiple midplane connector ports 580 are configured to send data to from any interface card 510 via midplane 550 With reception data.In certain embodiments, first interface card 516 ' includes 8 midplane connector ports 580.
The first modular system 518 and the second modular system 519 of first interface card 516 ' are operably coupled to first and connect Each midplane connector port 580 of mouth card 516 '.So, by midplane 550, with the first switching fabric part 571 The first order each modular system 512,514 associated with the third level is operably coupled to and the first switching fabric part Each associated modular system 518,519 of 571 second level.In other words, with the first order of the first switching fabric part 571 and Each related modular system 512,514 of the third level can to from associated with the second level of the first switching fabric part 571 Each modular system 518,519 sends data and receives data, and vice versa.Especially, with modular system 512 or 514 The associated module of the first order can send data to the module associated with the second level in modular system 518 or 519.Similarly, The module associated with the second level in modular system 518 or 519 can be to associated with the third level in modular system 512 or 514 Module sends data.In other embodiments, the module associated with the third level can be sent to the module associated with the second level Data and/or control signal, and the module associated with the second level can be sent to the module associated with the first order data and/ Or control signal.
In the first switching fabric part 571, each module of the first order has 8 input (that is, each interface cards 510 Two modules) embodiment in, the second level of the first switching fabric part 571 have at least eight module be used for the first switching fabric Part 571 with maintain can rearrange it is clog-free.So, the second level of the first switching fabric part 571 has at least eight mould Block is simultaneously clog-free by that can rearrange.In certain embodiments, the number of modules of twice second level be used to promote to exchange to tie Construction system 500 expands to 5 grades of switching fabrics from 3 grades of switching fabrics, as being described in further detail here.In such 5 grades friendships Change in structure, the exchange handling capacity of 2 times of second level in the three-level switching fabric of switching fabric system 500 is supported in the second level. For example, in certain embodiments, 16 modules of the second level can be used to promote exchange from three-level the future of switching fabric system 500 Structure extension is 5 grades of switching fabrics.
The first modular system 518 and the second modular system 519 of first interface card 516 ' are ASIC.First modular system 518 and second modular system 519 be identical ASIC example.In addition, in certain embodiments, with the first switching fabric part The first modular system 518 and the second modular system 519 that 571 second level are associated are to be equally used for and the first switching fabric part First modular system 512 of the first interface card 510 ' that 571 first order and the third level are associated and the second modular system 514 ASIC example.So, because individually ASIC multiple examples can be used for each module of the first switching fabric part 571 System, making expense can reduce.
In use, data are sent to via the first switching fabric part 571 from the first input/output module 502 Two input/output modules 502.First input/output module 502 is via the cable of the first cable collection 540 to the first switching fabric Part 571 sends data.Data are by the cable connector port 560 of one in interface card 510 ' and are sent to module system First order module in system 512 or 514.
A connector in the midplane that first order module in modular system 512 or 514 passes through interface card 510 ' Port 562, midplane 550 and one into interface card 516 ' transmission data, and forward the data to modular system 518 or Second level module in 519.Data enter interface card 516 ' by the midplane connector port 580 of interface card 516 '.Then Data are sent to the second level module in modular system 518 or 519.
Second level module determines how the second input/output module 502 connects and redirect data via midplane 550 and return Interface card 510 '.Because each modular system 518 or 519 is operably coupled to each module system on interface card 510 ' System 512 and 514, the second level module in modular system 518 or 519 can determine that in modular system 512 or 514 which the 3rd Level module is operably coupled to the second input/output module and correspondingly sends data.
Data are sent to the third level module in the modular system 512,514 on interface card 510 '.Third level module is right By from the first cable collection 540 cable by cable connector port 560 to input/output module 502 second input/it is defeated Go out module and send data.
In other embodiments, single second level module is sent data to instead of first order module, first order module will Data are divided into independent part (for example, cell) and to a part for each second level module forwards data, first order mould Block is operably coupled to second level module (for example, in this embodiment, each second level module receives one of data Point).Each second level module is it is then determined that several parts how the second input/output module is connected directional data of laying equal stress on are returned To single third level module.Third level module and then rebuild several parts of the data received and to the second input/output module Send data.
Fig. 8-10 is shown to be used to accommodate switching fabric (such as the first switching fabric as described above according to one embodiment Part 571) shell 600 (i.e. frame).Shell 600 includes overcoat 610, midplane 640, the and of interface card 620 of horizontal level The interface card 630 of upright position.Fig. 8 shows the front view of overcoat 610, wherein can see 8 water being deployed in overcoat 610 The interface card 620 that prosposition is put.Fig. 9 shows the rearview of overcoat 610, wherein can see 8 be deployed in overcoat 610 vertically The interface card 630 of position.
The interface card 620 of each horizontal level is operably coupled to each upright position via midplane 640 Interface card 630 (referring to Figure 10).Midplane 640 includes preceding surface 642, rear surface 644 and connects preceding surface 642 and rear surface 644 jack (receptacle) array 650, as described below.As shown in Figure 10, the interface card 620 of horizontal level includes multiple It is connected to the midplane connector port 622 of jack on the preceding surface 642 of midplane 640.Similarly, the interface card of upright position 630 include multiple midplane connectors 632 for being connected to jack on the rear surface 644 of midplane 640.In this way, by The plane that the interface card 620 of each horizontal level is defined and the plane phase defined by the interface card 630 of each upright position Hand over.
The interface card 620 that the jack 650 of midplane 640 is operatively coupled to each horizontal level arrives each vertical position The interface card 630 put.Jack 650 promotes the signal between horizontal level interface card 620 and upright position interface card 630 to transmit. In some embodiments, for example, jack 650 can be arranged to the midplane connector end that reception is placed on interface card 620,630 Many peg type connectors, the tolerable injury level of many peg type connectors (multiple pin-connector) on mouth 622,632 Blank pipe that positional interface card 620 is directly connected with upright position interface card 630, and/or it is configured to be operatively coupled to two to connect Other any devices of mouth card.Using such midplane 640, each horizontal level interface card 620 is operably coupled to Each upright position interface card 630, (for example, electron trajectory) is connected without the route on midplane.
Figure 10, which is shown, includes the midplane of all 64 jacks 650 in 8 × 8 arrays.In such embodiment In, 8 horizontal level interface cards 620 can be operably coupled to 8 upright position interface cards 630.In other embodiments, Any number of jack can be included in midplane and/or any number of horizontal level interface card can be by midplane by coupling It is connected to any number of upright position interface card.
If the first switching fabric part 571 is located in shell 600, for example, first with the first switching fabric part 571 Level and the third level be associated each interface card 510 can be horizontal level and with the first switching fabric part 571 second Each associated interface card 516 of level can be upright position.So, with the first order of the first switching fabric part 571 and Each associated interface card 510 of the third level can be easily connected to and the first switching fabric portion by midplane 640 Each interface card 516 for dividing 571 second level associated.In other embodiments, with the first switching fabric part first order and Each associated interface card of the third level is upright position and each associated with the first switching fabric part second level Interface card is horizontal level.In another embodiment, it is associated with the third level with the first order of the first switching fabric each Individual interface card can be that any angle of opposite shell is placed, and each associated with the second level of the first switching fabric Interface card can be orthogonal to the interface card associated with the third level with the first switching fabric part first order relative to shell The position of angle.
Figure 11 and 12 is to show the switching fabric 1100 respectively in the first configuration and the second configuration according to one embodiment Schematic diagram.Switching fabric 1100 includes multiple switching fabric systems 1108.
Each switching fabric system 1108 includes multiple input/output modules 1102, first the 1140, second electricity of cable collection Cable collection 1142, the first switching fabric part 1171 being deployed in shell 1170 and the second friendship being deployed in shell 1172 Change structure division 1173.Each switching fabric system 1108 is structurally and functionally similar.In addition, input/output module 1102nd, the first cable collection 1140 and the second cable collection 1142 are structurally and functionally analogous respectively to input/output module 202nd, the first cable collection 240 and the second cable collection 242.
When during switching fabric 1100 is configured first, the first switching fabric part of each switching fabric system 1108 1171 and second switching fabric part 1173 be functionally similar to above-mentioned the first switching fabric part 571 and the second switching fabric portion Divide 573.So, when during switching fabric 1100 is configured first, the first switching fabric part 1171 and the second switching fabric portion 1173 are divided to be used as self-existent three-level switching fabric to operate.Therefore, when during switching fabric 1100 is configured first, each Switching fabric system 1108 is not operably coupled to other switching fabrics as self-existent switching fabric system acting System 1108.
In the second configuration (Figure 12), switching fabric 1100 further comprises that the 3rd cable collection 1144 and multiple connections are exchanged Structure 1191, each in shell 1190.Shell 1190 can be similar to shell 600 detailed above.It is each Each switching fabric part 1171,1173 of individual switching fabric system 1108 is operatively coupled to via the 3rd cable collection 1144 To each connection switching fabric 1191.So, when during switching fabric 1100 is configured second, each switching fabric system 1108 are operably coupled to other switching fabric systems 1108 via connection switching fabric 1191.Therefore, in second configures Switching fabric 1100 be 5 grades of Clos (clo this) network.
3rd cable collection 1144 can by suitable for switching fabric part 1171,1173 and connection switching fabric 1191 it Between transmission data any materials composition.In certain embodiments, for example, each cable 1144 is made up of multifiber. In such embodiment, each cable 1144 can have 36 to send and 36 receive optical fiber.The 36 of each cable 1144 Root, which sends optical fiber, can include 32 optical fiber for being used to send data, and 4 are used for growth data capacity and/or for redundancy Optical fiber.Similarly, 36 root receiving fibers of each cable 1144 include 32 optical fiber for being used to send data, and 4 Optical fiber for growth data capacity and/or for redundancy.In other embodiments, arbitrary number can be included in each cable Optical fiber.By using the cable with increase number optical fiber, the number of cable used can be efficiently reduced.
As discussed above, flow control can be performed inside the switching fabric of such as data center.Figure 13 and 14 with And adjoint description, it is the schematic diagram for showing the flow control inside switching fabric.Especially, Figure 13 is according to an implementation Example shows the schematic diagram of the data traffic associated with switching fabric 1300.Shown switching fabric 1300 is similar in fig. 13 Shown switching fabric 400, and implementing in the data center of the data center 100 for example shown in Fig. 1 in Figure 4 A. In this embodiment, switching fabric 1300 is 3 grades of clog-free Clos (clo this) network and including the first order 1340, the second level 1342, and the third level 1344.The first order 1340 includes module 1312, and the second level 1342 includes module 1314, and the third level 1344 include module 1316.In certain embodiments, switching fabric 1300 can be the switching fabric and of Cell Switch Each module 1312 of one-level 1340 can be Cell Switch.Each module 1312 of the first order 1340 includes input Mouth collection 1360, data are received when being configured as data into switching fabric 1300.Each module 1316 of the third level 1344 Including output port 1362, it is configured as allowing data to leave switching fabric 1300.Each module 1316 of the third level 1344 Including equal number of output port 1362.
Each module 1314 of the second level 1342 is operably coupled to the first order by unidirectional data path 1320 1340 each module.It is every between each module of the first order 1340 and each module 1314 of the second level 1342 One unidirectional data path 1320 is configured as promoting data to be sent to the second level 1342 from the module 1312 of the first order 1340 Module 1314.Because data path 1320 is unidirectional, it does not promote data to be sent to from the module 1314 of the second level 1342 The module 1312 of the first order 1340.Such unidirectional data path 1320 relative to similar bi-directional data path spend it is less, Connect and be more easily performed using less data.
Each module 1316 of the third level 1344 is operably coupled to the second level by unidirectional data path 1324 1342 each module 1314.Each between the module 1314 of the second level 1342 and the module 1316 of the third level 1344 Unidirectional data path 1324 is configured as the module for promoting data to be sent to the third level 1344 from the module 1314 of the second level 1342 1316.Because data path 1324 is unidirectional, it does not promote data to be sent to second from the module 1316 of the third level 1344 The module 1314 of level 1344.As described above, such unidirectional data path 1324 is spent relative to similar bi-directional data path It is less, use less region.
Unidirectional data path 1320 between the module 1312 of the first order 1340 and the module 1314 of the second level 1342 and/ Or the unidirectional data path between the module 1314 of the second level 1342 and the module 1316 of the third level 1344 can be with any side Formula is constructed, and is configured as effectively promoting data to transmit.In certain embodiments, for example, data path is the light connects of intermodule Device.In other embodiments, data path is in midplane connector.Such midplane connector can be analogous to such as figure Midplane connector described in 8 to 10.Such midplane connector can be efficiently used for each mould of the second level Block is connected to each module of the third level.In other embodiments, module is comprised in single chip bag and unidirectional number It is electron trajectory according to path.
Each module 1312 of the first order 1340 be relative to the third level 1344 corresponding module 1316 physically close to 's.In other words, each module 1312 and the module 1316 of the third level 1344 of the first order 1340 are paired.For example, at some In embodiment, each module 1312 of the first order 1340 is with the module 1316 of the third level 1344 in identical chip bag.It is double To flow control path 1322 between each module 1312 of the first order 1340 and the corresponding module 1316 of the third level 1344 In the presence of.Flow control path 1322 allows the module 1312 of the first order 1340 to send stream to the corresponding module 1316 of the third level 1344 Amount control designator, vice versa.As being described in further detail here, this allow switching fabric arbitrary number of level operational blocks which partition system to It sends the module transmitted traffic control designator of data.In certain embodiments, bidirectional traffics control path 1322 is by two Single one-way flow control path is built.Two single one-way flow control paths allow flow to control designator first Pass through between 1340 module 1312 of level and the module 1316 of the third level 1344.
Figure 14 is the schematic diagram for showing flow control in switching fabric 1300 shown in fig. 13 according to one embodiment. Especially, schematic diagram shows the detailed view of the first row 1310 of switching fabric 1300 shown in Figure 13.The first row includes the first order 1340 module 1312 ', the module 1314 ' of the second level 1342, the module 1316 ' of the third level 1344.The module of the first order 1340 1312 ' include processor 1330 and memory 1332.Processor 1330 is configured as control and receives and send data.Memory 1332 modules 1314 ' for being configured as the second level 1342 can't receive data and/or the module 1312 ' of the first order 1340 is gone back Buffered data when can not send data.In certain embodiments, if for example, the module 1314 ' of the second level 1342 warp-wise The module 1312 ' of one-level 1340 have sent termination designator, then the buffered data of module 1312 ' of the first order 1340 is until the second level 1342 module 1314 ' can receive data.Similarly, in certain embodiments, when module 1312 ' is substantially simultaneously receiving many During individual data-signal (such as from multiple input ports), the module 1312 ' of the first order 1340 can buffered data.Implement such In example, if only one single data-signal can be by module 1312 ' in the given time (for example, each clock cycle) Export, then other data-signals received can be buffered.Similar to the module 1312 ' of the first order 1340, in switching fabric 1300 Each module include processor and memory.
The module 1312 ' of the first order 1340 and the module 1316 ' of the third level 1344 matched with it are all included in first On chip bag 1326.This allows the flow control between the module 1312 ' of the first order 1340 and the module 1316 ' of the third level 1344 Path 1322 processed is easily built.For example, flow control path 1322 can be in the module 1312 ' of the first order 1340 and the 3rd Track between the module 1316 ' of level on the first chip bag 1326.In other embodiments, the module of the first order and the third level Module is wrapped but very close to each other in independent chip, and its flow control path for still allowing for in-between need not make With substantial amounts of distribution and/or long track with regard to that can be established.
The module 1314 ' of the second level 1342 is included on the second chip bag 1328.In the module 1312 ' of the first order 1340 Unidirectional data path 1320 between the module 1314 ' of the second level 1342, and in the module 1314 ' of the second level 1342 and the 3rd First chip bag 1326 is operationally connected to the second chip by the unidirectional data path 1324 between the module 1316 ' of level 1344 Bag 1328.Although figure 14 illustrates the module 1312 ' of the first order 1340 and the module 1316 ' of the third level 1344 are not also Each module of the second level is connected to by unidirectional data path.As described above, unidirectional data path can be in any way Construction, is configured as effectively promoting data to transmit in intermodule.
Flow control path 1322 and unidirectional data path 1320,1324 can be effectively used in module 1312 ', Transmitted traffic controls designator between 1314 ', 1316 '.If for example, the positive second level of the module 1312 ' of the first order 1340 The data volume that 1342 module 1314 ' is sent in data and the buffer of the module 1314 ' in the second level 1342 has exceeded threshold value, Then the module 1314 ' of the second level 1342 can via the module 1314 ' in the second level 1342 and the third level 1344 module 1316' it Between module 1316' transmitted traffics from unidirectional data path 1324 to the third level 1344 control designator.Flow control is indicated The module 1316 ' of the symbol triggering third level 1344 is sent via flow control path 1322 to the module 1312 ' of the first order 1340 to flow Amount control designator.The flow sent from the module 1316 ' of the third level 1344 to the module 1312 ' of the first order 1340 controls to indicate Symbol triggers the module 1312 ' of the first order 1340 to stop sending data to the module 1314 ' of the second level 1342.Similarly, via The flow control that the module 1316 ' of three-level 1344 is sent from the module 1314 ' of the second level 1342 to the module 1312 ' of the first order 1340 Designator processed, asks to send data (that is, continuation hair from the module 1312 ' of the first order 1340 to the module 1314 ' of the second level 1342 Send data).
There is the two-stage switching fabric on chip in the identical chips bag of bidirectional traffics control path to minimize in-between The connection of independent chip parlor, the independent chip inclusion product is big and/or needs large volume.In addition, having in-between on chip Two-stage in the identical bag of bidirectional traffics control path, communication is controlled when providing the flow between sending module and receiving module During ability, it is allowed to which the data path between chip bag is unidirectional.It is related to bidirectional traffics control path in switching fabric More details entitled " Flow Control in a Switch Fabric (flow control) in switching fabric " and in It is described in the Copending U.S. Patent Application number 12/345490 that on December 29th, 2008 submits, it is drawn completely herein It is used as reference.
With reference to as described in Figure 13 and 14, buffer module can be included in the module in switching fabric level.Being related to can quilt Being included in the more details of the buffer module in such as switching fabric level will be described with reference to Figure 15.
Figure 15 is the schematic diagram for showing buffer module 1500 according to one embodiment.As shown in figure 15, data-signal S0 Received to SM at buffer module 1500 on the input side 1580 of buffer module 1500 (for example, by buffer mould The input port 1562 of block 1500).After the processing of buffer module 1500, data-signal S0 to SM is from buffer module 1500 Buffer module 1500 (for example, by output port 1564 of buffer module 1500) on outlet side 1585 is sent.Data Each in signal S0 to SM can define channel (can also be referred to as data channel).Data-signal S0 to SM can be collectively referred to as Data-signal 1560.Although the input side 1580 of buffer module 1500 and the outlet side 1585 of buffer module 1500 are shown in The different physical sides of buffer module 1500, the input side 1580 of buffer module 1500 and the outlet side of buffer module 1500 1585 by logical definition and are not excluded for the various physical configurations of buffer module 1500.For example, one of buffer module 1500 Or multiple input ports 1562 and/or one or more output ports 1564 can be physically located in any of buffer module 1500 Side (and/or phase homonymy).
Buffer module 1500 can be configured as processing data signal 1560 so as to by the data of buffer module 1500 The processing latency of signal 1560 can be relatively small and is basically unchanged.Therefore, because data-signal 1560 passes through buffer module 1500 are processed, and the bit rate of data-signal 1560 can be basically unchanged.For example, the data-signal S2 for passing through buffer module 1500 Processing latency can be the number of clock cycles (for example, single clock cycle, several clock cycle) being basically unchanged.Cause This, data-signal S2 can be the time migration by multiple clock cycle, and is sent to buffer module 1500 and inputs The data that the data-signal S2 of side 1580 bit rate sends the outlet side 1585 substantially and from buffer module 1500 are believed Number S2 bit rate is identical.
Buffer module 1500 can be configured to respond to one or more parts modification one of flow control signal 1570 The bit rate of individual or multiple data-signals 1560.For example, buffer module 1500 can be configured to respond to flow control signal 1570 part come postpone buffer module 1500 receive data-signal S2, the indicated number of flow control signal 1570 it is believed that Number S2 should be delayed by the specific time cycle.Especially, buffer module 1500 can be configured as storage (for example, holding) number It is believed that number S2 one or more parts indicate what data-signal S2 should be no longer delayed by until buffer module 1500 is received Designator (for example, part of flow control signal 1570).Therefore, it is sent to the input side 1580 of buffer module 1500 Data-signal S2 bit rate is different from the bit rate for the data-signal S2 that the outlet side 1585 from buffer module 1500 is sent (for example, substantially different).
In certain embodiments, it can deposited in the processing of buffer module 1500 based on for example variable-sized cell fragment Body is stored up to perform.For example, in certain embodiments, the fragment of cell can be included in buffer module 1500 by different Memory bank (for example, static random access memories (SRAM) memory bank) is processed during distribution is handled.Store physical efficiency common The shared storage buffer of definition.In certain embodiments, the fragment of data-signal can predefine mode during distribution is handled (such as with the predefined pattern according to predefined algorithm) is assigned to memory bank.For example, in certain embodiments, data-signal 1560 guiding fragment can be carried out in several parts (for example, particular bank of buffer module 1500) of buffer module 1500 Processing, the part is different from several parts of the tracking section (trailing segments) of the processing in buffer module 1500. In some embodiments, the section of data-signal 1560 can be handled in a particular order.In certain embodiments, for example, data-signal 1560 each fragment can be handled based on its respective position in cell.In cell fragment by shared Storage buffer it is processed after, cell section can be sorted and be postponed during the processing of restructuring and rush device module 1500 and send.
In certain embodiments, for example, the reading multiplexing module of buffer module 1500 can be configured as restructuring with The associated fragment of data-signal 1560 simultaneously sends (for example, transmission) data-signal 1560 from buffer module 1500.At restructuring Reason can be defined based on the predefined methodology for being used for the memory bank allocated segment to buffer module 1500.For example, reading Take frequency multiplexing technique module and can be configured as with polling mode (because fragment is write with polling mode) from guiding memory bank the One reads the guiding fragment associated with cell, and then from track memory bank with polling mode reading it is relevant with cell with Track fragment.Therefore, considerably less control signal, if any, need in write-in multiplexing module and read multiplexing Sent between module.It is related to the more details of fragment processing (for example, fragment distribution and/or fragment restructuring) entitled “Methods and Apparatus Related to Shared Memory Buffer for Variable-Sized Cells (method and apparatus for being related to the shared storage buffer for variable-sized cells) " was simultaneously submitted on March 31st, 2009 Copending U.S. Patent Application number 12/415517 described in, it is incorporated by reference completely herein.
Figure 16 A are, according to one embodiment, to be configured as the coordinating transmissions of switching fabric 1600 via exchcange core 1690 The entrance scheduler module 1620 of cell group and the schematic block diagram of outlet scheduler module 1630.Coordinate to include for example via exchange The scheduled transmission cell group of structure 1600, tracking are related to request and/or response of transmission cell group etc..Entrance scheduler module 1620 The entrance side and outlet scheduler module 1630 of switching fabric 1600, which can be included in, can be included in going out for switching fabric 1600 Mouth side.Switching fabric 1600 can include entrance level 1602, intergrade 1604, and export-grade 1606.In certain embodiments, exchange Structure 1600 can be based on Clos (clo this) network architecture (for example, clog-free Clos networks, proper clog-free Clos Network, Benes (David Barnes) network) it is defined, and switching fabric 1600 can include datum plane and control plane.In some realities Apply in example, switching fabric 1600 can be the core of data center's (not shown), it can include network or device interconnecting.
As shown in Figure 16 A, input rank IQ1 to IQK (being collectively referred to as entry queue 1610) can be located at switching fabric 1600 entrance side.Entry queue 1610 can be associated with the entrance level 1602 of switching fabric 1600.In certain embodiments, enter Mouth queue 1610 can be included in line card (line card).In certain embodiments, entry queue 1610, which can be located at, exchanges knot Outside structure 1600 and/or outside exchcange core 1690.Each entry queue 1610 can be FIFO (FIFO) type team Row.Although to show, but in certain embodiments, each entry queue IQ1 to IQK can be with input/output end port (example Such as, 10Gb/s ports) related (for example, unique related).In certain embodiments, each entry queue IQ1 to IQK can have Enough sizes are to implement Congestion Control Solution, and such as request authorizes Congestion Control Solution.For example, input rank IQK-1 can have There are enough sizes to hold cell (or cell group), until request authorizes congestion scheme for cell (or cell group) quilt Perform.
As shown in Figure 16 A, output port P1 to PL (being collectively referred to as output port 1640) can be located at switching fabric 1600 Outlet side.Output port 1640 can be related to the output stage 1606 of switching fabric 1600.In certain embodiments, output end Mouth 1640 can be referred to as destination port.
In certain embodiments, input rank 1610 can be included in one or more inputs positioned at switching fabric 1600 In input line card (not shown) outside level 1602.In certain embodiments, output port 1640 can be included in one or In multiple output line card (not shown) outside the output stage 1606 of switching fabric 1600.In certain embodiments, one Or multiple input ranks 1610 and/or one or more output ports 1640 can be included in one or many of switching fabric 1600 In individual level (for example, input stage 1602).In certain embodiments, output scheduling module 1620 can be included in one or more defeated Going out line card neutralization/or input scheduling module 1630 can be included in one or more input linear.In certain embodiments, with Each relevant line card (for example, output line card, inputs line card) of exchcange core 1690 can include one or more scheduling moulds Block (for example, output scheduling module, input scheduling module).
In certain embodiments, input rank 1610 and/or output port 1640 can be included in one or more be located at In gateway apparatus (not shown) between switching fabric 1600 and/or peripheral processor (not shown).One or more gateways At least a portion of device, switching fabric 1600 and/or peripheral processor energy common definition data center (not shown).One In a little embodiments, one or more gateway apparatus can be the edge device in the marginal portion of exchcange core 1690.One In a little embodiments, switching fabric 1600 and peripheral processor can be configured as based on different protocol processes data.For example, outer Enclosing processing unit can include, such as one or more to be configured as based on Ethernet protocol and can be the structure based on cell Switching fabric 1600 and communicate master device (for example, being configured as performing master device, the Wan Wei of one or more virtual resources Network server).In other words, one or more gateway apparatus can be provided to other devices being configured to via a protocol communication To the access of switching fabric 1600, the switching fabric can be configured as via another protocol communication.In certain embodiments, one Individual or multiple gateway apparatus can be referred to as access and exchange or network equipment.In certain embodiments, one or more gateway apparatus It can be configured as router, hub device, and/or network Biodge device.
In this embodiment, for example, input scheduling module 1630 can be configured as being defined on the letter of input rank IQ1 queuings The tuple GA and cell group GC queued up in input rank IQK-1.Cell group GA queues up in input rank IQ1 front portion, and believes Tuple GB queues up in input rank IQ1 after cell group GA.Because input rank IQ1 is fifo type queue, cell group GB It can not be sent via switching fabric 1600 until cell group GA is sent from input rank IQ1.GC is in input rank for cell group IQK-1 anterior queuing.
In certain embodiments, to be mapped to (for example, assigning to) one or more defeated for the part of input rank 1610 Exit port 1640.For example, input rank IQ1 to IQK-1 can be mapped to output port P1, so that all in input port 1Q1 It will all be dispatched to the IQK-1 cells 310 queued up by input scheduling module 1620 and be transferred to output port via switching fabric 1600 P1.Similarly, input rank IQK can be mapped to output port P2.The mapping can be stored in storage as such as inquiry table Device (for example, memory 1622), when scheduling (for example, request) transmission cell group, input scheduling module 1620 can access the inquiry Table.
In certain embodiments, one or more input ranks 1610 can be with priority valve (also known as transmission preferences weights) It is related.Input scheduling module 1620 can be configured as the transmission from the scheduling cells of input rank 1610 based on priority valve.For example, Because input rank IQK-1 can be associated with the priority valve higher than input rank IQ1, input scheduling module 1620 can by with It is set to the request cell group GC before request cell group GA is transferred to output port P1 and is transferred to output port P1.Priority valve energy It is defined based on service class (for example, service quality (QoS)).For example, in certain embodiments, different types of network service Amount can be associated from different service class (and different priority).For example, the storage traffic is (for example, reading and writing The traffic), inter-processor communication, media signaling, session layer signaling etc. it is each related at least one service class Connection.In certain embodiments, priority valve can be based on such as IEEE802.1qbb agreements, which define the flow based on priority Control strategy.
In certain embodiments, one or more input ranks 1610 and/or one or more output ports 1640 can be with It is suspended.In certain embodiments, one or more input ranks 1610 and/or one or more output ports 1640 can be temporary Stop so as to which cell will not be lost.If for example, output port P1 is temporarily unavailable, from input rank IQ1 and/or input rank The cell of IQK-1 transmission can be suspended, so that will not be because output port P1 is temporarily unavailable and loses in output port P1 cells Lose.In certain embodiments, one or more input ranks 1610 can be associated with priority valve.If for example, output end Mouthful P1 congestions, then can suspend from input rank IQ1 to output port P1 cell transmission, rather than from input rank IQK-1 It can be transmitted to output port P1 cell, because input rank IQK-1 can be with the priority valve phase higher than input rank 1Q1 Association.
Input scheduling module 1620 can be configured as exporting with (for example, be sent to signal and receive from it signal) and adjust Degree module 1630 exchange signal with coordinate via switching fabric 1600 to output port P1 transmit cell group GA, and coordinate via Switching fabric 1600 transmits cell group GC to output port P1.Because cell group GA will be sent to output port P1, the output Port P1 can be referred to as cell group GA destination port.Similarly, output port P1 can be referred to as cell group GB destination Port.As shown in Figure 16 A, cell group GA can be sent via transmission path 4112, and transmission path 4112 is different from sending cell Group GC transmission path 4114.
Cell group GA and cell group GB are by defining by input scheduling module 1620 based on the cell queued up in input rank IQ1 4110 definition.Especially, cell group GA can be based on coming from having public purpose port and with specific in input rank IQ1 Each cell is defined in the cell group GA of position.Similarly, cell group GC can be based on coming from having public purpose port It is defined with each cell in the cell group GC of ad-hoc location in input rank IQK-1.Although it is not shown, but In some embodiments, such as cell 4110 can be included in exchcange core 1690 from one or more peripheral processors (for example, individual People's computer, server, router, personal digital assistant (PDA)) via it is one or more can be wiredly and/or wirelessly The content (for example, packet) that network (for example, LAN (LAN), wide area network (WAN), virtual net) is received.It is related to definition letter The more details of tuple, such as cell group GA, cell group GB and/or cell group GC, discuss with reference to accompanying drawing 17 and 18.
Figure 16 B are to be shown to be related to the signaling process figure of the signaling of cell group GA transmission according to one embodiment.Such as Figure 16 B institutes Show, the time increases in the downstream direction.After cell group GA has been defined (as shown in fig. 16), input scheduling module 1620 can be configured as transmission request with scheduling cells group GA to transmit via switching fabric 1600;The request is asked as transmission 22 displays.Transmission request 22 can be defined as the destination port to cell group GA, i.e. output port P1 sends cell group GA's Request.In certain embodiments, cell group GA destination port, which can also be referred to as transmitting, asks 22 target (to be also known as mesh Mark destination port).In certain embodiments, transmission request 22 can be included via specific transmission path (such as in Figure 16 A Shown transmission path 4112) send cell group GA request by switching fabric 1600, or in special time.Input scheduling mould Block 1620 can be configured as transmission request 22 input scheduling module 1620 be defined it is rear to output scheduling module 1630 send transmission request 22.
In certain embodiments, transmission request 22 can exchanged before the outlet side of switching fabric 1600 is sent to The input side of structure 1600 is queued up.In certain embodiments, transmission request 22 can queue up until input scheduling module 1620 is triggered Send the outlet side that switching fabric 1600 is arrived in transmission request 22.In certain embodiments, because for from switching fabric 1600 The capacity for the transmission request that input side is sent is higher than threshold value, and input scheduling module 1620 can be configured as keeping (or triggering is kept) Transmission request 22 is in such as input transmission request queue (not shown).The threshold value can be based on the transmission via switching fabric 1600 Stand-by period is defined.
In certain embodiments, transmission request 22 can be arranged in the output queue (not shown) of the outlet side of switching fabric 1600 Team.In certain embodiments, output queue can be included in or beyond switching fabric 1600, or positioned at exchcange core 1690 In outer line card (not shown).Although it is not shown, in certain embodiments, transmission request 22 can with specific input rank Queued up at (for example, input rank IQ1) associated output queue or a part for output queue.In certain embodiments, often One output port 1640 can be related to output queue, output queue it is associated with the priority valve of input rank 1610 (for example, Corresponding to).For example, output port P1 can be associated with input rank IQ1 (it has specific priority valve) output team Arrange (or part of output queue) and the output queue associated with input rank IQK (it has specific priority valve) (or part of output queue) is associated.Therefore, input rank IQ1 queue up transmission request 22 can with input rank Output queue associated IQ1 is queued up.In other words, transmission request 22 can be at (outlet side of switching fabric 1600) and at least one The associated output queue of the priority valve of individual input rank 1610 is queued up.Similarly, transmission request 22 can be asked in input transmission Seek queue (not shown) or the part that inputs transmission queue associated with the priority valve of at least one input rank 1610 It is middle to queue up.
If output scheduling module 1630 determines cell group GA destination port (the output port P1 i.e. shown in Figure 16 A) Available for cell group GA is received, then output scheduling module 1630 can be configured as sending transmission response to input scheduling module 1620 24.Transmission response 24 can be for example, for that (will be sent for example, sending IQ1 from the input rank shown in Figure 16 A) to cell The mandate for the cell group GA that group GA destination port is sent.Transmission mandate can be referred to as by sending the mandate of cell group.In some realities Apply in example, cell group GA and/or input rank IQ1 can be referred to as the target of transmission response 24.In certain embodiments, process is worked as When the transmission of switching fabric 1600 is substantially authorized to, for example, because when destination port is available, for by letter to be sent Tuple GA mandate can be awarded.
In response to transmission response 24, input scheduling module 1620 can be configured as laterally handing over from the input of switching fabric 1600 The outlet side for changing structure 1600 sends cell group GA via switching fabric 1600.In certain embodiments, transmission response 24 can be wrapped Include via particular transmission path (such as the transmission path 4112 shown in Figure 16 A) by switching fabric 1600, or when specific Between send cell group GA instruction.In certain embodiments, the instruction can be defined based on such as routing policy.
As shown in fig 16b, transmission request 22 includes cell quantitative value 30, destination mark symbol (ID) 32, queue identifier (ID) 34, queue sequential value (SV) 36 (it can be collectively referred to as asking label).Cell quantitative value 30, which can embody, is included in letter Cell quantity in tuple GA.For example, in this embodiment, cell group GA includes the individual cell in seven (7) (shown in Figure 16 A). Destination mark symbol 32 can represent cell group GA destination port can be by output scheduling module so as to transmit the target of request 22 1630 determine.
Cell quantitative value 30 and destination mark symbol 32 can be output scheduler module 1630 and use with scheduling cells group GA warps Transmitted from switching fabric 1600 to output port P1 (shown in Figure 16 A).As shown in fig 16b, in this embodiment, because being included in Cell quantity in cell group GA can be in cell group GA purpose location port (for example, output port P1 shown in Figure 16 A) Processed (for example, can be received), output scheduling module 1630 can be configured as defining and sending transmission response 24.
In certain embodiments, the destination port if as cell group GA is unavailable (for example, in down state In, in congestion state), be included in cell quantity in cell group GA can not cell group GA destination port (for example, figure Output port P1 shown in 16A) processed (for example, can not be received), then output scheduling module 1630 can be configured as not Input scheduling module 1620 is arrived available for communication.In certain embodiments, for example, output scheduling module 1630 can be configured as Refusal sends cell group GA request (not shown) via switching fabric 1600 when cell group GA destination port is unavailable.Pass The refusal of defeated request 22 can be referred to as transmission refusal.In certain embodiments, transmission refusal can include responsive tags.
In certain embodiments, such as output port P1 (shown in Figure 16 A) available or unavailable energy is by output scheduling Condition of the module 1630 based on satisfaction is determined.For example, condition can relate to exceed the queue associated with output port P1 (not in figure Shown in 16A) storage limitation, the data traffic speed via output port P1, get out scheduling and be used for from input rank 1610 cell quantity transmitted via switching fabric 1600 (shown in Figure 16 A) etc..In certain embodiments, output port is worked as When P1 is disabled, output port P1 is not useable for receiving cell via switching fabric 1600.
As shown in fig 16b, queue identifier 34 and queue sequential value 36 are sent to output scheduling in transmission request 22 Module 1630.Queue identifier 34 can represent and/or can be used to identify what (for example, being separately identified) cell group GA queued up wherein Input rank IQ1 (shown in Figure 16 A).Queue sequential value 36 can represent cell group GA relative to other letters in input rank IQ1 The position of tuple.For example, cell group GA can and cell group GB associated with queue sequential value x (in input as shown in fig. 16 Queued up at queue IQ1) can be associated with queue sequential value Y.Queue sequential value x can indication information element group GA will with queue sequence Sent before cell group GB related value Y from input rank IQ1.
In certain embodiments, from the scope of the queue sequential value associated with input rank IQ1 (shown in Figure 16 A) Select queue sequential value 36.The scope of queue sequential value can be defined to come from the sequential value pair in queue sequential value scope Do not repeated within the specific period in input rank IQ1.For example, the scope of queue sequential value can be defined to come from team Queue sequential value in row sequential value scope is not repeated within least one period, and the time cycle is needed by exchcange core 1690 (shown in Figure 16 A) remove several cell cycles (for example, cell 160) that some queue up in input rank IQ1.One In a little embodiments, queue sequential value can be increased (in the range of queue sequential value) and with being based on by input scheduling module 1620 Each cell group that the cell 4110 that input rank IQ1 queues up is defined is associated.
In certain embodiments, the queue sequential value scope associated with input rank IQ1 can with input rank 1610 Another associated queue sequential value overlapping ranges of (shown in Figure 16 A).Therefore, queue sequential value 36, even if from The not exclusive scope of queue sequential value, can also be included (including e.g., including) queue identifier 34 (it can be unique) with Unique mark cell group GA (at least during the specific period).In certain embodiments, queue sequential value 36 is exchanging knot It is unique or global unique value (GUID) (for example, universal unique identifier (UUID)) in structure 1600.
In certain embodiments, input scheduling module 1620 can be configured as waiting associated with cell group GB to define Transmission request (not shown).For example, input scheduling module 1620 can be configured as waiting until transmission request 22 is sent or waited Treat until the response (for example, transmission response 24, transmission are refused) in response to transmission request 22 is associated with cell group GB in definition Transmission request before received.
As shown in fig 16b, output scheduling module 1630 can be configured as including queue identifier 34 in transmission response 24 With queue sequential value 36 (it can be collectively referred to as responsive tags).When transmission response 24 is received in input scheduling module 1620 When, queue identifier 34 and queue sequential value 36 can be included in transmission response 24, so that transmission response 24 can be with inputting The cell group GA of scheduler module 1620 is associated.Especially, queue identifier 34 and queue sequential value 36 can be used for jointly by Cell group GA is designated mandate and transmitted via switching fabric 1600.
In certain embodiments, output scheduling module 1630 can be configured as delay and send the biography for corresponding to transmission request 22 Defeated response 24.In certain embodiments, if output scheduling module 1630 can be configured as such as cell group GA purpose ground terminal Unavailable (for example, interim unavailable) the then delay response of mouth (the output port P1 i.e. shown in Figure 16 A).In some embodiments In, output scheduling module 1630 can be configured to respond to output port P1 and change into upstate transmission biography from down state Defeated response 24.
In certain embodiments, output scheduling module 1630 can be configured as because cell group GA destination port (i.e. Output port P1 shown in Figure 16 A) data are received from another input rank 1610, and postpone to send transmission response 24.Example Such as, because output port P1 receives different cell group (not shown), output from such as input rank IQK (shown in Figure 16 A) Port P1 is not useable for receiving data from input rank IQ1.In certain embodiments, based on input rank IQ1 and input team The associated priority valves of IQK are arranged, the cell group from input rank IQ1 can be with having than the cell group from input rank IQK There is higher priority valve.Output scheduling module 1630 can be configured as delay and send the period of transmission response 24 1, the time Section is based on for example in the size calculating of the output port P1 different cell groups received.For example, output scheduling module 1630 can by with It is set to complete to postpone in the processing of output port P1 different cell groups to send 24 1 expeced times of transmission response, The target of transmission response 24 schedules cell group GA.In other words, output scheduling module 1630 can be configured as being based on output port P1 The predetermined time delay changed from down state to upstate sends the transmission response 24 that target schedules cell group GA.
In certain embodiments, because at least a portion transmission paths for being sent by it of cell group GA are (such as in Figure 16 A Shown transmission path 4112) unavailable (for example, congestion), output scheduling module 1630 can be configured as delay send transmission ring Answer 24.Output scheduling module 1630 can be configured as delay and send transmission response 24 until the fractional transmission path no longer congestion, Or the scheduled time based on the fractional transmission path no longer congestion.
As shown in fig 16b, cell group GA can be sent to cell group GA mesh based on (for example, in response to) transmission response 24 Ground port.In certain embodiments, cell group GA can be sent out based on one or more instructions being included in transmission response 24 Send.For example, in certain embodiments, cell group GA can be based on being included in transmission via transmission path 4112 (shown in Figure 16 A) Response 24 in instruction, or based on it is one or more be used for via switching fabric 1600 cell group transmit rule (for example, with In the rule transmitted via the cell group that can recombinate switching fabric) sent.Although it is not shown, in certain embodiments, After cell group GA is in output port P1 (shown in Figure 16 A) by reception, the content from cell group is (for example, data Packet) can via it is one or more can be that network (for example, LAN, WAN, virtual net) wiredly and/or wirelessly is sent to one Individual or multiple network entities (for example, personal computer, server, router, PDA).
Referring again to Figure 16 A, in certain embodiments, cell group GA sent via transmission path 4112 and compared to The relatively small output queue (not shown) of such as input rank 1610 is received.In certain embodiments, output queue (or output A part for queue) can be relevant with priority valve.Priority valve can be associated with one or more input ranks 1610.Output is adjusted Degree module 1630 can be configured as extracting cell group GA from output queue and can be configured as sending cell group to output port P1 GA。
In certain embodiments, when cell group GA is sent to the outlet side of switching fabric 1600, cell group GA is adjoint Being included in the response identifier in cell group GA can be extracted by input scheduling module 1620 and be sent to output port P1 together.Ring Identifier is answered to be defined and be included in transmission response 24 in output scheduling module 1630.In certain embodiments, if Cell group GA is queued up in the output queue (not shown) associated with cell group GA destination port, then response identifier can be used In extracting cell group GA from cell group GA destination port, so that cell group GA can be from switching fabric 1600 via cell group GA Destination port sent.Response identifier can be associated with the position in output queue, and the output queue is via defeated Go out the queuing that scheduler module 1630 is cell group GA to retain.
In certain embodiments, when transmission request (such as transmission request Figure 16 B shown in associated with cell group 22) when being defined, the cell group queued up in input rank 1610 can be moved to memory 1622.For example, in input rank IQK The cell group GD of queuing can be defined in response to the transmission request associated with cell group GD and be moved to memory 1622. In some embodiments, cell group GD can be adjusted in the transmission request associated with cell group GD from input scheduling module 1620 to output Degree module 1630 is moved to memory 1622 before sending.Cell group GD can be stored in memory 1622, until cell The outlet side of group GD from the lateral switching fabric 1600 of input of switching fabric 1600 is sent.In certain embodiments, cell group energy Memory 1622 is moved to, so as to reduce the congestion (for example, the end of a thread (HOL) blocks) at input rank IQK.
In certain embodiments, input scheduling module 1620 can be configured as based on the queue identity associated with cell group Symbol and/or queue sequential value extract the cell group being stored in memory 1622.In certain embodiments, cell is in memory Cell group position in 1622 can be determined based on inquiry table and/or index value.Cell group can be in cell group by from switching fabric The outlet side of the 1600 lateral switching fabric 1600 of input is extracted before sending.For example, cell group GD energy and queue identifier And/or queue sequential value is relevant.The position that cell group GD is stored in memory 1622 can be with queue identifier and/or queue Sequential value is associated.The transmission request for being defined from input scheduling module 1620 and being sent to output scheduling module 1630 can include team Column identifier and/or queue sequential value.From output scheduling module 1630 receive transmission response can include queue identifier and/or Queue sequential value.In response to transmission response, input scheduling module 1620 can be configured as based on queue identifier and/or queue Cell group GD is extracted in the position of sequential value from memory 1622, and input scheduling module 1620 can trigger cell group GD biography It is defeated.
In certain embodiments, some cell numbers being included in cell group can be based on available in memory 1622 Amount of space is defined.For example, input scheduling module 1620 can be configured as being based on being included in storage when cell group GD is defined Amount of available storage space in device 1622 determines the cell quantity being included in cell group GD.In certain embodiments, if bag The amount of available storage space included in memory 1622 increases, then the cell quantity being included in cell group GD can increase.One In a little embodiments, cell group GD be moved to memory 1622 be used for store before or after, be included in cell group GD Cell quantity can be increased by input scheduling module 1620.
In certain embodiments, being included in the quantity of some cells in cell group can be based on passing through such as switching fabric The stand-by period of 1600 transmission is defined.Especially, in view of the stand-by period associated with switching fabric 1600, input scheduling Module 1620 can be configured as defining the size of cell group promoting flow to pass through switching fabric 1600.For example, because cell group The threshold size that the stand-by period based on switching fabric 1600 defines is reached, input scheduling module 1620 can be configured as closing Close cell group (for example, defining the size of cell group).In certain embodiments, input scheduling module 1620 can be configured as immediately The packet in cell group is sent, bigger cell group is defined without being to wait for other packet, because by handing over The stand-by period for changing structure 1600 is short.
In certain embodiments, to be configured as limitation lateral from the input of switching fabric 1600 for input scheduling module 1620 The quantity for the transmission request that the outlet side of switching fabric 1600 is sent.In certain embodiments, the limitation can be defeated based on being stored in The strategy for entering scheduler module 1620 is defined.In certain embodiments, the limitation can be based on and one or more input ranks 1610 associated priority valves are defined.For example, input scheduling module 1620 can be configured as allowing and (being based on threshold restriction) The transmission request associated with input rank IQ1 is more than the transmission request from input rank IQK, because input rank IQ1 has There is the priority valve higher than input rank IQK.
In certain embodiments, one or more parts of input scheduling module 1620 and/or output scheduling module 1630 Can be hardware based module (for example, DSP, FPGA) and/or module based on software (for example, computer code module, energy The processor readable instruction sets performed on a processor).In certain embodiments, with input scheduling module 1620 and/or output The associated one or more functions of scheduler module 1630 can be included in different modules and/or be combined into one or more Module.For example, cell group GA can be in input scheduling module 1620 the first submodule define and transmit request 22 (Figure 16 B It is shown) the second submodule that can be in input scheduling module 1620 defines.
In certain embodiments, switching fabric 1600 has than more or less levels shown in Figure 16 A.In some realities Apply in example, switching fabric 1600 can be that the switching fabric and/or time division multiplexing of reconfigurable (for example, can recombinate) exchange knot Structure.In certain embodiments, switching fabric 1600 can be based on Clos (clo this) network architecture (for example, on stricti jurise Clog-free Clos (clo this) network, Benes (David Barnes) network) be defined.
Figure 17 is to be shown to queue up at the input rank 1720 positioned at the input side of switching fabric 1700 according to one embodiment Two cell groups schematic block diagram.Cell group is defined by input scheduling module 1740 on the input side of switching fabric 1700, Switching fabric 1700 can be for example associated with exchcange core and/or be included in exchcange core for example shown in Figure 16 A In.Input rank 1720 is also on the input side of switching fabric 1700.In certain embodiments, input rank 1720 can be included In the input line card (not shown) associated with switching fabric 1700.Although it is not shown, but in certain embodiments, one Or multiple cell groups can include multiple cells (for example, 25 cells, 10 cells, 100 cells) or only one cell.
As shown in figure 17, input rank 1720 includes cell 1 to T (i.e. cell 1 arrives cell T), and it can the row of being collectively referred to as Team's cell 1710.Input rank 1720 is fifo type queue, and cell 1 is located at front end 1724 (or transmission end) and the letter of queue First T is located at the rear end 1722 (or arrival end) of queue.As shown in figure 17, the queuing cell 1710 at input rank 1720 includes First cell group 1712 and the second cell group 1716.In certain embodiments, each cell of queuing cell 1710 is come from With equal length (for example, 32 bit lengths, 64 bit lengths).In certain embodiments, two in queuing cell 1710 It is individual or more to have different length.
Come from queuing cell 1710 each cell have be to by come from queuing cell 1710 each letter One-output end in four output ports 1770 that output port label (for example, alphabetical " E ", alphabetical " F ") in member is indicated Mouth E, output port F, output port G or the transferring queued contents of output port H.The output port 1770 that cell is sent to Destination port can be referred to as.Queuing cell 1710 each its corresponding purpose can be sent to via switching fabric 1700 Ground port.In certain embodiments, input scheduling module 1740 can be configured as based on the same inquiry table of such as routing table (LUT) determine for the destination port for each cell for coming from queuing cell 1710.In certain embodiments, come from The destination port of each cell of queuing cell 1710 can the purpose based on the content (for example, data) being included in cell Ground is determined.In certain embodiments, one or more output ports 1770 can be associated with output queue, believes in output queue Member can queue up to be sent until via output port 1770.
First cell group 1712 and the second cell group 1716 can be by input scheduling modules 1740 based on queuing cell 1710 Destination port is defined.As shown in figure 17, each cell being included in the first cell group 1712 has by output port The identical destination port (that is, output port E) that label " E " is indicated.Similarly, it is included in each in the second cell group 1716 Individual cell has the identical destination port (that is, output port F) indicated by output port label " F ".
Cell group (for example, first cell group 1712) can be defined based on destination port, because cell group is via exchange Structure 1700 is sent as group.For example, if cell 1 were included in the first cell group 1712, the first cell group 1712 Single destination port can not be sent to, because cell 1 has and cell 2 to the different mesh of cell 7 (output port " E ") Ground port (output port " F ").So, the first cell group 1712 is transmitted not via switching fabric 1700 as group.
Cell group is defined as continuous block of cells because cell group via switching fabric 1700 as group by transmission And because input rank 1720 is the queue of fifo type.For example, cell 12, and cell 2 cannot function as cell group to cell 7 It is defined, because cell 12 can not together be sent with the block of cells of cell 2 to cell 7.Cell 8 to cell 11 be between Cell, its in cell 2 to cell 7 after input rank 1720 is sent, but in cell 12 from the quilt of input rank 1720 It must be sent before transmission from input rank 1720.In certain embodiments, if input rank 1720 is not fifo type Queue, one or more queuing cells 1710 may be sent out of order and group may span across cell between.
Although it is not shown, but come from queuing cell 1710 each cell can have can be referred to as sequence of cells The sequential value of value.Sequence of cells value can represent such as order of the cell 2 relative to cell 3.Sequence of cells value can be used for for example One or more output ports 1770 are reset column cell from output port 1770 in the content associated with cell before sending. For example, in certain embodiments, cell group 1712 can be received simultaneously in the output queue (not shown) associated with output port E Based on sequence of cells value permutatation.In certain embodiments, output queue can compared to input rank 1720 it is relatively small (for example, Shallow (shallow) output queue).
In addition, the data (for example, packet) being included in cell can also have the sequence for being referred to as data sequence value Value.For example, data sequence value can represent relative ranks of such as the first packet relative to the second packet.Data sequence Value can be used at for example one or more output ports 1770 in packet from output port 1770 by weight before sending Data packets.
Figure 18 is that had to arrange at the bright input rank 1820 positioned at the input side of switching fabric 1800 according to another embodiment The schematic block diagram of two cell groups of team.Cell group is defined by input scheduling module 1840 on the input side of switching fabric 1800, Switching fabric 1800 can be for example associated with exchcange core and/or be included in exchcange core as shown in Figure 16 A.It is defeated Enqueue 1820 is also on the input side of switching fabric 1800.In certain embodiments, input rank 1820 can be included in In the associated input line card (not shown) of switching fabric 1800.Although it is not shown, but in certain embodiments, one or many Individual cell group can include only one cell.
As shown in figure 18, input rank 1820 includes cell 1 to Z (i.e. cell 1 arrives cell Z), and it is collectively referred to as queuing up Cell 1810.Input rank 1820 is fifo type queue, and wherein cell 1 is in the front end 1824 (or transmission end) of queue and letter First Z is in the rear end 1822 (or arrival end) of queue.As shown in figure 18, the queuing cell 1810 at input rank 1820 includes the One cell group 1812 and the second cell group 1816.In certain embodiments, each cell from queuing cell 1810 has Equal length (for example, 32 bit lengths, 64 bit lengths).In certain embodiments, two or more queuing cells 1810 With different length.In this embodiment, input rank 1820 is mapped to output port F2 so as to all cells 1810 Dispatched by input scheduling module 1840 for being transferred to output port F2 via switching fabric 1800.
Coming from each cell of queuing cell 1810 has with one or more packets (for example, ether netting index According to packet) associated content.The packet is represented by alphabetical " Q " to " Y ".For example, as shown in figure 18, packet R quilts It is divided into three different cells, cell 2, cell 3 and cell 4.
Cell group (for example, first cell group 1812) is defined, so that partial data packet is not associated to different cells Group.In other words, cell group is defined, so that all packets are all associated with single cell group.The border of cell group Border based on the packet queued up at input rank 1820 is defined, so that packet is not included in different letters In tuple.Fragment data packets are that different cell groups may cause undesirable result, for example, exported in switching fabric 1800 The buffering of side.If for example, packet T Part I (such as cell 6) be included in the first cell group 1812 and Packet T Part II (such as cell 7) is included in the second cell group 1816, then packet T Part I must It must be buffered at least a portion in one or more output queue (not shown) of the outlet side of switching fabric 1800, until number The outlet side of switching fabric 1800 is sent to according to packet T Part II, so that all data packets T is passed through from switching fabric 1800 Sent by output port E2.
In certain embodiments, sequential value can also be had by being included in the packet in queuing cell 1810, and it is referred to as Data sequence value.Data sequence value can represent such as relative ranks of the packet R relative to packet S.Data sequence value It can be used in packet from output port 1870 by before sending, be recombinated at for example one or more output ports 1870 Packet.
Figure 19 is to show the method flow diagram via the transmission of switching fabric scheduling cells group according to one embodiment.Such as Figure 19 Shown, 1900, cell is queued up at input rank to be received for the designator of transmission via switching fabric.In some implementations In example, switching fabric can be based on Clos (clo this) architecture, and can have multistage.In certain embodiments, exchange Structure can be associated with exchcange core (for example, within).In certain embodiments, when new cell is received in input rank When, or when getting out (or being ready to) when cell and being sent via switching fabric at once, designator can be received.
1910, the cell group with common purpose ground is defined according to the cell queued up at input rank.Come from The destination of each cell of cell group is determined based on inquiry table.In certain embodiments, destination be based on strategy and/or It is determined based on packet sorting algorithm.In certain embodiments, can be related to switching fabric importation common purpose The common purpose of connection ground port.
1920, request label is related to cell group.Label is asked to include for example, one or more cell quantity Value, destination mark symbol, queue identifier, queue sequential value etc..Cell group be sent to switching fabric input side it Before, request label can be associated with cell group.
1930, including the transmission request of label is asked to be sent to output scheduling module.In certain embodiments, transmit Request is included in special time or the request sent via particular transmission path.In certain embodiments, transmission request can be Cell group is sent after being already stored in the memory associated with switching fabric input stage.In certain embodiments, Cell group can be moved to memory to reduce the possibility of the congestion at input rank.In other words, cell group can be moved To memory so as to which other cells queued up after cell group can be prepared for from the transmission (or transmission) at input rank, Without waiting cell group to be sent at input rank.In certain embodiments, transmission request can be sent to specific defeated The request of exit port (for example, specific destination port).
1950, asked when in response to transmission, via the transmission of switching fabric when 1940 are not authorized to, including response The transmission refusal of label is sent to input scheduling module.In certain embodiments, transmission request can be rejected, because exchanging Structure congestion, destination port are unavailable etc..In certain embodiments, transmission request can be rejected a specific time Section.In certain embodiments, responsive tags be able to can be used to transmit what refusal was associated with cell group including one or more Identifier.
If be authorized to 1940 via the transmission of switching fabric, 1960, including to the response of input scheduling module The transmission response of label is sent.In certain embodiments, transmission response can be that transmission is authorized.In certain embodiments, pass Defeated response can be ready to be sent (or being ready to) receives cell group at once after in the destination of cell group.
1970, cell group is extracted based on responsive tags.If cell group has been moved to memory, cell group It can be extracted from memory.If cell group is queued up at input rank, cell group can be extracted from input rank.Cell Group can be extracted based on the queue identifier and/or queue sequential value being included in responsive tags.Queue identifier and/or team Row sequential value may be from queue label.
1980, cell group can be sent via switching fabric.Cell group can be according to the instruction being included in transmission response Sent via switching fabric.In certain embodiments, cell group can be in the specific time and/or via specific transmission path Sent.In certain embodiments, cell group can be sent via switching fabric to the destination of such as output port.In some realities Apply in example, via switching fabric by send after, cell group can with the destination of cell group (for example, destination port) phase Queued up at the output queue of association.
Figure 20 is to show the signaling process figure that the request sequence value associated with transmission request is handled according to one embodiment. As shown in figure 20, transmission request 52 is sent in switching fabric from the input scheduling module 2020 on switching fabric input side Output scheduling module 2030 on outlet side.Transmission request 56 is after transmission request 52 is sent from input scheduling module 2020 It is sent to output scheduling module 2030.As shown in figure 20, transmission request 54 is sent from input scheduling module 2020, but not Received by output scheduling module 2030.Transmission request 52, transmission request 54 and transmission request 56 are each inputted with identical Queue IQ1 is associated, as indicated by its corresponding queue identifier, and relevant with identical destination port EP1, such as its phase The destination mark symbol answered is indicated.Transmission request 52, transmission request 54 and transmission request 56 can be collectively referred to as transmission request 58.As shown in figure 20, the time increases in the downstream direction.
As shown in figure 20, each transmission request 58 may include request sequence value (SV).Request sequence value can represent transmission Ask the sequence relative to other transmission requests.In this embodiment, request sequence value may be from and destination port EP1 phases The scope of the request sequence value of association, and increased by numerical order in the form of full integer.In certain embodiments, sequence is asked Train value can be such as string (strings), and (for example, opposite numerical order) can increase in a different order.Transmission please Asking 52 includes request sequence value 5200, and transmission request 54 includes request sequence value 5201, and transmission request 56 includes request sequence Train value 5202.In this embodiment, request sequence value 5200 indicate transmission request 52 transmission request 54 before be defined and Sent, transmission request 54 has request sequence value 5201.
Output scheduling module 2030 can determine the biography from the transmission request of input scheduling module 2020 based on request sequence value It is defeated to have failed.Especially, output scheduling module 2030 can determine that the transmission request associated with request sequence value 5201 Do not received in transmission request 56 before receiving, transmission request 56 is relevant with request sequence value 5202.In certain embodiments, Exceed threshold time period when the period (being shown as the period 2040) between the reception in transmission request 52 and transmission request 56 When, output scheduling module 2030 can perform the action of the transmission request 54 on loss.In certain embodiments, output scheduling mould Block 2030 can request that input scheduling module 2020 retransmits transmission request 54.Output scheduling module 2030 may include the request sequence lost Train value, so that input scheduling module 2020 can recognize that transmission request 54 is not received.In certain embodiments, output scheduling module 2030 can refuse to be included in the request for transmitting cell group in transmission request 56.In certain embodiments, output scheduling mould Block 2030 can be configured as based on queue sequential value in the way of being substantially similar to and be described method together with request sequence value Processing and/or response transmission request (such as transmission request 58).
Figure 21 is the signaling process figure for showing the response sequence value relevant with transmission response according to one embodiment.Such as Figure 21 Shown, transmission response 62 is sent to the defeated of switching fabric input side from the output scheduling module 2130 on switching fabric outlet side Enter scheduler module 2120.Transmission response 66 is sent to input from output scheduling module 2130 after transmission response 62 is sent and adjusted Spend module 2120.As shown in figure 21, transmission response 64 is sent from output scheduling module 2130, but not by input scheduling module 2120 receive.Transmission response 62, transmission response 64 and transmission response 66 and identical by its correspondingly queue identifier indicate Input rank IQ2 is associated.Transmission response 62, transmission response 64 and transmission response 66 can be collectively referred to as transmission response 68.Such as Shown in Figure 21, the time increases in the downstream direction.
As shown in figure 21, each transmission response 68 may include response sequence value (SV).Response sequence value can represent relative In the transmission response sequence of other transmission responses.In this embodiment, response sequence value may come from and input rank IQ2 phases The scope of the response sequence value of association, and increased according to numerical order in the form of full integer.In certain embodiments, respond Sequential value can for example be gone here and there, and can increase in a different order (for example, reversely numerical order).Transmission response 62 may include Response sequence value 5300, transmission response 64 includes response sequence value 5301, and outflow response 66 includes response sequence value 5302. In the embodiment, response sequence value 5300 indicates the quilt before the transmission response 64 with corresponding sequence value 5301 of transmission response 62 Definition and transmission.
Input scheduling module 2120 can determine the biography from the transmission response of output scheduling module 2130 based on response sequence value It is defeated to have failed.Especially, input scheduling module 2120 can determine that the transmission response associated with response sequence value 5301 Do not received in transmission response 66 before receiving, transmission response 66 is associated with response sequence value 5302.In some embodiments In, when the period (being shown as the time cycle 2140) between the reception in transmission response 62 and transmission response 66 exceeding threshold value Between the cycle when, input scheduling module 2120 can perform the action of the transmission response 64 on loss.In certain embodiments, input Scheduler module 2120 can request that output scheduling module 2130 retransmits transmission response 64.Input scheduling module 2120 may include what is lost Response sequence value, so that output scheduling module 2130 can recognize that transmission response 64 is not received.In certain embodiments, when with biography When the associated transmission response of defeated request is not received within the specific time cycle, input scheduling module 2120 can dropped cell Group.
Figure 22 is the multistage schematic block diagram for showing the controllable queue of flow according to one embodiment.As shown in figure 22, first The source that the level sending side of queue 2210 and the sending side of second level queue 2220 are included on the sending side of physical link 2200 is real In body 2230.The receiving side of first order queue 2210 and the receiving side of second level queue 2220 are included in physical link 2200 and connect Receive in the destination entity 2240 on side.Source entity 2230 and/or destination entity 2240 can be any type of computing device (examples Such as, a part for exchcange core, peripheral processor), it can be configured as receiving and/or sending via physical link 2200 Data.In certain embodiments, source entity 2230 and/or destination entity 2240 can be associated with data center.
As shown in figure 22, first order queue 2210 is included in transmit queue A1 to the A4 on the sending side of physical link 2200 (being referred to as first order transmit queue 2234) and receiving queue D1 to the D4 in the receiving side of physical link 2200 (are referred to as the first order Receiving queue 2244).The transmit queue B1 and B2 that second level queue 2220 is included on the sending side of physical link 2200 (are referred to as the Two grades of transmit queues 2232) and the receiving side of physical link 2200 on receiving queue C1 and C2 (be referred to as second level receiving queue 2242)。
Can be based on the stream between source entity 2230 and destination entity 2240 via the data flow of physical link 2200 The associated flow control signaling of amount control ring is controlled (for example, modification, pause).For example, from the sending side of physical link 2200 On source entity 2230 send data can be received in the destination entity 2240 in the receiving side of physical link 2200.Work as destination entity 2240 be not useable for from source entity 2230 via physical link 2200 receive data when, flow control signal can be in destination entity It is defined at 2240 and/or source entity 2230 can be sent to from destination entity 2240.Flow control signal can be configured as touching Entity 2230 is risen to change the data flow from source entity 2230 to destination entity 2240.
If for example, receiving queue D2 is not useable for the data that processing is sent from transmit queue A1, destination entity 2240 It can be configured as sending the flow control signal associated with flow control ring to source entity 2230;Flow control signal can by with Triggering is set to from transmit queue A1 to pauses of the receiving queue D2 via the data transfer of transmission path, transmission path includes second At least a portion and physical link 2200 of level queue 2220.In certain embodiments, receiving queue D2 may unavailable, example Such as, when can not receive data when receiving queue D2 is too full.In certain embodiments, receiving queue D2 can in response to previously from The data that transmit queue A1 is received change into down state (for example, congestion state) from upstate.In certain embodiments, Transmit queue A1 can be referred to as the target of flow control signal.Transmit queue A1 can be based on sending team in flow control signal The associated queue identifiers of A1 are arranged to be identified.In certain embodiments, flow control signal can be referred to as feedback signal.
In this embodiment, flow control ring is associated with physical link 2200 (being referred to as physical link control ring), flow Control ring is associated with first order queue 2210 (being referred to as first order control ring), and flow control ring and second level queue 2220 Associated (being referred to as second level control ring).Especially, physical link control ring is not with including physical link 2200 and including first The transmission path of level queue 2210 and second level queue 2200 is associated.It can be based on and thing via the data flow of physical link 2200 The relevant flow control signaling of reason link control ring is turned on and off.
First order control ring can the number based at least one transmit queue 2234 come from second level queue 2210 According to transmission and based at least one availability of receiving queue 2244 (for example, designator of availability) in first order queue 2210 The flow control signal of definition.So, first order control ring can be referred to as associated with first order queue 2210.The first order is controlled Ring can be with including at least a portion of physical link 2200, at least a portion of second level queue 2220 and first order queue 2210 Transmission path be associated.The flow control signaling relevant with first order control ring can trigger control and come from and first order queue The data flow of 2210 associated transmit queues 2234.
Second level control ring can with including physical link 2200 and including at least a portion of second level queue 2220, but Not including the transmission path of first order queue 2210 is associated.Second level control ring can be based on out of second level queue 2220 at least One transmit queue 2232 and based at least one availability of receiving queue 2242 in second level queue 2220 (for example, availability Designator) data transfer of flow control signal that defines.So, second level control ring can be referred to as and second level queue 2220 are associated.The flow control signaling being associated to second level control ring, which can be triggered, to be controlled from related with second level queue 2220 The data flow of the transmit queue 2232 of connection.
In this embodiment, the flow control ring associated with second level queue 2220 is the flow control based on priority Ring.Especially, come from each transmit queue of second level transmit queue 2232 and come from second level receiving queue 2242 Receiving queue pairing;And each queue pair is relevant with service class (being also known as the grade of service or service quality). In the embodiment, second level transmit queue B1 and second level transmit queue C1 define queue pair and associated with service class x.The Two grades of transmit queue B2 and second level transmit queue C2 define queue pair and associated with service class Y.In certain embodiments, Different types of Internet traffic can be associated from different service class (i.e. different priority).For example, the storage traffic (example Such as, read and write traffic), inter-processor communication, media signaling, session layer signaling etc. can be with an at least seeervice level It is not related.In certain embodiments, second level control ring can be based on, for example Institute of Electrical and Electric Engineers (IEEE) 802.1qbb agreements, it defines the flow control policy based on priority.
Via the data traffic of transmission path 74, as shown in figure 22, it can be controlled using at least one control ring.Transmit road Footpath 74 includes first order transmit queue A2, second level transmit queue B1, physical link 2200, second level receiving queue C1 and first Level receiving queue D3.However, via number of the queue in the one-level of transmission path 74 based on the flow control ring associated with this grade , can be by the another first order impact data flow of transmission path 74 according to the change in stream.Flow control at one-level can influence Another grade of data flow, because the queue (for example, transmit queue 2232, transmit queue 2234) and purpose in source entity 2230 are real Queue (for example, receiving queue 2242, receiving queue 2244) in body 2240 is classification section.In other words, based on a flow The flow control of control ring can have via the factor associated with different flow control ring on the raw influence of data miscarriage.
For example, can be based on to first order receiving queue D3 data flow via transmission path 74 from first order transmit queue A1 One or more control rings-first order control ring, second level control ring and/or physical link control ring are changed.To the first order The pause of receiving queue D3 data flow from upstate may change into down state (example due to first order receiving queue D3 Such as, congestion state) and be triggered.
If the data flow to first order receiving queue D3 is associated with service class x, via second level transmit queue B1 and second level receiving queue C1 (its define the queue associated with service class x to) data flow can based on and second level control The associated flow control signaling pause of ring (it is the control ring based on priority) processed.But via related to service class x The data transmission suspension of the queue pair of connection can cause the data for coming from the transmit queue for being input to second level transmit queue B1 to pass Defeated pause.Especially, it can cause to come not only from first via the data transmission suspension of the queue pair associated with service class x Level transmit queue A2 data transfer, is also from the pause of first order transmit queue A1 data transfer.In other words, come from First order transmit queue A1 data flow is indirect or is concurrently affected.In certain embodiments, received at transmit queue A1 Data and the data that are received at transmit queue A2 can be associated with identical service class X, but at transmit queue A1 The data of reception and the data received at transmit queue A2 may be from for example different (for example, independent) network equipments (not shown), such as peripheral processor, it can be associated from different service class.
Data flow to first order receiving queue D3 can also be especially by coming from first order transmit queue A2 data Transmission pause is suspended based on the flow control signaling relevant with first order control ring.Team's A2 numbers are sent by coming from the first order According to the direct pause of transmission, coming from first order transmit queue A1 data transfer can be not disrupted.In other words, the first order Transmit queue A2 flow control can be directly controlled based on the flow control signal associated with first order control ring, without Come from other first order transmit queues such as first order transmit queue A1 data transmission suspension.
Data flow to first order receiving queue D3 can also be by being based on having with physical link control ring via physical link 220 The flow control signaling data transmission suspension of pass is controlled.But via the data transmission suspension of physical link 2200 can cause through By all data transmission suspensions of physical link 2200.
Queue on the sending side of physical link 2200 can be referred to as transmit queue 2236 and in physical link receiving side On queue can be referred to as receiving queue 2246.In certain embodiments, transmit queue 2236 can also be referred to as source queue, and connect Destination queue can be referred to as by receiving queue 2246.Although it is not shown, in certain embodiments, one or more transmit queues 2236 can be included in one or more interface cards associated with source entity 2230, and one or more receiving queues 2246 can be included in one or more interface cards relevant with destination entity 2240.
When source entity 2230 sends data via physical link 2200, source entity 2230 can be referred to as being located at physical link The transmitter of 2200 sending sides.Destination entity 2240 can be configured as receiving data and be referred to as receiving positioned at physical link 2200 Receiver on side.Although it is not shown, in certain embodiments, source entity 2230 (and associated element is (for example, hair Send queue 2236)) it can be configured as working as destination entity (for example, receiver) and destination entity 2240 is (and related Element (for example, receiving queue 2246)) it can be configured as working as source entity (for example, transmitter).In addition, physical link 2200 can work as bi-directional link.
In certain embodiments, physical link 2200 can be tangible link, for example optical link (for example, fiber optic cables, Plastic optical fiber cable), cable link (for example, electric wire based on copper), twisted pair wire links (for example, 5 class cables) etc..At some In embodiment, physical link 2200 can be Radio Link.Such as ether is based on via the data transmissions of physical link 2200 FidonetFido, wireless protocols, Ethernet protocol, fibre channel protocol, Ethernet fibre channel protocol, the agreement for being related to infinite bandwidth And/or etc. agreement be defined.
In certain embodiments, second level control ring can be referred to as being nested in first order control ring, because and the second level The associated second level queue 2220 of control ring is located in the first order queue 2210 associated with first order control ring.It is similar Ground, physical link control ring can be referred to as being nested in the control ring of the second level.In certain embodiments, second level control ring energy quilt Referred to as internal control ring, and first order control ring can be referred to as outside control ring.
Figure 23 is the multistage schematic block diagram for showing the controllable queue of flow according to one embodiment.As shown in figure 23, first The level sending side of queue 2310 and the sending side of second level queue 2320 are included on the sending side of physical link 2300 In source entity 2330.The receiving side of first order queue 2310 and the receiving side of second level queue 2320 are included in positioned at physics chain In destination entity 2340 in the receiving side of road 2300.Queue on the sending side of physical link 2300 can be collectively referred to as transmit queue Queue on 2336, and physical link receiving side can be collectively referred to as receiving queue 2346.Although it is not shown, at some In embodiment, source entity 2330 can be configured as destination entity working, and destination entity 2340 can be configured as conduct Source entity (for example, transmitter) works.In addition, physical link 2300 can work as bi-directional link.
As shown in figure 23, source entity 2330 communicates with destination entity 2340 via physical link 2300.Source entity 2330 has There is queue QP1, it is configured as in data via physical link 2300 by buffered data (if desired) before sending, and mesh Entity 2340 have queue QP2, its be configured as data destination entity 2340 be allocated before buffer via physical link 2300 data (if desired) received.In certain embodiments, it can be processed via the data flow of physical link 2300, without Need buffering queue QP1 and queue QP2.
Being included in transmit queue QAl to the QAN in first order queue 2310, each can be referred to as first order transmit queue And transmit queue 2334 (or queue 2334) can be collectively referred to as.The transmit queue QB1 being included in second level queue 2320 is arrived QBM each can be referred to as second level transmit queue and transmit queue 2332 (or queue 2332) can be collectively referred to as.It is included in Receiving queue QD1 to QDR in first order queue 2310 each can be referred to as first order receiving queue and can be collectively referred to as Receiving queue 2344 (or queue 2344).Being included in receiving queue QC1 to the QCM in second level queue 2320, each can be claimed For second level receiving queue and receiving queue 2342 (or queue 2342) can be collectively referred to as.
As shown in figure 23, each queue for coming from second level queue 2320 is located in physical link 2300 and come from Within transmission path in first order queue 2310 between at least one queue.For example, a part for transmission path can be by first Level receiving queue QD4, second level receiving queue QC1 and physical link 2300 are defined.Second level receiving queue QC1 is located at first In transmission path between level receiving queue QD4 and physical link 2300.
In this embodiment, physical link control ring is associated with physical link 2300, first order control ring and first Level queue 2310 is associated, and second level control ring is associated with second level queue 2320.In certain embodiments, the second level Control ring can be the control ring based on priority.In certain embodiments, physical link control ring include physical link 2300, Queue QP1 and queue QP2.
Flow control signal can be at source entity 2330 source control module 2370 and destination entity 2340 at purpose control Molding block 2380 is defined and/or sent in-between.In certain embodiments, source control module 2370 can be referred to as source stream Control module is measured, and purpose control module 2380 can be referred to as target flow control module.For example, purpose control module 2380 The one or more receiving queues 2346 (for example, receiving queue QD2) that can be configured as at destination entity 2340 are unavailable When data are received, to the transmitted traffic control signal of source control module 2370.Flow control signal can be configured as trigger source control Molding block 2370 for example suspends the data flow from one or more receiving queues 2330 to one or more receiving queues 2346.
In data by before sending, source control module 2370 is by queue identifier and is coming from the hair of transmit queue 2336 The data queued up at queue are sent to be associated.Queue identifier can represent and/or be used for the transmit queue that identification data is queued up.Example Such as, when packet is queued up in first order transmit queue QA4, unique identification first order transmit queue QA4 queue identifier It can be added in packet or be included in the field (for example, head, afterbody, payload) in packet. In some embodiments, queue identifier can be relevant with the data at source control module 2370, or is touched by source control module 2370 Hair.In certain embodiments, only data by send before, or data from one of transmit queue 2336 by send after, Queue identifier can be associated with data.
Queue identifier can be related to being sent to the data of the receiving side of physical link 2300 from the sending side of physical link 2300 Connection can be identified so as to data source (for example, source queue).Therefore, flow control signal can be defined to temporary based on queue identifier Stop the transmission of one or more transmit queues 2336.For example, the queue identifier energy quilt associated with first order transmit queue QAN In being included in the packet sent from first order transmit queue QAN to first order receiving queue QD3.If receiving data point After group, first order receiving queue QD3 can not receive another packet for coming from first order transmit queue QAN, then please The flow control signal energy for asking first order transmit queue QAN pauses to be transmitted to first order receiving queue QD3 additional data packet It is defined based on the queue identifier associated with first order transmit queue QAN.Queue identifier can be by purpose control module 2380 parse from packet, and by purpose control module 2380 for defining flow control signal.
In certain embodiments, connect from several transmit queues 2336 (for example, first order transmit queue 2334) to the first order The data transmissions for receiving queue QDR are changed into down state from upstate in response to first order receiving queue QDR and suspended. Each in several transmit queues 2336 can be identified based on its corresponding queue identifier in flow control signal.
In certain embodiments, one or more transmit queues 2336 and/or one or more receiving queues 2346 can be with It is virtual queue (for example, set of queues of logical definition).Therefore, queue identifier can be associated with virtual queue (for example, energy Embody).In certain embodiments, the queue that the queue that queue identifier can be to coming from definition virtual queue is concentrated is related Connection.In certain embodiments, each queue identifier of the queue identifier collection associated with physical link 2300 is come from Can be unique.For example, each transmit queue being associated with physical link 2300 (for example, associated with redirecting) 2336 can be associated with unique queue identifier.
In certain embodiments, source control module 2370 can be configured as by queue identifier only with transmit queue 2336 One particular subset and/or only associated with the data subset queued up at one of transmit queue 2336 place.If for example, data It is not accompanied by queue identifier and is sent to first order receiving queue QD1 from first order transmit queue QA2, then is configured to request and comes from It can not be defined in the flow control signal of first order transmit queue QA2 data transmission suspension, because being unaware of source data. Therefore, when data are sent from transmit queue, by queue identifier and data not being contacted into (for example, omission), come from The transmit queue of transmit queue 2336 can be exempted from flow control.
In certain embodiments, the unavailability energy base of one or more receiving queues 2346 at destination entity 2340 It is satisfied and is defined in condition.The condition can relate to the storage limitation of queue, queue access rate, the data for being input to queue Flow rate etc..For example, flow control signal can be at purpose control module 2380 in response to one or more receiving queues 2346 state, such as second level receiving queue QC2 from upstate be based on threshold value storage limitation be exceeded change into it is unavailable State (for example, congestion state) is defined.When in down state, second level receiving queue QC2 is not useable for receiving number According to because such as second level receiving queue QC2 is considered as too full (such as storing exceeding for limitation by threshold value indicated).In some realities Apply in example, when disabled, one or more receiving queues 2346 can be in down state.In certain embodiments, reception is worked as When queue is not useable for receiving data, flow control signal can be based on request to the receiving queue for coming from receiving queue 2346 Data transmission suspension is defined.In certain embodiments, the state of one or more receiving queues 2346 can be in response to receiving team The particular subset that row 2346 (for example, receiving queue in specific level) are in congestion state changes into congestion state from upstate (by purpose control module 2380).
In certain embodiments, flow control signal can be defined to indicate receiving queue at purpose control module 2380 One in 2346 is changed into upstate from down state.For example, initially, purpose control module 2380 can by with It is set to definition and changes into down state from upstate in response to first order receiving queue QD3 and sends first flow control letter Number arrive source control module 2370.First order receiving queue QD3 can be in response to the data that are sent from first order transmit queue QA2 from can It is down state with state change.Therefore, the target of first flow control signal can be first order transmit queue QA2 (bases Indicated in queue identifier).When first order receiving queue QD3 changes back upstate from down state, purpose control Module 2380 can be configured as definition and send second flow control signal to source control module 2370, and it is indicated from unavailable shape State changes back upstate.In certain embodiments, source control module 2370 can be configured to respond to second flow control letter Number data transfer of the triggering from one or more transmit queues 2336 to first order receiving queue QD3.
In certain embodiments, flow control signal can have one or more parameter values, and it passes through source control module 2370 are used for the biography that modification comes from one of transmit queue 2336 (being recognized by queue identifier in flow control signal) It is defeated.For example, flow control signal may include that trigger source control module 2370 suspends the biography for coming from one of transmit queue 2336 The parameter value of a defeated special time period (for example, 10 milliseconds (ms)).In other words, flow control signal may include time out section Parameter value.In certain embodiments, time out section can be uncertain.In certain embodiments, flow control signal energy Definition is from one or more transmit queues 2336 with special speed (for example, specified number of frames per second, given number bit per second) Send the request of data.
In certain embodiments, flow control signal (for example, time out section in flow control signal) can be based on stream Amount control algolithm is defined.Time out section can be based on coming from receiving queue 2346 (for example, first order receiving queue QD4) Receiving queue be defined for the down state elapsed time cycle.In certain embodiments, time out section can be based on many It is defined in a first order receiving queue 2344 for down state.For example, in certain embodiments, when similar one specific When the first order receiving queue 2344 of number is congestion state, time out section increase.In certain embodiments, it is such It is determined that can be determined in purpose control module 2380.The period that receiving queue is in unavailable experience can be by purpose control Module 2380 is calculated based on the rate of discharge (for example, historical traffic rate, previous traffic rate) for for example coming from receiving queue data Plan (such as, it is contemplated that) period.
In certain embodiments, source control module 2370 can refuse or change modification and comes from one or more transmit queues The request of 2336 data flow.For example, in certain embodiments, source control module 2370 can be configured as reducing or increasing pause Period.In certain embodiments, be not in response in flow control signal suspend data transfer, source control module 2370 can by with Modification is set to transmitting one of queue 2336 associated transmission path.If for example, first order transmit queue QA2 bases The request of pause transmission is received in the change of first order receiving queue QD2 states, then source control module 2370 can be configured as touching Hair is from first order transmit queue QA2 to such as first order receiving queue QD3 data transfer, rather than asking according to pause transmission Ask progress.
As shown in figure 23, within second level queue 2320 queue fan-in (fan into) is fanned out to (fan out) physics Link 2300.For example, transmit queue 2332 (for example, QB1 to QBM) fan-in physical link on the sending side of physical link 2300 Queue QP1 on 2300 sending sides.Therefore, the data queued up at any transmit queue 2332 can be sent to physical link 2300 queue QP1.In the receiving side of physical link 2300, the data energy quilt sent from physical link 2300 via queue QP2 It is broadcast to receiving queue 2342 (that is, queue QC1 to QCM).
Equally, as shown in figure 23, the fan-in of transmit queue 2334 in first order queue 2310 is to second level queue 2320 Interior transmit queue 2332.For example, the data that any place is queued up in the first order transmit queue QA1, QA4 and QAN-2 can be sent out It is sent to second level transmit queue QB2.In the receiving side of physical link 2300, the number sent from such as second level receiving queue QCM According to the first order receiving queue QDR-1 and QDR can be broadcast to.
Due to many flow control rings (for example, first control ring) and different fan-ins, it is fanned out to architecture and is associated, stream Amount control ring has different influences to the data flow via physical link 2300.For example, when from second level transmit queue QB1's When data transfer is suspended based on second level control ring, from the first order transmit queue QA1, QA2, QA3 and QAN-1 via the second level Transmit queue QB1 is also suspended to the data transfer of one or more receiving queues 2346.In this case, under coming from Row flow queue (for example, second level transmit queue QB1) transmission pause when, come from one or more up flow queues (for example, First order transmit queue QA1) data transmissions be suspended.If on the contrary, from first order transmit queue QA1 along including at least The data transfer of downstream second level transmit queue QB1 transmission path is suspended based on first order control ring, then comes from second Level transmit queue QB1 data on flows rate can be reduced, and the data transfer without coming from second level transmit queue QB1 is complete Suspend in portion;For example, first order transmit queue QA1, still is able to send data via second level transmit queue QB1.
In certain embodiments, fan-in and be fanned out to architecture can be with the difference shown in Figure 23.For example, in some realities Apply in example, some queues in first order queue 2310 can be configured as the roundabout ground fan-in physical link of second level queue 2320 2300。
The flow control signaling associated with transmit queue 2336 is handled and and receiving queue by source control module 2370 2346 associated flow control signalings are handled by purpose control module 2380.Although it is not shown, in certain embodiments, Flow control signaling can by it is one or more can be control module that is independent and/or being integrated into single control module (or Control submodule) processing.For example, the flow control signaling associated with first order receiving queue 2344 can by independently of by with It is set to the control module processing for the control module for handling the flow control signaling associated with second level receiving queue 2342.It is similar Ground, the flow control signaling associated with first order transmit queue 2334 can be by sending out independently of being configured as processing with the second level The control module of the relevant flow control signaling control module of queue 2332 is sent to handle.In certain embodiments, source control module 2370 and/or one or more parts of purpose control module 2380 can be hardware based module (for example, DSP, FPGA) And/or the module (for example, calculate node module, the processor readable instruction sets that can be performed on a processor) based on software.
Figure 24 is the schematic block diagram for showing purpose control module 2450 according to one embodiment, the purpose control module by with It is set to and defines the flow control signal 6428 associated with multiple receiving queues.Queue level includes first order queue 2410 and second Level queue 2420.As shown in figure 24, source control module 2460 is associated with the sending side of first order queue 2410 and purpose is controlled Module 2450 is associated with the receiving side of first order queue 2410.Queue on the sending side of physical link 2400 can be claimed jointly For transmit queue 2470.Queue in the receiving side of physical link 2400 can be collectively referred to as receiving queue 2480.
Purpose control module 2450 is configured to respond to one or more receiving queues in first order queue 2410 not Data are received available for from the single source queue at first order queue 2410, to the transmitted traffic control signal of source control module 2460 6428.Source control module 2460 is configured as suspending from the source queue at first order queue 2410 based on flow control signal 6428 The data transfer of multiple receiving queues to first order queue 2410.
Flow control signal 6428 can by purpose control module 2450 based on each in first order queue 2410 not It can be defined with the information that receiving queue is associated.Purpose control module 2450 can be configured as collecting and unavailable receiving queue Associated information simultaneously is configured as defining flow control signal 6428, so that the flow control signal (not shown) of potential conflict It is not delivered to the single source queue at first order queue 2410.In certain embodiments, the stream of the information definition based on collection Amount control signal 6428 can be referred to as aggregated flows control signal.
Especially, in this example embodiment, purpose control module 2450 is configured to respond to two receiving queue-receiving queues 2442 and receiving queue 2446- is not useable at the receiving side of first order queue 2410 from the sending side of first order queue 2410 Transmit queue 2412 receives data, to define flow control signal 6428.In this embodiment, in response to from transmit queue 2412 The packet sent respectively via transmission path 6422 and transmission path 6424, receiving queue 2442 and receiving queue 2446 from Upstate changes into down state.As shown in figure 24, transmission path 6422 includes transmit queue 2412, second level queue Transmit queue 2422, physical link 2400 in 2420, receiving queue 2432 and receiving queue in second level queue 2420 2442.Transmission path 6424 includes transmit queue 2412, transmit queue 2422, physical link 2400, receiving queue 2432 and connect Receive queue 2446.
In certain embodiments, flow control algorithm can be used for based on the information for being related to the unavailability of receiving queue 2442 And/or be related to the information of the unavailability of receiving queue 2446 and define flow control signal 6428.If for example, purpose controls mould Block 2450 determines that receiving queue 2442 and receiving queue 2446 are not useable for the different periods, then purpose control module 2450 can To be configured as defining flow control signal 6428 based on the different periods.For example, purpose control module 2450 can be via stream Amount control signal 6428 asks the period of data transmission suspension one from transmit queue 2412, and the period is based on the different time Section (for example, the period equal to different time sections average value, period equal to higher value in different time sections) is calculated.One In a little embodiments, flow control signal 6428 can be based on the independent pause request (example for coming from the receiving side of first order queue 2410 Such as, the pause request associated with receiving queue 2442 and the pause request associated with receiving queue 2446) definition.
In certain embodiments, flow control signal 6428 can allow the period to define based on maximum or most I.One In a little embodiments, flow control signal 6428 can be based on the collective data flow rate meter for coming from such as transmit queue 2412 Calculate.For example, time out section can be measured based on the collective data flow rate for coming from transmit queue 2412.In some embodiments In, if for example, the data traffic speed for coming from transmit queue 2412 is higher than threshold value, time out section can be increased, with And time out section can be reduced if the data traffic speed for coming from transmit queue 2412 is less than threshold value.
In certain embodiments, flow control algorithm can be configured as in definition and/or transmitted traffic control signal 6428 The specific period is waited before.Waiting period, which can be defined such that, to be related to transmit queue 2412 and can wait in section not Flow control signal 6428 can be used to define with multiple pauses request that the time is received.In certain embodiments, during wait Between at least one the pause request of section in response to being related to transmit queue 2412 received and be triggered.
In certain embodiments, flow control signal 6428 can be based on and each receiving queue in first order queue 2410 Associated priority valve is defined by flow control algorithm.If for example, receiving queue 2442 have than with receiving queue 2446 The higher priority valve of associated priority valve, then purpose control module 2450 can be configured as be based on and receiving queue 2442 Rather than the associated information definition flow control signal 6428 of receiving queue 2446.For example, flow control signal 6428 can base In the time out section associated with receiving queue 2442 rather than the time out section definition associated with receiving queue 2446, Because receiving queue 2442 has the higher priority valve of the priority valve more associated than with receiving queue 2446.
In certain embodiments, flow control signal 6428 can based on inside first order queue 2410 each reception The associated attribute of queue is defined by flow control algorithm.For example, it is particular type queue that flow control signal 6428, which can be based on, The receiving queue 2442 and/or receiving queue 2446 of (for example, then enter first to go out (LIFO) queue, FIFO (FIFO) queue) are fixed Justice.In certain embodiments, flow control signal 6428 can be based on being configured as receiving specific type of data (for example, control number According to/signal queue, media data/signal queue) receiving queue 2442 and/or receiving queue 2446 define.
Although it is not shown, the one or more control moulds associated with queue level (for example, first order queue 2410) Block can be configured as sending information to different control modules, and the wherein information is used to define flow control signal.Different Control module is relevant from different queue levels.For example, the pause request associated with receiving queue 2442 and and receiving queue 2446 relevant pause requests can be defined in purpose control module 2450.Pause request can be sent to and second level queue The associated purpose control module (not shown) of 2420 receiving sides.Flow control signal (not shown) can with second level queue Based on pause request and based on flow control algorithm definition at the associated purpose control module of 2420 receiving sides.
Flow control signal 6428 can be based on the flow control ring associated with first order queue 2410 (for example, the first order Control ring) definition.One or more flow control signal (not shown) can also be based on the stream associated with second level queue 2420 Measure control ring and/or the flow control ring definition associated with physical link 2400.
The data transfer associated with transmit queue (except transmit queue 2412) in first order queue 2410 substantially not by Flow control signal 6428 is limited, because being controlled to the data flow of receiving queue 2442 and 2446 based on first order flow control ring System.Even if for example, from the data transmission suspension of transmit queue 2412, transmit queue 2414 can also be continued on through by transmit queue 2422 Send data.Even if for example, transmit queue 2414 can be configured as the data via transmit queue 2422 from transmit queue 2412 Transmission has timed out, moreover it is possible to send data to receiving queue 2448 via the transmission path 6426 including transmit queue 2422. In some embodiments, though transmit queue 2422 can be configured as from queue 2412 via transmission path 6422 data transfer Through being suspended based on flow control signal 6428, moreover it is possible to continue to send number from such as transmit queue 2416 to receiving queue 2442 According to.
, whereas if the data transfer to receiving queue 2442 and 2446 passes through based on the stream relevant with second level control ring The (not shown) control of amount control signal is suspended via the data flow of transmit queue 2422, then (removes and come from transmit queue 2412 Data transfer outside) will also be limited via the data transfer of transmit queue 2422 from transmit queue 2414 and transmit queue 2416 System.It will be suspended from the data transfer of transmit queue 2422, and because it is associated with special services rank, and cause and for example exist The data of congestion can be associated with special services rank at receiving queue 2442 and 2446.
The one or more parameter values defined within flow control signal 6428 can be stored in purpose control module In 2450 memory 2452.In certain embodiments, after one or more parameter values are defined and/or when flow control When signal 6428 is sent to source control module 2460, parameter value can be stored in the memory 2452 of purpose control module 2450 Place.The parameter value defined in flow control signal 6428 can be used for the state for tracking such as transmit queue 2412.For example, depositing Entry in reservoir 2452 can indicate transmit queue 2412 in halted state (such as non-sent state).Entry can be based in stream The time out section parameter value defined in amount control signal 6428 is defined.Section has timed, out between when pausing, the entry energy quilt It is updated to indicate that the state of transmit queue 2412 has been changed to such as active state (for example sending state).Although it is not shown, But in certain embodiments, one or more parameter values can be stored in the memory (example outside purpose control module 2450 Such as, remote memory) in.
In certain embodiments, it is stored in one or more of the memory 2452 of purpose control module 2450 parameter value (for example, the status information defined based on one or more parameter values) can be used to determine additional stream by purpose control module 2450 Whether amount control signal (not shown) should be defined.In certain embodiments, one or more parameter values can be by purpose control Module 2450 defines one or more additional flow control signals.
If for example, receiving queue 2442 in response to the first packet for being received from transmit queue 2412 from upstate Down state (for example, congestion state) is changed into, then suspending can be via stream from the request of the data transfer of transmit queue 2412 Amount control signal 6428 is sent.Flow control signal 6428 can indicate that transmit queue 2412 is the request based on queue indicator Target and can specify time out section.When flow control signal 6428 is sent to source control module 2460, with transmission The associated time out section of queue 2412 and queue identifier can be stored in the memory 2452 of purpose control module 2450 In.Flow control signal 6428 by send after, receiving queue 2444 can in response to received from transmit queue 2412 second Packet changes into congestion state from upstate (transmission path is not shown in fig. 24).In the number from transmit queue 2412 Before transmission pause, the second packet can be sent based on flow control signal 6428 from transmit queue 2412.Purpose control Molding block 2450 can access the information being stored in memory 2452, and can be in response to having off status with receiving queue 2444 Change, to determine that the additional flow control signal that target is transmit queue 2412 should not be defined and be sent to source control module 2460, because flow control signal 6428 is sent.
In certain embodiments, source control module 2460 can be configured as temporary based on nearest flow control signal parameter value Stop coming from the transmission of transmit queue 2412.For example, target for transmit queue 2412 flow control signal 6428 by It is sent to after source control module 2460, target can be controlled for the slower flow control signal (not shown) of transmit queue 2412 in source Received at molding block 2460.Source control module 2460 can be configured as performing one associated with subsequent flow control signal Or multiple parameter values, rather than the parameter value associated with flow control signal 6428.In certain embodiments, slower flow control Signal processed can trigger transmit queue 2412 maintain halted state keep a ratio indicated more in flow control signal 6428 The long or shorter period.
In certain embodiments, when the priority valve associated with one or more parameter values higher than (or less than) with and stream When measuring the priority valve that the associated one or more parameter values of control signal 6428 are associated, source control module 2460 is alternatively Perform one or more parameter values associated with slower flow control signal.In certain embodiments, each priority valve It can be defined in purpose control module 2450, and each priority valve can be based on and one or more phases of receiving queue 2480 The priority valve definition of association.
In certain embodiments, flow control signal 6428 and slower flow control signal (are all that target is transmit queue 2412) the identical receiving queue for being responsive to come from receiving queue 2480 is unavailable and is defined.For example, slower flow control Signal can include the undated parameter value defined by purpose control module 2450 based on receiving queue 2442, and receiving queue 2442 is not One is maintained in upstate than being previously calculated the longer period.In certain embodiments, target is transmit queue 2412 Flow control signal 6428 can change state in response to one of receiving queue 2480 (can not for example, being changed into from upstate With state) and be defined, and target can be in response to receiving queue 2480 for the slower flow control signal of transmit queue 2412 In another change state (for example, changing into down state from upstate) and be defined.
In certain embodiments, multiple flow control signals can be defined to pause from the in purpose control module 2450 The transmission of more than 2410 transmit queue of one-level queue.In certain embodiments, multiple transmit queues can receive team to independent Row such as receiving queue 2444 sends data.In certain embodiments, to multiple transmit queues from first order queue 2410 The history of flow control signal can be stored in the memory 2452 of purpose control module 2450.In certain embodiments, The slower flow control signal associated with independent receiving queue can the history based on flow control signal calculated.
In certain embodiments, the time out related to multiple transmit queues section can be grouped and be included in flow control In system packet.For example, when the time out section associated with transmit queue 2412 and the pause associated with transmit queue 2414 Between section can be included in flow control packet (being also known as flow control packet).It is related to the more details of flow control packet It will be described with reference to Figure 25.
Figure 25 is the schematic diagram for showing flow control packet according to one embodiment.Flow control packet includes head 2510th, afterbody 2520 and including the temporary of several transmit queues for being represented by queue identifier (ID) (in row 2514 show) Stop the pay(useful) load 2530 of period parameter value (being shown in row 2512).As shown in figure 25, by queue ID 1 to V (i.e. queues ID1 to queue IDV) transmit queue that represents each with time out section parameter value 1 to V, (i.e. the time out cycle 1 is to suspending Time cycle V) it is associated.Time out section parameter value 2514 indicates the transmit queue represented by queue 2512 from being sent data The period that (for example, forbidding) is undergone should be suspended.
In certain embodiments, flow control packet can be for example, purpose control module 2450 for example shown in Figure 24 Purpose control module at be defined.In certain embodiments, purpose control module can be configured as the time interval in rule Define flow control packet.For example, purpose control module, which can be configured as every 10ms, defines a flow control packet.At some In embodiment, when pausing between section parameter value when being calculated, and/or when pausing between section parameter value given number When being calculated, purpose control module can be configured as defining flow control packet with random time.In certain embodiments, purpose The status information that control module can be accessed based on for example one or more parameter values and/or by purpose control module determines at least one Partial discharge control packet should not be defined and/or send.
Although it is not shown, in certain embodiments, multiple queue ID can be with independent time out cycle parameter value phase Association.In certain embodiments, at least one queue ID can be associated with the parameter value in addition to time out section parameter value. For example, queue ID can be associated with flow rate parameter value.Flow rate parameter value can indicate transmit queue (by queue ID tables Show) flow rate (for example, maximum stream flow speed) of data should be sent.In certain embodiments, flow control packet can have There are one or more means for being configured as indicating whether specific receiving queue can be used for reception data.
Flow control packet can be from purpose control module to source control module (such as source control module shown in Figure 24 2460) sent via flow control signal (such as the flow control signal 6428 shown in Figure 24).In certain embodiments, flow Amount control packet can be defined based on the 2nd layer of (for example, the 2nd layer of osi model) agreement.In other words, flow control packet energy The 2nd layer in network system is defined and is used wherein.In certain embodiments, flow control packet can with the 2nd layer Sent between associated device (for example, mac device).
Referring again to Figure 25, the one or more parameter values associated with flow control signal 6428 are (for example, based on parameter The status information of value definition) it can be stored in the memory 2562 of source control module 2560.In certain embodiments, flow is worked as Control signal 6428 is when source control module 2560 is received, and one or more parameter values can be stored in source control module 2560 Memory 2562 in.Parameter value defined in flow control signal 6428 can be used to track one or more receiving queues The state of 2580 (for example, receiving 2542).For example, the entry in memory 2562 can indicate that receiving queue 2542 is not useable for connecing Receive data.The entry can be defined and with connecing based on the time out cycle parameter value defined in flow control signal 6428 Receiving the identifier (for example, queue identifier) of queue 2542 is associated.Section time-out between when pausing, the entry can be updated to refer to Show that the state of receiving queue 2542 has been changed to such as active state.Although it is not shown, but in certain embodiments, one Or multiple parameter values can be stored in the memory outside source control module 2560 (for example, remote memory).
In certain embodiments, one or more parameter values at the memory 2562 of source control module 2560 are stored in (and/or status information) can be used to determine whether data should be sent to one or more reception teams by source control module 2560 Row 2580.For example, source control module 2560 can be configured as based on the state for being related to receiving queue 2544 and receiving queue 2542 Information sends data from transmit queue 2516 to receiving queue 2544 rather than receiving queue 2542.
In certain embodiments, source control module 2560 can analyze data transmission mode to determine whether data should be from one Individual or multiple sources queue 2570 is sent to one or more receiving queues 2580.For example, source control module 2560 can be based on storage Parameter value at the memory 2562 of source control module 2560 determines that transmit queue 2514 is sent relatively to receiving queue 2546 High data volume.Based on the determination, source control module 2560 can trigger queue 2516 to receiving queue 2548 rather than receive team Row 2546 send data, because receiving queue 2546 receives high data volume from transmit queue 2514.Pass through analysis and transmit queue Congestion at 2570 associated transmission modes, one or more receiving queues 2580 starts to be substantially avoided.
In certain embodiments, source control module 2560 can be analyzed and is stored at the memory 2562 of source control module 2560 Parameter value (and/or status information) to determine whether data should be sent to one or more receiving queues 2580.Pass through The parameter value (and/or status information) of storage is analyzed, the congestion at one or more transmit queues 2580 starts can be basic On be avoided by.For example, source control module 2560 can be based on compared to the history availability of receiving queue 2542 (for example, more preferably, more Difference) the history availability of receiving queue 2540 carry out trigger data and be sent to receiving queue 2540 rather than receiving queue 2542. In some embodiments, for example, source control module 2560 can be based on relevant data burst pattern compared to the history of receiving queue 2544 The historical performance of receiving queue 2542 of performance sends data to receiving queue 2542 rather than receiving queue 2544.In some implementations In example, specific time window, certain types of net can be based on by being related to the parameter value analysis of one or more receiving queues 2580 Network processing (for example, inter-processor communication), special services rank etc..
In certain embodiments, purpose control module 2550 can send about receiving queue 2580 status information (for example, Current state information), it can be used to determine whether data should be from one or more source queues 2570 by source control module 2560 Sent.For example, source control module 2560 can trigger queue 2514 sends data to queue 2544 rather than queue 2546, because Queue 2546 has the more active volumes of ratio queue 2544 as indicated by purpose control module 2550.In some embodiments In, current state information, transmission mode analysis and any combination of historical data analysis can be used to be essentially prevented or reduce The possibility that the congestion of one or more receiving queues 2580 starts.
In certain embodiments, flow control signal 6428 can be from purpose control module 2550 via out-of-band transmission path quilt It is sent to source control module 2560.For example, flow control signal 6428 can be via the Special chain for being related to the communication of flow control signaling Road is sent.In certain embodiments, flow control signal 6428 can via the queue associated with second level queue 2520, with The associated queue of first order queue 2510, and/or physical link 2500 are sent.
Some embodiments described herein are related to computer readable medium (being also known as processor readable medium) Computer stores product, computer readable medium have have thereon instruction by performing the executable operation of various computers or based on Calculation machine code.Medium and computer code (being also known as code) can be designed to and build for a specific purpose those Medium and computer code.The example of computer readable medium includes, but is not limited to:Such as hard disk, floppy disk and tape Magnetic storage media;Such as compact disk/Digital video disc (CD/DVD), compression compact disc-ROM (CD-ROM) and entirely Cease the optical storage medium of device;The magnetic-light storage medium of such as CD;Carrier signal processing module;And be specifically configured to Store and configuration processor code hardware unit, such as ASIC, programmable logic device (PLD), and read-only storage (ROM) and Ram set.
The example of computer code includes, but is not limited to, microcode or microcommand, machine instruction, such as by collecting Code that person produces, for producing web services, and include the high-level instructions performed by computer using translator File.For example, embodiment can use Java, C++ or other programming languages (for example, programming language of object-oriented) and exploitation Instrument is implemented.The additional examples of computer code include, but are not limited to control signal, encrypted code and compression code.
Although various embodiments have been described more than it should be appreciated that its merely by example rather than The mode of limitation embodies, and can carry out the various change in form and details.Times of equipment and/or method described herein Meaning part can be combined in any way, except mutually exclusive combination.The embodiments described herein can include the difference of description Function, the various combinations of component and/or feature and/or the son of embodiment are combined.

Claims (47)

1. a kind of communication equipment, including:
Exchcange core, the exchcange core defines single logic entity and with the multistage being physically distributed across multiple frames Switching fabric, the multilevel interchange frame has multiple input ports and multiple output ports, and the exchcange core is configured as Multiple peripheral processors are couple to via the multiple input port and the multiple output port,
The exchcange core is configured as in the first peripheral processor for arranging to have the first frame and is arranged in the second frame The second peripheral processor between provide clog-free connectivity with line rate, the exchcange core be configured as receiving with it is described The first associated packet of first peripheral processor, the exchcange core is configured as based on associated with the described first packet Cell, sequentially to second peripheral processor send second packet and to the 3rd peripheral processor send the 3rd Packet, the multilevel interchange frame is configured as from the input port in the multiple input port into the output port Output port sends the cell.
2. communication equipment as claimed in claim 1, has virtually wherein the multiple peripheral processor includes at least one Change the peripheral processor and at least one peripheral processor without virtual resources of resource.
3. communication equipment as claimed in claim 1, wherein the number of the multiple input port and the multiple output port More than 1000, each output port of each input port and the multiple output port in the multiple input port It is both configured to operate with the speed for being not less than 10Gb/s.
4. communication equipment as claimed in claim 1, wherein:
First peripheral processor and second peripheral processor be memory node device, calculate node device, One in service node device or router.
5. communication equipment as claimed in claim 1, wherein the exchcange core is configured as filling in second peripheral processes Put between the 3rd peripheral processor with the clog-free connectivity of line rate offer.
6. communication equipment as claimed in claim 5, wherein:
First peripheral processor and the 3rd peripheral processor be memory node device, calculate node device, One in service node device or router;And
Second peripheral processor is at least one in firewall device, intersecting detection means or load balance device.
7. a kind of communication equipment, including:
Exchcange core, the exchcange core has the multilevel interchange frame being physically distributed across multiple frames, the multistage friendship Changing structure has multiple input ports and multiple output ports, and the exchcange core is configured as via the multiple input port Multiple peripheral processors are couple to the multiple output port,
The exchcange core be configured as using line rate as the multiple peripheral processor in each peripheral processor The connectedness of each remaining processing unit in the multiple peripheral processor is provided, so that the multiple output end Each output port in mouthful can be by each peripheral processor in the multiple peripheral processor via described An input port in multiple input ports is coequally accessed, the number of the multiple input port and the multiple output port Mesh is more than each output end of each input port and the multiple output port in 1000, the multiple input port Mouth is both configured to operate with the speed for being not less than 10Gb/s.
8. communication equipment as claimed in claim 7, wherein the multiple peripheral processor includes at least one via ether The peripheral processor that net connection is couple to the exchcange core is couple to the friendship with least one via non-Ethernet connection Change the peripheral processor of core.
9. communication equipment as claimed in claim 7, wherein the multiple peripheral processor, which includes at least one, uses the 3rd layer The peripheral processor of route and at least one the 4th layer of peripheral processor to the 7th layer of device.
10. a kind of communication equipment, including:
Exchcange core, the exchcange core defines single logic entity and with multilevel interchange frame, and multistage exchange is tied Structure has multiple levels being physically distributed across multiple frames, and the multiple level has multiple input ports and multiple outputs jointly Port, the exchcange core is configured as being couple to multiple peripheries via the multiple input port and the multiple output port Processing unit,
The transmission that the exchcange core is configured as the multiple cells associated with packet can be essentially ensures that without logical When crossing the loss of the multilevel interchange frame, it is allowed to the input port that the multiple cell enters in the multiple input port.
11. communication equipment as claimed in claim 10, wherein the multiple peripheral processor includes being configured as and optical fiber Channel agreement communication the first peripheral processor and be configured as with fiber channel covering Ethernet protocol communicate second Peripheral processor.
12. communication equipment as claimed in claim 10, wherein being configured to determine that property of the multilevel interchange frame network.
13. communication equipment as claimed in claim 10, wherein being configured to determine that property of the multilevel interchange frame network, so that When the multiple cell can be sent to an output port in the multiple output port in the scheduled time, the multistage Switching fabric allows the packet to enter input port.
14. communication equipment as claimed in claim 10, wherein:
The exchcange core is configured as the first output port and into the multiple output port from the input port Two output ports send multiple cells associated with the packet, without in multiple levels of the multilevel interchange frame At least one-level at perform packet loss processing.
15. communication equipment as claimed in claim 10, wherein:
The exchcange core is couple to the multistage including multiple via the multiple input port and the multiple output port The edge device of switching fabric, the multiple edge device is couple to the multiple peripheral processor, and the multiple edge is set Each edge device in standby is configured as receiving described be grouped and based on the multiple cell of the packet definition.
16. communication equipment as claimed in claim 10, wherein:
The exchcange core is configured as multiple levels via the multilevel interchange frame from the input port to the multiple An output port in output port sends multiple cells associated with the packet, without in the multiple level At least one-level at perform packet loss processing.
17. a kind of communication equipment, including:
Exchcange core, the exchcange core defines single logic entity and with multilevel interchange frame, and multistage exchange is tied Structure has multiple levels being physically distributed across multiple frames, and the multilevel interchange frame has multiple input ports and multiple defeated Exit port, the exchcange core is configured as being couple to outside multiple via the multiple input port and the multiple output port Enclose processing unit,
The exchcange core is configured as receiving packet, the exchcange core quilt from the input port in the multiple input port It is configured to send multiple and institute via output port of the multiple level from the input port into the multiple output port The associated cell of packet is stated, is damaged without performing packet at least one-level in multiple levels of the multilevel interchange frame Consumption processing.
18. communication equipment as claimed in claim 17, wherein being configured to determine that property of the multilevel interchange frame network, so that Only when the transmission for the multiple cells associated with packet that can be essentially ensures that in the switching fabric is lossless, Just allow the packet of the input port in the multiple input port.
19. communication equipment as claimed in claim 17, wherein:
The output port is the first output port,
The exchcange core is configured as first output port into the multiple output port from the input port Sent and the multiple cell associated with the packet with the second output port.
20. communication equipment as claimed in claim 17, wherein:
The exchcange core is couple to the multistage including multiple via the multiple input port and the multiple output port The edge device of switching fabric, the multiple edge device is couple to the multiple peripheral processor, and the multiple edge is set Each edge device in standby is configured as receiving described be grouped and based on the multiple cell of the packet definition.
21. a kind of communication equipment, including:
Exchcange core, the exchcange core defines single logic entity and the multistage exchange with being configured to determine that property network Structure, the multilevel interchange frame has a multiple input ports and multiple output ports, the exchcange core be configured as via The multiple input port and the multiple output port are couple to multiple peripheral processors,
The exchcange core is configured as receiving packet, the exchcange core quilt from the input port in the multiple input port It is configured to the output port from the input port into the multiple output port and sends multiple associated with the packet Cell.
22. communication equipment as claimed in claim 21, wherein the multilevel interchange frame is physically distributed across multiple frames.
23. communication equipment as claimed in claim 21, wherein being configured to determine that property of the multilevel interchange frame network, so that Only when the transmission for the multiple cells associated with packet that can be essentially ensures that in the multilevel interchange frame is lossless It is time-consuming, just allow the packet of the input port in the multiple input port.
24. communication equipment as claimed in claim 21, wherein being configured to determine that property of the multilevel interchange frame network, so that An output in the multiple output port can be sent in the scheduled time when the multiple cell associated with packet During port, it is allowed to the packet of the input port in the multiple input port.
25. communication equipment as claimed in claim 21, wherein:
The exchcange core is couple to the multistage including multiple via the multiple input port and the multiple output port The edge device of switching fabric, the multiple edge device is couple to the multiple peripheral processor, and the multiple edge is set Each edge device in standby is configured as receiving described be grouped and based on the multiple cell of the packet definition.
26. communication equipment as claimed in claim 21, wherein:
The exchcange core is configured as multiple levels via the multilevel interchange frame from the input port to the output Port sends multiple cells associated with the packet, without performing packet at least one-level in the multiple level Loss is handled.
27. a kind of communication equipment, including:
Exchcange core, the exchcange core has the multilevel interchange frame being physically distributed between multiple frames, described many Level switching fabric has multiple input buffers and multiple output ports, and the exchcange core is configured to couple to multiple edges Equipment;With
The control for not needing software during operation and realizing and needed during configuration and monitoring software to realize with hardware Device, the controller is couple to the multiple input buffer and the multiple output port, and the controller is configured to work as When the congestion at an output port in multiple output ports is foreseen and it occurs for the congestion in the exchcange core Before, an input buffer transmitted traffic control signal into the multiple input buffer.
28. communication equipment as claimed in claim 27, wherein the controller is configured as independently of for the exchange core Flow is controlled in the structure of the multilevel interchange frame of the heart, the input buffer and the output port is performed end-to-end Flow is controlled.
29. communication equipment as claimed in claim 27, wherein the controller is configured as independently of for the multiple side The flow control of edge equipment, End-to-end flow control is performed to the input buffer and the output port.
30. communication equipment as claimed in claim 27, further comprises:
Multiple peripheral processors for being configured to couple to the multiple edge device,
The controller is configured as independently of the flow control for the multiple edge device, to the input buffer and The output port performs End-to-end flow control.
31. communication equipment as claimed in claim 27, wherein the controller is configured as performing End-to-end flow control, from And cell is buffered at the input buffer for a period of time being sent to before the output port, the time and institute Stating End-to-end flow control is associated.
32. communication equipment as claimed in claim 27, wherein the controller is configured as independently of in the multistage exchange At one level of structure cache cell section and independently of at an edge device in the multiple edge device cache Packet, at the input buffer cache cell perform End-to-end flow control.
33. communication equipment as claimed in claim 27, wherein the controller is configured as independently of associated with Ethernet Flow control mechanism, at the input buffer cache cell perform End-to-end flow control.
34. a kind of communication equipment, including:
Exchcange core, the exchcange core has the multilevel interchange frame being physically distributed between multiple frames, described many Level switching fabric is configured as receiving multiple cells associated with packet and is configured as being based on the multiple cell switching Multiple cell sections;
An edge device in multiple edge devices for being couple to the exchcange core, the edge device is configured as receiving The packet, the edge device is configured to send the multiple cell to the multilevel interchange frame;With
The controller of the multilevel interchange frame is couple to, the controller is configured as setting independently of for the multiple edge Standby flow is controlled and controlled for flow in the structure of the multilevel interchange frame, to the multiple cell traffic control System.
35. communication equipment as claimed in claim 34, wherein:
The controller is not needed software and is realized with hardware and need software real during configuration and monitoring during operation It is existing.
36. communication equipment as claimed in claim 34, wherein:
The multilevel interchange frame has multiple input buffers and multiple output ports,
When the congestion that the controller is configured as at an output port in the multiple output port is foreseen with And before the congestion in the exchcange core occurs, an input buffer into the multiple input buffer sends stream Measure control signal.
37. communication equipment as claimed in claim 34, wherein:
The multilevel interchange frame has multiple input buffers and multiple output ports,
The controller is configured as independently of the flow control mechanism associated with Ethernet, to being buffered in the multiple input The cell cached at an input buffer in device performs End-to-end flow control.
38. a kind of communication equipment, including:
Exchcange core, the exchcange core has multilevel interchange frame;
More than first peripheral processor, the multilevel interchange frame, described are couple to by multiple connections with agreement Each peripheral processor in more than one peripheral processor is the memory node with virtual resources, more than described first The virtual storage resource that the virtual resources common definition of individual peripheral processor is interconnected by the exchcange core;With
More than second peripheral processor, the multilevel interchange frame, described are couple to by multiple connections with agreement Each peripheral processor in more than two peripheral processor is the memory node with virtual resources, more than described second The virtual computing resource that the virtual resources common definition of individual peripheral processor is interconnected by the exchcange core.
39. communication equipment as claimed in claim 38, wherein:
Each peripheral processor in more than first peripheral processor has virtual resources, more than described first Each peripheral processor in peripheral processor is configured such that its virtual resources can be by from described first The virtual resource of remaining peripheral processor in multiple peripheral processors is substituted;And
Each peripheral processor in more than second peripheral processor has virtual resources, more than described second Each peripheral processor in peripheral processor is configured such that its virtual resources can be by from described second The virtual resource of remaining peripheral processor in multiple peripheral processors is substituted.
40. communication equipment as claimed in claim 38, wherein:
More than first peripheral processor is associated and associated with security protocol with based on packet communication protocol;And
More than second peripheral processor is associated and associated with security protocol with based on packet communication protocol.
41. a kind of communication equipment, including:
Exchcange core, the exchcange core has multilevel interchange frame, and the exchcange core is configured as being logically divided into One virtual switch core and the second virtual switch core;
Multiple peripheral processors for being couple to the multilevel interchange frame, the multiple peripheral processor has operationally It is couple to the first peripheral processor subset of the first virtual switch core and to be operably coupled to described second virtual Second peripheral processor subset of exchcange core.
42. communication equipment as claimed in claim 41, wherein:
The exchcange core be configured such that the first virtual switch core and the second virtual switch core independently of Manage to being managed property each other.
43. communication equipment as claimed in claim 41, wherein:
The exchcange core is configured such that the first virtual switch core has independently of the second virtual switch core The bandwidth of the bandwidth of the heart.
44. communication equipment as claimed in claim 41, wherein:
The exchcange core is configured such that the first virtual switch core has and the second virtual switch core Bandwidth and the independent bandwidth of managerial management and managerial management.
45. communication equipment as claimed in claim 41, wherein:
The exchcange core is configured such that the first virtual switch core is operated using l2 protocol, and described second Virtual switch core is operated using l2 protocol and layer-3 protocol.
46. communication equipment as claimed in claim 41, wherein:
The first peripheral processor subset has virtual resource, and the second peripheral processor subset has virtual money Source.
47. communication equipment as claimed in claim 41, wherein:
The first peripheral processor subset is included in being calculate node, memory node, service node device and router The peripheral processor of one, and including being remaining in calculate node, memory node, service node device and router The peripheral processor of one;And
The second peripheral processor subset is included in being calculate node, memory node, service node device and router The peripheral processor of one, and including being remaining in calculate node, memory node, service node device and router The peripheral processor of one.
CN201410138824.5A 2008-09-11 2009-09-11 System, method and equipment for data center Active CN103916326B (en)

Applications Claiming Priority (25)

Application Number Priority Date Filing Date Title
US9620908P 2008-09-11 2008-09-11
US61/096,209 2008-09-11
US9851608P 2008-09-19 2008-09-19
US61/098,516 2008-09-19
US12/242,230 2008-09-30
US12/242,224 US8154996B2 (en) 2008-09-11 2008-09-30 Methods and apparatus for flow control associated with multi-staged queues
US12/242,224 2008-09-30
US12/242,230 US8218442B2 (en) 2008-09-11 2008-09-30 Methods and apparatus for flow-controllable multi-staged queues
US12/343,728 US8325749B2 (en) 2008-12-24 2008-12-24 Methods and apparatus for transmission of groups of cells via a switch fabric
US12/343,728 2008-12-24
US12/345,502 2008-12-29
US12/345,502 US8804711B2 (en) 2008-12-29 2008-12-29 Methods and apparatus related to a modular switch architecture
US12/345,500 US8804710B2 (en) 2008-12-29 2008-12-29 System architecture for a scalable and distributed multi-stage switch fabric
US12/345,500 2008-12-29
US12/495,337 2009-06-30
US12/495,364 US9847953B2 (en) 2008-09-11 2009-06-30 Methods and apparatus related to virtualization of data center resources
US12/495,361 US8755396B2 (en) 2008-09-11 2009-06-30 Methods and apparatus related to flow control within a data center switch fabric
US12/495,337 US8730954B2 (en) 2008-09-11 2009-06-30 Methods and apparatus related to any-to-any connectivity within a data center
US12/495,358 2009-06-30
US12/495,344 US20100061367A1 (en) 2008-09-11 2009-06-30 Methods and apparatus related to lossless operation within a data center
US12/495,344 2009-06-30
US12/495,364 2009-06-30
US12/495,358 US8335213B2 (en) 2008-09-11 2009-06-30 Methods and apparatus related to low latency within a data center
US12/495,361 2009-06-30
CN200910246898.XA CN101917331B (en) 2008-09-11 2009-09-11 Systems, methods, and apparatus for a data centre

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN200910246898.XA Division CN101917331B (en) 2008-09-11 2009-09-11 Systems, methods, and apparatus for a data centre

Publications (2)

Publication Number Publication Date
CN103916326A CN103916326A (en) 2014-07-09
CN103916326B true CN103916326B (en) 2017-10-31

Family

ID=43324725

Family Applications (2)

Application Number Title Priority Date Filing Date
CN200910246898.XA Active CN101917331B (en) 2008-09-11 2009-09-11 Systems, methods, and apparatus for a data centre
CN201410138824.5A Active CN103916326B (en) 2008-09-11 2009-09-11 System, method and equipment for data center

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN200910246898.XA Active CN101917331B (en) 2008-09-11 2009-09-11 Systems, methods, and apparatus for a data centre

Country Status (1)

Country Link
CN (2) CN101917331B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102420775A (en) * 2012-01-10 2012-04-18 西安电子科技大学 Routing method for module-expansion-based data center network topology system
US9094308B2 (en) 2012-06-06 2015-07-28 Juniper Networks, Inc. Finding latency through a physical network in a virtualized network
US8750288B2 (en) * 2012-06-06 2014-06-10 Juniper Networks, Inc. Physical path determination for virtual network packet flows
CN103023803B (en) * 2012-12-12 2015-05-20 华中科技大学 Method and system for optimizing virtual links of fiber channel over Ethernet
CN104871145A (en) * 2012-12-20 2015-08-26 马维尔国际贸易有限公司 Memory sharing in network device
US9419892B2 (en) * 2013-09-30 2016-08-16 Juniper Networks, Inc. Methods and apparatus for implementing connectivity between edge devices via a switch fabric
US9787559B1 (en) 2014-03-28 2017-10-10 Juniper Networks, Inc. End-to-end monitoring of overlay networks providing virtualized network services
CN105099939A (en) * 2014-04-23 2015-11-25 株式会社日立制作所 Method and device for implementing flow control among different data centers
CN105577575B (en) * 2014-10-22 2019-09-17 深圳市中兴微电子技术有限公司 A kind of chainlink control method and device
CN107104871B (en) * 2016-02-22 2021-11-19 中兴通讯股份有限公司 Subnet intercommunication method and device
CN105827544B (en) * 2016-03-14 2019-01-22 烽火通信科技股份有限公司 A kind of jamming control method and device for multistage CLOS system
CN107276908B (en) * 2016-04-07 2021-06-11 深圳市中兴微电子技术有限公司 Routing information processing method and packet switching equipment
US10243840B2 (en) * 2017-03-01 2019-03-26 Juniper Networks, Inc. Network interface card switching for virtual networks
CN113099488B (en) * 2019-12-23 2024-04-09 中国移动通信集团陕西有限公司 Method, device, computing equipment and computer storage medium for solving network congestion
US11323312B1 (en) 2020-11-25 2022-05-03 Juniper Networks, Inc. Software-defined network monitoring and fault localization
CN113595935A (en) * 2021-07-20 2021-11-02 锐捷网络股份有限公司 Data center switch architecture and data center
CN113961628B (en) * 2021-12-20 2022-03-22 广州市腾嘉自动化仪表有限公司 Distributed data analysis control system
CN115225589A (en) * 2022-07-17 2022-10-21 奕德(广州)科技有限公司 CrossPoint switching method based on virtual packet switching

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5457682A (en) * 1993-05-05 1995-10-10 At&T Ipm Corp. Apparatus and method for supporting a line group apparatus remote from a line unit
US5945922A (en) * 1996-09-06 1999-08-31 Lucent Technologies Inc. Widesense nonblocking switching networks
CN1084579C (en) * 1997-03-27 2002-05-08 上海贝尔电话设备制造有限公司 S12 exchanger timing supply method and system thereof
JP2001313660A (en) * 2000-02-21 2001-11-09 Nippon Telegr & Teleph Corp <Ntt> Wavelength multiplexed optical network
US7420969B2 (en) * 2000-11-29 2008-09-02 Rmi Corporation Network switch with a parallel shared memory
US6567576B2 (en) * 2001-02-05 2003-05-20 Jds Uniphase Inc. Optical switch matrix with failure protection
CN101132286B (en) * 2006-08-21 2012-10-03 丛林网络公司 Multi-chassis router with multiplexed optical interconnects

Also Published As

Publication number Publication date
CN101917331B (en) 2014-05-07
CN103916326A (en) 2014-07-09
CN101917331A (en) 2010-12-15

Similar Documents

Publication Publication Date Title
CN103916326B (en) System, method and equipment for data center
US11451491B2 (en) Methods and apparatus related to virtualization of data center resources
US10454849B2 (en) Methods and apparatus related to a flexible data center security architecture
CN105721358B (en) The method and apparatus in multi-hop distributed controll face and single-hop data surface switching fabric system
US8335213B2 (en) Methods and apparatus related to low latency within a data center
US8755396B2 (en) Methods and apparatus related to flow control within a data center switch fabric
Baransel et al. Routing in multihop packet switching networks: Gb/s challenge
CN103534997B (en) For lossless Ethernet based on port and the flow-control mechanism of priority
CN104272653B (en) Congestion control in grouped data networking
CN105323185B (en) Method and apparatus for flow control relevant to switch architecture
CN106899503B (en) A kind of route selection method and network manager of data center network
US20100061394A1 (en) Methods and apparatus related to any-to-any connectivity within a data center
CN103516632B (en) Methods and apparatus for providing services in a distributed switch
US20100061391A1 (en) Methods and apparatus related to a low cost data center architecture
US20100061367A1 (en) Methods and apparatus related to lossless operation within a data center
CN107819695A (en) A kind of distributed AC servo system SiteServer LBS and method based on SDN
CN105187331B (en) The system of dynamic resource management in the distributed control planes of interchanger
EP2557742A1 (en) Systems, methods, and apparatus for a data centre
CN101697524A (en) Relay method and device in switch
US20220150185A1 (en) Methods and apparatus related to a flexible data center security architecture
CN209692803U (en) SDN exchange network based on fat tree construction
Robles-Gomez et al. A complete topology management mechanism for the Advanced Switching Interconnect technology
CN109743266A (en) SDN exchange network based on fat tree construction
Robles-Gomez et al. Evaluation of a Fabric Management Mechanism for Advanced Switching in Presence of Traffic

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant