CN103916326A - System, method and apparatus used for data center - Google Patents

System, method and apparatus used for data center Download PDF

Info

Publication number
CN103916326A
CN103916326A CN201410138824.5A CN201410138824A CN103916326A CN 103916326 A CN103916326 A CN 103916326A CN 201410138824 A CN201410138824 A CN 201410138824A CN 103916326 A CN103916326 A CN 103916326A
Authority
CN
China
Prior art keywords
module
queue
peripheral processor
output port
exchcange core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410138824.5A
Other languages
Chinese (zh)
Other versions
CN103916326B (en
Inventor
P·辛德胡
G·艾贝
J-M·弗爱龙
A·文卡特马尼
Q·沃赫拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Juniper Networks Inc
Peribit Networks Inc
Original Assignee
Peribit Networks Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/242,230 external-priority patent/US8218442B2/en
Priority claimed from US12/343,728 external-priority patent/US8325749B2/en
Priority claimed from US12/345,502 external-priority patent/US8804711B2/en
Priority claimed from US12/345,500 external-priority patent/US8804710B2/en
Priority claimed from US12/495,337 external-priority patent/US8730954B2/en
Priority claimed from US12/495,344 external-priority patent/US20100061367A1/en
Priority claimed from US12/495,364 external-priority patent/US9847953B2/en
Priority claimed from US12/495,361 external-priority patent/US8755396B2/en
Priority claimed from US12/495,358 external-priority patent/US8335213B2/en
Application filed by Peribit Networks Inc filed Critical Peribit Networks Inc
Publication of CN103916326A publication Critical patent/CN103916326A/en
Publication of CN103916326B publication Critical patent/CN103916326B/en
Application granted granted Critical
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

System, method and apparatus used for data center. In an embodiment of the present invention, the apparatus comprises a first edge apparatus which may have a packet processing module. The first edge apparatus is configured to receive a packet. The packet processing module of the first edge apparatus can be configured to generate a plurality of cells based on the packet. A second edge apparatus has a packet processing module configured to regroup the packet based on the plurality of cells. A multi-stage switch structure can be coupled to the first edge apparatus and the second edge apparatus. The multi-stage switch structure can define a single logic entity. The multi-stage switch structure can have a plurality of switching modules. Each switching module in the plurality of switching modules has a shared memory device. The multi-stage switch structure can be configured to switch the plurality of cells, thereby enabling the plurality of cells to be transmitted to the second edge apparatus.

Description

For system, method and the equipment of data center
The application is to be that September 11, application number in 2009 are that 200910246898.X and denomination of invention are the divisional application of the Chinese patent application of " for system, method and the equipment of data center " applying date.
the cross reference of related application
Present patent application requires by name " Systems, Apparatus and Methods for a Data Centre (for system, equipment and the method for data center) " and priority and the interests of the U.S. Patent application No.61/098516 that submits on September 19th, 2008; Priority and the interests of the U.S. Patent application No.61/096209 that simultaneously requires by name " Methods and Apparatus Related to Flow Control within a Data Centre (relating to the method and apparatus of flow control in data center) " and submit on September 11st, 2008; Both here quote as a reference completely.
Present patent application is " Methods and Apparatus for Transmission of Groups of Cell via a Switch Fabric (via the method and apparatus of switching fabric transmit cell group) " by name the part continuation application of U.S. Patent application No.12/343728 submitted on December 24th, 2008, " System Architecture for a Scalable and Distributed Multi-Stage Switch Fabric (for system architecture scalable and distributed multi-stage switching fabric) " by name the part continuation application of U.S. Patent application No.12/345500 submitted on December 29th, 2008, " Methods and Apparatus Related to a Modular Switch Architecture (relating to the method and apparatus of modularization architecture for exchanging) " by name the part continuation application of U.S. Patent application No.12/345502 submitted on December 29th, 2008, " Methods and Apparatus for Flow Control Associated withs Multi-Stage Queue (for the method and apparatus of the flow control relevant with multi-queue) " by name submit on September 30th, 2008, require " Methods and Apparatus Related to Flow Control within a Data Center (relating to the method and apparatus of flow control in data center) " by name, the priority of U.S. Patent application No.61/096209 that on September 11st, 2008 submits to and the part continuation application of the U.S. Patent application No.12/242224 of interests, " Methods and Apparatus for Flow-Controllable Multi-Staged Queues (for the method and apparatus of the multi-queue of controllable flow rate) " by name submit on September 30th, 2008, require " Methods and Apparatus Related to Flow Control within a Data Centre (relating to the method and apparatus of flow control in data center) " by name, the priority of U.S. Patent application No.61/096209 that on September 11st, 2008 submits to and the part continuation application of the U.S. Patent application No.12/242230 of interests.Each above-mentioned application of mentioning is here quoted as a reference completely.
Present patent application is the part continuation application of by name " Methods and Apparatus Related to Any-to-Any Connectivity within a Data Centre (relating to the method and apparatus of any-to-any connectivity in data center) " the U.S. Patent application No.12/495337 that submits on June 30th, 2009 still; " Methods and Apparatus Related to Lossless Operation within a Data Centre (relating to the method and apparatus of lossless operation in data center) " by name the part continuation application of U.S. Patent application No.12/495344 submitted on June 30th, 2009; " Methods and Apparatus Related to Low Latercy within a Data Centre (relating to the method and apparatus of low latency in data center) " by name the part continuation application of U.S. Patent application No.12/495358 submitted on June 30th, 2009; " Methods and Apparatus Related to Flow Control within a Data Centre Switch Fabric (relating to the method and apparatus of flow control in data center's switching fabric) " by name the part continuation application of U.S. Patent application No.12/495361 submitted on June 30th, 2009; " Methods and Apparatus Related to Virtualization of Data Centre Resources (relating to the method and apparatus of data center's resource virtualizing) " by name the part continuation application of U.S. Patent application No.12/495364 submitted on June 30th, 2009.Each above-mentioned application of mentioning is here quoted as a reference completely.
Technical field
Generally, embodiment relates to data center's equipment, and relates more specifically to architecture, equipment and the method for the data center systems for having exchcange core (switch core) and edge device.
Background technology
Known architecture for data center systems relates to too thorny and complicated method, has increased expense and the stand-by period of this system.For example, some known data center network are made up of three or more exchange layers, wherein all carry out Ethernet and/or Internet Protocol (IP) packet transaction at every one deck.Packet transaction and queuing expense unnecessarily repeat at every one deck, have directly increased expense and end-to-end stand-by period.Similarly, such given data central site network is not typically expanded in cost-effectively mode: for given data center systems, increase in number of servers needs extra port conventionally, causes increasing more equipment at the every one deck of data center systems.Bad extensibility has increased the expense of this type of data center systems like this.
Therefore, there is the demand that comprises the data center systems of improved architecture, equipment and method for improvement.
Summary of the invention
In one embodiment, a kind of equipment comprises first edge device can with PHM packet handling module.The first edge device can be configured to receive grouping.The PHM packet handling module of the first edge device can be configured to produce multiple cells based on described grouping.The second edge device can have PHM packet handling module, and this PHM packet handling module is configured to re-assembly described grouping based on described multiple cells.Multilevel interchange frame can be coupled to the first edge device and the second edge device.This multilevel interchange frame can define an independent logic entity.This multilevel interchange frame can have multiple Switching Modules.Each Switching Module in multiple Switching Modules has shared storage device.Thereby multilevel interchange frame can be configured to exchange multiple cells makes multiple cells be sent to the second edge device.
According to the one side of embodiment of the present disclosure, a kind of equipment is provided, comprise: exchcange core, described exchcange core defines single logic entity and has the multilevel interchange frame distributing across multiple frames physically, described multilevel interchange frame has multiple input ports and multiple output port, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port, described exchcange core is configured to have the first peripheral processor of the first frame and be arranged between the second peripheral processor in the second frame in arrangement provide clog-free connectivity with wire rate.
According to an embodiment of the present disclosure, described multiple peripheral processors comprise that at least one peripheral processor with virtual resources and at least one do not have the peripheral processor of virtual resources.
According to an embodiment of the present disclosure, the number of described multiple input port and described multiple output ports is greater than 1000, and each input port in described multiple input ports and each output port of described multiple output ports are all configured to be not less than the speed operation of 10Gb/s.
According to an embodiment of the present disclosure, described the first peripheral processor and described the second peripheral processor are all in memory node device, computing node device, service node device or router.
According to an embodiment of the present disclosure, described multiple peripheral processor comprises the 3rd peripheral processor, described exchcange core is configured to provide clog-free connectivity with wire rate between described the second peripheral processor and described the 3rd peripheral processor, described exchcange core is configured to receive the first grouping being associated with described the first peripheral processor, described exchcange core be configured to based on described first cell being associated that divides into groups, sequentially send the second grouping and send the 3rd grouping to the 3rd peripheral processor to described the second peripheral processor, the input port that described multilevel interchange frame is configured to from described multiple input ports sends described cell to the output port in described output port.
According to an embodiment of the present disclosure, described the first peripheral processor and described the 3rd peripheral processor are all in memory node device, computing node device, service node device or router; And described the second peripheral processor is at least one in firewall device, crossing checkout gear or load balance device.
According to disclosure embodiment on the other hand, a kind of equipment is provided, comprise: exchcange core, described exchcange core has the multilevel interchange frame distributing across multiple frames physically, described multilevel interchange frame has multiple input ports and multiple output port, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port, described exchcange core is configured to taking wire rate each peripheral processor in described multiple peripheral processors and is provided to the connectedness of each all the other processing unit in described multiple peripheral processor, thereby each output port in described multiple output port can be accessed coequally by each peripheral processor in described multiple peripheral processors via an input port in described multiple input ports.
According to an embodiment of the present disclosure, described multiple peripheral processors comprise that at least one peripheral processor that is couple to described mutual core via Ethernet connection is connected the peripheral processor that is couple to described mutual core via non-Ethernet with at least one.
According to an embodiment of the present disclosure, described multiple peripheral processors comprise peripheral processor and at least one the 4th layer of peripheral processor to the 7th bed device that at least one uses the 3rd layer of route.
According to disclosure embodiment on the other hand, a kind of equipment is provided, comprise: exchcange core, described exchcange core defines single logic entity and has multilevel interchange frame, described multilevel interchange frame has multiple levels that distribute across multiple frames physically, described multiple level has multiple input ports and multiple output port jointly, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port, described exchcange core is configured to can substantially be ensured and not when the loss by described multilevel interchange frame when the transmission of described multiple cells, allow multiple cells that are associated with grouping to enter the input port in described multiple input port.
According to an embodiment of the present disclosure, described multiple peripheral processors comprise the first peripheral processor that is configured to communicate by letter with fibre channel protocol and the second peripheral processor that is configured to communicate by letter with the Ethernet protocol of fiber channel covering.
According to an embodiment of the present disclosure, described multilevel interchange frame is configured to deterministic network.
According to an embodiment of the present disclosure, described multilevel interchange frame is configured to deterministic network, thereby when described multiple cells are in the time that the scheduled time can be sent to an output port in described multiple output port, described multilevel interchange frame allows described grouping to enter input port.
According to an embodiment of the present disclosure, described output port is the first output port, described first output port and second output port of described exchcange core being configured to from described input port to described multiple output ports sends cell multiple and that described grouping is associated, and at least one-level place execution packet loss processing that need to be in multiple levels of described multilevel interchange frame.
According to an embodiment of the present disclosure, described exchcange core comprises multiple edge devices that are couple to described multilevel interchange frame via described multiple input ports and described multiple output port, described multiple edge device is couple to described multiple peripheral processor, and each edge device in described multiple edge devices is configured to receive described grouping and defines described multiple cells based on described grouping.
According to an embodiment of the present disclosure, described exchcange core is configured to multiple level via a described multilevel interchange frame output port from described input port to described multiple output ports and sends cell multiple and that described grouping is associated, and at least one-level place execution packet loss processing that need to be in described multiple levels.
According to disclosure embodiment on the other hand, a kind of equipment is provided, comprise: exchcange core, described exchcange core defines single logic entity and has switching fabric, described switching fabric has multiple levels that distribute across multiple frames physically, described multilevel interchange frame has multiple input ports and multiple output port, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port, the input port that described exchcange core is configured to from described multiple input ports receives grouping, described exchcange core is configured to the output port from described input port to described multiple output ports via described multiple level and sends cell multiple and that described grouping is associated, and need to not carry out packet loss processing at least one-level place in multiple levels of described switching fabric.
According to an embodiment of the present disclosure, described multilevel interchange frame is configured to deterministic network, thereby only have when can substantially ensureing the transmission of the multiple cells that are associated with grouping in described switching fabric and when lossless, just allow the grouping from the input port in described multiple input ports.
According to an embodiment of the present disclosure, described output port is the first output port, and described first output port of described exchcange core being configured to from described input port to described multiple output ports and the second output port send and described multiple and cell that described grouping is associated.
According to an embodiment of the present disclosure, described exchcange core comprises multiple edge devices that are couple to described multilevel interchange frame via described multiple input ports and described multiple output port, described multiple edge device is couple to described multiple peripheral processor, and each edge device in described multiple edge devices is configured to receive described grouping and defines described multiple cells based on described grouping.
According to disclosure embodiment on the other hand, a kind of equipment is provided, comprise: exchcange core, described exchcange core defines single logic entity and has the multilevel interchange frame of the deterministic network of being configured to, described multilevel interchange frame has multiple input ports and multiple output port, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port, the input port that described exchcange core is configured to from described multiple input ports receives grouping, the output port of described exchcange core being configured to from described input port to described multiple output ports sends cell multiple and that described grouping is associated.
According to an embodiment of the present disclosure, described multilevel interchange frame distributes across multiple frames physically.
According to an embodiment of the present disclosure, described multilevel interchange frame is configured to deterministic network, thereby only have when can substantially ensureing the transmission of the multiple cells that are associated with grouping in described switching fabric and when lossless, just allow the grouping from the input port in described multiple input ports.
According to an embodiment of the present disclosure, described multilevel interchange frame is configured to deterministic network, thereby when described multiple cells that are associated with grouping are in the time that the scheduled time can be sent to an output port in described multiple output port, described exchcange core is from the grouping of the input port in described multiple input ports.
According to an embodiment of the present disclosure, described exchcange core comprises multiple edge devices that are couple to described multilevel interchange frame via described multiple input ports and described multiple output port, described multiple edge device is couple to described multiple peripheral processor, and each edge device in described multiple edge devices is configured to receive described grouping and defines described multiple cells based on described grouping.
According to an embodiment of the present disclosure, described exchcange core is configured to send cell multiple and that described grouping is associated from described input port to described output port via multiple levels of described multilevel interchange frame, and need to not carry out packet loss processing at least one-level place in described multiple levels.
According to disclosure embodiment on the other hand, a kind of equipment is provided, comprise: exchcange core, described exchcange core has the multilevel interchange frame distributing between multiple frames physically, described multilevel interchange frame has multiple input buffers and multiple output port, and described exchcange core is configured to be couple to multiple edge devices; The controller that does not need software during operation and realize and need software to realize with hardware during configuration and monitoring, described controller is couple to described multiple input buffer and described multiple output port, before described controller is configured to the congested generation in the time of congested being foreseen at an output port place in multiple output ports and in described exchcange core, to an input buffer transmitted traffic control signal in described multiple input buffers.
According to an embodiment of the present disclosure, described controller is configured to be independent of for flow control in the structure of the described multilevel interchange frame of described exchcange core, and described input buffer and described output port are carried out to End-to-end flow control.
According to an embodiment of the present disclosure, described controller is configured to be independent of the flow control for described multiple edge devices, and described input buffer and described output port are carried out to End-to-end flow control.
According to an embodiment of the present disclosure, multiple peripheral processors that are configured to be couple to described multiple edge devices, described controller is configured to be independent of the flow control for described multiple edge devices, and described input buffer and described output port are carried out to End-to-end flow control.
According to an embodiment of the present disclosure, described controller is configured to carry out End-to-end flow control, thereby cell was buffered in described input buffer place's a period of time before being sent to described output port, the described time is associated with described End-to-end flow control.
According to an embodiment of the present disclosure, described controller is configured to be independent of in the cell section of a level place buffer memory of described multilevel interchange frame and is independent of the grouping of an edge device place buffer memory in described multiple edge devices, and the cell at described input buffer place buffer memory is carried out to End-to-end flow control.
According to an embodiment of the present disclosure, described controller is configured to be independent of the flow control mechanism being associated with Ethernet, and the cell at described input buffer place buffer memory is carried out to End-to-end flow control.
According to disclosure embodiment on the other hand, a kind of equipment is provided, comprise: exchcange core, described exchcange core has the multilevel interchange frame distributing between multiple frames physically, and described multilevel interchange frame is configured to receive multiple cells that are associated with grouping and is configured to based on the multiple cell sections of described multiple cell switchings; Multiple edge devices that are couple to described exchcange core, an edge device in described edge device is configured to receive described grouping, and described edge device is configured to, to described multilevel interchange frame, described multiple cells occur; With the controller that is couple to described multilevel interchange frame, described controller is configured to be independent of for the flow control of described multiple edge devices with for flow control in the structure of described multilevel interchange frame, and described multiple cells are carried out to flow control.
According to an embodiment of the present disclosure, described controller does not need software during operation and realizes and during configuration and monitoring, need software to realize with hardware.
According to an embodiment of the present disclosure, described multilevel interchange frame has multiple input buffers and multiple output port, before described controller is configured to the congested generation in the time of congested being foreseen at an output port place in described multiple output ports and in described exchcange core, to an input buffer transmitted traffic control signal in described multiple input buffers.
According to an embodiment of the present disclosure, described multilevel interchange frame has multiple input buffers and multiple output port, described controller is configured to be independent of the flow control mechanism being associated with Ethernet, and the cell of an input buffer place buffer memory in described multiple input buffers is carried out to End-to-end flow control.
According to disclosure embodiment on the other hand, provide a kind of equipment, comprising: exchcange core, described exchcange core has multilevel interchange frame; More than first peripheral processor, be couple to described multilevel interchange frame by multiple connections with agreement, each peripheral processor in described more than first peripheral processor is the memory node with virtual resources, the virtual memory resource that the common definition of described virtual resources of described more than first peripheral processor interconnects by described exchcange core; With more than second peripheral processor, be couple to described multilevel interchange frame by multiple connections with agreement, each peripheral processor in described more than first peripheral processor is the memory node with virtual resources, and the described virtual resources of described more than second peripheral processor defines the virtual computational resource interconnecting by described exchcange core jointly.
According to an embodiment of the present disclosure, each peripheral processor in described more than first peripheral processor has virtual resources, and each peripheral processor in described more than first peripheral processor is configured such that its virtual resources can be substituted by the virtual resource of all the other peripheral processors from described more than first peripheral processor; And each peripheral processor in described more than second peripheral processor has virtual resources, each peripheral processor in described more than second peripheral processor is configured such that its virtual resources can be substituted by the virtual resource of all the other peripheral processors from described more than second peripheral processor.
According to an embodiment of the present disclosure, described more than first peripheral processor is associated with based on packet communication protocol and is associated with security protocol; And described more than second peripheral processor is associated with based on packet communication protocol and is associated with security protocol.
According to disclosure embodiment on the other hand, provide a kind of equipment, comprising: exchcange core, described exchcange core has multilevel interchange frame, and described exchcange core is configured to be divided in logic the first virtual switch core and the second virtual exchcange core that turns; Multiple peripheral processors that are couple to described multilevel interchange frame, described multiple peripheral processors have the first peripheral processor subset that is operationally couple to described the first virtual switch core and the second peripheral processor subset that is operationally couple to described the second virtual switch core.
According to an embodiment of the present disclosure, described exchcange core is configured such that described the first virtual switch core and described the second virtual switch core are independent of each other and manages to being managed property.
According to an embodiment of the present disclosure, described exchcange core is configured such that described the first virtual switch core has the bandwidth of the bandwidth that is independent of described the second virtual switch core.
According to an embodiment of the present disclosure, described exchcange core is configured such that described the first virtual switch core has and is independent of described second bandwidth of virtual switch core and the bandwidth of managerial management and managerial management.
According to an embodiment of the present disclosure, described exchcange core is configured such that described the first virtual switch core is used l2 protocol operation, and described the second virtual switch core is used l2 protocol and layer-3 protocol operation.
According to an embodiment of the present disclosure, described the first peripheral processor subset has virtual resource, and described the second peripheral processor subset has virtual resource.
According to an embodiment of the present disclosure, described the first peripheral processor subset comprises it being the peripheral processor of in computing node, memory node, service node device and router, and comprises it being remaining one the peripheral processor in computing node, memory node, service node device and router; And described the second peripheral processor subset comprises it being the peripheral processor of in computing node, memory node, service node device and router, and comprise it being remaining one the peripheral processor in computing node, memory node, service node device and router.
Brief description of the drawings
Fig. 1 is according to the system block diagram of the data center of an embodiment (DC).
Fig. 2 is the schematic diagram that shows the example of the data center's part that possesses any-to-any connectivity according to an embodiment.
Fig. 3 is the schematic diagram that shows the resource logical groups being associated with data center according to an embodiment.
Fig. 4 A is the schematic diagram that shows to be included in the switching fabric in exchcange core according to an embodiment.
Fig. 4 B is the schematic diagram that shows to be stored in the swap table in the memory module shown in Fig. 4 A according to an embodiment.
Fig. 5 A is the schematic diagram that shows switching fabric system according to an embodiment.
Fig. 5 B is the schematic diagram that shows input/output module according to an embodiment.
Fig. 6 is the schematic diagram that shows a switching fabric system part of Fig. 5 A according to an embodiment.
Fig. 7 is the schematic diagram that shows a switching fabric system part of Fig. 5 A according to an embodiment.
Fig. 8 and 9 has shown respectively front view and the rearview of the shell for hiding switching fabric according to an embodiment.
Figure 10 has shown a part for Fig. 8 housing according to an embodiment.
Figure 11 and 12 is respectively the schematic diagram that shows respectively the switching fabric in the first configuration and the second configuration according to another embodiment.
Figure 13 is the schematic diagram that shows the data flow being associated with switching fabric according to an embodiment.
Figure 14 is the schematic diagram that shows flow control in the switching fabric shown in Figure 13 according to an embodiment.
Figure 15 is the schematic diagram that shows buffer module according to an embodiment.
Figure 16 A is the schematic block diagram that is configured to coordinate via the switching fabric of exchcange core entrance scheduler module and the outlet scheduler module of the transmission of cell group according to an embodiment.
Figure 16 B is the signaling process figure that shows to relate to cell group command transmitting according to an embodiment.
Figure 17 is the schematic block diagram that shows two cell groups of queuing up in the entry queue that is arranged at switching fabric entrance side according to an embodiment.
Figure 18 is the schematic block diagram that shows two cell groups of queuing up in the entry queue that is arranged at switching fabric entrance side according to another embodiment.
Figure 19 shows the flow chart via the method for switching fabric scheduling cell group transmission according to an embodiment.
Figure 20 is the signaling process figure that shows to process the request sequence value relevant with transmission request according to an embodiment.
Figure 21 is the signaling process figure that shows the response sequence value being associated with transmission response according to an embodiment.
Figure 22 is the schematic block diagram that shows the controlled queue of multistage flow according to an embodiment.
Figure 23 is the schematic block diagram that shows the controlled queue of multistage flow according to an embodiment.
Figure 24 is the schematic block diagram that shows to be configured to the destination control module that defines the flow control signal being associated with multiple receiving queues according to an embodiment.
Figure 25 is the schematic diagram that shows flow control grouping according to an embodiment.
Embodiment
Fig. 1 is the schematic diagram that shows data center (DC) 100 (for example, super data center, idealized data center) according to an embodiment.Data center 100 comprises exchcange core (SC) 180, is operably connected to the peripheral processor 170 of 4 types: computing node 110, service node 120, router one 30 and memory node 140.In this embodiment, data center's management (DCM) module 190 is configured to control the operation of (for example management) data center 100.In certain embodiments, data center 100 can be called as data center.In certain embodiments, peripheral processor can comprise such as virtual machine of one or more virtual resources.
Each peripheral processor 170 is configured to communicate by letter via the exchcange core 180 of data center 100.Especially, the exchcange core 180 of data center 100 is configured to provide any-to-any connectivity with the relatively low stand-by period between peripheral processor 170.For example, exchcange core 180 can be configured to send (for example transmitting) data between one or more computing nodes 110 and one or more memory node 140.In certain embodiments, exchcange core 180 can have at least hundreds of or several thousand ports (for example, the port of export and/or arrival end), can send and/or receive data by these port peripheral processors 170.Peripheral processor 170 comprises one or more Network Interface Units (for example network interface unit (NIC), 10G bit (Gb) Ethernet on-line file adapter (CNA) device), by these Network Interface Units, peripheral processor 170 can transmit a signal to exchcange core 180 and/or receive signal from exchcange core 180.Signal can be sent to exchcange core 180 and/or receive from exchcange core 180 via the physical link and/or the wireless link that are operationally couple to peripheral processor 170.In certain embodiments, peripheral processor 170 can be configured to for example, send data to exchcange core 180 and/or receive data from exchcange core 180 based on one or more agreements (Ethernet protocol (fibre-channel-over Ethernet protocol) that Ethernet protocol, multiprotocol label switching (MPLS) agreement, fibre channel protocol, fiber channel cover, the agreement (Infiniband-related protocol) that relates to infinite bandwidth).
In certain embodiments, exchcange core 180 can be (for example can possess function) independent merging exchange (consolidated switch) (for example independent large scale merges L2/L3 exchange (large-scale consolidated L2/L3switch)).In other words, exchcange core 180 can be configured to the heterogeneous networks element set that is for example configured to be connected mutually intercommunication via Ethernet on the contrary, for example, operate as independent logic entity (independent logical network element).Exchcange core 180 can be configured to connect (for example, being convenient to the communication between it) computing node 110, memory node 140, service node 120 and/or router one 30 in data center 100.In certain embodiments, exchcange core 180 can be configured to via interface arrangement communication, and wherein interface arrangement is configured to the rate sending data of 10Gb/s at least.In certain embodiments, exchcange core 180 for example can be configured to, via interface arrangement (fibre channel interface device) communication, and described interface arrangement is configured to for example 2Gb/s, 4Gb/s, 8Gb/s, 10Gb/s, 40Gb/s, 100Gb/s and/or the data of link rate transmission faster.
Although exchcange core 180 can be logical centralization, the enforcement of exchcange core 180 can be highly to distribute, for example, for reliability.For example, several parts of exchcange core 180 can be that physical distribution is intersected, for example, and many frames.In certain embodiments, the processing level section of for example exchcange core 180 can be included in the first frame and another of exchcange core 180 processed level section and can be included in the second frame.Process level section for two and can serve as in logic independent merging switching part.About the more details of exchcange core 180 architectures are described to 13 together in connection with accompanying drawing 4.
As shown in fig. 1, exchcange core 180 comprises marginal portion 185 and switching fabric 187.Marginal portion 185 can comprise edge device (not shown), in return the gateway apparatus work between structure 187 and peripheral processor 170.In certain embodiments, edge device in marginal portion 185 can jointly have several thousand ports (for example 100000 ports, 500000 ports), can be sent out and enter one or more parts of (for example, route) exchcange core 180 and/or send from one or more parts of exchcange core 180 from the data of peripheral processor 170 by these ports.In certain embodiments, edge device can be called as access exchange (access switch), network equipment and/or input/output module (for example,, shown in Fig. 5 A and Fig. 5 B).In certain embodiments, edge device can be included in the frame top (TOR) of for example frame.
Data can (for example be included in the edge device in marginal portion 185) in the marginal portion 185 of the switching fabric 187 of peripheral processor 170, exchcange core 180, exchcange core 180 and/or exchcange core 180, and to locate platform based on different processed.For example, can be the data packet flows based on Ethernet protocol or non-Ethernet protocol definition at one or more peripheral processors 170 and the communication between the edge device of marginal portion 185.In certain embodiments, the edge device that several data processing can be in marginal portion 185 is carried out, instead of in the interior execution of switching fabric 187 of exchcange core 180.For example, packet can be resolved into cell at the edge device place of marginal portion 185, and this cell is sent to switching fabric 187 from edge device.Cell can be resolved for section (segment) and switching fabric 187 is interior and be sent out as fragment (in certain embodiments can also the section of being called as (flits)).In certain embodiments, packet can be resolved as cell at the part place of switching fabric 187.In certain embodiments, Congestion Control Solution and/or data (for example cell) transmitting and scheduling via switching fabric 187 can for example, be implemented or carry out in the edge device of 185 inside, the marginal portion of switching center 180 (access exchange (access switches)).But Congestion Control Solution and/or data transmission scheduling cannot be carried out in the module of definition switching fabric 187.The more details that relate to packet, cell and/or the fragment processing of the component internal of data center will be described below.For example, the more details that relate to cell processing will at least be described to Figure 21 in conjunction with Figure 16 A.
In certain embodiments, the edge device in marginal portion 185 can be configured to classification, the packet for example receiving from peripheral processor 170 at exchcange core 180.Especially, edge device in the marginal portion 185 of exchcange core 180 can be configured to carry out the classification of ethernet type, and it can comprise for example, for example, classification based on for example the 2nd layer of ethernet address (media access control (MAC) address) and/or the 4th layer of ethernet address (universal datagram protocol (UDP) address).In certain embodiments, destination can be based on for example in the classification of the grouping of the marginal portion 185 of exchcange core 180 and determined.For example, the first edge device can packet-based Classification and Identification the second edge device as the destination of this grouping.Grouping can be resolved into cell and be sent to switching fabric 187 from the first edge device.Cell can exchange by switching fabric 187, thereby they can be sent to the second edge device.In certain embodiments, cell can be by switching fabric 187 based on relating to destination and exchanging with the information that cell is associated.
Security strategy about exchcange core 180 can more effectively be applied, because be sorted in the independent logical layer of exchcange core 180, carries out in the marginal portion 185 of exchcange core 180.Especially, many security strategies can be during classifying be applied in the marginal portion 185 of exchcange core 180 with relatively uniform and seamless way.
The more details that relate to the packet classification in data center are described in connection with for example Fig. 5 A, Fig. 5 B and Figure 19.Relate to the additional detail of the packet classification being associated in data center at " Methods and Apparatus Related to Packet Classification Associated with a Multi-Stage Switch (relating to the method and apparatus of the packet classification relevant with multistage exchange) " by name and in u.s. patent application serial number 12/242168 and " Methods and Apparatus for Packet Classification Based on Policy Vectors (method and apparatus of the packet classification based on strategic vector) " by name the description in the u.s. patent application serial number 12/242172 of submitting on September 30th, 2008 of submission on September 30th, 2008, the two is here all quoted as a reference completely.
Thereby exchcange core 180 can be defined the classification of data (for example packet) and in switching fabric 187, not carry out.Therefore, although that switching fabric 187 can have is multistage, multistagely do not need topological redirect, executing data classification in this topology redirect, and switching fabric 187 can define independent topological redirect.As an alternative, for example, can be used to the exchange (exchange of for example cell) of switching fabric 187 inside based on the definite destination information of classification at edge device (edge devices of 185 inside, marginal portion of exchcange core 180).Relate in the more details of switching fabric 187 inner exchanging and being described in connection with for example accompanying drawing 4A and 4B.
In certain embodiments, the processing that relates to classification can for example, be carried out at the sort module (not shown) that is included in edge device (, input/output module).By packet parsing become cell, via the restructuring of cell transmitting and scheduling, grouping and/or the cell of switching fabric 187 and/or etc. can for example, carry out at the processing module (not shown) of edge device (, input/output module).In certain embodiments, sort module can be called as packet classification module, and/or processing module can be called as PHM packet handling module.The more details that relate to the edge device that comprises sort module and processing module are described in connection with Fig. 5 B.
In certain embodiments, one or more parts of data center 100 can be that (maybe can comprise), hardware based module (for example, application-specific integrated circuit (ASIC) (ASIC), digital signal processor (DSP), field programmable gate array (FPGA)) and/or module based on software (for example, computer code module, the processor readable instruction sets that can carry out on processor).In certain embodiments, one or more merits relevant from data center 100 can be included in different modules and/or be incorporated in one or more modules.For example, data center's administration module 190 can be the combination of hardware module and software module, and it is configured to the resource (for example resource of exchcange core 180) in management data center 100
One or more computing nodes 110 can be general object computing engines, and it can comprise for example processor, memory and/or one or more Network Interface Unit (for example network interface unit (NIC)).In certain embodiments, the processor in computing node 110 can be the part in one or more cache coherences territory.
In certain embodiments, computing node 110 can be host apparatus, server and/or etc.In certain embodiments, one or more computing nodes 110 can have virtual resources, thereby any computing node 110 (or its part) can be used to any other computing node 110 (or its part) in alternate data center 100.
One or more memory nodes 140 can be to comprise for example device of processor, memory, locally-attached magnetic disc store and/or one or more Network Interface Units.In certain embodiments, memory node 140 (for example can have special module, hardware module and/or software module), be configured such that one or more computing nodes 110 for example can read from the data of one or more memory nodes 140 or write data to one or more memory nodes 140 via exchcange core 180.In certain embodiments, one or more memory nodes 140 can have virtual resources, thereby any memory node 140 (or its part) can be used to any other memory node 140 (or its part) in alternate data center 100.
One or more service nodes 120 can be that the 4th layer of open system interconnection (OSI) is to the 7th bed device, it can comprise that for example processor (for example, network processing unit), memory and/or one or more Network Interface Unit (for example, 10Gb Ethernet device).In certain embodiments, service node 120 can comprise hardware and/or software, is configured to the network live load of phase counterweight to carry out and calculate.In certain embodiments, service node 120 can be configured to for example, carry out calculating in the mode of relative efficiency (more effective than carrying out) based on each grouping on computing node 110 for example.Calculating can comprise for example calculating of total state fire compartment wall, intrusion detection and prevention (IDP) calculating, extend markup language (XML) speed-up computation, the calculating of transmission control protocol (TCP) terminal, and/or application level load balance calculates.In certain embodiments, one or more service nodes 120 can have virtual resources, thereby any service node 120 (or its part) can be used to any other service node 120 (or its part) of 100 inside, alternate data center.
One or more router ones 30 can be network equipments, are for example configured at least a portion at connection data center 100, to another network (fhe global the Internet).For example, as shown in Figure 1, exchcange core 180 can be configured to communicate by letter with network 137 with network 135 by router one 30.Although not shown, in certain embodiments, for example, the communication between can activation data center 100 inner assemblies (, the part of peripheral processor 170, exchcange core 180) of one or more router ones 30.Communication can be based on for example layer 3 routing protocol definition.In certain embodiments, one or more router ones 30 (for example can have one or more Network Interface Units, 10Gb Ethernet device), can send and/or receive signal to and/or from for example exchcange core 180 and/or other peripheral processors 170 by this Network Interface Unit router one 30.
The more details that relate to virtual resources in data center are at " Methods and Apparatus for Determining a Network Topology During Network Provisioning (during network provisioning for determining the method and apparatus of network topology) " by name the common unsettled U.S. Patent application No.12/346623 that submits on December 30th, 2008, " Methods and Apparatus for Distributed Dynamic Network Provisioning (method and apparatus distributing for dynamic network supply) " by name the common unsettled U.S. Patent application No.12/346632 submitting on December 30th, 2008, and be called " Methods and Apparatus for Distributed Dynamic Network Provisioning (for the method and apparatus of distributed dynamic network provisioning) " and illustrate in the common unsettled U.S. Patent application No.12/346630 of submission on December 30th, 2008, all these applications are here all quoted as a reference.
As mentioned above, exchcange core 180 can be configured to have the function of independent general switch, and it can be connected to any other peripheral processor 170 by any peripheral processor 170 in data center 100.Especially, exchcange core 180 can be configured to for example, provide any-to-any connectivity between peripheral processor 170 (relatively many peripheral processors 170) and exchcange core 180, except those restrictions that apply by the bandwidth of Network Interface Unit and by light velocity signaling delay (being also referred to as the light velocity stand-by period), substantially do not have visible restriction, Network Interface Unit connects peripheral processor 170 to exchcange core 180.In other words, exchcange core 180 can be configured such that each peripheral processor 170 seems to be directly interconnected to the every other peripheral processor in data center 100.In certain embodiments, exchcange core 180 can be configured such that peripheral processor 170 can communicate by letter with line speed (line rate) (or substantially with line speed) via exchcange core 180.Schematically showing of any-to-any connectivity is shown in Figure 2.
In addition, exchcange core 180 can with the mode processing example expected as and exchcange core 180 any peripheral processor 170 of communicating by letter between the migration of virtual resource because exchcange core 180 has the function of independent logic entity.Therefore, the virtual resource migration circle in peripheral processor 170 can be crossed over all ports (for example, all of the port of the edge device 185 of exchcange core 180) that are couple to exchcange core 180 substantially.
In certain embodiments, providing of being associated with virtual resource migration can be partly by network management module processing.Concentrated network management entity or network management module can be for example, with network equipment (, several parts of exchcange core 180) cooperation to collect and supervising the network topology information.For example, because resource is adhered to or is independent of network equipment, network equipment can be coupled to current operation the information pushing of relevant resource (virtual with physics) of network equipment to network management module.For example peripheral processor management tool (for example, server admin instrument) and/or the external management entity of network-management tool can communicate by letter with network management module and send network provisioning instructions with other resources in network equipment and network, and do not need the static state of network to describe.Such system has avoided the network performance difficult and that caused by other types peripheral processor 170 and network management system that static network is described to degenerate.
In one embodiment; server admin instrument or external management entity and network management module communicate by letter to provide the virtual resource relevant with peripheral processor 170 to network equipment, and definite mode of operation or situation (for example move, suspend or move) and the position of virtual resource in network.Virtual resource can be the upper virtual machine of carrying out of peripheral processor 170 (for example, server) that for example, is coupled to switching fabric in the access exchange via in data center (, being included in the access exchange in marginal portion 187).Permitted eurypalynous peripheral processor 170 and can be coupled to switching fabric via access exchange.
Not to rely on the static network that network topological information is found and/or (comprise virtual resource is bundled on network equipment) manages to describe, thus also cooperation discovery or definite network topological information of network management module and access exchange and external management entity communication.After virtual machine on initialization (and/or beginning) main frame (and/or peripheral processor 170 of other types), external management entity can provide to network management module the device identifier of virtual machine.This device identifier can be, the universal unique identifier (" UUID ") of title, global unique identification symbol (" GUID ") and/or virtual resource or the peripheral processor 170 of media access protocol (" MAC ") address, virtual machine or the peripheral processor 170 of the network interface of for example virtual machine or peripheral processor 170.GUID needs not be globally unique about all-network, virtual resource, peripheral processor 170 and/or network equipment, but it is unique in the network of being managed by network management module or Webisode.In addition the port that the access that, external management entity can be provided for being connected to the peripheral processor 170 of managing virtual machines exchanges provides instruction.Access exchange energy detects virtual machine and is initialised, starts and/or move to peripheral processor 170.After virtual machine being detected, access exchange can be inquired the information of peripheral processor 170 about peripheral processor 170 and/or virtual machine, comprises the device identifier of for example peripheral processor 170 or virtual machine.
Access exchange can inquire or ask for example to use for example Link Layer Discovery Protocol (" LLDP "), some are based on other standards or known protocols, or the information of the device identifier of the virtual machine of proprietary protocol, wherein this virtual machine is configured to via above-mentioned protocol communication.As an alternative, virtual machine can, after detecting that it has been connected to access exchange, use for example Ethernet or the IP broadcast multicasting information (comprising the device identifier of virtual machine) about it.
Access exchange then push virtual bench device identifier (being sometimes called as virtual unit identifier) and, in certain embodiments, other information that receive from virtual machine are to network management module.In addition, the device identifier of access exchange energy propelling movement access exchange and the port identifiers of access switching port are to network management module, and the peripheral processor 170 of control virtual machine is connected to described access exchange.This informational function is as the description of virtual machine position in network, and defined virtual machine is bundled into the peripheral processor 170 for network management module and external management entity.In other words, after receiving this information, network management module can be associated the device identifier of virtual machine with the particular port in specific access exchange, this virtual machine (and/or peripheral processor 170 of operation virtual machine) is connected in this specific access exchange.
The device identifier of virtual machine, device identifier, the port identifiers of access exchange and the supply instruction being provided by external management entity can be stored in the accessible memory of network management module.For example, the device identifier of virtual machine, device identifier and the port identifiers of access exchange can be stored in the memory that is configured as database, thereby the database query of the device identifier based on virtual machine returns to device identifier, port identifiers and the supply instruction of access exchange.
Because network management module can be based on virtual machine device identifier the position in network is associated with virtual machine, external management entity not should be noted that the topology of network or virtual machine is for example bundled on peripheral processor 170, so that Internet resources (, network equipment, virtual machine, virtual switch or physical server) to be provided.In other words, the position in network is (for example as the interconnected and virtual machine in network for external management entity, in network on which port of which access exchange, which peripheral processor 170) equally unknowable, and the device identifier of virtual machine that can be based on being controlled by peripheral processor in network 170 provides the access in network to exchange.In certain embodiments, external management entity can also provide physics peripheral processor 170.In addition,, because network management module is dynamically determined also supervising the network topology information, external management entity does not rely on static description of network for supply network.
As used in this specification, supply can comprise device and/or software module setting, configuration and/or the adjustment of polytype or form.For example, supply can comprise that strategy Network Based configures the network equipment of for example network switch.More particularly, for example, network provisioning can comprise: configuration network device is as the 2nd layer or the operation of layer 3 network switch; Change the routing table of network equipment; Upgrade security strategy and/or device address or the device identifier of the equipment that is operationally couple to network equipment; Select network equipment to use which procotol to implement; Setting example is as the network segment identifier of the Virtual Local Area Network for network equipment port (" VLAN ") mark; And/or application access control lists (" ACL ") is to network equipment.This network exchange function is provided or configures, thereby the rule being defined by network strategy and/or access limitation are applied to from the packet of network switch process.In certain embodiments, virtual bench is provided.Virtual bench can be, for example, realize the software module of virtual switch, virtual router or virtual gateway, and it is configured to as medium operation and its host apparatus control by for example peripheral processor 170 between physical network.In certain embodiments, set up virtual port or the connection between virtual resource and virtual bench for comprising.
Fig. 2 is the schematic diagram that shows the example of a part for the data center with any-to-any connectivity according to an embodiment.As shown in Figure 2, peripheral processor PD (from 210 groups of peripheral processors) is connected to each peripheral processor 210 via exchcange core 280.In certain embodiments, for clear, only have from peripheral processor PD and be illustrated to the connection of other peripheral processors 210 (except peripheral processor PD).
In certain embodiments, exchcange core 280 is defined, thereby exchcange core 280 is fair in some sense, substantially reasonably between the peripheral processor 210 of competition, shared in the bandwidth of the object link between peripheral processor PD and other peripheral processors 210.For example, when some shown in Fig. 2 (or all) peripheral processors 210 are attempted in the time accessing peripheral processor PD preset time, can be used for the bandwidth that each peripheral processor 280 accesses peripheral processor PD (for example, i.e. Time Bandwidth) will equate substantially.In certain embodiments, exchcange core 280 can be configured such that (or all) peripheral processor 210 can be with peripheral processor PD for example, with full bandwidth (, the full bandwidth of peripheral processor PD) and/or communicate by letter in choke free mode.In addition, exchcange core 280 can be configured such that can be not for example, by other links (, exist or the attempt) restriction between other peripheral processor and peripheral processor PD to the access of peripheral processor PD by peripheral processor (from peripheral processor 210).
In certain embodiments, the attribute of exchcange core 280, any-to-any connectivity, low latency, fairness and/or etc. can make (to be for example connected to, communicating with) peripheral processor 210 of the given type (for example memory node type, computing node type) of exchcange core 280 can be treated (for example, independent with respect to the position of other processing unit 210 and exchcange core 280) interchangeably.This can be known as interchangeability, and can impel validity and the simplification of the data center that comprises exchcange core 280.For example, even if exchcange core 280 (may have a large amount of ports, exceed 1000 ports), exchcange core 280 still can have the attribute of any-to-any connectivity and/or fairness, thereby each port can be with relatively high speed operation (for example,, to exceed the speed operation of 10Gb/s).This does not need to be included in the special interconnection of for example supercomputer and/or does not need just can realize the complete prophet of all communication patterns.Relate to the more details of the exchcange core architecture with any-to-any connectivity and/or fairness by 4 to 13 descriptions by reference to the accompanying drawings at least in part.
Again with reference to figure 1, in certain embodiments, data center 100 is configured to allow over-booking (oversubscription) flexibly.In certain embodiments, by flexible over-booking, the cost that the relative cost of network infrastructure (for example, relating to the network infrastructure of exchcange core 180) can relatively for example be calculated and store is lowered.For example, resource (for example all resources) in the exchcange core 180 of data center 100 can be served as flexible merging resource operation, thereby the resource of underusing being associated with the first application (or application sets) can dynamically be provided use by the second application (or application sets) during the peak value of for example the second application is processed.Therefore, the resource of data center 100 (or subset of resource) can be configured to be distributed to application-specific (or application sets) and can more effectively process over-booking such as fruit resource is strictly assigned as storage resources.If as storage resources management, over-booking can only be implemented in storage resources, instead of for example crosses over whole data center 107.In certain embodiments, the one or more agreements in data center 100 and/or assembly can for example, based on open standard (Institute of Electrical and Electric Engineers (IEEE) standard, Internet Engineering task groups (IETF) standard, the international information technical standard committee (INCITS) standards).
In certain embodiments, data center 100 can support to allow to implement the safe mode of wide region strategy.For example, data center 100 can support without communication strategy, wherein application rests on the independently virtual data center of data center 100, but can share identical physics peripheral processor (for example computing node 100, memory node 140) and network infrastructure (for example exchcange core 180).In some configurations, data center 100 can support the multiprocessing of same application part and need almost unrestrictedly communication.In some configurations, data center 100 can support for example strategy of deeply grouping inspection, total state fire compartment wall and/or stateless filter.
Data center 100 can have the end-to-end applications wait time (being also referred to as the end-to-end stand-by period) being applied to based on source stand-by period, zero load stand-by period, congested stand-by period and the definition of destination stand-by period.In certain embodiments, the source stand-by period can be for example time of expenditure (time of for example, being paid by software and/or NIC) during source peripheral processor is processed.Similarly, the destination stand-by period can be, for example the time of the expenditure time of software and/or NIC expenditure (for example, by) during destination peripheral processor is processed.In certain embodiments, zero load delay can be that the light velocity postpones to add that the processing of for example data center 180 inside and storage forwarding postpone.In certain embodiments, the congested stand-by period can be, for example, by the congested queueing delay causing in network.Data center 100 can have a low end-to-end stand-by period and can activate the expectation application performance of application, and described application is for for example having real-time constraint and/or having the latency-sensitive of the application of senior inter-process demand.
The zero load stand-by period of exchcange core 180 can be significantly less than the interconnected data center core having based on Ethernet redirect and obviously reduce.In certain embodiments, for example, exchcange core 180 can have the zero load stand-by period (except the light velocity stand-by period) lower than 6 microseconds from exchcange core 180 input ports to exchcange core 180 output ports.In certain embodiments, for example, exchcange core 180 can have the zero load stand-by period (except congested stand-by period and light velocity stand-by period) lower than 12 microseconds.It is for example, for example, due to, less desirable congestion level (between link congested) that data center core based on Ethernet has obviously the high stand-by period.Congested in the data center core based on to net greatly may be because the incapability of the data center's core based on Ethernet (or management devices relevant with data center's core based on Ethernet) increases the weight of, thereby process congested in less desirable mode.In addition, stand-by period in the data center core based on Ethernet can be skimble-scamble, because core homology-destination not between and/or between many storage forwarding switching nodes, can there is the redirect of different numbers, the classification of executing data grouping in this storage forwarding switching node.On the contrary, carry out the marginal portion 185 that is sorted in of exchcange core 180, and do not carry out at switching fabric 187, and exchcange core 180 has the deterministic switching fabric 187 based on cell.For example, can be predictable by the cell processing latency (instead of by cell path of switching fabric 187) of switching fabric 187.
The exchcange core 180 of data center 100 can provide harmless end-to-end grouping transmission, is based, at least in part, on the flow control mechanism of data center's 100 interior execution.For example, for example, on cell basis, use request grant mechanism (being also referred to as request authentication mechanism) to be performed via data (, the data relevant with the packet) transmitting and scheduling of switching fabric 187.Especially, send cell request based on substantially authorize transmissions (can't harm) be authorized to after, cell is sent to switching fabric 187 (for example sending to switching fabric 187 from marginal portion 185).Once be allowed to enter switching fabric 187, cell is processed as fragment in switching fabric 187.Clip stream in switching fabric 187 can further be controlled, and for example like this in the time of congested being detected in switching fabric 187, fragment is not lost.Relating to cell in exchcange core 180 and the more details of fragment processing will be described below.
In addition, can be terminated to the data flow from remaining peripheral processor 170 by switching fabric 187 by switching fabric 187 from the data flow of each peripheral processor 170.Especially, data congestion at one or more peripheral processors 170 does not affect by the data flow of the switching fabric 187 of exchcange core 180 in less desirable mode, because in the marginal portion 185 of exchcange core 180, send request authorized work, cell is only sent to the switching fabric 187 of exchcange core 180.For example, can authorize congested settlement mechanism processed based on request in the high-level data traffic of the first peripheral processor 170, thereby can not adversely affect the second peripheral processor 170 and be linked into the independent logic entity of exchcange core 180 in the high-level data traffic of the first peripheral processor 170.In other words, in the time being allowed to enter the switching fabric 187 of exchcange core 180, the traffic being associated with the first peripheral processor 170 will be isolated (for example, being isolated from congested angle) in the traffic relevant with the second peripheral processor 170.
The data packet flows that can be resolved in the exchcange core 180 of cell and fragment equally, can be controlled by the flow control mechanism based on fine granulation (fine grain) at peripheral processor 170.In certain embodiments, the level section of the flow control of fine granulation based on queue is performed.The flow control type of fine granulation can stop (or substantially stoping) to cause the end of a thread of bad network usage to block (head-of-line blocking).The flow control of fine granulation can also be used to reduce the stand-by period in (or minimizing) exchcange core 180.In certain embodiments, the flow control of fine granulation can activate high-performance piece and send the disk traffic to peripheral processor 170 with from the peripheral processor 170 reception of magnetic disc traffics, and this peripheral processor 170 cannot use Ethernet and internet (IP) network to realize in the mode of expecting.The more details that relate to the flow control of fine granulation 22 are described to 25 by reference to the accompanying drawings.
In certain embodiments, data center 100, and especially, exchcange core 180 can have module system structure.Especially, the exchcange core 180 of data center 100 can located on a small scale initial enforcement and can expand according to needs (for example increasing expansion).Exchcange core 180 can be expanded and substantially not need to interrupt the continued operation of existing network and/or can expand and should physics do not place and be tied at the new equipment of exchcange core 180.
In certain embodiments, one or more parts of exchcange core 180 can be configured to operate based on Virtual Private Network (" VPN ").Especially, thus exchcange core 180 can be divided one or more peripheral processors 170 can be configured to or nonoverlapping virtual division communication overlapping via exchcange core 180.Exchcange core 180 can also be broken down into the virtual resources with separation or overlapping subset.In other words, exchcange core 180 can be the independent exchange that can be divided with flexi mode.In certain embodiments, the method can make the interior one extension networking of merging exchcange core 180 in data center 100.This is contrary with data center, and data center can be the set of independent scalable network, and each of this network has customization and/or specific resource.In certain embodiments, thus the Internet resources of definition exchcange core 180 can merged its can use effectively.
In certain embodiments, data center's administration module 190 can be configured to define virtual multi-level of physics (and/or virtual) resource, this resources definition data center 100.For example, data center's administration module 190 is configured to the multi-level of defining virtual, and it can embody the application width of data center 100.In certain embodiments, (in two ranks) can comprise virtual application bunch (VAC) compared with low level, it can be to distribute to (for example to belong to, by its control) physics (or virtual) resource set of the independent application of one or more entities (for example, management entity, financial rule).(in two ranks) higher level can comprise virtual data center (VDC), and it can comprise and belonging to the VAC collection of (for example, by its control) one or more entities.In certain embodiments, data center 100 comprises multiple VAC, and wherein each can belong to different management entities.
Fig. 3 is the schematic diagram that shows the logical groups 300 of the resource being associated with data center according to an embodiment.As shown in Figure 3, logical groups 300 comprises virtual data center VDC1, virtual data center VDC2, with virtual data center VDC3 (together be called as VDC).Equally, as shown in Figure 3, each VDC comprises virtual application bunch VAC (for example VAC32 in VDC3).The logical groups of the physics of the data center of each VDC aspect data center 100 as shown in Figure 1 or virtual part (for example, the virtual machine of the part of the part of exchcange core, peripheral processor and/or peripheral processor inside).For example, each the VAC aspect in VDC is as the logical groups of the peripheral processor of computing node.For example, VDC1 can embody the logical groups of physical data core, and VAC22 embodies the logical groups of the peripheral processor 370 in VDC1.As shown in Figure 3, each VDC can be managed based on one group of tactful PY (can also be called as business rules) that can be configured to for example be defined in operating parameter allowed band in the application moving in VDC.In certain embodiments, VDC can be called as the ground floor (tier) of logical resource, and VAC is called as the second layer of logical resource.
In certain embodiments, VDC (and VAC) can be established, thereby the resource being associated with data center is managed by for example entity in the mode of expecting, this entity uses (for example, hire out, have, communicate by letter by the it) resource of data center and/or manager of data center's resource.For example, VDC1 can be the virtual data center being associated with financial rule, and VDC2 can be the virtual data center being associated with telecommunications service provider.Therefore, thereby strategy PY1 can define VDC1 (with the physics being associated with VDC1 and/or virtual data center resource) by financial rule and can be managed in the mode that is different from the management VDC2 (with physics and/or the virtual data center resource relevant with VDC2) based on tactful PY2, and this PY2 strategy is defined by telco service provider.In certain embodiments, one or more strategies (for example, a part of strategy PY1) set up by network manager, thus in the time being implemented, between the VDC1 relevant with financial rule and the VDC2 relevant with telco service provider, provide information security and/or fire compartment wall.
In certain embodiments, strategy can be managed (not shown) be associated (or integrated therein) with data center.For example, VDC2 can manage based on tactful PY2 (or subset of tactful PY2).In certain embodiments, data center's management can be configured to, for example, monitor the real-time performance of application in VDC and/or can be configured to automatically distribute or deallocate resource to meet the corresponding strategy for applying in VDC.In certain embodiments, strategy can be configured to operate based on time threshold.For example, one or more strategies can be configured to for example, periodic event (for example, predictable periodic event) work based on for example the parameter value (, traffic level) during the special time of a day or one week certain day changes.
In certain embodiments, strategy can be defined based on high level language.Therefore, strategy can specify in relatively accessible mode.The example of strategy comprises information security policy, the Fault Isolation Strategy, firewall policy, performance guarantee strategy (for example relating to the strategy by the service class of application implementation) and/or other management strategies (for example management isolation strategy) that relates to information protection or obtain.
In certain embodiments, strategy can be implemented in packet classification module, this packet classification module for example can be configured to, grouped data grouping (for example, IP grouping, session control protocol grouping, media packet, the packet defining at peripheral processor place).For example, in the packet classification module of the access exchange that, strategy can be in the marginal portion of exchcange core, implement.Classification can comprise the processing of any execution, thereby packet can be for example, processed in data center's (, exchcange core of data center) based on strategy.In certain embodiments, strategy comprises one or more policy conditions that are associated with the instruction that can be performed.Strategy can be that, if for example packet has the network address (policy condition) of particular type, route data is grouped into the strategy of specific destination (instruction).Packet classification can comprise determining whether policy condition meets, thereby this instruction can be performed.For example, one or more parts of packet (for example, field, payload, address part, port section) can be grouped sort module analysis by the policy condition based on definition in strategy.In the time that policy condition meets, packet can be performed in the instruction based on being associated with policy condition.
In certain embodiments, one or more parts of logical groups 300 for example can be configured to, with " lights-out " from multiple remote locations (" lights out ") pattern operation-carry out control logic group 300 for independent position and one or two master site of each VDC.In certain embodiments, having the routine data center of logical groups as shown in Figure 3 can be configured to not need personnel physically just can operate in data center's side.In certain embodiments, data center has enough redundant resources to adapt to the generation of fault, fault, the fault of data center's administration module and/or the fault of exchcange core assembly of for example one or more peripheral processors (for example peripheral processor in VAC).In the time that in data center, the monitoring software of (for example, in the data center management in data center) indicates this fault to arrive predetermined threshold, personnel's energy is notified and/or send the assembly for replacing this fault.
As shown in Figure 3, VDC can be logical groups independently mutually.In certain embodiments, the resource of data center's (example as shown in Figure 1) (for example, virtual resource, physical resource) can be divided into the logical groups 300 different compared to the logical groups shown in Fig. 3 (for example, the different layers of logical groups).In certain embodiments, two or more VDC of logical groups 300 are overlapping.For example, a VDC energy and the 2nd VDC share the resource (for example, physical resource, virtual resource) of data center.Especially, a part for the exchcange core of a VDC can be shared with the 2nd VDC.In certain embodiments, for example, the resource being included in the VAC of a VDC can be included in the VAC of the 2nd VDC.
In certain embodiments, one or more VDC can for example, by manual definition (for example,, by network manager manual definition) and/or definition (automatically defining based on strategy) automatically.In certain embodiments, VDC can be configured to change (for example dynamically changing).For example, VDC (for example VDC1) can be included in a specific resources collection in the time cycle and can be included in the different time cycle different resource collection (for example separate resource set, overlapping resource set) of (for example separate time cycle, overlapping time cycle).
In certain embodiments, one or more parts of data center can be in response to changing, dynamically being provided before changing or during changing, and this change relates to VDC (for example part of the VDC as the virtual machine of VDC migration).For example, the exchcange core of data center can comprise multiple network equipments, the for example network switch (network switches), each storage comprises the configuration template database that service order is provided, this service order is provided by virtual machine and/or asks.When virtual machine to and/or when being connected on the server of network switch port of exchcange core migration and/or initialization or starting, server can send the identifier that the service being provided by virtual machine is provided to the network switch.Network equipment can be based on this identifier option and installment template from configuration template database, and provides port and/or server based on this configuration template.Like this, the task of supply network port and/or device can (for example distribute in the network switch in exchcange core, distributing, do not need to redefine template with automated manner distributes), and can between peripheral processor, move as virtual machine dynamic change or resource.
In certain embodiments, for device and/or software module setting, configuration and/or the adjustment that should be able to comprise multiple types or form.For example,, for for example comprising, based on example intracardiac network equipment, the network switch in the tactful configuration data of in tactful PY as shown in Figure 3.More particularly, for example, the confession that relates to data center should be able to comprise one or more in following: configuration network device is using as network router or network exchange machine operation; Change the routing table of network equipment; Upgrade security strategy and/or be operationally couple to address or the identifier of network equipment equipment; Select network equipment will implement which procotol; Webisode identifier is set for example for Virtual Local Area Network (" the VLAN ") mark of network equipment port; And/or application access control lists (" ACL ") is to network equipment.A part for data center can be supplied or configure, thereby for example, is employed (for example, processing application by classification) packet to the part by data center by rule and/or the access limitation of strategy (, PY3) definition.
In certain embodiments, the virtual resource being associated with data center can be supplied.Virtual resource for example can be, implement software module, the virtual router of virtual switch (virtual switch), or being configured to the virtual gateway as the operation of medium between physical network and virtual resource, virtual resource is controlled by the main device of for example server.In certain embodiments, virtual resource can be controlled by main device.In certain embodiments, supply can comprise virtual port or the connection of setting up between virtual resource and virtual bench.
The more details that relate to virtual resources in data center are at " Method and Apparatus for Determining a Network Topology During Network Provisioning (during network provisioning for determining the method and apparatus of network topology) " by name the common unsettled U.S. Patent application No.12/346623 that submits on December 30th, 2008, " Methods and Apparatus for Distributed Dynamic Netowrk Provisioning (method and apparatus distributing for dynamic network supply) " by name the common unsettled U.S. Patent application No.12/346632 submitting on December 30th, 2008, " Methods and Apparatus for Distributed Dynamic Network Provisioning (method and apparatus distributing for dynamic network supply) " by name the common unsettled U.S. Patent application No.12/346630 submitting on December 30th, 2008 illustrate, all these applications are here all quoted and are used as reference.
Fig. 4 A is the schematic diagram that shows to be included in the switching fabric 400 in exchcange core according to an embodiment.In certain embodiments, switching fabric 400 can be included in the exchcange core of routine exchcange core 180 as shown in Figure 1.As shown in Figure 4 A, switching fabric 400 is three grades, clog-free Clos (clo this) networks, and comprises the first order 440, the second level 442 and the third level 444.The first order 440 comprises module 412 (its each can be called as Switching Module or cell switching machine).Each module 412 of the first order 440 is the integrated of electronic building brick and circuit.In certain embodiments, for example, each module is application-specific integrated circuit (ASIC) (ASIC).In other embodiments, multiple modules are comprised on an independent ASIC.In certain embodiments, each module is the integrated of discrete electronic components.In certain embodiments, there is multistage switching fabric and can be called as multilevel interchange frame.
In certain embodiments, each module 412 of the first order 440 can be cell switching machine.Cell switching function is configured to effectively be redirected data (for example, fragment), because it flows by switching fabric 400.In certain embodiments, for example, each module 412 of the first order can be configured to be redirected data based on the information being included in swap table.In certain embodiments, for example data redirection of the cell in 400 grades of switching fabrics can be called as exchange (for example, exchanges data) if or data with switching fabric 400 in the form of cell, be called cell switching machine.In certain embodiments, the information (for example, header) that the exchange in the module of switching fabric 400 can be based on for example associated with the data.The exchange of being carried out by the module of switching fabric 400 can be with for example, in the inner ethernet type classification of carrying out of edge device (, the edge device in the marginal portion 185 of the exchcange core 180 shown in Fig. 1) different.In other words, the exchange in the module of switching fabric 400 cannot be based on for example the 2nd layer of ethernet address and/or the 4th layer of ethernet address.The more details that relate to based on swap table exchanges data are described in connection with Fig. 4 B.
In certain embodiments, each cell switching machine also comprises multiple input ports of writing interface that are operationally couple to storage buffer (for example, straight-through buffer (cut-through buffer)).In certain embodiments, storage buffer is included in buffer module.Similarly, output port energy collecting is operationally couple to the fetch interface place of storage buffer.In certain embodiments, storage buffer can be to use static RAM (SRAM) on sheet to enter cell (for example a, part for packet) and provide enough bandwidth to read a shared storage buffer that shifts out cell for each time cycle to all output ports to provide enough bandwidth to write one for each time cycle to all input ports.Each cell switching machine operation is similar to the exchange in length and breadth (crossbar switch) that can be configured after each time cycle.
In certain embodiments, storage buffer (for example, several parts of the storage buffer of associating particular port and/or stream) (for example there is enough sizes, length) for the module of switching fabric 400 (for example, module 412) (for example implement exchange, cell switching machine, exchanges data) and/or data (for example, cell) are synchronously.But storage buffer for example can have, for the not enough size (and/or too short processing latency) of the module in switching fabric 400 (, module 412) implements Congestion Control Solution.The Congestion Control Solution of for example mandate/request mechanism can be implemented at the edge device (not shown) place that is for example associated with exchcange core, but can not be using storage buffer to implement in for the module in the switching fabric 400 of the data queue relevant with Congestion Control Solution.In certain embodiments, module (for example, module 414) in one or more storage buffers for example there is inadequate size (and/or too short processing latency), for being for example binned in the data (, cell) at module place.Relate to the more details of sharing storage buffer in connection with accompanying drawing 15 and by name " Methods and Apparatus Related to a Shared Memory Buffer for Variable-Sized Cells (relating to the method and apparatus of the shared storage buffer that can change sized cells);; and describe in the common unsettled U.S. Patent application No.12/415517 submitting on March 31st, 2009, this patent application is here cited as a reference completely.
In alternate embodiment, each module of the first order can be the crossbar switch with input port and delivery outlet.Multiple exchanges in crossbar switch are inputted bar (bar) by each and are connected to each take-off lever.When the exchange in crossbar switch is during in " unlatching " position, input is operationally couple to output and data can flow.As an alternative, when the exchange in exchanging is in length and breadth positioned at " closing " position, input is not operationally couple to output and data do not flow.Like this, which input bar the exchange in crossbar switch is controlled and is operationally couple to take-off lever.
Each module 412 of the first order 440 comprises that input port 460 collects, and is configured to receive data in the time that data enter switching fabric 400.In this embodiment, each module 412 of the first order 440 comprises the input port 460 of similar number.
Be similar to the first order 440, the second level 442 of switching fabric 400 comprises module 414.In module 414 structures of the second level 442, be similar to the module 412 of the first order 440.Each module 414 of the second level 442 is operationally couple to each module of the first order 440 by data path 420.Each data paths 420 between each module of the first order 440 and each module 414 of the second level 442 is configured to impel data to transmit to the module 414 of the second level 442 from the module 412 of the first order 440.
Data path 420 between the module 412 of the first order 440 and the module 414 of the second level 442 can build by any way and be configured to impel data to transmit in the mode (for example,, in effective mode) of expecting to the module 414 of the second level 442 from the module 412 of the first order 440.In certain embodiments, for example, data path is the optical connector of intermodule.In other embodiments, data path is in midplane.Such midplane can be similar to here to be described in the mode of details more.Such midplane can be effectively for being connected to each module of the second level each module of the first order.In a further embodiment, module is comprised in independent chip bag, and this data path is electron trajectory.
In certain embodiments, switching fabric 400 is clog-free Clos (clo this) networks.Like this, the number of the input port 460 of each module 412 of module 414 numbers of the second level 442 of switching fabric 400 based on the first order 440 and changing.At the clog-free Clos of rearrangable (clo this), network (for example, Benes (David Barnes) network) in, module 414 numbers of the second level 442 are more than or equal to the number of the input port 460 of each module 412 of the first order 440.Like this, if number and m that n is the input port 460 of each module 412 of the first order 440 are the numbers of the module 414 of the second level 442, m >=n.In certain embodiments, for example, each module of the first order has 5 input ports.Like this, the second level has at least 5 modules.All 5 modules of the first order are operationally couple to all 5 modules of the second level by data path.In other words, each module of the first order can send data to arbitrary module of the second level.
The third level 444 of switching fabric 400 comprises module 416.In module 416 structures of the third level 444, be similar to the module 412 of the first order 440.The number of the module 416 of the third level 444 equals the number of the module 412 of the first order 440.Each module 416 of the third level 444 comprises output port 462, and output port is configured to allow data to send from switching fabric 400.Each module 416 of the third level 444 comprises the output port 462 of similar number.In addition, the number of the output port 462 of each module 416 of the third level 444 equals input port 460 numbers of each module 412 of the first order 440.
Each module 416 of the third level 444 is connected to each module 414 of the second level 442 by data path 424.Data path 424 between the module 414 of the second level 442 and the module 416 of the third level 444 is configured to impel data to transmit to the module 416 of the third level 444 from the module 414 of the second level 442.
Data path 424 between the module 414 of the second level 442 and the module 416 of the third level 444 can be fabricated to be configured to effectively impel data to transmit to the module 416 of the third level 444 from the module 414 of the second level 442 in any way.In certain embodiments, for example, data path is the optical connector at intermodule.In other embodiments, data path is in midplane.Such midplane is similar to here to be described in detail.Such midplane can be effectively for being connected to each module of the second level each module of the third level.In another embodiment, module is comprised in independent chip bag and data path is electron trajectory.
Fig. 4 B is the schematic diagram that shows to be stored in the swap table 49 in the memory 498 of module as shown in Figure 4 A according to an embodiment.In routine second level module 414 as shown in Figure 4 A, the module (for example Switching Module) of can be configured to based on the example swap table execution cell switching machine of swap table 49 as shown in Figure 4 B.For example, swap table 49 (or swap table of similar configuration) can for example be used in by the module in (and/or being included) one-level module, determines that can cell be sent to its destination via the module in another grade of module.In certain embodiments, the module that cell can be sent to its destination via this module is called as switching purpose ground.Especially, can in swap table 49, search by the destination information (it can be determined outside switching fabric 400) based on comprising for example cell to switching purpose.
Swap table 49 comprises that binary value (for example, binary value " 1 ", binary value " 0 "), can it represent be arrived by the one or more modules (it can be positioned at adjacent level) that represented to SMM (shown in 48 row) by module value SM1 to one or more destinations of DTk (shown in 47 row) representative by destination value DT1.Especially, when the destination in the row that comprising binary value (for example, destination DT1) can via with the row that intersects of row in module (for example, module SM2) arrival time, swap table 49 comprises binary value " 1 ".When the destination in the row that comprising binary value can not via with the row that intersects of row in module arrival time, swap table 49 comprises binary value " 0 ".For example, the binary value " 1 " in each entry of 46 places is if represent that module (comprising swap table 49) sends data to the module being represented to SM3 by module value SM1, and data finally can be sent to the destination by destination value DT3 representative.In certain embodiments, module can be configured to a module in the random module group of selecting to be represented to SM3 (it is switching purpose ground) by module value SM1, and data can be sent to selected module, thereby data can be sent to by object and are also worth the destination that DT3 represents.
In certain embodiments, destination value 47 can be the destination port value for example, being associated with the edge device (, access switch) of for example exchcange core, the server of communicating by letter with edge device etc.In certain embodiments, destination value (it is corresponding at least one the destination value 47 being included in swap table 49) can packet classification and cell (for example, being included in cell header) based on being for example included in cell be associated.Therefore the destination value, being associated with cell can be used to use swap table 49 to inquire about switching purpose ground by module.Packet classification can for example, be performed at the edge device of exchcange core (, access switch).
In certain embodiments, memory (with such swap table 49) can be included in the modular system of one or more modules.In certain embodiments, swap table 49 can be associated with more than one input port and/or the more than one output port of modular system (or multiple system).The more details that relate to modular system are described in connection with Fig. 7.
Fig. 5 A is the schematic diagram that shows switching fabric system 500 according to an embodiment.Switching fabric system 500 comprises multiple input/output module 502, the first cable collection 540, the second cable collection 542 and switching fabric 575.Switching fabric 575 comprises the first switching fabric part 571 being deployed in shell 570 or frame, and is deployed in the second switching fabric part 573 in shell 572 and frame.
Input/output module 502 (it can be for example edge device) is configured to send data and/or receive data to and/or from the first switching fabric part 571 and/or the second switching fabric part 573.In addition, each input/output module 502 comprises analytical capabilities, classification feature, forwarding capability and/or queuing and scheduling feature.Like this, packet parsing, packet classification, forwarding of packets and packet queue and scheduling all occurred before packet enters the first switching fabric part 571 and/or the second switching fabric part 573.Therefore, these functions need to not carried out in every one-level of switching fabric 575, and each module (describing in further detail here) of switching fabric part 571,573 does not need to comprise the ability of carrying out these functions.This cost, power loss, cooling requirement and/or physical extent that can reduce switching fabric part 571,573 each module needs.This can also reduce the stand-by period being associated with switching fabric.In certain embodiments, for example, the end-to-end stand-by period (sending data needed time from input/output module to another input/output module by switching fabric) can be lower than the end-to-end stand-by period of the switching fabric system of use Ethernet protocol.In certain embodiments, the throughput of switching fabric part 571,573 is only limited to retrain by the Connection Density of switching fabric system 500 instead of power and/or heat.In certain embodiments, input/output module 502 (and/or the function being associated with input/output module 502) can be included in, for example, and in the edge device in the marginal portion of exchcange core as shown in Figure 1.Analytical capabilities, classification feature, forwarding capability can be similar in the u.s. patent application serial number 12/242168 and " Methods and Apparatus for Packet Classification Based on Policy Vectors (method and apparatus of the packet classification based on policy vector) " by name the disclosed function execution in the u.s. patent application serial number 12/242172 of submitting on September 30th, 2008 that are called " Methods and Apparatus Related to Packet Classification Associated with a Multi-Stage Switch (relating to the method and apparatus about the packet classification of multistage exchange) " and submit on September 30th, 2008 with queuing and scheduling feature, the two is here all quoted as a reference completely.
Each input/output module 502 is configured to the first end of the first cable collection 540 cables to be connected to the first end of the second cable collection 542 cables.Each root cable 540 is disposed between input/output module 502 and the first switching fabric part 571.Similarly, each root cable 542 is disposed between input/output module 502 and the second switching fabric part 573.Use the first cable collection 540 and the second cable collection 542, each input/output module 502 can send data and/or receive data to and/or from the first switching fabric part 571 and/or the second switching fabric part 573 respectively.
The first cable collection 540 and the second cable collection 542 can be by any materials compositions that is suitable for transmitting data between input/output module 502 and switching fabric part 571,573.In certain embodiments, for example, each root cable 540,542 is made up of multifiber.In such embodiments, each root cable 540,542 can have 12 transmissions and 12 root receiving fibers.12 of each root cable 540,542 send optical fiber and can comprise that 8 for sending the optical fiber of data, 1 optical fiber for transmitting control signal, and 3 for growth data capacity and/or for the optical fiber of redundancy.Similarly, 12 root receiving fibers of each root cable 540,542 can comprise that 8 for sending the optical fiber of data, 1 optical fiber for transmitting control signal, and 3 for growth data capacity and/or for the optical fiber of redundancy.In other embodiments, the optical fiber of arbitrary number can be comprised in each root cable.
The first switching fabric part 571 and the second switching fabric part 573 1 are used from redundancy and/or larger capacity.In other embodiments, only there is a switching fabric part to be used.Still in other embodiments, exceed 2 switching fabric parts and be used to the redundancy of increase and/or larger capacity.For example, 4 switching fabric parts can operationally be couple to each input/input module by for example 4 cables.The second switching fabric part 573 structurally with in function is similar to the first switching fabric 571.Therefore, only describe the first switching fabric part 571 here in detail.
Fig. 5 B is the schematic diagram that shows input/output module 502 according to an embodiment.As shown in Figure 5 B, input/output module 502 comprises sort module 596, processing module 597, and memory 598.Sort module 596 can be configured to executing data classification, for example ethernet type of grouping classification.
The all kinds of data processing can be carried out in processing module 597.For example, data, for example grouping can be resolved into cell at processing module 597 places.In certain embodiments, Congestion Control Solution can be implemented at processing module 597 places and/or for example, for example, can carry out at processing module 597 places via data (cell) transmitting and scheduling of switching fabric (, the switching fabric 400 shown in Fig. 4 A).Processing module 597 by information (for example can also be configured to, header information, destination information, source information) be connected into for example cell net load, cell net load for example can be used to, by switching fabric (, switching fabric 400 shown in Fig. 4 A) cell-switching (based on swap table as shown in Figure 4 B).
In the time that data processing is carried out at sort module 596 and/or processing module 597 places, one or more parts of data (for example grouping, cell) can be stored in (for example, queuing up) memory 598.For example, in the time that processing module 597 execution relate to the processing of Congestion Control Solution, the data that are resolved into cell can be queued up at memory 598.Therefore, memory 598 can have enough sizes to implement if accompanying drawing 16A is to the Congestion Control Solution as described in accompanying drawing 21.
Fig. 6 shows a part for the switching fabric system 500 that comprises the first switching fabric part 571 of Fig. 5 A in greater detail.The first switching fabric part 571 comprises interface card 510, and it is associated with the first order and the third level of the first switching fabric part 571; Interface card 516, it is associated with the second level of the first switching fabric part 571; And midplane 550.The first switching fabric part 571 comprises 8 interface cards 510 in certain embodiments, and it is associated with the first order and the third level of the first switching fabric, and 8 interface cards 516, and it is associated with the second level of the first switching fabric.In other embodiments, can use the different numbers of the interface card being associated from the first switching fabric first order and the third level and/or the different numbers of the interface card that is associated with the first switching fabric second level.
As shown in Figure 6, each input/output module 502 is operationally couple to interface card 510 via a cable of the first cable collection 540.In certain embodiments, each of for example 8 interface cards 510 is operationally couple to 16 input/output modules 502, as here in greater detail.Like this, the first switching fabric part 571 can be coupled to 128 input/output modules (16 × 8=128).Each of 128 input/output modules 502 can to send from the first switching fabric part 571 data and receive data.
Each interface card 510 is connected to each interface card 516 via midplane 550.Like this, each interface card 510 can to send data from each interface card 516 and receive data, as here in greater detail.Use midplane 550 that interface card 510 is connected to interface card 516 and reduced the number of cable for connecting 571 grades of the first switching fabric parts.
Fig. 7 shows first interface card 510 ', midplane 550 in greater detail, and first interface card 516 '.Interface card 510 ' is associated with the first order and the third level of the first switching fabric part 571, and interface card 516 ' is associated with the second level of the first switching fabric part 571.Each interface card 510 structurally with function on similar with first interface card 510 '.Similarly, each interface card 516 structurally with function on similar with first interface card 516 '.
First interface card 510 ' comprises multiple cable connector ports 560, the first modular system 512, the second modular system 514, and multiple midplane connector port 562.For example, Fig. 7 has shown the first interface card 510 ' with 560 and 8 midplane connector ports 562 of 16 cable connector ports.Each cable connector port 560 of first interface card 510 ' is configured to receive the second end from the cable of the first cable collection 540.Like this, as mentioned above, 8 interface cards 510,16 cable connector ports 560 on each are used to receive 128 cables (16 × 8=128).Although there are 16 cable connector ports 560 shown in Fig. 7, and in other embodiments, the cable connector port of arbitrary number can be used, thereby each root cable of the first cable collection can be received by the cable connector port in the first switching fabric.For example, if 16 interface cards are all used, each interface card can comprise 8 cable connector ports.
Each comprises the module of first switching fabric part 571 first order and the module of first switching fabric part 571 third level the first modular system 512 of first interface card 510 ' and the second modular system 514.In certain embodiments, 8 of 16 cable connector ports 560 cable connector ports are operationally couple to 512 and 16 cable connector ports of the first modular system, 560 8 remaining cable connector ports and are operationally couple to the second modular system 514.The first modular system 512 and the second modular system 514 can operationally be couple to each of 8 midplane connector ports 562 of interface card 510 '.
The first modular system 512 and second modular system 514 of first interface card 510 ' are ASIC.The first modular system 512 and the second modular system 514 are examples of identical ASIC.Like this, owing to can producing the Multi-instance of independent ASIC, manufacturing cost can reduce.In addition, first module of switching fabric part 571 first order and the module of the first switching fabric third level are all included on each ASIC.
In certain embodiments, each the midplane connector port in 8 midplane connector ports 562 has the data capacity that doubles each cable connector port in 16 cable connector ports 560.Like this, each has 8 midplane connector ports 562 16 data and sends and be connected with 16 data receivers, instead of has 8 data transmissions and be connected with 8 data receivers.Like this, the bandwidth of 8 midplane connector ports 562 equals the bandwidth of 16 cable connector ports 560.In other embodiments, each midplane connector port have 32 data send be connected with 32 data receivers.In such embodiments, each cable connector port have 16 data send be connected with 16 data receivers.
8 midplane connector ports 562 of first interface card 510 ' are connected to midplane 550.Midplane 550 is configured to each interface card 510 being associated with first switching fabric part 571 first order and the third level to be connected to each interface card 516 being associated with first switching fabric part 571 second level.Like this, midplane 550 guarantees that each midplane connector port 562 of each interface card 510 is connected to the midplane connector port 580 of distinct interface card 516.In other words, do not have two identical midplane connector ports of interface card 510 to be operationally couple to identical interface card 516.Like this, midplane 550 allows each interface card 510 to sending data and receive data with any one from 8 interface cards 516.
Although Fig. 7 has shown the schematic diagram of first interface card 510 ', midplane 550 and first interface card 516 ', and in certain embodiments, first interface card 510, midplane 550 and first interface card 516 are that physical location is similar to respectively horizontal level interface card 620, midplane 640 and upright position interface card 630, as shown in Fig. 5-7, also here describe in further detail.Like this, the module being associated with the first order and the module (all on interface card 510) being associated with the third level are positioned at one side of midplane, and the module being associated with the second level (on interface card 516) is positioned at the opposite side of midplane 550.Such topology allows each module being associated to the first order to be operationally couple to each module relevant with the second level, and each module relevant to the second level is operationally couple to each module relevant with the third level.
First interface card 516 ' comprises multiple midplane connector ports 580, the first modular system 518, and the second modular system 519.Multiple midplane connector ports 580 be configured via midplane 550 to send from any interface card 510 data and receive data.In certain embodiments, first interface card 516 ' comprises 8 midplane connector ports 580.
The first modular system 518 of first interface card 516 ' and the second modular system 519 are operationally couple to each midplane connector port 580 of first interface card 516 '.Like this, by midplane 550, each modular system 512,514 being associated with first switching fabric part 571 first order and the third level is operationally couple to each modular system 518,519 being associated with first switching fabric part 571 second level.In other words, each modular system 512 relevant with the third level to first switching fabric part 571 first order, 514 can be to sending data and receive data with each modular system 518,519 from being associated with first switching fabric part 571 second level, and vice versa.Especially, the module being associated with the interior first order of modular system 512 or 514 can send data to the module being associated with the interior second level of modular system 518 or 519.Similarly, the module being associated with the interior second level of modular system 518 or 519 can send data to the module being associated with the interior third level of modular system 512 or 514.In other embodiments, the module being associated with the third level can send data and/or control signal to the module being associated with the second level, and the module being associated with the second level can send data and/or control signal to the module being associated with the first order.
There are 8 inputs (in each module of first switching fabric part 571 first order, 510 two modules of each interface card) embodiment in, first switching fabric part 571 second level have at least 8 modules for the first switching fabric part 571 to maintain can rearrange clog-free.Like this, the second level of the first switching fabric part 571 has at least 8 modules and can be rearranged clog-free.In certain embodiments, the number of modules that doubles the second level is used to impel switching fabric system 500 to expand to 5 grades of switching fabrics from 3 grades of switching fabrics, as described in further detail here.In 5 grades of such switching fabrics, the exchange throughput of 2 times of second level in three grades of switching fabrics in switching fabric system 500 is supported in the second level.For example, in certain embodiments, 16 modules of the second level can be used to impel switching fabric system 500 to expand to 5 grades of switching fabrics from three grades of switching fabrics in the future.
The first modular system 518 and second modular system 519 of first interface card 516 ' are ASIC.The first modular system 518 and the second modular system 519 are examples of identical ASIC.In addition, in certain embodiments, the first modular system 518 being associated with first switching fabric part 571 second level and the second modular system 519 are same for the first modular system 512 of first interface card 510 ' of being associated with first switching fabric part 571 first order and the third level and the example of the ASIC of the second modular system 514.Like this, because the Multi-instance of independent ASIC can be used to each modular system of the first switching fabric part 571, make expense and can reduce.
In use, data are sent to the second input/output module 502 via the first switching fabric part 571 from the first input/output module 502.The first input/output module 502 sends data via the cable of the first cable collection 540 to the first switching fabric part 571.Data are through the cable connector port 560 of in interface card 510 ' and be sent to the interior first order module of modular system 512 or 514.
First order module in modular system 512 or 514 sends data by a connector port 562 in the midplane of interface card 510 ', midplane 550 and to one in interface card 516 ', and forwards the data to the second level module in modular system 518 or 519.Data enter interface card 516 ' by the midplane connector port 580 of interface card 516 '.Then data are sent to the second level module in modular system 518 or 519.
Second level module determines that how the second input/output module 502 connects via midplane 550 and redirected data are returned interface card 510 '.Because each modular system 518 or 519 is operationally couple to each modular system 512 and 514 on interface card 510 ', which third level module in the second level module energy determination module system 512 or 514 in modular system 518 or 519 is operationally couple to the second input/output module and is correspondingly sent data.
Data are sent to the third level module in the modular system 512,514 on interface card 510 '.Then third level module sends data by cable connector port 560 to the second input/output module of input/output module 502 via the cable of the first cable collection 540.
In other embodiments, replace first order module to send data to independent second level module, first order module by Data Segmentation for independently part (for example, cell) and to the part of each second level module forwards data, first order module (is for example operationally couple to second level module, in this embodiment, each second level module receives a part for data).Each second level module several parts that then how definite the second input/output module is connected the directional data of laying equal stress on are got back to independent third level module.Third level module then rebuild reception data several parts and to second input/output module send data.
Fig. 8-10 have shown the shell 600 (being frame) that is used for holding switching fabric (for example the first switching fabric part 571 as above) according to an embodiment.Shell 600 comprises the interface card 620 of overcoat 610, midplane 640, horizontal level and the interface card 630 of upright position.Fig. 8 has shown the front view of overcoat 610, wherein can see the interface card 620 that is deployed in 8 horizontal levels in overcoat 610.Fig. 9 has shown the rearview of overcoat 610, wherein can see the interface card 630 that is deployed in 8 upright positions in overcoat 610.
The interface card 620 of each horizontal level is operationally couple to the interface card 630 (referring to Figure 10) of each upright position via midplane 640.Midplane 640 comprises front surface 642, rear surface 644 and is connected front surface 642 and the jack of rear surface 644 (receptacle) array 650, as described below.As shown in figure 10, the interface card 620 of horizontal level comprises the midplane connector port 622 of jack on multiple front surfaces 642 that are connected to midplane 640.Similarly, the interface card 630 of upright position comprises the midplane connector 632 of jack on multiple rear surfaces 644 that are connected to midplane 640.By this way, the plane being defined by the interface card 620 of each horizontal level and the Plane intersects being defined by the interface card 630 of each upright position.
The jack 650 of midplane 640 operationally couples the interface card 620 of each horizontal level to the interface card 630 of each upright position.Jack 650 impels the signal transmission between horizontal level interface card 620 and upright position interface card 630.In certain embodiments, for example, jack 650 can be to be configured to receive be placed on interface card 620, many peg types connector of the many peg types connector (multiple pin-connector) on 630 midplane connector port 622,632, the blank pipe that permission horizontal level interface card 620 is directly connected with upright position interface card 630 and/or any other devices that are configured to operationally couple two interface cards.Use such midplane 640, each horizontal level interface card 620 is operationally couple to each upright position interface card 630, and need to the route on midplane not connect (for example, electron trajectory).
Figure 10 has shown the midplane that comprises whole 64 jacks 650 that are arranged in 8 × 8 arrays.In such embodiments, 8 horizontal level interface cards 620 can operationally be couple to 8 upright position interface cards 630.In other embodiments, the jack of arbitrary number can be included in midplane and/or the horizontal level interface card of arbitrary number can be coupled to by midplane the upright position interface card of arbitrary number.
If the first switching fabric part 571 is arranged in shell 600, for example, each interface card 510 being associated with the first order and the third level of the first switching fabric part 571 can be that horizontal level and each interface card 516 of being associated with first switching fabric part 571 second level can be upright positions.Like this, each interface card 510 being associated with the first order and the third level of the first switching fabric part 571 can easily be connected to each interface card 516 being associated with first switching fabric part 571 second level by midplane 640.In other embodiments, each interface card being associated with the first switching fabric part first order and the third level is upright position and each interface card of being associated with the first switching fabric part second level is horizontal level.In another embodiment, each interface card being associated with the first order and the third level of the first switching fabric can be the arbitrarily angled placement of opposite shell, and each interface card being associated with the second level of the first switching fabric can be to be orthogonal to the interface card that is associated with the first switching fabric part first order and the third level position with respect to the angle of shell.
Figure 11 and 12 shows the schematic diagram of the switching fabric 1100 in the first configuration and the second configuration respectively according to an embodiment.Switching fabric 1100 comprises multiple switching fabric systems 1108.
Each switching fabric system 1108 comprises multiple input/output modules 1102, the first cable collection 1140, the second cable collection 1142, is deployed in the first switching fabric part 1171 in shell 1170 and is deployed in the second switching fabric part 1173 in shell 1172.Each switching fabric system 1108 structurally with function on similar.In addition, input/output module 1102, the first cable collection 1140 and the second cable collection 1142 structurally with in function are similar to respectively input/output module 202, the first cable collection 240 and the second cable collection 242.
In the time that switching fabric 1100 is in the first configuration, the first switching fabric part 1171 of each switching fabric system 1108 and the second switching fabric part 1173 function classes are similar to the first above-mentioned switching fabric part 571 and the second switching fabric part 573.Like this, in the time that switching fabric 1100 is in the first configuration, the first switching fabric part 1171 and the second switching fabric part 1173 are as self-existent three grades of switching fabrics operation.Therefore,, in the time that switching fabric 1100 is in the first configuration, each switching fabric system 1108 is not operationally couple to other switching fabric system 1108 as self-existent switching fabric system acting.
In the second configuration (Figure 12), switching fabric 1100 further comprises the 3rd cable collection 1144 and multiple connection switching fabric 1191, and each is positioned at shell 1190.Shell 1190 can be similar to the shell 600 of describing in detail above.Each switching fabric part 1171,1173 of each switching fabric system 1108 is operationally couple to each via the 3rd cable collection 1144 and connects switching fabric 1191.Like this, in the time that switching fabric 1100 is in the second configuration, each switching fabric system 1108 is operationally couple to other switching fabric systems 1108 via connecting switching fabric 1191.Therefore, the switching fabric 1100 in the second configuration is 5 grades of Clos (clo this) networks.
The 3rd cable collection 1144 can form by being applicable in switching fabric part 1171,1173 and connecting any materials that transmits data between switching fabric 1191.In certain embodiments, for example, each root cable 1144 is made up of multifiber.In such embodiments, each root cable 1144 can have 36 transmissions and 36 reception optical fiber.36 of each root cable 1144 send optical fiber and can comprise that 32 for sending the optical fiber of data, and 4 for growth data capacity and/or for the optical fiber of redundancy.Similarly, 36 root receiving fibers of each root cable 1144 comprise that 32 for sending the optical fiber of data, and 4 for growth data capacity and/or for the optical fiber of redundancy.In other embodiments, in each root cable, can comprise the optical fiber of arbitrary number.Have the cable that increases number optical fiber by use, the number of cable of use can reduce effectively.
As discussed above, flow control can be in the inner execution of the switching fabric of for example data center.Figure 13 and 14 and the description of following are the schematic diagrames showing in the flow control of switching fabric inside.Especially, Figure 13 is the schematic diagram that shows the data traffic being associated with switching fabric 1300 according to an embodiment.Be similar at the switching fabric 400 shown in Fig. 4 A at the switching fabric 1300 shown in Figure 13, and can in the data center of data center 100, implement as shown in Figure 1 in example.In this embodiment, switching fabric 1300 is 3 grades of clog-free Clos (clo this) networks and comprises the first order 1340, the second level 1342, and the third level 1344.The first order 1340 comprises module 1312, and the second level 1342 comprises module 1314, and the third level 1344 comprises module 1316.In certain embodiments, switching fabric 1300 can be that the switching fabric of cell switching machine and each module 1312 of the first order 1340 can be cell switching machines.Each module 1312 of the first order 1340 comprises input port collection 1360, is configured to receive data in the time that data enter switching fabric 1300.Each module 1316 of the third level 1344 comprises output port 1362, is configured to allow data to leave switching fabric 1300.Each module 1316 of the third level 1344 comprises the output port 1362 of similar number.
Each module 1314 of the second level 1342 is operationally couple to each module of the first order 1340 by one-way data path 1320.Each one-way data path 1320 between each module of the first order 1340 and each module 1314 of the second level 1342 is configured to impel data to be sent to the module 1314 of the second level 1342 from the module 1312 of the first order 1340.Because data path 1320 is unidirectional, it does not impel data to be sent to the module 1312 of the first order 1340 from the module 1314 of the second level 1342.Such one-way data path 1320 is less with respect to the cost of similar bi-directional data path, use less data to connect and be easier to and implement.
Each module 1316 of the third level 1344 is operationally couple to each module 1314 of the second level 1342 by one-way data path 1324.Each one-way data path 1324 between the module 1314 of the second level 1342 and the module 1316 of the third level 1344 is configured to impel data to be sent to the module 1316 of the third level 1344 from the module 1314 of the second level 1342.Because data path 1324 is unidirectional, it does not impel data to be sent to the module 1314 of the second level 1344 from the module 1316 of the third level 1344.As mentioned above, such one-way data path 1324 is less with respect to similar bi-directional data path cost, uses less region.
In the one-way data path 1320 between the module 1312 of the first order 1340 and the module 1314 of the second level 1342 and/or the one-way data path between the module 1314 of the second level 1342 and the module 1316 of the third level 1344 can construct by any way, be configured to effectively impel data transmission.In certain embodiments, for example, data path is the optical connector of intermodule.In other embodiments, data path is in midplane connector.Such midplane connector can be the midplane connector being similar to described in Fig. 8 to 10.Such midplane connector can be effectively for being connected to each module of the second level each module of the third level.In other embodiments, module is comprised in independent chip bag and one-way data path is electron trajectory.
Each module 1312 of the first order 1340 is physically to approach with respect to the corresponding module 1316 of the third level 1344.In other words, each module 1312 of the first order 1340 is paired with the module 1316 of the third level 1344.For example, in certain embodiments, each module 1312 of the first order 1340 with the module 1316 of the third level 1344 in identical chip bag.Bidirectional traffics control path 1322 exists between each module 1312 of the first order 1340 and the corresponding module 1316 of the third level 1344.Flow control path 1322 allows the module 1312 of the first order 1340 to the corresponding module 1316 transmitted traffic control designators of the third level 1344, and vice versa.As described in further detail here, this allows the operational blocks which partition system of switching fabric arbitrary number of level to the module transmitted traffic control designator of its transmission data.In certain embodiments, bidirectional traffics control path 1322 is by two independent one-way flow control path constructions.Article two, independent one-way flow control path permissible flow control designator passes through between the module 1312 of the first order 1340 and the module 1316 of the third level 1344.
Figure 14 is the schematic diagram that shows flow control in the switching fabric 1300 shown in Figure 13 according to an embodiment.Especially, schematic diagram shows the detailed view of switching fabric 1300 the first rows 1310 shown in Figure 13.The first row comprises the module 1312 ' of the first order 1340, the module 1314 ' of the second level 1342, the module 1316 ' of the third level 1344.The module 1312 ' of the first order 1340 comprises processor 1330 and memory 1332.Processor 1330 is configured to control and receives and send data.Memory 1332 is configured to buffered data in the time that the module 1312 ' that the module 1314 ' of the second level 1342 can't receive data and/or the first order 1340 can't send data.In certain embodiments, for example, if the module 1314 ' of the second level 1342 has sent termination designator to the module 1312 ' of the first order 1340, the module 1312 ' buffered data of the first order 1340 is until the module 1314 ' of the second level 1342 can receive data.Similarly, in certain embodiments, when module 1312 ' is in the time substantially receiving multiple data-signals (for example, from multiple input ports) simultaneously, the module 1312 ' of the first order 1340 can buffered data.In such embodiments, if only there is an independent data-signal for example, to be exported in the given time (, each clock cycle) by module 1312 ', the data-signal of other receptions can be cushioned.Be similar to the module 1312 ' of the first order 1340, each module in switching fabric 1300 comprises processor and memory.
The module 1312 ' of the first order 1340 and being all included on the first chip bag 1326 with the module 1316 ' of the third level 1344 of its pairing.This allows the flow control path 1322 between the module 1312 ' of the first order 1340 and the module 1316 ' of the third level 1344 easily to build.For example, flow control path 1322 can be the track on the first chip bag 1326 between the module 1312 ' of the first order 1340 and the module 1316 ' of the third level.In other embodiments, but the module of the first order and the module of the third level are wrapped very close to each other at chip independently, and it still allows the flow control path between it not need to use a large amount of distributions and/or long track just can be established.
The module 1314 ' of the second level 1342 is included on the second chip bag 1328.One-way data path 1320 between the module 1312 ' of the first order 1340 and the module 1314 ' of the second level 1342, and one-way data path 1324 between the module 1314 ' of the second level 1342 and the module 1316 ' of the third level 1344 is operationally connected to the second chip bag 1328 by the first chip bag 1326.Although not shown in Figure 14, the module 1312 ' of the first order 1340 and the module 1316 ' of the third level 1344 are also connected to each module of the second level by one-way data path.As mentioned above, one-way data path can be constructed by any way, is configured to effectively impel data in intermodule transmission.
Flow control path 1322 and one-way data path 1320,1324 can be used to effectively in module 1312 ', 1314 ', transmitted traffic control designator between 1316 '.For example, if the module of module 1312 ' the forward second level 1342 of the first order 1,340 1314 ' sends data and the data volume in the buffer of the module 1314 ' of the second level 1342 has exceeded threshold value, the module 1314 ' of the second level 1342 can be via the one-way data path 1324 between the module 1314 ' in the second level 1342 and the module 1316 ' of the third level 1344 module 1316 ' the transmitted traffic control designator to the third level 1344.The module 1316 ' of this flow control designator triggering third level 1344 is module 1312 ' the transmitted traffic control designator to the first order 1340 via flow control path 1322.The module 1312 ' that the flow control designator sending to the module 1312 ' of the first order 1340 from the module 1316 ' of the third level 1344 causes the first order 1340 stops sending data to the module 1314 ' of the second level 1342.Similarly, the flow control designator sending to the module 1312 ' of the first order 1340 from the module 1314 ' of the second level 1342 via the module 1316 ' of the third level 1344, request sends data (, continuing to send data) from the module 1312 ' of the first order 1340 to the module 1314 ' of the second level 1342.
Between it, have the connection that two-stage switching fabric in the identical chips bag in bidirectional traffics control path on chip minimizes independent chip parlor, this independent chip inclusion is long-pending large and/or need large volume.In addition, between it, there is the two-stage in the identical bag in bidirectional traffics control path on chip, when provide between sending module and receiver module flow control communication capacity time, it is unidirectional allowing the data path between chip bag.The more details that relate to the bidirectional traffics control path in switching fabric are described at " Flow Control in a Switch Fabric (flow control in switching fabric) " by name and in the common unsettled Application No. 12/345490 of submitting on December 29th, 2008, and it is here cited as a reference completely.
As described in conjunction with Figure 13 and 14, buffer module can be included in the module in switching fabric level.Relating to the more details that can be included in the buffer module in switching fabric level is for example described in connection with Figure 15.
Figure 15 is the schematic diagram that shows buffer module 1500 according to an embodiment.As shown in figure 15, data-signal S0 is in received (for example,, by the input port 1562 of buffer module 1500) on the input side 1580 of buffer module 1500 at buffer module 1500 to SM.After buffer module 1500 is processed, the buffer module 1500 (for example,, by the output port 1564 of buffer module 1500) the outlet side 1585 of data-signal S0 to SM from buffer module 1500 sends.Data-signal S0 can define channel (can also be called as data channel) to each in SM.Data-signal S0 can be called as data-signal 1560 jointly to SM.Although the input side 1580 of buffer module 1500 and the outlet side 1585 of buffer module 1500 are presented at the different physical side of buffer module 1500, the input side 1580 of buffer module 1500 and the outlet side 1585 of buffer module 1500 are by logical definition and do not get rid of the various physical configuration of buffer module 1500.For example, one or more input ports 1562 of buffer module 1500 and/or one or more output port 1564 can be physically located in any side (and/or same side) of buffer module 1500.
Can be relatively little and substantially constant by data-signal 1560 processing latency of buffer module 1500 thereby buffer module 1500 can be configured to process data signal 1560.Therefore,, because data-signal 1560 is processed by buffer module 1500, the bit rate of data-signal 1560 can be substantially constant.For example, can be substantially constant clock cycle number (for example, single clock cycle, several clock cycle) by the data-signal S2 processing latency of buffer module 1500.Therefore, data-signal S2 can be the time migration by multiple clock cycle, and the bit rate of data-signal S2 that is sent to buffer module 1500 input sides 1580 is by identical the bit rate of the data-signal S2 substantially sending with the outlet side 1585 from buffer module 1500.
Buffer module 1500 can be configured to revise in response to one or more parts of flow control signal 1570 bit rate of one or more data-signals 1560.For example, buffer module 1500 can be configured to postpone in response to a part for flow control signal 1570 the data-signal S2 receiving at buffer module 1500, and flow control signal 1570 designation data signal S2 should be delayed the specific time cycle.Especially, buffer module 1500 (for example can be configured to storage, hold) one or more parts of data-signal S2 are for example, until buffer module 1500 receives the designator (a, part for flow control signal 1570) that designation data signal S2 should no longer be delayed.Therefore, send to the bit rate different (for example, substantially different) of the data-signal S2 that the bit rate of the data-signal S2 of the input side 1580 of buffer module 1500 sends from the outlet side 1585 from buffer module 1500.
In certain embodiments, can the cell fragment based on for example variable-sized carry out at memory bank in the processing of buffer module 1500.For example, in certain embodiments, the fragment of cell can be included in memory bank (for example, static random incoming memory (SRAM) memory bank) in buffer module 1500 processed during allocation process by different.Storage buffer is shared in the common definition of storage physical efficiency.In certain embodiments, the fragment of data-signal can be during allocation process for example, be assigned to memory bank in predefine mode (with according to the predefined pattern of predefine algorithm).For example, in certain embodiments, the guiding fragment of data-signal 1560 can be in several parts of buffer module 1500 (for example, the particular bank of buffer module 1500) to process, this part is different from several parts of the tracking section (trailing segments) in buffer module 1500 interior processing.In certain embodiments, the section of data-signal 1560 can be with specific sequential processes.In certain embodiments, for example, each fragment of data-signal 1560 can be processed the position separately in cell based on it.After cell fragment is processed by shared storage buffer, cell section can be sorted and send from buffer module 1500 during the processing of restructuring.
In certain embodiments, for example, the reading multiplexing module and can be configured to the fragment that is associated with data-signal 1560 of restructuring and send (for example, transmission) data-signal 1560 from buffer module 1500 of buffer module 1500.Restructuring is processed and can be defined based on the predefined methodology for the memory bank allocated segment to buffer module 1500.For example, read frequency multiplexing technique module and can be configured to read from guiding memory bank first the guiding fragment being associated with cell with polling mode (because fragment writes with polling mode), and then read the tracking fragment relevant with cell from following the tracks of memory bank with polling mode.Therefore, considerably less control signal, if any, need to be sent out writing multiplexing module and read between multiplexing module.(for example relate to fragment processing, the restructuring of fragment allocation and/or fragment) more details describe at " Methods and Apparatus Related to Shared Memory Buffer for Variable-Sized Cells (relating to the method and apparatus for the shared storage buffer of variable-sized cells) " by name and in the common unsettled Application No. 12/415517 that on March 31st, 2009 submits to, it is here cited as a reference completely.
Figure 16 A is according to an embodiment, is configured to the entrance scheduler module 1620 of switching fabric 1600 coordinating transmissions cell groups and the schematic block diagram of outlet scheduler module 1630 via exchcange core 1690.Coordination for example can comprise via switching fabric 1600 scheduled transmission cell groups, follow the tracks of and relate to the request of transmit cell group and/or response etc.Entrance scheduler module 1620 can be included in the entrance side of switching fabric 1600 and export scheduler module 1630 outlet side that can be included in switching fabric 1600.Switching fabric 1600 can comprise entrance level 1602, intergrade 1604, and export-grade 1606.In certain embodiments, switching fabric 1600 can based on Clos (clo this), the network architecture (for example, clog-free Clos network, proper clog-free Clos network, Benes (David Barnes) network) be defined, and switching fabric 1600 can comprise datum plane and control plane.In certain embodiments, switching fabric 1600 can be the core of data center's (not shown), and it can comprise network or device interconnecting.
As shown in Figure 16 A, input rank IQ1 can be positioned at the entrance side of switching fabric 1600 to IQK (being jointly called as entry queue 1610).Entry queue 1610 can be associated with the entrance level 1602 of switching fabric 1600.In certain embodiments, entry queue 1610 can be included in line card (line card).In certain embodiments, entry queue 1610 can be positioned at outside switching fabric 1600 and/or outside exchcange core 1690.Each entry queue 1610 can be the queue of first-in first-out (FIFO) type.For example, although for illustrating, but in certain embodiments, each IQ1 of entry queue can for example, with input/output end port (, 10Gb/s port) relevant (, unique relevant) to IQK.In certain embodiments, each IQ1 of entry queue can have enough sizes to implement Congestion Control Solution to IQK, and Congestion Control Solution is authorized in for example request.For example, input rank IQK-1 can have enough sizes to hold cell (or cell group) until request authorizes congested scheme to be performed for cell (or cell group).
As shown in Figure 16 A, output port P1 can be positioned at the outlet side of switching fabric 1600 to PL (being jointly called as output port 1640).Output port 1640 can be relevant to the output stage of switching fabric 1,600 1606.In certain embodiments, output port 1640 can be called as destination port.
In certain embodiments, input rank 1610 can be included in the input line card (not shown) outside one or more input stages 1602 that are arranged in switching fabric 1600.In certain embodiments, output port 1640 can be included in the output line card (not shown) outside one or more output stages 1606 that are arranged in switching fabric 1600.In certain embodiments, one or more input ranks 1610 and/or one or more output port 1640 for example can be included in, in one or more levels (, input stage 1602) of switching fabric 1600.In certain embodiments, output scheduling module 1620 can be included in one or more output line cards and/or input scheduling module 1630 can be included in one or more input linearities.In certain embodiments, each line card relevant with exchcange core 1690 (for example, output line card, input line card) can comprise one or more scheduler modules (for example, output scheduling module, input scheduling module).
In certain embodiments, input rank 1610 and/or output port 1640 can be included in one or more gateway apparatus (not shown) between switching fabric 1600 and/or peripheral processor (not shown).One or more gateway apparatus, switching fabric 1600 and/or peripheral processor can define at least a portion of data center's (not shown) jointly.In certain embodiments, one or more gateway apparatus can be the edge devices in the marginal portion of exchcange core 1690.In certain embodiments, switching fabric 1600 and peripheral processor can be configured to the protocol processes data based on different.For example, peripheral processor can comprise, for example one or more can be configured to based on Ethernet protocol with can be the switching fabric 1600 of the structure based on cell and the main device (for example, being configured to carry out main device, the Web server of one or more virtual resources) of communicating by letter.In other words, one or more gateway apparatus can be to the access that is configured to be provided to via other devices of a protocol communication switching fabric 1600, and this switching fabric can be configured to via another protocol communication.In certain embodiments, one or more gateway apparatus can be called as access exchange or network equipment.In certain embodiments, one or more gateway apparatus can be configured to as router, hub device and/or network bridge device.
In this embodiment, for example, input scheduling module 1630 can be configured to the cell group GC that is defined in the cell group GA of input rank IQ1 queuing and queues up at input rank IQK-1.Cell group GA queues up in the front portion of input rank IQ1, and cell group GB queues up after cell group GA in input rank IQ1.Because input rank IQ1 is fifo type queue, cell group GB can not send until cell group GA sends from input rank IQ1 via switching fabric 1600.Cell group GC queues up in the front portion of input rank IQK-1.
In certain embodiments, the part of input rank 1610 can be mapped to (for example, assigning to) one or more output ports 1640.For example, input rank IQ1 can be mapped to output port P1 to IQK-1, thereby all cells 310 of queuing up to IQK-1 at input port IQ1 all will be dispatched via switching fabric 1600 and are transferred to output port P1 by input scheduling module 1620.Similarly, input rank IQK can be mapped to output port P2.This mapping can be stored in memory (for example, memory 1622) as for example question blank, and for example, in the time of scheduling (, request) transmit cell group, input scheduling module 1620 can be accessed this question blank.
In certain embodiments, one or more input ranks 1610 can be relevant to priority valve (being also called transmission preferences weights).Input scheduling module 1620 can be configured to dispatch from input rank 1610 based on priority valve the transmission of cell.For example, because input rank IQK-1 can be associated with the priority valve higher than input rank IQ1, input scheduling module 1620 can be configured to ask cell group GC to be transferred to output port P1 before request cell group GA is transferred to output port P1.Priority valve can for example, be defined based on service class (, service quality (QoS)).For example, in certain embodiments, dissimilar Internet traffic can be associated from different service class (with different priority).For example, storing communication amount (for example, reading and write traffic), internal processor are communicated by letter, media signaling, session layer signaling etc. each be associated with at least one service class.In certain embodiments, priority valve can be based on for example IEEE802.1qbb agreement, and it has defined the flow control strategy based on priority.
In certain embodiments, one or more input ranks 1610 and/or one or more output port 1640 can be suspended.In certain embodiments, thus one or more input rank 1610 and/or one or more output port 1640 can be suspended cell can not be lost.For example, if output port P1 is temporarily unavailable, can be suspended from the cell of input rank IQ1 and/or input rank IQK-1 transmission, thus can be temporary transient not unavailable and lose because of output port P1 at output port P1 cell.In certain embodiments, one or more input ranks 1610 can be associated with priority valve.For example, if output port P1 is congested, the cell from input rank IQ1 to output port P1 transmission can suspend, instead of cell from from input rank IQK-1 to output port P1 can transmit, because input rank IQK-1 can be associated with the priority valve higher than input rank IQ1.
Input scheduling module 1620 can be configured to (for example, to its transmitted signal and from its receive signal) output scheduling module 1630 exchange signal with coordinate via switching fabric 1600 to output port P1 transmit cell group GA, and coordinate via switching fabric 1600 to output port P1 transmit cell group GC.Because cell group GA will be sent to output port P1, this output port P1 can be called as the destination port of cell group GA.Similarly, output port P1 can be called as the destination port of cell group GB.As shown in Figure 16 A, cell group GA can be sent out via transmission path 4112, and transmission path 4112 is different from the transmission path 4114 that sends cell group GC.
Cell group GA and cell group GB define by defining the cell 4110 based on queuing up at input rank IQ1 by input scheduling module 1620.Especially, cell group GA can be based on being defined from having public purpose ground port and having each cell in the cell group GA of ad-hoc location in input rank IQ1.Similarly, cell group GC can be based on being defined from having public purpose ground port and having each cell in the cell group GC of ad-hoc location in input rank IQK-1.Although not shown, but in certain embodiments, for example cell 4110 from one or more peripheral processors (for example can be included in exchcange core 1690, personal computer, server, router, personal digital assistant (PDA)) via one or more can be wired and/or wireless network (for example, local area network (LAN) (LAN), wide area network (WAN), virtual net) receive content (for example, packet).Relate to definition cell group, the more details of for example cell group GA, cell group GB and/or cell group GC, 17 and 18 discuss by reference to the accompanying drawings.
Figure 16 B is the signaling process figure that shows the signaling that relates to cell group GA transmission according to an embodiment.As shown in Figure 16 B, the time increases on down direction.Be defined (as shown in Figure 16 A) afterwards at cell group GA, input scheduling module 1620 can be configured to send request to dispatch cell group GA to transmit via switching fabric 1600; This request shows as transmission request 22.Transmission request 22 can be defined as to the destination port of cell group GA, and output port P1 sends the request of cell group GA.In certain embodiments, the destination port of cell group GA can also be called as the target (being also called as target destination port) of transmission request 22.In certain embodiments, transmission request 22 for example can comprise, via specific transmission path (transmission path 4112 shown in Figure 16 A) passes through switching fabric 1600, or sends the request of cell group GA at special time.Input scheduling module 1620 can be configured to after input scheduling module 1620 is defined, send transmission request 22 to output scheduling module 1630 in transmission request 22.
In certain embodiments, transmission request 22 can, before being sent to the outlet side of switching fabric 1600, be queued up at the input side of switching fabric 1600.In certain embodiments, transmission request 22 can be queued up and be sent the outlet side of transmission request 22 to switching fabric 1600 until input scheduling module 1620 triggers.In certain embodiments, because the capacity of the transmission request sending for input side from switching fabric 1600 is higher than threshold value, input scheduling module 1620 can be configured to keep (or trigger keep) transmission request 22 for example inputting in transmission request queue (not shown).This threshold value can be based on being defined via the transmission latency of switching fabric 1600.
In certain embodiments, transmission request 22 can be queued up at the output queue (not shown) of the outlet side of switching fabric 1600.In certain embodiments, output queue can be included in and be positioned in or beyond switching fabric 1600, or is arranged in the line card (not shown) outside exchcange core 1690.Although not shown, in certain embodiments, transmission request 22 can for example, queued up with specific input rank (, the input rank IQ1) output queue being associated or the part place of output queue.In certain embodiments, each output port 1640 can be relevant to output queue, output queue be associated with the priority valve of input rank 1610 (for example, corresponding to).For example, output port P1 can with the output queue (or part of output queue) being associated with input rank IQ1 (it has specific priority valve) and and the output queue (or part of output queue) that is associated of input rank IQK (it has specific priority valve) be associated.The transmission request 22 of therefore, queuing up at input rank IQ1 can be queued up in the output queue being associated with input rank IQ1.In other words the output queue that, transmission request 22 can be associated with the priority valve of at least one input rank 1610 at (outlet side of switching fabric 1600) is queued up.Similarly, transmission request 22 can be queued up in a part for input transmission request queue (not shown) or the input transmit queue being associated with the priority valve of at least one input rank 1610.
If output scheduling module 1630 determines that the destination port (being the output port P1 shown in Figure 16 A) of cell group GA can be used for receiving cell group GA, output scheduling module 1630 can be configured to send transmission response 24 to input scheduling module 1620.Transmission response 24 for example can be, for example, for the mandate that will (, send) the cell group GA sending to the destination port of cell group GA from the input rank transmission IQ1 shown in Figure 16 A.The mandate that sends cell group can be called transmission and authorize.In certain embodiments, cell group GA and/or input rank IQ1 can be called as the target of transmission response 24.In certain embodiments, when substantially authorized through the transmission of switching fabric 1600, for example, because destination port when available, can be awarded for the mandate of the cell group GA that will be sent out.
In response to transmission response 24, input scheduling module 1620 can be configured to send cell group GA to the outlet side of switching fabric 1600 via switching fabric 1600 from the input side of switching fabric 1600.In certain embodiments, transmission response 24 for example can comprise, via particular transmission path (transmission path 4112 shown in Figure 16 A) by switching fabric 1600, or send the instruction of cell group GA at special time.In certain embodiments, this instruction can be defined based on for example routing policy.
As shown in Figure 16 B, transmission request 22 comprises cell number value 30, destination identifier (ID) 32, queue identifier (ID) 34, queue sequential value (SV) 36 (it can be called as request label jointly).Cell number value 30 can embody the cell quantity being included in cell group GA.For example, in this embodiment, cell group GA comprises seven (7) individual cells (shown in Figure 16 A).Thereby destination identifier 32 can represent the target of the destination port transmission request 22 of cell group GA and can be determined by output scheduling module 1630.
Cell number value 30 and destination identifier 32 can be output scheduler module 1630 and use to dispatch cell group GA and transmit to output port P1 (shown in Figure 16 A) via switching fabric 1600.As shown in Figure 16 B, in this embodiment, for example, because the cell quantity being included in cell group GA can be at the object location of cell group GA port (, output port P1 shown in Figure 16 A) processed (for example, can be received), output scheduling module 1630 can be configured to define and send transmission response 24.
In certain embodiments, if for example, because the destination port of cell group GA is unavailable (, in down state, at congestion state), the cell quantity being included in cell group GA can not be at the destination port of cell group GA (for example, output port P1 shown in Figure 16 A) processed (for example, can not be received), output scheduling module 1630 can be configured to be not useable for communication to input scheduling module 1620.In certain embodiments, for example, output scheduling module 1630 can be configured to destination port as cell group GA when unavailable refusal send the request (not shown) of cell group GA via switching fabric 1600.The refusal of transmission request 22 can be called as transmission refusal.In certain embodiments, transmission refusal can comprise responsive tags.
In certain embodiments, the available or unavailable energy of for example output port P1 (shown in Figure 16 A) is determined based on satisfied condition by output scheduling module 1630.For example, condition can relate to exceed the storage restriction of the queue (not shown in Figure 16 A) being associated with output port P1, via the data traffic speed of output port P1, be ready to scheduling for from input rank 1610 via cell quantity of switching fabric 1600 (shown in Figure 16 A) transmission etc.In certain embodiments, in the time that output port P1 is disabled, output port P1 is not useable for receiving cell via switching fabric 1600.
As shown in Figure 16 B, queue identifier 34 and queue sequential value 36 are sent to output scheduling module 1630 in transmission request 22.Queue identifier 34 can represent and/or can be used for the input rank IQ1 (shown in Figure 16 A) that mark (for example, identifying separately) cell group GA queues up therein.Queue sequential value 36 can represent the position of cell group GA with respect to other cell groups in input rank IQ1.For example, cell group GA can be associated with queue sequential value X and cell group GB (queuing up at input rank IQ1 place as shown in Figure 16 A) can be associated with queue sequential value Y.Queue sequential value X can will be sent out from input rank IQ1 by indication letter tuple GA before the cell group GB relevant to queue sequential value Y.
In certain embodiments, from the scope of the queue sequential value that is associated with input rank IQ1 (shown in Figure 16 A), select queue sequential value 36.Thereby the scope of queue sequential value can be defined the sequential value coming from queue sequential value scope and not repeat within the specific time period for input rank IQ1.For example, thereby the scope of queue sequential value can be defined the queue sequential value coming from queue sequential value scope and not repeat within least one time period, this time cycle need to be removed several cell cycle (for example, cell 160) that some are queued up at input rank IQ1 by exchcange core 1690 (shown in Figure 16 A).In certain embodiments, queue sequential value can be increased (within the scope of queue sequential value) and be associated with each cell group that the cell 4110 based on queuing up at input rank IQ1 defines by input scheduling module 1620.
In certain embodiments, the queue sequential value scope being associated with input rank IQ1 can be with overlapping with another queue sequential value scope being associated of input rank 1610 (shown in Figure 16 A).Therefore, queue sequential value 36, even if come from the not exclusive scope of queue sequential value, also can be included (for example, being included) queue identifier 34 (it can be unique) with unique identification cell group GA (at least during the specific time period).In certain embodiments, queue sequential value 36 is unique or global unique value (GUID) (for example, universal unique identifier (UUID)) in switching fabric 1600.
In certain embodiments, input scheduling module 1620 can be configured to wait for defining the transmission request (not shown) being associated with cell group GB.For example, input scheduling module 1620 is received before being configured to wait for the transmission request for example, being associated with cell group GB in definition in response to the response (, transmission response 24, transmission refusal) of transmitting request 22 until transmission request 22 is sent out or waits for.
As shown in Figure 16 B, output scheduling module 1630 can be configured to comprise queue identifier 34 and queue sequential value 36 (it can be called as responsive tags jointly) at transmission response 24.When transmission response 24 is in the time that input scheduling module 1620 is received, queue identifier 34 and queue sequential value 36 can be included in transmission response 24, thereby transmission response 24 can be associated with the cell group GA in input scheduling module 1620.Especially, queue identifier 34 and queue sequential value 36 can be used to cell group GA to be designated to authorize transmit via switching fabric 1600 jointly.
In certain embodiments, output scheduling module 1630 can be configured to the transmission response 24 of delayed delivery corresponding to transmission request 22.In certain embodiments, for example, if output scheduling module 1630 can be configured to the delayed response of destination port (being the output port P1 shown in Figure 16 A) unavailable (, temporarily unavailable) of for example cell group GA.In certain embodiments, output scheduling module 1630 can be configured to change into upstate in response to output port P1 from down state and send transmission response 24.
In certain embodiments, output scheduling module 1630 can be configured to because the destination port (being the output port P1 shown in Figure 16 A) of cell group GA receives data from another input rank 1610, and delayed delivery transmission response 24.For example, because output port P1 receives different cell group (not shown) from for example input rank IQK (shown in Figure 16 A), output port P1 is not useable for receiving data from input rank IQ1.In certain embodiments, based on the priority valve being associated with input rank IQ1 and input rank IQK, can there is higher priority valve with the cell group from input rank IQK recently from the cell group of input rank IQ1.Output scheduling module 1630 can be configured to 24 1 time periods of delayed delivery transmission response, and the size of the different cell groups of this time period based on for example receiving at output port P1 is calculated.For example, output scheduling module 1630 can be configured in order to complete in the processing of the different cell groups of output port P1 24 1 of delayed delivery transmission responses section expeced time, and transmission response 24 targets are due to cell group GA.In other words the predetermined time delay that, output scheduling module 1630 can be configured to change to upstate based on output port P1 from down state sends the transmission response 24 of target due to cell group GA.
In certain embodiments, for example, because cell group GA is for example, by least a portion transmission path (transmission path 4112 shown in Figure 16 A) of its transmission unavailable (, congested), output scheduling module 1630 can be configured to delayed delivery transmission response 24.Output scheduling module 1630 can be configured to delayed delivery transmission response 24 until this part transmission path is no longer congested, or based on the no longer congested scheduled time of this part transmission path.
As shown in Figure 16 B, cell group GA can for example, be sent to the destination port of cell group GA based on (, in response to) transmission response 24.In certain embodiments, cell group GA can be sent out based on one or more instructions that are included in transmission response 24.For example, in certain embodiments, cell group GA can the instruction based on being included in transmission response 24 via transmission path 4112 (shown in Figure 16 A), or for example, be sent out based on one or more rules (rule of, transmitting for the cell group via the switching fabric of can recombinating) for the cell group transmission via switching fabric 1600.Although not shown, but in certain embodiments, at cell group GA after output port P1 (shown in Figure 16 A) is received, from the content of cell group (for example, packet) can via one or more can be wired and/or wireless network (for example, LAN, WAN, virtual net) be sent to one or more network entities (for example, personal computer, server, router, PDA).
Again with reference to figure 16A, in certain embodiments, cell group GA is sent out via transmission path 4112 and is being received than the relatively little output queue (not shown) of for example input rank 1610.In certain embodiments, output queue (or part of output queue) can be relevant with priority valve.Priority valve can be associated with one or more input ranks 1610.Output scheduling module 1630 can be configured to extract cell group GA and can be configured to from output queue send cell group GA to output port P1.
In certain embodiments, in the time that cell group GA is sent to the outlet side of switching fabric 1600, cell group GA follows the response identifier being included in cell group GA can extract and send to output port P1 by input scheduling module 1620 together.Response identifier can be defined and be included in transmission response 24 in output scheduling module 1630.In certain embodiments, if cell group GA queues up at the output queue (not shown) being associated with the destination port of cell group GA, response identifier can be used for extracting cell group GA from the destination port of cell group GA, thereby cell group GA can be sent out via the destination port of cell group GA from switching fabric 1600.Response identifier can be associated with the position in output queue, and the queuing that this output queue has been cell group GA by output scheduling module 1630 retains.
In certain embodiments, for example, in the time that the transmission request being associated with cell group (the transmission request 22 shown in Figure 16 B) is defined, the cell group of queuing up in input rank 1610 can be moved to memory 1622.For example, the cell group GD queuing up at input rank IQK can be moved in response to the transmission request being associated with cell group GD is defined memory 1622.In certain embodiments, cell group GD can be moved to memory 1622 from input scheduling module 1620 in the transmission request being associated with cell group GD before output scheduling module 1630 sends.Cell group GD can be stored in memory 1622, until cell group GD sends to the outlet side of switching fabric 1600 from the input side of switching fabric 1600.In certain embodiments, cell group can be moved to memory 1622, thereby reduces congested (for example, the end of a thread (HOL) blocks) at input rank IQK place.
In certain embodiments, queue identifier and/or queue sequential value that input scheduling module 1620 can be configured to based on being associated with cell group extract the cell group being stored in memory 1622.In certain embodiments, the cell group position of cell in memory 1622 can be determined based on question blank and/or index value.Cell group can be extracted before the outlet side of switching fabric 1600 sends by the input side from switching fabric 1600 in cell group.For example, cell group GD can be relevant with queue identifier and/or queue sequential value.The position that cell group GD is stored in memory 1622 can be associated with queue identifier and/or queue sequential value.The transmission request that is defined and sent to output scheduling module 1630 by input scheduling module 1620 can comprise queue identifier and/or queue sequential value.The transmission response receiving from output scheduling module 1630 can comprise queue identifier and/or queue sequential value.In response to transmission response, input scheduling module 1620 can be configured to from memory 1622, extract cell group GD in the position based on queue identifier and/or queue sequential value, and input scheduling module 1620 can trigger the transmission of cell group GD.
In certain embodiments, some cell number that are included in cell group can the amount of available space based in memory 1622 be defined.For example, input scheduling module 1620 can be configured to determine based on the amount of available storage space being included in memory 1622 the cell quantity being included in cell group GD in the time that cell group GD is defined.In certain embodiments, if the amount of available storage space being included in memory 1622 increases, the cell quantity being included in cell group GD can increase.In certain embodiments, be moved to memory 1622 for before or after storing at cell group GD, the cell quantity being included in cell group GD can be increased by input scheduling module 1620.
In certain embodiments, being included in the quantity of some cells in cell group can be based on being defined through the stand-by period of for example transmission of switching fabric 1600.Especially, in view of the stand-by period being associated with switching fabric 1600, the size that input scheduling module 1620 can be configured to define cell group is to impel flow through switching fabric 1600.For example, because cell group has reached the threshold size of the stand-by period definition based on switching fabric 1600, input scheduling module 1620 can be configured to close cell group (for example, the size of definition cell group).In certain embodiments, input scheduling module 1620 can be configured to send immediately the packet in cell group, instead of waits for that other packet defines larger cell group, because short through the stand-by period of switching fabric 1600.
In certain embodiments, input scheduling module 1620 can be configured to limit the quantity of the transmission request sending to the outlet side of switching fabric 1600 from the input side of switching fabric 1600.In certain embodiments, this restriction can be defined by the strategy based on being stored in input scheduling module 1620.In certain embodiments, this restriction can be defined by the priority valve based on being associated with one or more input ranks 1610.For example, input scheduling module 1620 can be configured to allow (based on threshold restriction) transmission request of being associated with input rank IQ1 recently many from the transmission request of input rank IQK, because input rank IQ1 has the priority valve higher than input rank IQK.
In certain embodiments, one or more parts of input scheduling module 1620 and/or output scheduling module 1630 can be hardware based module (for example, DSP, FPGA) and/or module based on software (for example, computer code module, the processor readable instruction sets that can carry out on processor).In certain embodiments, the one or more functions that are associated from input scheduling module 1620 and/or output scheduling module 1630 can be included in different modules and/or be combined into one or more modules.For example, cell group GA can be defined and transmit request 22 (shown in Figure 16 B) and can be defined by the second submodule in input scheduling module 1620 by the first submodule in input scheduling module 1620.
In certain embodiments, switching fabric 1600 has than in the more or less level shown in Figure 16 A.In certain embodiments, switching fabric 1600 can be switching fabric and/or the time division multiplexing switching fabric of reconfigurable (for example, can recombinate).In certain embodiments, switching fabric 1600 can for example, based on Clos (clo this) network architecture (, proper clog-free Clos (clo this) network, Benes (David Barnes) network) be defined.
Figure 17 is the schematic block diagram that shows two cell groups of queuing up at input rank 1720 places that are positioned at switching fabric 1700 input sides according to an embodiment.Cell group is defined on the input side of switching fabric 1700 by input scheduling module 1740, and switching fabric 1700 can be for example associated with exchcange core and/or be for example included in the exchcange core shown in Figure 16 A.Input rank 1720 is also on the input side of switching fabric 1700.In certain embodiments, input rank 1720 can be included in the input line card (not shown) being associated with switching fabric 1700.For example, although not shown, but in certain embodiments, one or more cell groups can comprise multiple cells (, 25 cells, 10 cells, 100 cells) or a cell only.
As shown in figure 17, input rank 1720 comprises cell 1 to T (being cell 1 to cell T), and it can be called as queuing cell 1710 jointly.Input rank 1720 is fifo type queues, and cell 1 is positioned at the front end 1724 (or transmission ends) of queue and cell T and is positioned at the rear end 1722 (or arrival end) of queue.As shown in figure 17, comprise the first cell group 1712 and the second cell group 1716 at the queuing cell 1710 at input rank 1720 places.In certain embodiments, each cell that comes from queuing cell 1710 has equal length (for example, 32 bit lengths, 64 bit lengths).In certain embodiments, two or more in queuing cell 1710 can have different length.
Each cell that comes from queuing cell 1710 has for one-output port E, output port F, output port G or the transferring queued content of output port H to for example, by coming from four output ports 1770 of output port label (, letter " E ", letter " the F ") instruction on each cell of queuing cell 1710.The output port 1770 that cell is sent to can be called as destination port.Each can be sent to its corresponding destination port via switching fabric 1700 queuing cell 1710.In certain embodiments, input scheduling module 1740 can be configured to determine for the destination port of each cell that comes from queuing cell 1710 based on the question blank as routing table (LUT) for example.In certain embodiments, the destination port that comes from each cell of queuing cell 1710 can be determined for example, destination based on being included in the content (, data) in cell.In certain embodiments, one or more output ports 1770 can be associated with output queue, and in output queue, cell can be queued up until be sent out via output port 1770.
The first cell group 1712 and the second cell group 1716 can be defined by the destination port based on queuing cell 1710 by input scheduling module 1740.As shown in figure 17, each cell being included in the first cell group 1712 has the identical destination port (, output port E) by output port label " E " instruction.Similarly, each cell being included in the second cell group 1716 has the identical destination port (, output port F) by output port label " F " instruction.
Cell group (for example, the first cell group 1712) can be defined based on destination port, because cell group is sent out as group via switching fabric 1700.For example, if cell 1 is included in the first cell group 1712, the first cell group 1712 can not be sent to independent destination port, because cell 1 has from cell 2 to the different destination port (output port " F ") of cell 7 (output port " E ").Like this, the first cell group 1712 is not transmitted as group via switching fabric 1700.
Cell group is defined as continuous block of cells because cell group is sent out as group via switching fabric 1700 and because input rank 1720 are queues of fifo type.For example, cell 12, and cell 2 can not be defined as cell group to cell 7, because cell 12 can not be sent out with cell 2 together with the block of cells of cell 7.Cell 8 is cells between to cell 11, its cell 2 to cell 7 after input rank 1720 is sent out, but before cell 12 is sent out from input rank 1720, must be sent out from input rank 1720.In certain embodiments, if input rank 1720 is not fifo type queue, one or more queuing cells 1710 may not send in order and group may be crossed over cell between.
Although not shown, but each cell that comes from queuing cell 1710 can have the sequential value that can be called as sequence of cells value.Sequence of cells value energy representation case is if cell 2 is with respect to the order of cell 3.Sequence of cells value can be used at for example one or more output ports 1770 permutatation cell before the content being associated with cell is sent out from output port 1770.For example, in certain embodiments, cell group 1712 is can be at the output queue (not shown) being associated with output port E received and based on the permutatation of sequence of cells value.In certain embodiments, output queue can be compared input rank 1720 relatively little (for example, shallow (shallow) output queue).
In addition the data (for example, packet) that are included in cell, can also have the sequential value that is called as data sequence value.For example, data sequence value energy representation case is if the first packet is with respect to the relative order of the second packet.Data sequence value can be used to for example one or more output ports 1770 be in packet be sent out from output port 1770 before permutatation packet.
Figure 18 is the schematic block diagram that has bright two cell groups of queuing up at input rank 1820 places that are positioned at switching fabric 1800 input sides according to another embodiment.Cell group is defined on switching fabric 1800 input sides by input scheduling module 1840, and switching fabric 1800 can be for example associated with exchcange core and/or be included in exchcange core as shown in Figure 16 A.Input rank 1820 is also on the input side of switching fabric 1800.In certain embodiments, input rank 1820 can be included in the input line card (not shown) being associated with switching fabric 1800.Although not shown, but in certain embodiments, one or more cell groups can comprise an only cell.
As shown in figure 18, input rank 1820 comprises cell 1 to Z (being cell 1 to cell Z), and it is called as queuing cell 1810 jointly.Input rank 1820 is fifo type queues, wherein cell 1 at the front end 1824 (or transmission ends) of queue and cell Z in the rear end 1822 of queue (or arrival end).As shown in figure 18, comprise the first cell group 1812 and the second cell group 1816 at the queuing cell 1810 at input rank 1820 places.In certain embodiments, there is equal length (for example, 32 bit lengths, 64 bit lengths) from each cell of queuing cell 1810.In certain embodiments, two or more queuing cells 1810 have different length.In this embodiment, thus input rank 1820 is mapped to all cells of output port F2 1810 is dispatched for being transferred to output port F2 via switching fabric 1800 by input scheduling module 1840.
Each cell that comes from queuing cell 1810 has the content for example, being associated with one or more packets (, Ethernet data grouping).This packet is represented by letter " Q " to " Y ".For example, as shown in figure 18, packet R is divided into three different cells, cell 2, cell 3 and cell 4.
Cell group (for example, the first cell group 1812) is defined, thereby partial data grouping is not associated with different cell groups.In other words, cell group is defined, thereby all packets are all associated with independent cell group.The border of the packet of the border of cell group based on queuing up at input rank 1820 places is defined, thereby packet is not included in different cell groups.Fragment data packets is that different cell groups may cause less desirable result, for example, in the buffering of switching fabric 1800 outlet sides.For example, for example, for example, if the Part I of packet T (cell 6) is included in the first cell group 1812 and the Part II of packet T (cell 7) is included in the second cell group 1816, the Part I of packet T must cushion at least a portion place in one or more output queue (not shown) of switching fabric 1800 outlet sides, until the Part II of packet T is sent to switching fabric 1800 outlet sides, thereby all data packets T is sent out via output port E2 from switching fabric 1800.
In certain embodiments, the packet being included in queuing cell 1810 also can have sequential value, and it is called as data sequence value.Data sequence value energy representation case is if packet R is with respect to the relative order of packet S.Data sequence value can be used to before packet is sent out from output port 1870, in for example one or more output port 1870 places recombination data groupings.
Figure 19 shows the method flow diagram via the transmission of switching fabric scheduling cell group according to an embodiment.As shown in figure 19,1900, cell is queued up received via switching fabric for the designator transmitting in input rank place.In certain embodiments, switching fabric can be based on Clos (clo this) architecture, and can have multistage.In certain embodiments, switching fabric can be associated with exchcange core (for example,, within it).In certain embodiments, when new cell is in the time that input rank is received, or be ready to (or horse back is ready to) while being sent out via switching fabric when cell, designator can be received.
1910, the cell group with common purpose ground is defined according to the cell of queuing up in input rank place.The destination that comes from each cell of cell group is determined based on question blank.In certain embodiments, destination based on strategy and/or determined based on packet classification algorithm.In certain embodiments, can be the common purpose ground port being associated with switching fabric importation common purpose.
1920, request label is relevant to cell group.Request label for example can comprise, one or more cell number values, destination identifier, queue identifier, queue sequential value etc.Be sent to the input side of switching fabric in cell group before, request label can be associated with cell group.
1930, comprise and ask the transmission request of label to be sent to output scheduling module.In certain embodiments, the request that transmission request is included in special time or is sent out via particular transmission path.In certain embodiments, transmission request can be sent out after cell group has been stored in the memory being associated with switching fabric input stage.In certain embodiments, cell group can be moved to memory to reduce in the congested possibility of input rank place.In other words, can be prepared for the transmission (or transmission) from input rank thereby cell group can be moved to other cells that memory queues up after cell group, and not need to wait for that cell group sends from input rank.In certain embodiments, transmission request can be the request that sends to specific output port (for example, specific destination port).
1950, when in response to transmission request, when authorized, comprise that the transmission refusal of responsive tags is sent to input scheduling module 1940 via the transmission of switching fabric.In certain embodiments, transmission request can be rejected, because switching fabric is congested, destination port is unavailable etc.In certain embodiments, transmission request can be rejected a specific time period.In certain embodiments, responsive tags can comprise one or more can being used to the transmission refusal identifier associated with cell group.
If authorized in 1940 transmission via switching fabric, 1960, the transmission response that is included in the responsive tags of input scheduling module is sent out.In certain embodiments, transmission response can be that transmission is authorized.In certain embodiments, transmission response can be ready in the destination of cell group be sent out after (or being ready to) reception cell group at once.
1970, cell group is extracted based on responsive tags.If cell group has been moved into memory, cell group can be extracted from memory.If cell group is queued up in input rank place, cell group can be extracted from input rank.Cell group can queue identifier and/or queue sequential value based on being included in responsive tags be extracted.Queue identifier and/or queue sequential value can come from queue label.
1980, cell group can be sent out via switching fabric.Cell group can be sent out via switching fabric according to the instruction being included in transmission response.In certain embodiments, cell group can and/or be sent out via specific transmission path in the specific time.In certain embodiments, cell group can send to the destination of for example output port via switching fabric.In certain embodiments, after being sent out via switching fabric, cell group can be queued up at the output queue place for example, being associated with the destination (, destination port) of cell group.
Figure 20 is the signaling process figure that shows the request sequence value processing being associated with transmission request according to an embodiment.As shown in figure 20, transmission request 52 is sent to the output scheduling module 2030 on switching fabric outlet side from the input scheduling module 2020 on switching fabric input side.Transmission request 56 is sent to output scheduling module 2030 from input scheduling module 2020 after transmission request 52 is sent out.As shown in figure 20, transmission request 54 is sent out from input scheduling module 2020, but can't help output scheduling module 2030 receives.Each is associated transmission request 52, transmission request 54 and transmission request 56 with identical input rank IQ1, and queue identifier as corresponding in it is indicated, and relevant with identical destination port EP1, and destination as corresponding in it identifier is indicated.Transmission request 52, transmission request 54 and transmission request 56 can be called as transmission request 58 jointly.As shown in figure 20, the time increases on down direction.
As shown in figure 20, each transmission request 58 can comprise request sequence value (SV).Request sequence value can represent the sequence of transmission request with respect to other transmission requests.In this embodiment, request sequence value can come from the scope of the request sequence value being associated with destination port EP1, and increases with the form of full integer by numerical value order.In certain embodiments, request sequence value can be for example to go here and there (strings), and can for example, increase with different orders (, contrary numerical value order).Transmission request 52 comprises request sequence value 5200, and transmission request 54 comprises request sequence value 5201, and transmission request 56 comprises request sequence value 5202.In this embodiment, the 5200 instruction transmission requests 52 of request sequence value were defined and were sent out before transmission request 54, and transmission request 54 has request sequence value 5201.
Output scheduling module 2030 can be determined from the failure of transmission of the transmission request of input scheduling module 2020 based on request sequence value.Especially, output scheduling module 2030 can determine that the transmission request being associated with request sequence value 5201 is not received before transmission request 56 is received, and transmission request 56 is relevant with request sequence value 5202.In certain embodiments, when between transmission request 52 and the reception of transmission request 56 time period, (being shown as the time period 2040) exceeded threshold time section time, output scheduling module 2030 can be carried out the action about the transmission request 54 of loss.In certain embodiments, output scheduling module 2030 can ask input scheduling module 2020 to retransmit transmission request 54.Output scheduling module 2030 can comprise the request sequence value of loss, not received thereby input scheduling module 2020 can be identified transmission request 54.In certain embodiments, output scheduling module 2030 can refuse to be included in the request for transmit cell group in transmission request 56.In certain embodiments, output scheduling module 2030 can be configured to be described together with request sequence value to be substantially similar to based on queue sequential value the mode of method and process and/or response transmission request (for example transmission request 58).
Figure 21 is the signaling process figure that shows the response sequence value relevant with transmission response according to an embodiment.As shown in figure 21, the output scheduling module 2130 of transmission response 62 from switching fabric outlet side is sent to the input scheduling module 2120 of switching fabric input side.Transmission response 66 sends to input scheduling module 2120 from output scheduling module 2130 after transmission response 62 is sent out.As shown in figure 21, transmission response 64 sends from output scheduling module 2130, but can't help input scheduling module 2120 receives.Transmission response 62, transmission response 64 and transmission response 66 and identical being associated by its input rank IQ2 that correspondingly queue identifier is indicated.Transmission response 62, transmission response 64 and transmission response 66 can be called as transmission response 68 jointly.As shown in figure 21, the time increases on down direction.
As shown in figure 21, each transmission response 68 can comprise response sequence value (SV).Response sequence value can represent the transmission response sequence with respect to other transmission responses.In this embodiment, response sequence value can come from the scope of the response sequence value being associated with input rank IQ2, and increases with the form of full integer according to numerical value order.In certain embodiments, response sequence value can be for example to go here and there, and can for example, increase with different orders (, reverse numerical value order).Transmission response 62 can comprise response sequence value 5300, and transmission response 64 comprises response sequence value 5301, and spreads out of response and 66 comprise response sequence value 5302.In this embodiment, response sequence value 5300 instruction transmission responses 62 were defined and sent before having the transmission response 64 of corresponding sequence value 5301.
Input scheduling module 2120 can be determined from the failure of transmission of the transmission response of output scheduling module 2130 based on response sequence value.Especially, input scheduling module 2120 can determine that the transmission response being associated with response sequence value 5301 is not received before transmission response 66 is received, and transmission response 66 is associated with response sequence value 5302.In certain embodiments, in the time that the time period between transmission response 62 and the reception of transmission response 66, (being shown as the time cycle 2140) exceeded the threshold time cycle, input scheduling module 2120 can be carried out the action about the transmission response 64 of losing.In certain embodiments, input scheduling module 2120 can ask output scheduling module 2130 to retransmit transmission response 64.Input scheduling module 2120 can comprise the response sequence value of loss, not received thereby output scheduling module 2130 can be identified transmission response 64.In certain embodiments, in the time that the transmission response being associated with transmission request is not received within the specific time cycle, input scheduling module 2120 can dropped cell group.
Figure 22 is the multistage schematic block diagram that shows the controlled queue of flow according to an embodiment.As shown in figure 22, the transmitter side of the transmitter side of first order queue 2210 and second level queue 2220 is included in the source entity 2230 on physical link 2200 transmitter sides.The receiver side of the receiver side of first order queue 2210 and second level queue 2220 is included in the destination entity 2240 on physical link 2200 receiver sides.Source entity 2230 and/or destination entity 2240 can be the calculation elements (for example, a part for exchcange core, peripheral processor) of any type, and it can be configured to receive and/or send data via physical link 2200.In certain embodiments, source entity 2230 and/or destination entity 2240 can be associated with data center.
As shown in figure 22, first order queue 2210 is included in transmit queue A1 on physical link 2200 transmitter sides to A4 (being called first order transmit queue 2234) and the receiving queue D1 on physical link 2200 receiver sides to D4 (being called first order receiving queue 2244).Second level queue 2220 is included in receiving queue C1 and the C2 (being called second level receiving queue 2242) on transmit queue B1 and B2 (being called second level transmit queue 2232) and physical link 2200 receiver sides on physical link 2200 transmitter sides.
Data flow via physical link 2200 can the flow control signaling based on being associated with the flow control ring between source entity 2230 and destination entity 2240 be controlled (for example, amendment, time-out).For example, the data that the source entity 2230 from physical link 2200 transmitter sides sends can receive by the destination entity 2240 on physical link 2200 receiver sides.In the time that destination entity 2240 is not useable for receiving data from source entity 2230 via physical link 2200, flow control signal can be defined at destination entity 2240 places and/or can be sent to source entity 2230 from destination entity 2240.Flow control signal can be configured to trigger source entity 2230 to revise the data flow from source entity 2230 to destination entity 2240.
For example, if receiving queue D2 is not useable for processing the data that send from transmit queue A1, destination entity 2240 can be configured to send to source entity 2230 flow control signal being associated with flow control ring; Flow control signal can be configured to trigger from transmit queue A1 to receiving queue D2 the time-out via the transfer of data of transmission path, and transmission path comprises at least a portion and the physical link 2200 of second level queue 2220.In certain embodiments, receiving queue D2 may be unavailable, for example, and in the time that receiving queue D2 too completely can not receive data.In certain embodiments, receiving queue D2 can change into down state (for example, congestion state) from upstate in response to the data that previously receive from transmit queue A1.In certain embodiments, transmit queue A1 can be called as the target of flow control signal.Transmit queue A1 can the queue identifier based on being associated with transmit queue A1 be identified in flow control signal.In certain embodiments, flow control signal can be called as feedback signal.
In this embodiment, flow control ring is associated with physical link 2200 (being called physical link control ring), flow control ring is associated with first order queue 2210 (being called first order control ring), and flow control ring is associated with second level queue 2220 (being called second level control ring).Especially, physical link control ring with comprise physical link 2200 and do not comprise first order queue 2210 and the transmission path of second level queue 2200 is associated.Data flow via physical link 2200 can the flow control signaling based on relevant with physical link control ring be switched on and disconnect.
First order control ring can be based on coming from the transfer of data of at least one transmit queue 2234 in second level queue 2210 and for example, flow control signal based at least one receiving queue 2244 availability (, the designator of availability) definition in first order queue 2210.Like this, first order control ring can be called as with first order queue 2210 and is associated.First order control ring can be associated with the transmission path that comprises at least a portion of physical link 2200, second level queue 2220 and at least a portion of first order queue 2210.The flow control signaling relevant with first order control ring can trigger the data flow of controlling from the transmit queue 2234 being associated with first order queue 2210.
Second level control ring can with comprise physical link 2200 and comprise at least a portion of second level queue 2220, but do not comprise that the transmission path of first order queue 2210 is associated.Second level control ring can be based at least one transmit queue 2232 in second level queue 2220 and for example, transfer of data based on the flow control signal of at least one receiving queue 2242 availability (, the designator of availability) definition in second level queue 2220.Like this, second level control ring can be called as with second level queue 2220 and is associated.The flow control signaling being associated with second level control ring can trigger the data flow of controlling the transmit queue 2232 from being associated with second level queue 2220.
In this embodiment, the flow control ring being associated with second level queue 2220 is the flow control ring based on priority.Especially, come from each transmit queue and the receiving queue pairing that comes from second level receiving queue 2242 of second level transmit queue 2232; And each queue is pair relevant with service class (being also called as the grade of service or service quality).In this embodiment, second level transmit queue B1 and second level transmit queue C1 definition queue to and be associated with service class X.Second level transmit queue B2 and second level transmit queue C2 definition queue to and be associated with service class Y.In certain embodiments, dissimilar Internet traffic can be associated from different service class (being different priority).For example, storing communication amount (for example, reading and write traffic), internal processor are communicated by letter, media signaling, session layer signaling etc. can be relevant at least one service class.In certain embodiments, second level control ring can be based on, for example Institute of Electrical and Electric Engineers (IEEE) 802.1qbb agreement, and it defines the flow control strategy based on priority.
Via the data traffic of transmission path 74, as shown in figure 22, can use at least one control ring to be controlled.Transmission path 74 comprises first order transmit queue A2, second level transmit queue B1, physical link 2200, second level receiving queue C1 and first order receiving queue D3.But the change in the data flow of the flow control ring via the queue in transmission path 74 one-levels based on being associated with this grade, can affect data flow by another level of transmission path 74.Flow control at one-level place can affect the data flow at another grade, for example, because the queue in source entity 2230 (, transmit queue 2232, transmit queue 2234) and destination entity 2240 in queue (for example, receiving queue 2242, receiving queue 2244) be classification section.In other words, the flow control based on a flow control ring can have via the factor being associated with different flow control ring the raw impact of data miscarriage.
For example, can be modified based on one or more control ring one first order control rings, second level control ring and/or physical link control ring to the data flow of first order receiving queue D3 via transmission path 74 from first order transmit queue A1.Time-out to the data flow of first order receiving queue D3 may be changed into down state (for example, congestion state) and be triggered from upstate due to first order receiving queue D3.
If the data flow to first order receiving queue D3 is associated with service class X, flow control signaling time-out that can be based on being associated with second level control ring (it is the control ring based on priority) via the data flow of second level transmit queue B1 and second level receiving queue C1 (queue that its definition is associated with service class X to).But can cause coming from the data transmission suspension of the transmit queue that is input to second level transmit queue B1 via the right data transmission suspension of the queue being associated with service class X.Especially, can cause not only coming from the transfer of data of first order transmit queue A2 via the right data transmission suspension of the queue being associated with service class X, also come from the time-out of the transfer of data of first order transmit queue A1.In other words, indirectly or be concurrently affected from the data flow of first order transmit queue A1.In certain embodiments, the data that receive at transmit queue A1 place and the data that receive at transmit queue A2 place can be associated with identical service class X, but the data that receive at transmit queue A1 place and the data of transmit queue A2 place reception may come from for example different (for example, independently) network equipment (not shown), for example peripheral processor, it can be associated from different service class.
To the data flow of first order receiving queue D3 can also be especially by the data transmission suspension that comes from the first order transmit queue A2 flow control signaling based on relevant with first order control ring and suspending.The direct time-out that sends the A2 of team transfer of data by coming from the first order, the transfer of data that comes from first order transmit queue A1 can not be interrupted.In other words, the flow control signal of the flow control of first order transmit queue A2 based on being associated with first order control ring can directly be controlled, and do not need to come from for example data transmission suspension of first order transmit queue A1 of other first order transmit queues.
Can also be by via physical link 220, the transmission of the flow control signaling data based on relevant with physical link control ring suspends and controlled to the data flow of first order receiving queue D3.But can cause all data transmission suspension via physical link 2200 via the data transmission suspension of physical link 2200.
Queue on physical link 2200 transmitter sides can be called as transmit queue 2236 and the queue on physical link receiver side can be called as receiving queue 2246.In certain embodiments, transmit queue 2236 can also be called as source queue, and receiving queue 2246 can be called as destination queue.Although not shown, but in certain embodiments, one or more transmit queues 2236 can be included in one or more interface cards that are associated with source entity 2230, and one or more receiving queue 2246 can be included in one or more interface cards relevant with destination entity 2240.
In the time that source entity 2230 sends data via physical link 2200, source entity 2230 can be called as the transmitter that is positioned at physical link 2200 transmitter sides.Destination entity 2240 can be configured to receive data and be called as the receiver being positioned on physical link 2200 receiver sides.Although not shown, but in certain embodiments, source entity 2230 (with the element being associated (for example, transmit queue 2236)) as destination entity (for example can be configured to, receiver) work and destination entity 2240 (with relevant element (for example, receiving queue 2246)) can be configured to for example, work as source entity (, transmitter).In addition, physical link 2200 can be served as two-way link work.
In certain embodiments, physical link 2200 can be tangible link, and for example optical link (for example, fiber optic cables, plastic optical fiber cable), cable link are (for example, based on the electric wire of copper), twisted pair wire links (for example, 5 class cables) etc.In certain embodiments, physical link 2200 can be wireless link.Via the data transmissions of physical link 2200 based on for example Ethernet protocol, wireless protocols, Ethernet protocol, fibre channel protocol, Ethernet fibre channel protocol, relate to infinite bandwidth agreement and/or etc. agreement be defined.
In certain embodiments, second level control ring can be called as and is nested in first order control ring, because the second level queue 2220 being associated with second level control ring is positioned at the first order queue 2210 being associated with first order control ring.Similarly, physical link control ring can be called as and is nested in the control ring of the second level.In certain embodiments, second level control ring can be called as internal control ring, and first order control ring can be called as external control ring.
Figure 23 is the multistage schematic block diagram that shows the controlled queue of flow according to an embodiment.As shown in figure 23, the transmitter side of the transmitter side of first order queue 2310 and second level queue 2320 is included in the source entity 2330 being positioned on physical link 2300 transmitter sides.The receiver side of the receiver side of first order queue 2310 and second level queue 2320 is included in the destination entity 2340 being positioned on physical link 2300 receiver sides.Queue on physical link 2300 transmitter sides can be called as transmit queue 2336 jointly, and queue on physical link receiver side can be called as receiving queue 2346 jointly.Although not shown, in certain embodiments, source entity 2330 can be configured to as destination entity work, and destination entity 2340 can be configured to for example, work as source entity (, transmitter).In addition, physical link 2300 can be served as two-way link work.
As shown in figure 23, source entity 2330 is communicated by letter via physical link 2300 with destination entity 2340.Source entity 2330 has queue QP1, it is configured to buffered data before data are sent out via physical link 2300 (if needs), and destination entity 2340 has queue QP2, it is configured to data and before destination entity 2340 is assigned with, cushions the data (if needs) that receive via physical link 2300.In certain embodiments, can be processed via the data flow of physical link 2300, and do not need buffer queue QP1 and queue QP2.
Each can be called as first order transmit queue and can jointly be called as transmit queue 2334 (or queue 2334) to QAN to be included in transmit queue QA1 in first order queue 2310.Each can be called as second level transmit queue and can jointly be called as transmit queue 2332 (or queue 2332) to QBM to be included in transmit queue QB1 in second level queue 2320.Each can be called as first order receiving queue and can jointly be called as receiving queue 2344 (or queue 2344) to QDR to be included in receiving queue QD1 in first order queue 2310.Each can be called as second level receiving queue and can jointly be called as receiving queue 2342 (or queue 2342) to QCM to be included in receiving queue QC1 in second level queue 2320.
As shown in figure 23, each queue that comes from second level queue 2320 is located at physical link 2300 and comes from first order queue 2310 within the transmission path between at least one queue.For example, a part for transmission path can be defined by first order receiving queue QD4, second level receiving queue QC1 and physical link 2300.Second level receiving queue QC1 is located in the transmission path between first order receiving queue QD4 and physical link 2300.
In this embodiment, physical link control ring is associated with physical link 2300, and first order control ring is associated with first order queue 2310, and second level control ring is associated with second level queue 2320.In certain embodiments, second level control ring can be the control ring based on priority.In certain embodiments, physical link control ring comprises physical link 2300, queue QP1 and queue QP2.
Flow control signal can be defined and/or be sent out between it in the object control module 2380 at the source control module 2370 at source entity 2330 places and destination entity 2340 places.In certain embodiments, source control module 2370 can be called as source flux control module, and object control module 2380 can be called as target flow control module.For example, object control module 2380 for example can be configured to, when in the time that one or more receiving queues 2346 (, receiving queue QD2) at destination entity 2340 places are not useable for accepting data, to source control module 2370 transmitted traffic control signals.Flow control signal can be configured to trigger source control module 2370 and for example suspend the data flow from one or more receiving queues 2330 to one or more receiving queues 2346.
Before data are sent out, source control module 2370 is associated queue identifier with the data of queuing up at the transmit queue place that comes from transmit queue 2336.Queue identifier can represent and/or be used to the transmit queue that identification data is queued up.For example, when packet is in the time that first order transmit queue QA4 queues up, the queue identifier of unique identification first order transmit queue QA4 can be added in packet or for example be included in, in the field (, head, afterbody, payload) in packet.In certain embodiments, queue identifier can be relevant with the data at source control module 2370 places, or triggered by source control module 2370.In certain embodiments, only before data are sent out, or data are after one of transmit queue 2336 is sent out, and queue identifier can be associated with the data.
Thereby queue identifier can be associated with the data that send to physical link 2300 receiver sides from physical link 2300 transmitter sides, data source (for example, source queue) can be identified.Therefore, flow control signal can be defined the transmission to suspend one or more transmit queues 2336 based on queue identifier.For example, the queue identifier being associated with first order transmit queue QAN can be included in the packet sending from first order transmit queue QAN to first order receiving queue QD3.If after receiving packet, first order receiving queue QD3 can not receive another packet that comes from first order transmit queue QAN, and asking first order transmit queue QAN to suspend the flow control signal transmitting to the additional data packet of first order receiving queue QD3 can the queue identifier based on being associated with first order transmit queue QAN be defined.Queue identifier can be resolved by object control module 2380 from packet, and by object control module 2380 for defining flow control signal.
In certain embodiments, for example, change into down state and suspend from upstate in response to first order receiving queue QDR to the data transmissions of first order receiving queue QDR from several transmit queues 2336 (, first order transmit queue 2334).Each in several transmit queues 2336 can be identified based on its corresponding queue identifier in flow control signal.
In certain embodiments, one or more transmit queues 2336 and/or one or more receiving queue 2346 can be virtual queue (for example, the set of queues of logical definition).Therefore, queue identifier can be associated with virtual queue (for example, can embody).In certain embodiments, the queue that queue identifier can be concentrated with the queue that comes from defining virtual queue is associated.In certain embodiments, each queue identifier that comes from the queue identifier collection being associated with physical link 2300 can be unique.For example, each transmit queue 2336 for example, being associated with physical link 2300 (, being associated with redirect) can be associated with unique queue identifier.
In certain embodiments, source control module 2370 only can be configured to queue identifier with particular subset of transmit queue 2336 and/or only be associated with the data subset of locating to queue up one of in transmit queue 2336.For example, if data do not follow queue identifier to be sent to first order receiving queue QD1 from first order transmit queue QA2, the flow control signal that the request of being configured to comes from the data transmission suspension of first order transmit queue QA2 can not be defined, because do not know source data.Therefore,, in the time that data are sent out from transmit queue, by not for example, by queue identifier and Data relationship (, omitting), the transmit queue that comes from transmit queue 2336 can be exempted from flow control.
In certain embodiments, be satisfied and be defined based on condition in the unavailable performance of one or more receiving queues 2346 at destination entity 2340 places.This condition can relate to storage restriction, queue access rate, the data traffic speed that is input to queue of queue etc.For example, flow control signal can be at object control module 2380 places the state in response to one or more receiving queues 2346, for example receiving queue QC2 in the second level is exceeded and changes into down state (for example, congestion state) and be defined based on threshold value storage restriction from upstate.When in down state, second level receiving queue QC2 is not useable for receiving data, because that for example second level receiving queue QC2 is considered to is too full (as indicated in exceeding of being limited by threshold value storage).In certain embodiments, in the time of forbidding, one or more receiving queues 2346 can be in down state.In certain embodiments, in the time that receiving queue is not useable for receiving data, flow control signal can be defined to the data transmission suspension of the receiving queue that comes from receiving queue 2346 based on request.In certain embodiments, the state of one or more receiving queues 2346 can for example, the particular subset in congestion state be changed into congestion state (by object control module 2380) from upstate in response to receiving queue 2346 (, the receiving queue in a specific order).
In certain embodiments, flow control signal can be defined to indicate one in receiving queue 2346 to change into upstate from down state at object control module 2380 places.For example, initially, object control module 2380 can be configured to definition and change into down state in response to first order receiving queue QD3 from upstate send first flow control signal to source control module 2370.First order receiving queue QD3 can change into down state from upstate in response to the data that send from first order transmit queue QA2.Therefore, the target of first flow control signal can be first order transmit queue QA2 (based on queue identifier instruction).In the time that first order receiving queue QD3 changes back upstate from down state, object control module 2380 can be configured to definition and send the second flow control signal to source control module 2370, and its instruction changes back upstate from down state.In certain embodiments, source control module 2370 can be configured to trigger the transfer of data from one or more transmit queues 2336 to first order receiving queue QD3 in response to the second flow control signal.
In certain embodiments, flow control signal can have one or more parameter values, and it is used to by source control module 2370 transmission that amendment one of comes from transmit queue 2336 (being identified in flow control signal by queue identifier).For example, flow control signal can comprise that trigger source control module 2370 suspends the parameter value of transmission one special time period (for example, 10 milliseconds (ms)) one of coming from transmit queue 2336.In other words, flow control signal can comprise time out section parameter value.In certain embodiments, time out section can be uncertain.In certain embodiments, flow control signal can define the request that for example, sends data from one or more transmit queues 2336 with special speed (, specified number of frames per second, given number bit per second).
In certain embodiments, flow control signal (for example, the time out section in flow control signal) can be defined based on flow control algorithm.Time out section can be based on for example, being to be defined in the down state elapsed time cycle in the receiving queue that comes from receiving queue 2346 (, first order receiving queue QD4).In certain embodiments, time out section can be defined for down state based on more than one first order receiving queue 2344.For example, in certain embodiments, in the time that the first order receiving queue 2344 of a similar given number is congestion state, time out section increases.In certain embodiments, suchly definitely can be determined in object control module 2380.The time period of receiving queue in unavailable experience can be that the rate of discharge based on for example coming from receiving queue data is (for example by object control module 2380, historical rate of discharge, previous rate of discharge) plan (for example, estimating) time period of calculating.
In certain embodiments, the request that amendment comes from the data flow of one or more transmit queues 2336 can be refused or change to source control module 2370.For example, in certain embodiments, source control module 2370 can be configured to reduce or increase time out section.In certain embodiments, be not in response to flow control signal and suspend transfer of data, source control module 2370 can be configured to amendment and the transmission path being associated one of in transmit queue 2336.For example, if the first order transmit queue QA2 change based on first order receiving queue QD2 state receives the request that suspends transmission, source control module 2370 can be configured to trigger from first order transmit queue QA2 to for example transfer of data of first order receiving queue QD3, instead of carries out according to the request that suspends transmission.
As shown in figure 23, the queue fan-in within second level queue 2320 (fan into) or fan-out (fan out) physical link 2300.For example, for example, queue QP1 on transmit queue 2332 (, QB1 is to QBM) fan-in physical link 2300 transmitter sides on physical link 2300 transmitter sides.Therefore, can be sent to the queue QP1 of physical link 2300 in the data that queue up in transmit queue 2332 places arbitrarily.On physical link 2300 receiver sides, the data that send via queue QP2 from physical link 2300 can be broadcast to receiving queue 2342 (, queue QC1 is to QCM).
Equally, as shown in figure 23, transmit queue 2334 fan-ins in first order queue 2310 arrive the transmit queue 2332 in second level queue 2320.For example, the data that any place is queued up in first order transmit queue QA1, QA4 and QAN-2 can be sent to second level transmit queue QB2.On physical link 2300 receiver sides, can be broadcast to first order receiving queue QDR-1 and QDR from the data that for example receiving queue QCM in the second level sends.
For example, because many flow control rings (, the first control ring) are associated from different fan-in, fan-out architecture, flow control ring is on having different impacts via the data flow of physical link 2300.For example, in the time that the transfer of data from second level transmit queue QB1 is suspended based on second level control ring, be also suspended to the transfer of data of one or more receiving queues 2346 via second level transmit queue QB1 from first order transmit queue QA1, QA2, QA3 and QAN-1.In this case, for example, in the time coming from the transmission time-out of downstream queue (, second level transmit queue QB1), the data transmissions that comes from one or more upstream queues (for example, first order transmit queue QA1) is suspended.On the contrary, if from first order transmit queue QA1 along comprising that at least the transfer of data of the transmission path of downstream second level transmit queue QB1 is suspended based on first order control ring, the data on flows rate that comes from second level transmit queue QB1 can reduce, and does not need the transfer of data that comes from second level transmit queue QB1 all to suspend; For example, first order transmit queue QA1, still can send data via second level transmit queue QB1.
What in certain embodiments, fan-in and fan-out architecture can be from shown in Figure 23 is different.For example, in certain embodiments, some queues in first order queue 2310 can be configured to roundabout second level queue 2320 ground fan-in physical links 2300.
The flow control signaling being associated with transmit queue 2336 is processed by source control module 2370 and the flow control signaling that is associated with receiving queue 2346 is processed by object control module 2380.Although not shown, in certain embodiments, flow control signaling can be processed by one or more control modules (or controlling submodule) that can be independently and/or be integrated in single control module.For example, the flow control signaling being associated with first order receiving queue 2344 can be by being independent of the control module processing that is configured to the control module of processing the flow control signaling being associated with second level receiving queue 2342.Similarly, the flow control signaling being associated with first order transmit queue 2334 can be by being independent of the control module processing that is configured to process the flow control signaling control module relevant with second level transmit queue 2332.In certain embodiments, one or more parts of source control module 2370 and/or object control module 2380 can be hardware based module (for example, DSP, FPGA) and/or module based on software (for example, computing node module, the processor readable instruction sets that can carry out on processor).
Figure 24 is the schematic block diagram that shows object control module 2450 according to an embodiment, and this object control module is configured to the flow control signal 6428 that definition is associated with multiple receiving queues.Queue level comprises first order queue 2410 and second level queue 2420.As shown in figure 24, source control module 2460 is associated with the transmitter side of first order queue 2410 and object control module 2450 is associated with the receiver side of first order queue 2410.Queue on physical link 2400 transmitter sides can be called as transmit queue 2470 jointly.Queue on physical link 2400 receiver sides can be called as receiving queue 2480 jointly.
Object control module 2450 is configured to be not useable for receiving data from the single source queue of first order queue 2410 in response to the one or more receiving queues in first order queue 2410, to source control module 2460 transmitted traffic control signals 6428.Source control module 2460 is configured to suspend the transfer of data from the source queue of first order queue 2410 to multiple receiving queues at first order queue 2410 places based on flow control signal 6428.
Flow control signal 6428 can be by object control module 2450 information based on being associated with each the unavailable receiving queue in first order queue 2410 and defining.Object control module 2450 can be configured to collect the information being associated with unavailable receiving queue and be configured to define flow control signal 6428, thereby the flow control signal (not shown) of potential conflict is not sent to the single source queue at first order queue 2410 places.In certain embodiments, the flow control signal 6428 of the information definition based on collecting can be called as aggregated flows control signal.
Especially, in this example, the transmit queue 2412 that object control module 2450 is configured to be not useable for from first order queue 2410 transmitter sides at first order queue 2410 receiver side places in response to two receiving queue-receiving queues 2442 and receiving queue 2446-receives data, defines flow control signal 6428.In this embodiment, in response to the packet sending via transmission path 6422 and transmission path 6424 respectively from transmit queue 2412, receiving queue 2442 and receiving queue 2446 are changed into down state from upstate.As shown in figure 24, transmission path 6422 comprises receiving queue 2432 and the receiving queue 2442 in transmit queue 2422 in transmit queue 2412, second level queue 2420, physical link 2400, second level queue 2420.Transmission path 6424 comprises transmit queue 2412, transmit queue 2422, physical link 2400, receiving queue 2432 and receiving queue 2446.
In certain embodiments, flow control algorithm can be used to based on relating to the information of receiving queue 2442 unavailabilities and/or relating to the information of receiving queue 2446 unavailabilities and define flow control signal 6428.For example, if object control module 2450 determines that receiving queue 2442 and receiving queue 2446 are not useable for the different time periods, object control module 2450 can be configured to the time period definition flow control signal 6428 based on different.For example, object control module 2450 can be asked one time period of data transmission suspension from transmit queue 2412 via flow control signal 6428, the time period of this time period based on different (for example, equal different time sections mean value time period, equal time period of higher value in different time sections) calculate.In certain embodiments, flow control signal 6428 can be asked (the time-out request for example, being associated with receiving queue 2442 and the time-out request being associated with receiving queue 2446) definition by independent time-out the based on coming from first order queue 2410 receiver sides.
In certain embodiments, flow control signal 6428 can allow time period definition based on maximum or I.In certain embodiments, flow control signal 6428 can calculate by the collective data flow rate based on coming from for example transmit queue 2412.For example, time out section can be measured by the collective data flow rate based on coming from transmit queue 2412.In certain embodiments, for example, if the data traffic speed that comes from transmit queue 2412 higher than threshold value, time out section can be increased, and if the data traffic speed that comes from transmit queue 2412 lower than threshold value, time out section can be reduced.
In certain embodiments, flow control algorithm can be configured to wait for the specific time period before definition and/or transmitted traffic control signal 6428.Stand-by period section can be defined and makes to relate to transmit queue 2412 and can be used to define flow control signal 6428 in the received multiple time-out requests of different time in section of waiting for.In certain embodiments, stand-by period section is received and is triggered in response at least one the time-out request that relates to transmit queue 2412.
In certain embodiments, flow control signal 6428 can be defined by flow control algorithm by the priority valve based on being associated with each receiving queue in first order queue 2410.For example, if receiving queue 2442 has the priority valve higher than the priority valve being associated with receiving queue 2446, object control module 2450 can be configured to the information definition flow control signal 6428 based on being associated with receiving queue 2442 instead of receiving queue 2446.For example, the time out section that flow control signal 6428 can be based on being associated with receiving queue 2442 instead of the time out section being associated with receiving queue 2446 definition, because receiving queue 2442 has the priority valve higher than the priority valve being associated with receiving queue 2446.
In certain embodiments, flow control signal 6428 can be defined by flow control algorithm by the attribute based on being associated with each receiving queue of first order queue 2410 inside.For example, flow control signal 6428 can for example, receiving queue 2442 and/or receiving queue 2446 based on being particular type queue (entering first (LIFO) queue, first-in first-out (FIFO) queue) define.In certain embodiments, flow control signal 6428 can for example, receiving queue 2442 and/or receiving queue 2446 based on being configured to receive specific type of data (, controlling data/signal queue, media data/signal queue) define.
Although not shown, the one or more control modules that for example, are associated with queue level (, first order queue 2410) can be configured to send information to different control modules, wherein this information is used to define flow control signal.Different control modules is relevant from different queue level.For example, the time-out request being associated with receiving queue 2442 and relevant time-out request can be defined in object control module 2450 with receiving queue 2446.Suspend request and can be sent to the object control module (not shown) being associated with second level queue 2420 receiver sides.Flow control signal (not shown) can be at the object control module place being associated with second level queue 2420 receiver sides based on suspending request and defining based on flow control algorithm.
Flow control signal 6428 can define by the flow control ring (for example, first order control ring) based on being associated with first order queue 2410.The flow control ring that one or more flow control signal (not shown) can also be based on being associated with second level queue 2420 and/or the flow control ring being associated with physical link 2400 definition.
The transfer of data being associated with the interior transmit queue of first order queue 2410 (except transmit queue 2412) is not limited by flow control signal 6428 substantially, because control based on first order flow control ring to the data flow of receiving queue 2442 and 2446.For example, even from the data transmission suspension of transmit queue 2412, transmit queue 2414 can also continue to send data via transmit queue 2422.For example, even if transmit queue 2414 can be configured to suspend via the transfer of data of transmit queue 2422 from transmit queue 2412, can also send data to receiving queue 2448 via the transmission path 6426 that comprises transmit queue 2422.In certain embodiments, even if transmit queue 2422 can be configured to be suspended based on flow control signal 6428 via the transfer of data of transmission path 6422 from queue 2412, can also continue to send data from for example transmit queue 2416 to receiving queue 2442.
Otherwise, if be suspended via the data flow of transmit queue 2422 by the flow control signal (not shown) control based on relevant with second level control ring with 2446 transfer of data to receiving queue 2442, (except coming from the transfer of data of transmit queue 2412) also will be limited via the transfer of data of transmit queue 2422 from transmit queue 2414 and transmit queue 2416.To be suspended from the transfer of data of transmit queue 2422, and because it is associated with special services rank, and will for example cause and can be associated with special services rank in the congested data in receiving queue 2442 and 2446 places.
Within flow control signal 6428, one or more parameter values of definition can be stored in the memory 2452 of object control module 2450.In certain embodiments, after one or more parameter values are defined and/or in the time that flow control signal 6428 is sent to source control module 2460, parameter value can be stored in memory 2452 places of object control module 2450.Can be used for following the tracks of for example state of transmit queue 2412 at the parameter value of flow control signal 6428 interior definition.For example, the entry in memory 2452 can indicate transmit queue 2412 for example, in halted state (non-transmission state).Entry can be defined by the time out section parameter value based in flow control signal 6428 interior definition.Overtime when time out section, this entry can be updated to indicate the state of transmit queue 2412 to change into for example active state (for example sending state).For example, although not shown, in certain embodiments, one or more parameter values can be stored in the memory (, remote memory) outside object control module 2450.
In certain embodiments, the one or more parameter values (for example, the state information based on one or more parameter values definition) that are stored in the memory 2452 of object control module 2450 can be by object control module 2450 for determining whether additional flow control signal (not shown) should be defined.In certain embodiments, one or more parameter values can define one or more additional flow control signals by object control module 2450.
For example, for example, if receiving queue 2442 (is changed into down state in response to the first packet receiving from transmit queue 2412 from upstate, congestion state), suspend from the request of the transfer of data of transmit queue 2412 and can be sent out via flow control signal 6428.Flow control signal 6428 can be targets of this request and can specify time out section based on queue designator instruction transmit queue 2412.In the time that flow control signal 6428 is sent to source control module 2460, the time out section being associated with transmit queue 2412 and queue identifier can be stored in the memory 2452 of object control module 2450.After flow control signal 6428 is sent out, receiving queue 2444 can be changed into congestion state (transmission path is not shown among Figure 24) from upstate in response to the second packet receiving from transmit queue 2412.Before the data transmission suspension from transmit queue 2412, the second packet can be sent out from transmit queue 2412 based on flow control signal 6428.Object control module 2450 can be accessed the information being stored in memory 2452, and can be in response to having the change of off status with receiving queue 2444, determine that target is that the additional flow control signal of transmit queue 2412 should not be defined and send to source control module 2460, because flow control signal 6428 has been sent out.
In certain embodiments, the flow control signal parameter value that source control module 2460 can be configured to based on nearest suspends the transmission that comes from transmit queue 2412.For example, after the flow control signal 6428 that is transmit queue 2412 in target has been sent to source control module 2460, target is that the slower flow control signal (not shown) of transmit queue 2412 can be received at source control module 2460 places.Source control module 2460 can be configured to carry out and one or more parameter values that flow control signal is associated subsequently, instead of the parameter value being associated with flow control signal 6428.In certain embodiments, slower flow control signal can trigger transmit queue 2412 and maintain a halted state longer or shorter time period that ratio is indicated in flow control signal 6428 of maintenance.
In certain embodiments, when the priority valve being associated with one or more parameter values higher than (or lower than) with and one or more parameter values of being associated of flow control signal 6428 be associated priority valve time, source control module 2460 is carried out the parameter value that one or more and slower flow control signal is associated alternatively.In certain embodiments, each priority valve can be defined in object control module 2450, and each priority valve can define based on the priority valve being associated with one or more receiving queues 2480.
In certain embodiments, flow control signal 6428 and slower flow control signal (being all that target is transmit queue 2412) are all in response to unavailable being defined of identical receiving queue that comes from receiving queue 2480.For example, slower flow control signal can comprise the undated parameter value being defined based on receiving queue 2442 by object control module 2450, and receiving queue 2442 maintains a ratio and previously calculated the longer time period in down state.In certain embodiments, target is that the flow control signal 6428 of transmit queue 2412 can be in response to change state one of in receiving queue 2480 (for example, change into down state from upstate) and be defined, and target is that the slower flow control signal of transmit queue 2412 can for example, be defined in response to another change state (, changing into down state from upstate) in receiving queue 2480.
In certain embodiments, multiple flow control signals can be defined to suspend the transmission from more than 2410 transmit queue of first order queue in object control module 2450.In certain embodiments, multiple transmit queues can be to send data to for example receiving queue 2444 of independent receiving queue.In certain embodiments, can be stored in the memory 2452 of object control module 2450 to the history of the flow control signal of the multiple transmit queues from first order queue 2410.In certain embodiments, the slower flow control signal being associated with independent receiving queue can the history based on flow control signal be calculated.
In certain embodiments, the time out section relevant to multiple transmit queues can be grouped and be included in flow control grouping.For example, the time out section being associated with transmit queue 2412 and the time out section being associated with transmit queue 2414 can be included in flow control grouping (being also called as flow control grouping).The more details that relate to flow control grouping are described in connection with Figure 25.
Figure 25 is the schematic diagram that shows flow control grouping according to an embodiment.Flow control grouping comprises head 2510, afterbody 2520 and comprise the pay(useful) load 2530 of the time out section parameter value (showing) of the several transmit queues for being represented by queue identifier (ID) (showing at row 2514) in row 2512.As shown in figure 25, each is associated the transmit queue being represented to V (being that queue ID1 is to queue IDV) by queue ID1 with time out section parameter value 1 to V (being that the time out cycle 1 is to time out cycle V).Time out section parameter value 2514 indicates the transmit queue that represented by queue 2512 from sending data, should be suspended the time period that (for example, forbidding) experience.
In certain embodiments, flow control grouping can be for example, and the object control module place of example object control module 2450 is as shown in Figure 24 defined.In certain embodiments, object control module can be configured in the definition flow control grouping of the time interval of rule.For example, object control module can be configured to flow control grouping of every 10ms definition.In certain embodiments, in the time that time out section parameter value is calculated, and/or in the time that the given number of time out section parameter value has been calculated, object control module can be configured to random time definition flow control grouping.In certain embodiments, object control module can determine that at least a portion flow control grouping should not be defined and/or send based on for example one or more parameter values and/or the state information of being accessed by object control module.
Although not shown, in certain embodiments, multiple queue ID can be associated with independent time out cycle parameter value.In certain embodiments, at least one queue ID can be associated with the parameter value except time out section parameter value.For example, queue ID can be associated with flow rate parameter value.Flow rate parameter value can indicate transmit queue (ID represents by queue) should send the flow rate (for example, maximum stream flow speed) of data.In certain embodiments, flow control grouping can have and is one or morely configured to indicate specific receiving queue whether to can be used for receiving the means of data.
Flow control grouping can (example source control module 2460 as shown in Figure 24) send via flow control signal (example flow control signal 6428 as shown in Figure 24) from object control module to source control module.In certain embodiments, flow control grouping can for example, be defined based on the 2nd layer of (, the 2nd of osi model the layer) agreement.In other words, flow control grouping can be defined and be used therein in the 2nd of network system layer.In certain embodiments, flow control grouping can with the 2nd layer of device being associated (for example, mac device) between be sent out.
Again with reference to Figure 25, the one or more parameter values (for example, the state information based on parameter value definition) that are associated with flow control signal 6428 can be stored in the memory 2562 of source control module 2560.In certain embodiments, when flow control signal 6428 is in the time that source control module 2560 is received, one or more parameter values can be stored in the memory 2562 of source control module 2560.In flow control signal 6428, the parameter value of definition can be used to follow the tracks of the state of one or more receiving queues 2580 (for example, receiving 2542).For example, the entry in memory 2562 can indicate receiving queue 2542 to be not useable for receiving data.This entry can the time out cycle parameter value based on definition in flow control signal 6428 be defined and for example, be associated with the identifier (, queue identifier) of receiving queue 2542.Overtime when time out section, this entry can be updated to indicate the state of receiving queue 2542 to change into for example active state.Although not shown, but in certain embodiments, one or more parameter values for example can be stored in, in the memory (, remote memory) outside source control module 2560.
Whether the one or more parameter values (and/or state information) that in certain embodiments, are stored in memory 2562 places of source control module 2560 can should be sent to one or more receiving queues 2580 for specified data by source control module 2560.For example, source control module 2560 can be configured to state information based on relating to receiving queue 2544 and receiving queue 2542 from transmit queue 2516 to receiving queue 2544 instead of receiving queue 2542 sends data.
In certain embodiments, source control module 2560 can be analyzed data-transmission mode and whether should send to one or more receiving queues 2580 from one or more sources queue 2570 with specified data.The parameter value at memory 2562 places that for example, source control module 2560 can be based on being stored in source control module 2560 determines that transmit queue 2514 sends relatively high data volume to receiving queue 2546.Determine based on this, source control module 2560 can trigger queue 2516 and send data to receiving queue 2548 instead of receiving queue 2546, because receiving queue 2546 receives high data volume from transmit queue 2514.By analyzing the transmission mode being associated with transmit queue 2570, the congested beginning at one or more receiving queues 2580 places can be avoided substantially.
In certain embodiments, whether source control module 2560 can analyzing stored should be sent to one or more receiving queues 2580 at the parameter value (and/or state information) at memory 2562 places of source control module 2560 with specified data.By the parameter value (and/or state information) of analyzing stored, can substantially be avoided in the congested beginning at one or more transmit queues 2580 places.For example, source control module 2560 can send to receiving queue 2540 instead of receiving queue 2542 based on for example, carrying out trigger data than the historical availability of receiving queue 2540 of the historical availability of receiving queue 2542 (, better, poorer).In certain embodiments, for example, source control module 2560 can send data than receiving queue 2542 historical performances of receiving queue 2544 historical performances to receiving queue 2542 instead of receiving queue 2544 based on relevant data burst mode.In certain embodiments, relate to network processes (for example, internal processor communication) that the parameter value analysis of one or more receiving queues 2580 can be based on specific time window, particular type, special services rank etc.
In certain embodiments, object control module 2550 can send about the state information of receiving queue 2580 (for example, current state information), whether it can should be sent out from one or more sources queue 2570 for specified data by source control module 2560.For example, source control module 2560 can trigger queue 2514 and send data to queue 2544 instead of queue 2546, because queue 2546 has the more active volume of ratio queue 2544 as indicated in object control module 2550.In certain embodiments, any combination of current state information, transmission mode analysis and historical data analysis can be used to substantially stop or reduce the possibility of the congested beginning of one or more receiving queues 2580.
In certain embodiments, flow control signal 6428 can be sent to source control module 2560 via transmission path band from object control module 2550.For example, flow control signal 6428 can be sent out via the dedicated link that relates to the communication of flow control signaling.In certain embodiments, the queue that flow control signal 6428 can be associated via the queue being associated with second level queue 2520, with first order queue 2510, and/or physical link 2500 is sent out.
Embodiment more described herein relate to the have computer readable medium Computer Storage product of (being also called as processor readable medium), and computer readable medium has to be useful on it carries out instruction or the computer code that various computers can executable operations.Medium and computer code (being also called as code) can be those media and the computer codes that is designed and builds for specific purpose.The example of computer readable medium comprises, but is not restricted to: for example magnetic storage media of hard disk, floppy disk and tape; The optical storage medium of for example compact disk/Digital video disc (CD/DVD), compression compact disc-ROM (CD-ROM) and holographic apparatus; Magnetic-the optical storage medium of for example CD; Carrier signal processing module; And be configured to especially store the also hardware unit of executive program code, for example ASIC, programmable logic device (PLD), and read-only memory (ROM) and ram set.
The example of computer code comprises, but is not restricted to, microcode or microcommand, machine instruction, for example produced by compiler, for generation of the code of web services, and comprise the file that is used the high-level instructions that translating machine carries out by computer.For example, embodiment can use Java, C++ or other programming languages (for example, OO programming language) and developing instrument to be implemented.The additional examples of computer code comprises, but be not restricted to control signal, encrypted code and compressed code.
Although various embodiment are being described above, should be understood that it is only to embody by the mode of example instead of restriction, and can carry out the various variations in form and details.The combination in any way of the arbitrary portion of equipment described herein and/or method, except mutually exclusive combination.The embodiments described herein can comprise various combinations and/or the sub-combination of function, assembly and/or the feature of the different embodiment of description.

Claims (47)

1. an equipment, comprising:
Exchcange core, described exchcange core defines single logic entity and has the multilevel interchange frame distributing across multiple frames physically, described multilevel interchange frame has multiple input ports and multiple output port, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port
Described exchcange core is configured to have the first peripheral processor of the first frame and be arranged between the second peripheral processor in the second frame in arrangement provide clog-free connectivity with wire rate.
2. equipment as claimed in claim 1, wherein said multiple peripheral processors comprise that at least one peripheral processor with virtual resources and at least one do not have the peripheral processor of virtual resources.
3. equipment as claimed in claim 1, the number of wherein said multiple input port and described multiple output ports is greater than 1000, and each input port in described multiple input ports and each output port of described multiple output ports are all configured to be not less than the speed operation of 10Gb/s.
4. equipment as claimed in claim 1, wherein:
Described the first peripheral processor and described the second peripheral processor are all in memory node device, computing node device, service node device or router.
5. equipment as claimed in claim 1, wherein:
Described multiple peripheral processor comprises the 3rd peripheral processor, and described exchcange core is configured to provide clog-free connectivity with wire rate between described the second peripheral processor and described the 3rd peripheral processor,
Described exchcange core is configured to receive the first grouping being associated with described the first peripheral processor, described exchcange core be configured to based on described first cell being associated that divides into groups, sequentially send the second grouping and send the 3rd grouping to the 3rd peripheral processor to described the second peripheral processor, the input port that described multilevel interchange frame is configured to from described multiple input ports sends described cell to the output port in described output port.
6. equipment as claimed in claim 5, wherein:
Described the first peripheral processor and described the 3rd peripheral processor are all in memory node device, computing node device, service node device or router; And
Described the second peripheral processor is at least one in firewall device, crossing checkout gear or load balance device.
7. an equipment, comprising:
Exchcange core, described exchcange core has the multilevel interchange frame distributing across multiple frames physically, described multilevel interchange frame has multiple input ports and multiple output port, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port
Described exchcange core is configured to each peripheral processor in described multiple peripheral processors taking wire rate and is provided to the connectedness of each all the other processing unit in described multiple peripheral processor, thereby each output port in described multiple output port can be accessed via an input port in described multiple input ports coequally by each peripheral processor in described multiple peripheral processors.
8. equipment as claimed in claim 7, wherein said multiple peripheral processors comprise that at least one peripheral processor that is couple to described mutual core via Ethernet connection is connected the peripheral processor that is couple to described mutual core via non-Ethernet with at least one.
9. equipment as claimed in claim 7, wherein said multiple peripheral processors comprise peripheral processor and at least one the 4th layer of peripheral processor to the 7th bed device that at least one uses the 3rd layer of route.
10. an equipment, comprising:
Exchcange core, described exchcange core defines single logic entity and has multilevel interchange frame, described multilevel interchange frame has multiple levels that distribute across multiple frames physically, described multiple level has multiple input ports and multiple output port jointly, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port
Described exchcange core is configured to can substantially be ensured and not when the loss by described multilevel interchange frame, allow multiple cells that are associated with grouping to enter the input port in described multiple input port when the transmission of described multiple cells.
11. equipment as claimed in claim 10, wherein said multiple peripheral processors comprise the first peripheral processor that is configured to communicate by letter with fibre channel protocol and the second peripheral processor that is configured to communicate by letter with the Ethernet protocol of fiber channel covering.
12. equipment as claimed in claim 10, wherein said multilevel interchange frame is configured to deterministic network.
13. equipment as claimed in claim 10, wherein said multilevel interchange frame is configured to deterministic network, thereby when described multiple cells are in the time that the scheduled time can be sent to an output port in described multiple output port, described multilevel interchange frame allows described grouping to enter input port.
14. equipment as claimed in claim 10, wherein:
Described output port is the first output port,
Described first output port and second output port of described exchcange core being configured to from described input port to described multiple output ports sends cell multiple and that described grouping is associated, and at least one-level place execution packet loss processing that need to be in multiple levels of described multilevel interchange frame.
15. equipment as claimed in claim 10, wherein:
Described exchcange core comprises multiple edge devices that are couple to described multilevel interchange frame via described multiple input ports and described multiple output port, described multiple edge device is couple to described multiple peripheral processor, and each edge device in described multiple edge devices is configured to receive described grouping and defines described multiple cells based on described grouping.
16. equipment as claimed in claim 10, wherein:
Described exchcange core is configured to multiple level via a described multilevel interchange frame output port from described input port to described multiple output ports and sends cell multiple and that described grouping is associated, and at least one-level place execution packet loss processing that need to be in described multiple levels.
17. 1 kinds of equipment, comprising:
Exchcange core, described exchcange core defines single logic entity and has switching fabric, described switching fabric has multiple levels that distribute across multiple frames physically, described multilevel interchange frame has multiple input ports and multiple output port, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port
The input port that described exchcange core is configured to from described multiple input ports receives grouping, described exchcange core is configured to the output port from described input port to described multiple output ports via described multiple level and sends cell multiple and that described grouping is associated, and at least one-level place execution packet loss processing that need to be in multiple levels of described switching fabric.
18. equipment as claimed in claim 17, wherein said multilevel interchange frame is configured to deterministic network, thereby only have when can substantially ensureing the transmission of the multiple cells that are associated with grouping in described switching fabric and when lossless, just allow the grouping from the input port in described multiple input ports.
19. equipment as claimed in claim 17, wherein:
Described output port is the first output port,
Described first output port of described exchcange core being configured to from described input port to described multiple output ports and the second output port send and described multiple and cell that described grouping is associated.
20. equipment as claimed in claim 17, wherein:
Described exchcange core comprises multiple edge devices that are couple to described multilevel interchange frame via described multiple input ports and described multiple output port, described multiple edge device is couple to described multiple peripheral processor, and each edge device in described multiple edge devices is configured to receive described grouping and defines described multiple cells based on described grouping.
21. 1 kinds of equipment, comprising:
Exchcange core, described exchcange core defines single logic entity and has the multilevel interchange frame of the deterministic network of being configured to, described multilevel interchange frame has multiple input ports and multiple output port, described exchcange core is configured to be couple to multiple peripheral processors via described multiple input ports and described multiple output port
Described exchcange core is configured to input port from described multiple input ports and receives grouping, and the output port of described exchcange core being configured to from described input port to described multiple output ports sends cell multiple and that described grouping is associated.
22. equipment as claimed in claim 21, wherein said multilevel interchange frame distributes across multiple frames physically.
23. equipment as claimed in claim 21, wherein said multilevel interchange frame is configured to deterministic network, thereby only have when can substantially ensureing the transmission of the multiple cells that are associated with grouping in described switching fabric and when lossless, just allow the grouping from the input port in described multiple input ports.
24. equipment as claimed in claim 21, wherein said multilevel interchange frame is configured to deterministic network, thereby when described multiple cells that are associated with grouping are in the time that the scheduled time can be sent to an output port in described multiple output port, described exchcange core is from the grouping of the input port in described multiple input ports.
25. equipment as claimed in claim 21, wherein:
Described exchcange core comprises multiple edge devices that are couple to described multilevel interchange frame via described multiple input ports and described multiple output port, described multiple edge device is couple to described multiple peripheral processor, and each edge device in described multiple edge devices is configured to receive described grouping and defines described multiple cells based on described grouping.
26. equipment as claimed in claim 21, wherein:
Described exchcange core is configured to send cell multiple and that described grouping is associated from described input port to described output port via multiple levels of described multilevel interchange frame, and need to not carry out packet loss processing at least one-level place in described multiple levels.
27. 1 kinds of equipment, comprising:
Exchcange core, described exchcange core has the multilevel interchange frame distributing between multiple frames physically, and described multilevel interchange frame has multiple input buffers and multiple output port, and described exchcange core is configured to be couple to multiple edge devices; With
The controller that does not need software during operation and realize and need software to realize with hardware during configuration and monitoring, described controller is couple to described multiple input buffer and described multiple output port, before described controller is configured to the congested generation in the time of congested being foreseen at an output port place in multiple output ports and in described exchcange core, to an input buffer transmitted traffic control signal in described multiple input buffers.
28. equipment as claimed in claim 27, wherein said controller is configured to be independent of for flow control in the structure of the described multilevel interchange frame of described exchcange core, and described input buffer and described output port are carried out to End-to-end flow control.
29. equipment as claimed in claim 27, wherein said controller is configured to be independent of the flow control for described multiple edge devices, and described input buffer and described output port are carried out to End-to-end flow control.
30. equipment as claimed in claim 27, further comprise:
Multiple peripheral processors that are configured to be couple to described multiple edge devices,
Described controller is configured to be independent of the flow control for described multiple edge devices, and described input buffer and described output port are carried out to End-to-end flow control.
31. equipment as claimed in claim 27, wherein said controller is configured to carry out End-to-end flow control, thereby cell was buffered in described input buffer place's a period of time before being sent to described output port, the described time is associated with described End-to-end flow control.
32. equipment as claimed in claim 27, wherein said controller is configured to be independent of in the cell section of a level place buffer memory of described multilevel interchange frame and is independent of the grouping of an edge device place buffer memory in described multiple edge devices, and the cell at described input buffer place buffer memory is carried out to End-to-end flow control.
33. equipment as claimed in claim 27, wherein said controller is configured to be independent of the flow control mechanism being associated with Ethernet, and the cell at described input buffer place buffer memory is carried out to End-to-end flow control.
34. 1 kinds of equipment, comprising:
Exchcange core, described exchcange core has the multilevel interchange frame distributing between multiple frames physically, and described multilevel interchange frame is configured to receive multiple cells that are associated with grouping and is configured to based on the multiple cell sections of described multiple cell switchings;
Multiple edge devices that are couple to described exchcange core, an edge device in described edge device is configured to receive described grouping, and described edge device is configured to, to described multilevel interchange frame, described multiple cells occur; With
Be couple to the controller of described multilevel interchange frame, described controller is configured to be independent of for the flow control of described multiple edge devices with for flow control in the structure of described multilevel interchange frame, and described multiple cells are carried out to flow control.
35. equipment as claimed in claim 34, wherein:
Described controller does not need software during operation and realizes and during configuration and monitoring, need software to realize with hardware.
36. equipment as claimed in claim 34, wherein:
Described multilevel interchange frame has multiple input buffers and multiple output port,
Before described controller is configured to the congested generation in the time of congested being foreseen at an output port place in described multiple output ports and in described exchcange core, to an input buffer transmitted traffic control signal in described multiple input buffers.
37. equipment as claimed in claim 34, wherein:
Described multilevel interchange frame has multiple input buffers and multiple output port,
Described controller is configured to be independent of the flow control mechanism being associated with Ethernet, and the cell of an input buffer place buffer memory in described multiple input buffers is carried out to End-to-end flow control.
38. 1 kinds of equipment, comprising:
Exchcange core, described exchcange core has multilevel interchange frame;
More than first peripheral processor, be couple to described multilevel interchange frame by multiple connections with agreement, each peripheral processor in described more than first peripheral processor is the memory node with virtual resources, the virtual memory resource that the common definition of described virtual resources of described more than first peripheral processor interconnects by described exchcange core; With
More than second peripheral processor, be couple to described multilevel interchange frame by multiple connections with agreement, each peripheral processor in described more than first peripheral processor is the memory node with virtual resources, and the described virtual resources of described more than second peripheral processor defines the virtual computational resource interconnecting by described exchcange core jointly.
39. equipment as claimed in claim 38, wherein:
Each peripheral processor in described more than first peripheral processor has virtual resources, and each peripheral processor in described more than first peripheral processor is configured such that its virtual resources can be substituted by the virtual resource of all the other peripheral processors from described more than first peripheral processor; And
Each peripheral processor in described more than second peripheral processor has virtual resources, and each peripheral processor in described more than second peripheral processor is configured such that its virtual resources can be substituted by the virtual resource of all the other peripheral processors from described more than second peripheral processor.
40. equipment as claimed in claim 38, wherein:
Described more than first peripheral processor is associated with based on packet communication protocol and is associated with security protocol; And
Described more than second peripheral processor is associated with based on packet communication protocol and is associated with security protocol.
41. 1 kinds of equipment, comprising:
Exchcange core, described exchcange core has multilevel interchange frame, and described exchcange core is configured to be divided in logic the first virtual switch core and the second virtual exchcange core that turns;
Multiple peripheral processors that are couple to described multilevel interchange frame, described multiple peripheral processors have the first peripheral processor subset that is operationally couple to described the first virtual switch core and the second peripheral processor subset that is operationally couple to described the second virtual switch core.
42. equipment as claimed in claim 41, wherein:
Described exchcange core is configured such that described the first virtual switch core and described the second virtual switch core are independent of each other and manages to being managed property.
43. equipment as claimed in claim 41, wherein:
Described exchcange core is configured such that described the first virtual switch core has the bandwidth of the bandwidth that is independent of described the second virtual switch core.
44. equipment as claimed in claim 41, wherein:
Described exchcange core is configured such that described the first virtual switch core has and is independent of described second bandwidth of virtual switch core and the bandwidth of managerial management and managerial management.
45. equipment as claimed in claim 41, wherein:
Described exchcange core is configured such that described the first virtual switch core is used l2 protocol operation, and described the second virtual switch core is used l2 protocol and layer-3 protocol operation.
46. equipment as claimed in claim 41, wherein:
Described the first peripheral processor subset has virtual resource, and described the second peripheral processor subset has virtual resource.
47. equipment as claimed in claim 41, wherein:
Described the first peripheral processor subset comprises it being the peripheral processor of in computing node, memory node, service node device and router, and comprises it being remaining one the peripheral processor in computing node, memory node, service node device and router; And
Described the second peripheral processor subset comprises it being the peripheral processor of in computing node, memory node, service node device and router, and comprises it being remaining one the peripheral processor in computing node, memory node, service node device and router.
CN201410138824.5A 2008-09-11 2009-09-11 System, method and equipment for data center Active CN103916326B (en)

Applications Claiming Priority (25)

Application Number Priority Date Filing Date Title
US9620908P 2008-09-11 2008-09-11
US61/096,209 2008-09-11
US9851608P 2008-09-19 2008-09-19
US61/098,516 2008-09-19
US12/242,230 US8218442B2 (en) 2008-09-11 2008-09-30 Methods and apparatus for flow-controllable multi-staged queues
US12/242,224 2008-09-30
US12/242,230 2008-09-30
US12/242,224 US8154996B2 (en) 2008-09-11 2008-09-30 Methods and apparatus for flow control associated with multi-staged queues
US12/343,728 2008-12-24
US12/343,728 US8325749B2 (en) 2008-12-24 2008-12-24 Methods and apparatus for transmission of groups of cells via a switch fabric
US12/345,500 US8804710B2 (en) 2008-12-29 2008-12-29 System architecture for a scalable and distributed multi-stage switch fabric
US12/345,500 2008-12-29
US12/345,502 US8804711B2 (en) 2008-12-29 2008-12-29 Methods and apparatus related to a modular switch architecture
US12/345,502 2008-12-29
US12/495,364 2009-06-30
US12/495,358 2009-06-30
US12/495,337 2009-06-30
US12/495,344 2009-06-30
US12/495,344 US20100061367A1 (en) 2008-09-11 2009-06-30 Methods and apparatus related to lossless operation within a data center
US12/495,337 US8730954B2 (en) 2008-09-11 2009-06-30 Methods and apparatus related to any-to-any connectivity within a data center
US12/495,364 US9847953B2 (en) 2008-09-11 2009-06-30 Methods and apparatus related to virtualization of data center resources
US12/495,361 US8755396B2 (en) 2008-09-11 2009-06-30 Methods and apparatus related to flow control within a data center switch fabric
US12/495,358 US8335213B2 (en) 2008-09-11 2009-06-30 Methods and apparatus related to low latency within a data center
US12/495,361 2009-06-30
CN200910246898.XA CN101917331B (en) 2008-09-11 2009-09-11 Systems, methods, and apparatus for a data centre

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN200910246898.XA Division CN101917331B (en) 2008-09-11 2009-09-11 Systems, methods, and apparatus for a data centre

Publications (2)

Publication Number Publication Date
CN103916326A true CN103916326A (en) 2014-07-09
CN103916326B CN103916326B (en) 2017-10-31

Family

ID=43324725

Family Applications (2)

Application Number Title Priority Date Filing Date
CN200910246898.XA Active CN101917331B (en) 2008-09-11 2009-09-11 Systems, methods, and apparatus for a data centre
CN201410138824.5A Active CN103916326B (en) 2008-09-11 2009-09-11 System, method and equipment for data center

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN200910246898.XA Active CN101917331B (en) 2008-09-11 2009-09-11 Systems, methods, and apparatus for a data centre

Country Status (1)

Country Link
CN (2) CN101917331B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107104871A (en) * 2016-02-22 2017-08-29 中兴通讯股份有限公司 Subnet interoperability methods and device
CN113099488A (en) * 2019-12-23 2021-07-09 中国移动通信集团陕西有限公司 Method, device, computing equipment and computer storage medium for solving network congestion
CN113630809A (en) * 2021-08-12 2021-11-09 迈普通信技术股份有限公司 Service forwarding method, device and computer readable storage medium
CN115225589A (en) * 2022-07-17 2022-10-21 奕德(广州)科技有限公司 CrossPoint switching method based on virtual packet switching

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102420775A (en) * 2012-01-10 2012-04-18 西安电子科技大学 Routing method for module-expansion-based data center network topology system
US9064216B2 (en) * 2012-06-06 2015-06-23 Juniper Networks, Inc. Identifying likely faulty components in a distributed system
US8755377B2 (en) 2012-06-06 2014-06-17 Juniper Networks, Inc. Facilitating operation of one or more virtual networks
CN103023803B (en) * 2012-12-12 2015-05-20 华中科技大学 Method and system for optimizing virtual links of fiber channel over Ethernet
WO2014096970A2 (en) * 2012-12-20 2014-06-26 Marvell World Trade Ltd. Memory sharing in a network device
US9419892B2 (en) * 2013-09-30 2016-08-16 Juniper Networks, Inc. Methods and apparatus for implementing connectivity between edge devices via a switch fabric
US9787559B1 (en) 2014-03-28 2017-10-10 Juniper Networks, Inc. End-to-end monitoring of overlay networks providing virtualized network services
CN105099939A (en) * 2014-04-23 2015-11-25 株式会社日立制作所 Method and device for implementing flow control among different data centers
CN105577575B (en) * 2014-10-22 2019-09-17 深圳市中兴微电子技术有限公司 A kind of chainlink control method and device
CN105827544B (en) * 2016-03-14 2019-01-22 烽火通信科技股份有限公司 A kind of jamming control method and device for multistage CLOS system
CN107276908B (en) * 2016-04-07 2021-06-11 深圳市中兴微电子技术有限公司 Routing information processing method and packet switching equipment
US10243840B2 (en) * 2017-03-01 2019-03-26 Juniper Networks, Inc. Network interface card switching for virtual networks
US11323312B1 (en) 2020-11-25 2022-05-03 Juniper Networks, Inc. Software-defined network monitoring and fault localization
CN113595935A (en) * 2021-07-20 2021-11-02 锐捷网络股份有限公司 Data center switch architecture and data center
CN113961628B (en) * 2021-12-20 2022-03-22 广州市腾嘉自动化仪表有限公司 Distributed data analysis control system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1098236A (en) * 1993-05-05 1995-02-01 美国电话电报公司 Be used to support installation method away from the line group device of subscriber line equipment
CN1167417A (en) * 1997-03-27 1997-12-10 上海贝尔电话设备制造有限公司 S12 exchanger timing supply method and system thereof
US5945922A (en) * 1996-09-06 1999-08-31 Lucent Technologies Inc. Widesense nonblocking switching networks
EP1128585A2 (en) * 2000-02-21 2001-08-29 Nippon Telegraph and Telephone Corporation Node apparatus and optical wavelength division multiplexing network, and system switching method
US20020136484A1 (en) * 2001-02-05 2002-09-26 Macdonald Robert I. Optical switch matrix with failure protection
CN101132286A (en) * 2006-08-21 2008-02-27 丛林网络公司 Multi-chassis router with multiplexed optical interconnects

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7420969B2 (en) * 2000-11-29 2008-09-02 Rmi Corporation Network switch with a parallel shared memory

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1098236A (en) * 1993-05-05 1995-02-01 美国电话电报公司 Be used to support installation method away from the line group device of subscriber line equipment
US5945922A (en) * 1996-09-06 1999-08-31 Lucent Technologies Inc. Widesense nonblocking switching networks
CN1167417A (en) * 1997-03-27 1997-12-10 上海贝尔电话设备制造有限公司 S12 exchanger timing supply method and system thereof
EP1128585A2 (en) * 2000-02-21 2001-08-29 Nippon Telegraph and Telephone Corporation Node apparatus and optical wavelength division multiplexing network, and system switching method
US20020136484A1 (en) * 2001-02-05 2002-09-26 Macdonald Robert I. Optical switch matrix with failure protection
CN101132286A (en) * 2006-08-21 2008-02-27 丛林网络公司 Multi-chassis router with multiplexed optical interconnects

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107104871A (en) * 2016-02-22 2017-08-29 中兴通讯股份有限公司 Subnet interoperability methods and device
CN107104871B (en) * 2016-02-22 2021-11-19 中兴通讯股份有限公司 Subnet intercommunication method and device
CN113099488A (en) * 2019-12-23 2021-07-09 中国移动通信集团陕西有限公司 Method, device, computing equipment and computer storage medium for solving network congestion
CN113099488B (en) * 2019-12-23 2024-04-09 中国移动通信集团陕西有限公司 Method, device, computing equipment and computer storage medium for solving network congestion
CN113630809A (en) * 2021-08-12 2021-11-09 迈普通信技术股份有限公司 Service forwarding method, device and computer readable storage medium
CN115225589A (en) * 2022-07-17 2022-10-21 奕德(广州)科技有限公司 CrossPoint switching method based on virtual packet switching

Also Published As

Publication number Publication date
CN101917331A (en) 2010-12-15
CN103916326B (en) 2017-10-31
CN101917331B (en) 2014-05-07

Similar Documents

Publication Publication Date Title
CN101917331B (en) Systems, methods, and apparatus for a data centre
US11451491B2 (en) Methods and apparatus related to virtualization of data center resources
US10454849B2 (en) Methods and apparatus related to a flexible data center security architecture
US8730954B2 (en) Methods and apparatus related to any-to-any connectivity within a data center
US8335213B2 (en) Methods and apparatus related to low latency within a data center
US8340088B2 (en) Methods and apparatus related to a low cost data center architecture
US8755396B2 (en) Methods and apparatus related to flow control within a data center switch fabric
CN105323185B (en) Method and apparatus for flow control relevant to switch architecture
US20100061367A1 (en) Methods and apparatus related to lossless operation within a data center
CN105721358B (en) The method and apparatus in multi-hop distributed controll face and single-hop data surface switching fabric system
CN1969509B (en) Network device architecture for centralized packet processing
CN103534989B (en) Flow control based on priority in distributed frame agreement (DFP) exchange network framework
EP2680513B1 (en) Methods and apparatus for providing services in a distributed switch
EP2557742A1 (en) Systems, methods, and apparatus for a data centre
CN102546384B (en) Dynamic resource management method
MXPA04000969A (en) Scalable switching system with intelligent control.
EP2680536B1 (en) Methods and apparatus for providing services in a distributed switch
US9172645B1 (en) Methods and apparatus for destination based hybrid load balancing within a switch fabric
CN102546742A (en) Methods and apparatus for managing next hop identifiers in a distributed switch fabric system
US20220150185A1 (en) Methods and apparatus related to a flexible data center security architecture
Wang et al. Improved power of two choices for fat-tree routing
Yuan A Novel Architecture, Topology, and Flow Control for Data Center Networks
Robles-Gomez et al. Evaluation of a Fabric Management Mechanism for Advanced Switching in Presence of Traffic

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant