CN103916326B - System, method and equipment for data center - Google Patents
System, method and equipment for data center Download PDFInfo
- Publication number
- CN103916326B CN103916326B CN201410138824.5A CN201410138824A CN103916326B CN 103916326 B CN103916326 B CN 103916326B CN 201410138824 A CN201410138824 A CN 201410138824A CN 103916326 B CN103916326 B CN 103916326B
- Authority
- CN
- China
- Prior art keywords
- module
- queue
- peripheral processor
- packet
- exchcange core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
A kind of system, method and equipment for data center.In embodiment of the disclosure, equipment includes the first edge equipment can with PHM packet handling module.First edge equipment is configured as receiving packet.The PHM packet handling module of first edge equipment can be configured as producing multiple cells based on packet.Second edge equipment has the PHM packet handling module for being configured to multiple cell restructuring packets.Multilevel interchange frame can be couple to first edge equipment and second edge equipment.Multilevel interchange frame can define single logic entity.Multilevel interchange frame can have multiple Switching Modules.Each Switching Module in multiple Switching Modules has shared storage device.Multilevel interchange frame can be configured as exchanging multiple cells, so that multiple cells are sent to second edge equipment.
Description
The application is the applying date for September in 2009 11 days, Application No. 200910246898.X and entitled " used
In the system, method and equipment of data center " Chinese patent application divisional application.
The cross reference of related application
Patent application claims are entitled, and " Systems, Apparatus and Methods for a Data Centre (are used
In the system, apparatus and method of data center) " and the U.S. Patent application No.61/098516 that is submitted for 19th in September in 2008
Priority and interests;Require entitled " Methods and Apparatus Related to Flow Control simultaneously
Within a Data Centre (method and apparatus for being related to the control of flow in the data center) " and in September in 2008 11 days
The U.S. Patent application No.61/096209 of submission priority and interests;Both are all fully incorporated by reference herein.
Present patent application is entitled " Methods and Apparatus for Transmission of Groups of
Cell via a Switch Fabric (method and apparatus that cell group is transmitted via switching fabric) " are and December 24 in 2008
The part continuation application for the U.S. Patent application No.12/343728 that day submits;It is entitled " System Architecture for
A Scalable and Distributed Multi-Stage Switch Fabric (are used for scalable and distributed multi-stage
The system architecture of switching fabric) " and the U.S. Patent application No.12/345500 part submitted on December 29th, 2008 after
Continuous application;It is entitled " Methods and Apparatus Related to a Modular Switch Architecture
(method and apparatus for being related to modularization architecture for exchanging) " and the U.S. Patent application No.12/ submitted on December 29th, 2008
345502 part continuation application;It is entitled " Methods and Apparatus for Flow Control Associated
Withs Multi-Stage Queue (being used for the method and apparatus that the flow relevant with multi-queue is controlled) " and in 2008 9
Submit months 30 days, it is desirable to entitled " Methods and Apparatus Related to Flow Control within a
Data Center (method and apparatus for being related to flow control in the data center) ", the U.S. that September in 2008 is submitted on the 11st is special
The part continuation application of profit application No.6I/096209 priority and the U.S. Patent application No.12/242224 of interests;It is name
For " Methods and Apparatus for Flow-Controllable Multi-Staged Queues (are used to can control
The method and apparatus of the multi-queue of flow) " and submitted within 30th in September in 2008, it is desirable to entitled " Methods and
Apparatus Related to Flow Control within a Data Centre (are related to flow control in the data center
The method and apparatus of system) ", the U.S. Patent application No.61/096209 that September in 2008 is submitted on the 11st priority and interests
U.S. Patent application No.12/242230 part continuation application.Each above-mentioned application referred to is quoted completely herein to be made
For reference.
Present patent application or entitled " Methods and Apparatus Related to Any-to-Any
Connectivity within a Data Centre (method and apparatus for being related to any-to-any connectivity in data center) " and in
The part continuation application for the U.S. Patent application No.12/495337 that on June 30th, 2009 submits;It is entitled " Methods and
Apparatus Related to Lossless Operation within a Data Centre (are related to nothing in data center
The method and apparatus for damaging operation) " and the U.S. Patent application No.12/495344 part submitted on June 30th, 2009 continue
Application;It is entitled " Methods and Apparatus Related to Low Latercy within a Data Centre
(method and apparatus for being related to low latency in data center) " and the U.S. Patent application submitted on June 30th, 2009
No.12/495358 part continuation application;It is entitled " Methods and Apparatus Related to Flow
Control within a Data Centre Switch Fabric (are related to the side that flow is controlled in data center's switching fabric
Method and equipment) " and the U.S. Patent application No.12/495361 part continuation application submitted on June 30th, 2009;It is name
For " Methods and Apparatus Related to Virtualization of Data Centre Resources
(method and apparatus for being related to data center resource virtualization) " and the U.S. Patent application submitted on June 30th, 2009
No.12/495364 part continuation application.Each above-mentioned application referred to is all fully incorporated by reference herein.
Technical field
Generally, embodiment is related to data center's equipment, and more particularly relates to exchcange core (switch
Core) and edge device data center systems architecture, apparatus and method.
Background technology
Known architecture for data center systems is related to excessively intractable and complicated method, adds this germline
The expense of system and stand-by period.For example, some known data center networks are made up of three or more switching layer, wherein every
One layer is carried out Ethernet and/or Internet Protocol (IP) packet transaction.Packet transaction and queuing expense are unnecessarily each
Layer is repeated, and directly increases expense and end-to-end stand-by period.Similarly, such known data center network and atypically
Extended in cost-effectively mode:For given data center systems, the increase in number of servers usually requires extra
Port, causes in each layer of more equipment of increase of data center systems.So bad scalability adds such data
The expense of centring system.
Accordingly, there exist include improved architecture, the demand of the data center systems of apparatus and method for improvement.
The content of the invention
In one embodiment, a kind of communication equipment includes the first edge equipment can with PHM packet handling module.First
Edge device can be configured as receiving packet.The PHM packet handling module of first edge equipment can be configured as based on the packet production
Raw multiple cells.Second edge equipment can have PHM packet handling module, and the PHM packet handling module is configured as based on the multiple
Cell re-assemblies the packet.Multilevel interchange frame can be coupled to first edge equipment and second edge equipment.This is more
Level switching fabric can define a single logic entity.The multilevel interchange frame can have multiple Switching Modules.It is multiple to hand over
Each Switching Module changed the mold in block has shared storage device.Multilevel interchange frame can be configured as exchanging multiple letters
Member is so that multiple cells are sent to second edge equipment.
One side in accordance with an embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart defines single logic entity and with the multilevel interchange frame being physically distributed across multiple frames, and multistage exchange is tied
Structure has multiple input ports and multiple output ports, and the exchcange core is configured as via the multiple input port and institute
State multiple output ports and be couple to multiple peripheral processors, the exchcange core is configured as arranging to have the of the first frame
Between one peripheral processor and the second peripheral processor being arranged in the second frame clog-free connection is provided with line rate
Property.
According to one embodiment of the disclosure, the multiple peripheral processor, which includes at least one, has virtual resources
Peripheral processor and at least one do not have virtual resources peripheral processor.
According to one embodiment of the disclosure, the number of the multiple input port and the multiple output port is more than
1000, each output port quilt of each input port and the multiple output port in the multiple input port
It is configured to operate with the speed for being not less than 10Gb/s.
According to one embodiment of the disclosure, first peripheral processor and second peripheral processor are
One in memory node device, calculate node device, service node device or router.
According to one embodiment of the disclosure, the multiple peripheral processor includes the 3rd peripheral processor, described
Exchcange core is configured as providing with line rate between second peripheral processor and the 3rd peripheral processor
Clog-free connectivity, the exchcange core is configured as receiving first packet associated with first peripheral processor,
The exchcange core is configured as, based on the cell associated with the described first packet, sequentially filling to second peripheral processes
Put transmission second packet and send the 3rd packet to the 3rd peripheral processor, the multilevel interchange frame is configured as from institute
State output port of the input port in multiple input ports into the output port and send the cell.
According to one embodiment of the disclosure, first peripheral processor and the 3rd peripheral processor are
One in memory node device, calculate node device, service node device or router;And the second peripheral processes dress
It is at least one in firewall device, intersecting detection means or load balance device to put.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart have physically across multiple frames be distributed multilevel interchange frame, the multilevel interchange frame have multiple input ports and
Multiple output ports, the exchcange core is configured as being couple to via the multiple input port and the multiple output port
Multiple peripheral processors, the exchcange core be configured as using line rate as the multiple peripheral processor in each
Peripheral processor provides the connectedness to each remaining processing unit in the multiple peripheral processor, so that institute
Each output port stated in multiple output ports can be by each peripheral processes in the multiple peripheral processor
Device is coequally accessed via an input port in the multiple input port.
According to one embodiment of the disclosure, the multiple peripheral processor is connected including at least one via Ethernet
The peripheral processor for being couple to the exchcange core is couple to the exchcange core with least one via non-Ethernet connection
Peripheral processor.
According to one embodiment of the disclosure, the multiple peripheral processor includes at least one and uses the 3rd layer of route
Peripheral processor and at least one the 4th layer of peripheral processor to the 7th layer of device.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart defines single logic entity and with multilevel interchange frame, and the multilevel interchange frame has multiple physically across multiple
The level of frame distribution, the multiple level has multiple input ports and multiple output ports jointly, and the exchcange core is configured
To be couple to multiple peripheral processors, the exchcange core quilt via the multiple input port and the multiple output port
It is configured to when the transmission of the multiple cells associated with packet can be essentially ensures that without by the multilevel interchange frame
Loss when, it is allowed to the multiple cell enter the multiple input port in input port.
According to one embodiment of the disclosure, the multiple peripheral processor includes being configured as and fibre channel protocol
First peripheral processor of communication and it is configured as the second peripheral processes for communicating of Ethernet protocol with fiber channel covering
Device.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network, so that when described
When multiple cells can be sent to an output port in the multiple output port in the scheduled time, multistage exchange is tied
Structure allows the packet to enter input port.
According to one embodiment of the disclosure, the exchcange core is configured as from the input port to the multiple defeated
The first output port in exit port and the second output port send multiple cells associated with the packet, without
Packet loss processing is performed at least one-level in multiple levels of the multilevel interchange frame.
According to one embodiment of the disclosure, the exchcange core includes multiple via the multiple input port and described
Multiple output ports are couple to the edge device of the multilevel interchange frame, and the multiple edge device is couple to the multiple outer
Each edge device in processing unit, the multiple edge device is enclosed to be configured as receiving described be grouped and based on described
Packet defines the multiple cell.
According to one embodiment of the disclosure, the exchcange core is configured as via the multiple of the multilevel interchange frame
An output port of the level from the input port into the multiple output port sends multiple associated with the packet
Cell, without performing packet loss processing at least one-level in the multiple level.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart defines single logic entity and with switching fabric, and the switching fabric has multiple physically across the distribution of multiple frames
Level, the multilevel interchange frame has a multiple input ports and multiple output ports, the exchcange core be configured as via
The multiple input port and the multiple output port are couple to multiple peripheral processors, and the exchcange core is configured as
Packet is received from the input port in the multiple input port, the exchcange core is configured as via the multiple level from institute
State output port of the input port into the multiple output port and send multiple cells associated with the packet, without
Packet loss processing is performed at least one-level in multiple levels of the switching fabric.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network, so as to only work as
The transmission for the multiple cells associated with packet that can be essentially ensures that in the switching fabric and it is lossless when, just allow
The packet of input port in the multiple input port.
According to one embodiment of the disclosure, the output port is the first output port, and the exchcange core is configured
To be sent and institute from first output port of the input port into the multiple output port and the second output port
State multiple cells associated with the packet.
According to one embodiment of the disclosure, the exchcange core includes multiple via the multiple input port and described
Multiple output ports are couple to the edge device of the multilevel interchange frame, and the multiple edge device is couple to the multiple outer
Each edge device in processing unit, the multiple edge device is enclosed to be configured as receiving described be grouped and based on described
Packet defines the multiple cell.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart defines single logic entity and the multilevel interchange frame with being configured to determine that property network, the multilevel interchange frame tool
There are a multiple input ports and multiple output ports, the exchcange core is configured as via the multiple input port and described many
Individual output port is couple to multiple peripheral processors, and the exchcange core is configured as from defeated in the multiple input port
Inbound port receives packet, and the exchcange core is configured as the output end into the multiple output port from the input port
Mouth sends multiple cells associated with the packet.
According to one embodiment of the disclosure, the multilevel interchange frame is physically distributed across multiple frames.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network, so as to only work as
The transmission for the multiple cells associated with packet that can be essentially ensures that in the switching fabric and it is lossless when, just allow
The packet of input port in the multiple input port.
According to one embodiment of the disclosure, being configured to determine that property of the multilevel interchange frame network, so that when described
When multiple cells associated with packet can be sent to an output port in the multiple output port in the scheduled time,
The packet of input port of the exchcange core in the multiple input port.
According to one embodiment of the disclosure, the exchcange core includes multiple via the multiple input port and described
Multiple output ports are couple to the edge device of the multilevel interchange frame, and the multiple edge device is couple to the multiple outer
Each edge device in processing unit, the multiple edge device is enclosed to be configured as receiving described be grouped and based on described
Packet defines the multiple cell.
According to one embodiment of the disclosure, the exchcange core is configured as via the multiple of the multilevel interchange frame
Level sends multiple cells associated with the packet from the input port to the output port, without described many
Packet loss processing is performed at least one-level in individual level.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart has the multilevel interchange frame being physically distributed between multiple frames, and the multilevel interchange frame has multiple inputs slow
Device and multiple output ports are rushed, the exchcange core is configured to couple to multiple edge devices;During operation need not
Software and the controller for realizing and being needed during configuration and monitoring software to realize with hardware, the controller is couple to institute
Multiple input buffers and the multiple output port are stated, the controller is configured to when one in multiple output ports is defeated
Before congestion when congestion at exit port is foreseen and in the exchcange core occurs, to the multiple input buffer
In an input buffer transmitted traffic control signal.
According to one embodiment of the disclosure, the controller is configured as independently of for described in the exchcange core
Flow is controlled in the structure of multilevel interchange frame, and end-to-end flux control is performed to the input buffer and the output port
System.
According to one embodiment of the disclosure, the controller is configured as independently of for the multiple edge device
Flow is controlled, and End-to-end flow control is performed to the input buffer and the output port.
According to one embodiment of the disclosure, multiple peripheral processes dresses for being configured to couple to the multiple edge device
Put, the controller is configured as independently of the flow control for the multiple edge device, to the input buffer and
The output port performs End-to-end flow control.
According to one embodiment of the disclosure, the controller is configured as performing End-to-end flow control, so that cell
It is buffered in being sent to before the output port at the input buffer for a period of time, the time arrives with the end
Hold flow control associated.
According to one embodiment of the disclosure, the controller is configured as independently of the one of the multilevel interchange frame
At individual level cache cell section and independently of at an edge device in the multiple edge device cache packet, it is right
The cell cached at the input buffer performs End-to-end flow control.
According to one embodiment of the disclosure, the controller is configured as independently of the flow control associated with Ethernet
Making mechanism, End-to-end flow control is performed to the cell cached at the input buffer.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart has the multilevel interchange frame being physically distributed between multiple frames, and the multilevel interchange frame is configured as receiving many
The individual cell associated with packet and it is configured as based on the multiple cell switching multiple cells section;It is multiple be couple to it is described
An edge device in the edge device of exchcange core, the edge device is configured as receiving the packet, the edge
Device configuration is to send the multiple cell to the multilevel interchange frame;With the control for being couple to the multilevel interchange frame
Device, the controller is configured as tying independently of the flow control for the multiple edge device and for multistage exchange
Flow is controlled in the structure of structure, and the multiple cell traffic is controlled.
According to one embodiment of the disclosure, the controller during operation do not need software and realized with hardware, with
And need software to realize during configuration and monitoring.
According to one embodiment of the disclosure, the multilevel interchange frame has multiple input buffers and multiple output ends
Mouthful, when the congestion that the controller is configured as at an output port in the multiple output port is foreseen and
Before congestion in the exchcange core occurs, an input buffer transmitted traffic into the multiple input buffer
Control signal.
According to one embodiment of the disclosure, the multilevel interchange frame has multiple input buffers and multiple output ends
Mouthful, the controller is configured as independently of the flow control mechanism associated with Ethernet, to being buffered in the multiple input
The cell cached at an input buffer in device performs End-to-end flow control.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart has multilevel interchange frame;More than first peripheral processor, the multistage is couple to by multiple connections with agreement
Each peripheral processor in switching fabric, more than first peripheral processor is the storage with virtual resources
Node, the void that the virtual resources common definition of more than first peripheral processor is interconnected by the exchcange core
Intend storage resource;With more than second peripheral processor, it is couple to multistage exchange by multiple connections with agreement and ties
Each peripheral processor in structure, more than second peripheral processor is the memory node with virtual resources,
The virtual meter that the virtual resources common definition of more than second peripheral processor is interconnected by the exchcange core
Calculate resource.
According to one embodiment of the disclosure, each peripheral processor in more than first peripheral processor
With virtual resources, each peripheral processor in more than first peripheral processor is configured such that it is empty
Planization resource can be substituted by the virtual resource of remaining peripheral processor in more than first peripheral processor;
And each peripheral processor in more than second peripheral processor has virtual resources, more than described second
Each peripheral processor in peripheral processor is configured such that its virtual resources can be by from described second
The virtual resource of remaining peripheral processor in multiple peripheral processors is substituted.
According to one embodiment of the disclosure, more than first peripheral processor is related to based on packet communication protocol
Join and associated with security protocol;And more than second peripheral processor is associated simultaneously with based on packet communication protocol
And it is associated with security protocol.
According to the another aspect of the embodiment of the present disclosure there is provided a kind of communication equipment, including:Exchcange core, the exchange core
The heart has a multilevel interchange frame, and the exchcange core is configured as being logically divided into the first virtual switch core and second virtual
Turn exchcange core;Multiple peripheral processors for being couple to the multilevel interchange frame, the multiple peripheral processor has
It is operably coupled to the first peripheral processor subset of the first virtual switch core and is operably coupled to described
Second peripheral processor subset of the second virtual switch core.
According to one embodiment of the disclosure, the exchcange core be configured such that the first virtual switch core and
The second virtual switch core is managed to being managed property independently from each other.
According to one embodiment of the disclosure, the exchcange core is configured such that the first virtual switch core tool
There is the bandwidth of the bandwidth independently of the second virtual switch core.
According to one embodiment of the disclosure, the exchcange core is configured such that the first virtual switch core tool
There are the bandwidth with the second virtual switch core and the independent bandwidth of managerial management and managerial management.
According to one embodiment of the disclosure, the exchcange core is configured such that the first virtual switch core makes
Operated with l2 protocol, and the second virtual switch core is operated using l2 protocol and layer-3 protocol.
According to one embodiment of the disclosure, the first peripheral processor subset has virtual resource, described second
Peripheral processor subset has virtual resource.
According to one embodiment of the disclosure, the first peripheral processor subset includes being calculate node, storage section
The peripheral processor of one in point, service node device and router, and including being calculate node, memory node, clothes
Remaining one peripheral processor being engaged in node apparatus and router;And the second peripheral processor subset bag
It is the peripheral processor of one in calculate node, memory node, service node device and router to include, and including being meter
Remaining one peripheral processor in operator node, memory node, service node device and router.
Brief description of the drawings
Fig. 1 is the system block diagram of the data center (DC) according to one embodiment.
Fig. 2 is the schematic diagram for being shown to possess the example of data center's part of any-to-any connectivity according to one embodiment.
Fig. 3 is the schematic diagram for showing the resource logic group associated with data center according to one embodiment.
Fig. 4 A are to show the schematic diagram of switching fabric that can be included in exchcange core according to one embodiment.
Fig. 4 B are to show the swap table in the memory module that can be stored in shown in Fig. 4 A according to one embodiment
Schematic diagram.
Fig. 5 A are the schematic diagrames for showing switching fabric system according to one embodiment.
Fig. 5 B are the schematic diagrames for showing input/output module according to one embodiment.
Fig. 6 is the schematic diagram for the switching fabric system part for showing Fig. 5 A according to one embodiment.
Fig. 7 is the schematic diagram for the switching fabric system part for showing Fig. 5 A according to one embodiment.
Fig. 8 and 9 respectively illustrates the front view and rearview of the shell for covering switching fabric according to one embodiment.
Figure 10 shows a part for shell in Fig. 8 according to one embodiment.
Figure 11 and 12 is the switching fabric shown respectively according to another embodiment in the first configuration and the second configuration respectively
Schematic diagram.
Figure 13 is the schematic diagram for showing the data flow associated with switching fabric according to one embodiment.
Figure 14 is the schematic diagram that flow is controlled in switching fabric according to showing in Figure 13 one embodiment.
Figure 15 is the schematic diagram for showing buffer module according to one embodiment.
Figure 16 A are being configured to via entering that the switching fabric coordination cell group of exchcange core is transmitted according to one embodiment
The schematic block diagram of mouth scheduler module and outlet scheduler module.
Figure 16 B are to be shown to be related to the signaling process figure that cell group transmits signaling according to one embodiment.
Figure 17 be according to one embodiment show be arranged at switching fabric entrance side entry queue queue up two
The schematic block diagram of cell group.
Figure 18 be according to another embodiment show be arranged at switching fabric entrance side entry queue queue up two
The schematic block diagram of cell group.
Figure 19 is the flow chart for showing the method via the transmission of switching fabric scheduling cells group according to one embodiment.
Figure 20 is the signaling process figure for showing the processing request sequence value relevant with transmission request according to one embodiment.
Figure 21 is the signaling process figure for showing the response sequence value associated with transmission response according to one embodiment.
Figure 22 is the schematic block diagram for showing the controllable queue of multistage flow according to one embodiment.
Figure 23 is the schematic block diagram for showing the controllable queue of multistage flow according to one embodiment.
Figure 24 is to be shown to be configured to define the flow control signal associated with multiple receiving queues according to one embodiment
Destination control module schematic block diagram.
Figure 25 is the schematic diagram for showing flow control packet according to one embodiment.
Embodiment
Fig. 1 is to show data center (DC) 100 (for example, in super data center, idealization data according to one embodiment
The heart) schematic diagram.Data center 100 includes exchcange core (SC) 180, is operably connected to the peripheral processes dress of 4 types
Put 170:Calculate node 110, service node 120, router 130 and memory node 140.In this embodiment, data center manages
Reason (DCM) module 190 is configured as the operation that control (for example manages) data center 100.In certain embodiments, data center
100 can be referred to as data center.In certain embodiments, peripheral processor can include one or more virtual resources example
Such as virtual machine.
Each peripheral processor 170 is configured to communicate via the exchcange core 180 of data center 100.Especially
Ground, the exchcange core 180 of data center 100 is configured as between peripheral processor 170 carrying with the relatively low stand-by period
For any-to-any connectivity.For example, exchcange core 180 can be configured as in one or more calculate nodes 110 and one or more
Data are sent and (for example transmitted) between memory node 140.In certain embodiments, exchcange core 180 can have at least hundreds of
Or thousands of ports (for example, the port of export and/or arrival end), can be sent by these port peripheral processing units 170 and/or
Receive data.Peripheral processor 170 includes one or more Network Interface Units (such as NIC (NIC), 10G ratios
Special (Gb) Ethernet concentrates network adapter (CNA) device), by these Network Interface Units, peripheral processor 170 can
Transmit a signal to exchcange core 180 and/or receive signal from exchcange core 180.Signal can be outer via being operably coupled to
The physical link and/or Radio Link for enclosing processing unit 170 are sent to exchcange core 180 and/or connect from exchcange core 180
Receive.In certain embodiments, peripheral processor 170 can be configured as based on one or more agreements (such as Ethernet association
View, the Ethernet protocol (fibre- of multiprotocol label switching (MPLS) agreement, fibre channel protocol, fiber channel covering
Channel-over Ethernet protocol), be related to the agreement (Infiniband-related of infinite bandwidth
Protocol)) send data to exchcange core 180 and/or receive data from exchcange core 180.
In certain embodiments, exchcange core 180 can be that (can for example possess function) individually merges exchange
(consolidated switch) (such as single large scale merges L2/L3 and exchanges (large-scale consolidated
L2/L3 switch)).In other words, exchcange core 180 can be configured as with being for example configured as being connected phase via Ethernet
The heterogeneous networks element set of mutual communication as single logic entity (such as single logical network element) on the contrary, grasp
Make.Exchcange core 180 can be configured as in data center 100 connecting (for example, communication between being easy to it) calculate node
110th, memory node 140, service node 120 and/or router 130.In certain embodiments, exchcange core 180 can by with
It is set to and is communicated via interface arrangement, wherein interface arrangement is configured as with least 10Gb/s rate sending data.In some realities
Apply in example, exchcange core 180 can be configured as communicating via interface arrangement (such as fibre channel interface device), the interface
Device is configured as with such as 2Gb/s, 4Gb/s, 8Gb/s, 10Gb/s, 40Gb/s, 100Gb/s and/or faster link rate
Send data.
Although exchcange core 180 can be logical centralization, the implementation of exchcange core 180 can be that height is distributed
, such as reliability.Intersect for example, several parts of exchcange core 180 can be physical distribution, for example, many frames.
In some embodiments, the processing level segment of such as exchcange core 180 can be included in the first frame and exchcange core 180 it is another
One processing level segment can be included in the second frame.Two processing level segments can serve as individually merging exchange part in logic
Point.More details about the architecture of exchcange core 180 will be together described with reference to accompanying drawing 4 to 13.
As shown in fig. 1, exchcange core 180 includes marginal portion 185 and switching fabric 187.Marginal portion 185 can be wrapped
Edge device (not shown) is included, can be worked as the gateway apparatus between switching fabric 187 and peripheral processor 170.
In some embodiments, the edge device in marginal portion 185 can jointly have thousands of ports (such as 100000 ends
Mouth, 500000 ports), it can be sent into (for example, road by data of these ports from peripheral processor 170
By) one or more parts of exchcange core 180 and/or send from one or more parts of exchcange core 180.One
In a little embodiments, edge device can be referred to as access and exchange (access switch), network equipment, and/or input/output
Module (for example, being shown in Fig. 5 A and Fig. 5 B).In certain embodiments, edge device can be included in the frame of such as frame
Push up in (TOR).
Data can peripheral processor 170, exchcange core 180, the switching fabric 187 of exchcange core 180, and/or
Put down based on different at (such as the edge device in the marginal portion 185 is included in) place of marginal portion 185 of exchcange core 180
Platform is processed.For example, in one or more peripheral processors 170 and the communication between the edge device of marginal portion 185
Can be the data packet flows based on Ethernet protocol or non-Ethernet protocol definition.In certain embodiments, at a variety of data
Reason can be performed in the edge device in marginal portion 185, rather than be performed in the switching fabric 187 of exchcange core 180.Example
Such as, packet can be resolvable to cell at the edge device of marginal portion 185, and the cell is sent out from edge device
It is sent to switching fabric 187.Cell can be resolved to section (segment) and in the switching fabric 187 as fragment (in some realities
Section (flits) can also be referred to as by applying in example) sent.In certain embodiments, packet can be in switching fabric 187
A part of place is resolved to cell.In certain embodiments, Congestion Control Solution and/or the data (example via switching fabric 187
Such as cell) transmitting and scheduling can (for example access be exchanged in the edge device inside the marginal portion 185 of switching centre 180
(access switches)) it is practiced and carried.However, Congestion Control Solution and/or data transmission scheduling cannot be in definition
Performed in the module of switching fabric 187.It is related to packet, cell and/or the fragment processing of the component internal of data center
More details will be described below.For example, the more details for being related to cell processing will at least combine Figure 16 A to Figure 21 and describe.
In certain embodiments, the edge device in marginal portion 185 can be configured as classification, such as in exchcange core
180 packets received from peripheral processor 170.Especially, the edge in the marginal portion 185 of exchcange core 180 is set
The standby classification that can be configured as performing ethernet type, it can be included based on such as the 2nd layer ethernet address (such as media
Access Control (MAC) address) and/or the 4th layer of ethernet address (such as universal datagram protocol (UDP) address) classification.
In some embodiments, destination can be based on true for example in the classification of the packet of the marginal portion 185 of exchcange core 180
It is fixed.For example, first edge equipment can packet-based Classification and Identification second edge equipment as the packet destination.Packet
Cell can be resolvable to and switching fabric 187 is sent to from first edge equipment.Cell can be handed over by switching fabric 187
Change, so that they can be sent to second edge equipment.In certain embodiments, cell can pass through the base of switching fabric 187
In be related to destination and with cell be associated information and exchange.
Security strategy on exchcange core 180 can be applied more effectively, because being sorted in the independent of exchcange core 180
Logical layer, is performed in the marginal portion 185 of exchcange core 180.Especially, many security strategies can be during classifying with relative
Unified and seamless way is applied in the marginal portion 185 of exchcange core 180.
Such as Fig. 5 A, Fig. 5 B and Figure 19 descriptions will be combined by being related to the more details of the packet classification in data center.It is related to
The additional detail for the packet classification being associated in data center is in entitled " Methods and Apparatus Related to
Packet Classification Associated with a Multi-Stage Switch (are related to relevant with multistage exchange
Packet classification method and apparatus) " and the U.S. patent application serial number 12/242168 submitted for 30th in September in 2008 and
Entitled " Methods and Apparatus for Packet Classification Based on Policy Vectors
(method and apparatus of the packet classification based on strategic vector) " and the U.S. patent application serial number submitted for 30th in September in 2008
Described in 12/242172, both is all fully incorporated by reference herein.
Exchcange core 180 can be defined not to be held so as to the classification of data (such as packet) in switching fabric 187
OK.Therefore, although switching fabric 187 can have multistage, but multistage does not need topology to redirect, and be performed in the topology is redirected
Data are classified, and switching fabric 187 can define single topology and redirect.As replacement, (core is for example exchanged in edge device
Edge device inside the marginal portion 185 of the heart 180) based on classification determine destination information can be used switching fabric
Exchange (exchange of such as cell) inside 187.The more details being related in the inner exchanging of switching fabric 187 will combine for example attached
Fig. 4 A and 4B are described.
In certain embodiments, the processing for being related to classification can be included in edge device (for example, input/output module)
Sort module (not shown) perform.By packet parsing into cell, via switching fabric 187 cell transmission scheduling, packet and/
Or cell restructuring and/or etc. can be held in the processing module (not shown) of edge device (for example, input/output module)
OK.In certain embodiments, sort module can be referred to as being grouped sort module, and/or processing module can be referred to as packet
Processing module.Fig. 5 B descriptions will be combined by being related to the more details of the edge device including sort module and processing module.
In certain embodiments, one or more parts of data center 100 can (or can include) be based on hardware
Module (for example, application specific integrated circuit (ASIC), digital signal processor (DSP), field programmable gate array (FPGA)) and/
Or the module (for example, computer code module, the processor readable instruction sets that can be performed on a processor) based on software.
In some embodiments, one or more work(relevant from data center 100 can be included in different modules and/or be tied
Close in one or more modules.For example, data center management module 190 can be the combination of hardware module and software module,
It is configured as managing the resource (such as resource of exchcange core 180) in data center 100
One or more calculate nodes 110 can be general purpose computing engines, and it can include such as processor, storage
Device, and/or one or more Network Interface Units (such as NIC (NIC)).In certain embodiments, calculate node
Processor in 110 can be the part in one or more cache coherence domains.
In certain embodiments, calculate node 110 can be host apparatus, server and/or etc..In some embodiments
In, one or more calculate nodes 110 can have virtual resources, so that any calculate node 110 (or part thereof) can
Be used in alternate data center 100 any other calculate node 110 (or part thereof).
One or more memory nodes 140 can include such as processor, memory, locally-attached magnetic disk storage
And/or the device of one or more Network Interface Units.In certain embodiments, memory node 140 can have special module
(for example, hardware module and/or software module), is configured such that for example one or more calculate nodes 110 can be via friendship
Core 180 is changed to read the data from one or more memory nodes 140 or write data to one or more memory nodes
140.In certain embodiments, one or more memory nodes 140 can have virtual resources, so that any memory node
140 (or part thereof) can be used in alternate data center 100 any other memory node 140 (or part thereof).
One or more service nodes 120 can be the 4th layer to the 7th layer device of open system interconnection (OSI), and it can be wrapped
Include such as processor (for example, network processing unit), memory, and/or one or more Network Interface Units (for example, 10Gb with
Too net device).In certain embodiments, service node 120 can include hardware and/or software, be configured as to relatively heavy
Network live load performs calculating.In certain embodiments, service node 120 can be configured as based on each packet with phase
Mode to effectively (such as more effective than being performed in such as calculate node 110) performs calculating.Calculating may include for example full shape
The calculating of state fire wall, intrusion detection and prevention (IDP) calculating, extensible markup language (XML) speed-up computation, transmission control protocol
(TCP) terminal is calculated, and/or application level load balance is calculated.In certain embodiments, one or more service nodes 120
There can be virtual resources, so that any service node 120 (or part thereof) can be used to inside alternate data center 100
Any other service node 120 (or part thereof).
One or more routers 130 can be network equipment, be configured as connecting at least a portion of data center 100
To another network (such as fhe global the Internet).For example, as shown in figure 1, exchcange core 180 can be configured as by router
130 communicate with network 135 and network 137.Although it is not shown, in certain embodiments, one or more energy of router 130
Communication enough between the activation inner assembly of data center 100 (for example, peripheral processor 170, part of exchcange core 180).It is logical
Letter can be defined based on such as layer 3 routing protocol.In certain embodiments, one or more routers 130 can have one
Individual or multiple Network Interface Units (for example, 10Gb Ethernet devices), by the Network Interface Unit router 130 can to/
Or send and/or receive signal from such as exchcange core 180 and/or other peripheral processors 170.
It is related to the more details of virtual resources in data center in entitled " Methods and Apparatus for
Determining a Network Topology During Network Provisioning (are used for during network provisioning
The method and apparatus for determining network topology) " and the Copending U.S. Patent Application No.12/ that is submitted on December 30th, 2008
346623rd, entitled " Methods and Apparatus for Distributed Dynamic Network
Provisioning (being used for the method and apparatus that dynamic network supplies distribution) " and submitted on December 30th, 2008 it is common not
Certainly U.S. Patent application No.12/346632 and entitled " Methods and Apparatus for Distributed
The Dynamic Network Provisioning method and apparatus of distributed dynamic network provisioning (be used for) " and in 2008 12
Illustrated in the Copending U.S. Patent Application No.12/346630 that the moon is submitted on the 30th, all these applications are all quoted herein to be made
For reference.
As described above, exchcange core 180 can be configured with the function of independent general switch, it can be by data center
Any peripheral processor 170 in 100 is connected to any other peripheral processor 170.Especially, the energy of exchcange core 180
It is configured as providing between exchcange core 180 in peripheral processor 170 (such as relatively many peripheral processor 170) and appoints
Meaning connectivity, except those by the bandwidth of Network Interface Unit and by light velocity signaling delay (when also referred to as the light velocity is waited
Between) apply limitation outside, substantially without visible limitation, Network Interface Unit connection peripheral processor 170 to friendship
Change core 180.In other words, exchcange core 180 can be configured such that each peripheral processor 170 seems direct
It is interconnected to the every other peripheral processor in data center 100.In certain embodiments, exchcange core 180 can be configured
For enable peripheral processor 170 via exchcange core 180 with line speed (line rate) (or substantially with circuit
Speed) communication.Being schematically illustrated in Fig. 2 for any-to-any connectivity is shown.
In addition, exchcange core 180 can handle any peripheral processes for example communicated with exchcange core 180 in the desired manner
The migration of virtual resource between device 170, because exchcange core 180 has the function of independent logic entity.Therefore, in periphery
Virtual resource migration circle in processing unit 170 can be across the essentially all port (example for being couple to exchcange core 180
Such as, all of the port of the edge device 185 of exchcange core 180).
In certain embodiments, the offer associated with virtual resource migration can be partially by network management module at
Reason.The network management entity or network management module of concentration can be closed with network equipment (for example, several parts of exchcange core 180)
Make to collect and manage network topological information.For example, because resource is attachment or independently of network equipment, network equipment can be by
The information that current operation is coupled to the relevant resource (virtual and physics) of network equipment is pushed to network management module.For example
The external management devices of peripheral processor management tool (for example, server management tool) and/or network-management tool can be with
Network management module communication sends network provisioning instruction with other resources into network equipment and network, without network
Static state description.Such system avoids the difficulty of static network description and by other types peripheral processor 170 and network
Network performance is degenerated caused by management system.
In one embodiment, server management tool or external management devices are communicated with network management module with to network
Device provides the virtual resource relevant with peripheral processor 170, and determine mode of operation or situation (such as operation, pause or
Migration) and virtual resource position in a network.Virtual resource can be that the access in via data center exchanges (example
Such as, it is included in the access in marginal portion 187 to exchange) it is coupled to the peripheral processor 170 (for example, server) of switching fabric
The virtual machine of upper execution.The peripheral processor 170 of many types can be coupled to switching fabric via access exchange.
Do not rely on what network topological information discovery and/or (including virtual resource is bundled on network equipment) were managed
Static network is described, and network management module is exchanged with access and external management devices communicate and cooperates to find or determine network
Topology information.Virtual machine on initialization (and/or start) main frame (and/or other kinds of peripheral processor 170) it
Afterwards, external management devices can provide the device identifier of virtual machine to network management module.The device identifier can be, example
Such as media access protocol (" MAC ") address, virtual machine or peripheral processes of virtual machine or the network interface of peripheral processor 170
Title, globally unique identifier (" GUID "), and/or the virtual resource of device 170 or peripheral processor 170 it is general unique
Identifier (" UUID ").GUID is needed not be on all-network, virtual resource, peripheral processor 170, and/or network dress
That puts is globally unique, but it is unique in the network or Webisode managed by network management module.In addition, outside
The port offer that management entity can provide the access exchange being connected to for the peripheral processor 170 to management virtual machine refers to
Order.Access exchange energy detection virtual machine has been initialised, has started, and/or has been moved to peripheral processor 170.Detecting
After virtual machine, access, which is exchanged, can inquire peripheral processor 170 about peripheral processor 170 and/or the letter of virtual machine
Breath, including such as device identifier of peripheral processor 170 or virtual machine.
Access exchange can inquire or ask for example using such as Link Layer Discovery Protocol (" LLDP "), some be based on other
Standard or known protocols, or proprietary protocol virtual machine device identifier information, wherein the virtual machine be configured as via
Above-mentioned protocol communication.Alternatively, virtual machine can detect its be already connected to access exchange after, using for example with
The too information (device identifier for including virtual machine) of net or IP broadcast multicasting on its own.
Access exchange and then push virtual bench device identifier (sometimes referred to as virtual unit identifier) and,
In certain embodiments, the other information from virtual machine reception is to network management module.Handed in addition, access exchange energy pushes access
The device identifier and the port identifiers of access switching port changed is to network management module, the peripheral processes dress of control virtual machine
Put 170 and be connected to the access exchange.The informational function is used as the description of virtual machine position in network, and define will be virtual
Machine is bundled into the peripheral processor 170 for network management module and external management devices.In other words, the letter is being received
After breath, network management module can by the device identifier of virtual machine to it is specific access exchange on particular port it is related
Connection, the virtual machine (and/or peripheral processor 170 of operation virtual machine) is connected to the specific access and exchanged.
Device identifier that the device identifier of virtual machine, access are exchanged, port identifiers and carried by external management devices
The supply instruction of confession can be stored in the accessible memory of network management module.For example, the device identifier of virtual machine,
Accessing the device identifier exchanged and port identifiers can be stored in the memory for being configured as database, so that
Device identifier, port identifiers and the supply that the database query of device identifier based on virtual machine returns to access exchange refer to
Order.
Because network management module can be based on virtual machine position of the device identifier to virtual machine in a network it is related
Connection, external management devices do not need the topology of attentional network or are bundled on peripheral processor 170 to provide net by virtual machine
Network resource (for example, network equipment, virtual machine, virtual switch or physical server).In other words, external management devices are as network
In interconnection and virtual machine in a network position (for example, in a network which access exchange which port, which periphery
In processing unit 170) it is equally unknowable, and can the equipment based on the virtual machine that peripheral processor 170 is controlled in network
The access that identifier is provided in network is exchanged.In certain embodiments, external management devices can also provide physical peripheral processing dress
Put 170.In addition, because network management module be dynamically determined and manage network topological information, external management devices not against for
The network static description of supply network.
As used in this specification, supply can include polytype or the device and/or software module of form
Set, configure and/or adjust.For example, supply can include the network equipment that such as network switch is configured based on network strategy.
More particularly, for example, network provisioning can include:Configuration network device is operated as the 2nd layer or layer 3 network interchanger;Change
Become the routing table of network equipment;Update be operably coupled to network equipment equipment security strategy and/or device address or
Device identifier;Selection network equipment is implemented using which procotol;Set for example for the virtual of network equipment port
The network segment identifier of LAN (" VLAN ") mark;And/or application access control listses (" ACL ") arrive network equipment.Should
Network exchange function is provided or configured, so that the rule and/or access restriction that are defined by network strategy are applied to from network
The packet that interchanger passes through.In certain embodiments, virtual bench is provided.Virtual bench can for example be realized empty
Intend the software module of exchange, virtual router or virtual gateway, it is configured as the medium operation between physical network
And it is controlled by the host apparatus of such as peripheral processor 170.In certain embodiments, supply can include setting up virtual terminal
Mouth or the connection between virtual resource and virtual bench.
Fig. 2 is the signal of the example for the part for showing the data center with any-to-any connectivity according to one embodiment
Figure.It is each (210 groups of peripheral processor is come from) as shown in Fig. 2 peripheral processor PD to be connected to via exchcange core 280
Individual peripheral processor 210.In certain embodiments, for clarity, only from peripheral processor PD to other peripheral processes
The connection of device 210 (except peripheral processor PD) is illustrated.
In certain embodiments, exchcange core 280 is defined, so that exchcange core 280 is fair in some sense,
The bandwidth of purpose link i.e. between peripheral processor PD and other peripheral processors 210 is by substantially reasonably competing
Shared between the peripheral processor 210 striven.For example, when some (or whole) peripheral processors 210 shown in Fig. 2 attempt
When preset time accesses peripheral processor PD, access peripheral processor PD's available for each peripheral processor 280
Bandwidth (for example, i.e. Time Bandwidth) will be substantially identical.In certain embodiments, exchcange core 280 can be configured such that
Some (or all) peripheral processors 210 can with peripheral processor PD with full bandwidth (for example, peripheral processor PD's is complete
Bandwidth) and/or communicated in clog-free mode.In addition, exchcange core 280 can be configured such that by peripheral processor
(coming from peripheral processor 210) to peripheral processor PD access can not be by other peripheral processors and peripheral processes
Other links (for example, in the presence of or attempt) limitation between device PD.
In certain embodiments, the attribute of exchcange core 280, any-to-any connectivity, low latency, fairness and/or is waited
Etc. given type (such as memory node type, the calculating section for enabling to be connected to (for example, communicating with) exchcange core 280
Vertex type) peripheral processor 210 can interchangeably be treated (for example, relative to other processing units 210 and exchange core
The position of the heart 280 is independent).This can be referred to as interchangeability, and can promote to include the validity of the data center of exchcange core 280
And simplification.Even if exchcange core 280 may have substantial amounts of port (for example, more than 1000 ports), exchcange core 280 is still
Can have any-to-any connectivity and/or the attribute of fairness, so that each port can operate (example at a relatively high speed
Such as, operated with the speed more than 10Gb/s).This need not necessarily include the special interconnection of such as supercomputer and/or is not required to
The complete prophet of all communication patterns can be achieved with.It is related to the exchcange core system with any-to-any connectivity and/or fairness
The more details of structure will be described with reference to accompanying drawing 4 to 13 at least in part.
Referring again to Fig. 1, in certain embodiments, data center 100 is configured as allowing flexible over-booking
(oversubscription).In certain embodiments, by flexible over-booking, network infrastructure is (for example, be related to exchange
The network infrastructure of core 180) the relative cost cost that for example can calculate and store relatively be lowered.For example, in number
Can be as flexibly merging resource operation, so that with the according to the resource (such as all resources) in the exchcange core 180 at center 100
The associated resource underused of one application (or application collection) can be by second during the peak value processing of the such as second application
Dynamically provided and used using (or application collection).Therefore, the resource (or subset of resource) of data center 100 can be configured as
If being strictly assigned as storage resources than resource and distributing to application-specific (or application collection) and can more effectively handle excess to order
Purchase.If managed as storage resources, over-booking can be implemented only in storage resources, rather than for example across whole number
According to center 107.In certain embodiments, the one or more agreements and/or component in data center 100 can be based on open marks
Accurate (such as Institute of Electrical and Electric Engineers (IEEE) standard, Internet Engineering Task group (IETF) standard, international information technology
Standard committee (INCITS) standard).
In certain embodiments, data center 100 can support the safe mode for allowing to implement wide scope strategy.For example, number
No communication strategy can be supported according to center 100, wherein application rests on the independent virtual data center of data center 100, but
It is that can share identical physical peripheral processing unit (such as calculate node 100, memory node 140) and network infrastructure
(such as exchcange core 180).In some configurations, data center 100 can support multiprocessing and the needs of same application part
Communicated almost without limitation.In some configurations, data center 100 can support to need for example to go deep into packet inspection, total state to prevent
The strategy of wall with flues and/or stateless wave filter.
When data center 100 can have end-to-end be applied to based on source stand-by period, zero load stand-by period, congestion wait
Between and application stand-by period (also referred to as end-to-end stand-by period) for defining destination stand-by period.In certain embodiments,
The source stand-by period can be the time for example paid during the processing of source peripheral processor (for example, by software and/or NIC branch
The time gone out).Similarly, the destination stand-by period can for example be paid during the processing of destination peripheral processor
Time (for example, the time paid by software and/or NIC).In certain embodiments, zero load delay can be that light velocity delay adds
On processing inside such as data center 180 and storage forward delay.In certain embodiments, the congestion stand-by period can be,
The queueing delay for example as caused by the congestion in network.Data center 100 can have the low end-to-end stand-by period to activate application
Expectation application performance, the application for for example with real-time constraint and/or with LEVEL INTERNAL handle communication requirement should
Latency-sensitive.
The zero load stand-by period of exchcange core 180 can be significantly less than in the data with the interconnection redirected based on Ethernet
Heart core is significantly reduced.In certain embodiments, for example, exchcange core 180 can have from the input port of exchcange core 180
It is less than the zero load stand-by period of 6 microseconds to the output port of exchcange core 180 (except the light velocity stand-by period).In some embodiments
In, for example, exchcange core 180 can have the zero load stand-by period less than 12 microseconds (except congestion stand-by period and light velocity etc.
Treat the time).Data center core based on Ethernet is due to the substantially high stand-by period, such as undesirable to gather around
Fill in rank (such as the congestion between link).Congestion in the data center core based on Ethernet may due to based on
The data center's core (or managing device relevant with data center's core based on Ethernet) netted very much incapability and aggravate,
So as to handle congestion in an undesired manner.In addition, the stand-by period in the data center core based on Ethernet can
To be skimble-scamble, because core is between not homologous-destination pair and/or many storages forward energy between switching node
Redirected with different number of, the classification of packet is performed in the storage forwards switching node.On the contrary, exchcange core 180
The marginal portion 185 that is sorted in perform, without being performed in switching fabric 187, and exchcange core 180 has and deterministic is based on
The switching fabric 187 of cell.For example, (rather than passing through switching fabric by the cell processing latency of switching fabric 187
187 cell path) can be predictable.
The exchcange core 180 of data center 100 can provide lossless end-to-end packet transmission, be based, at least in part, in data
The flow control mechanism performed in the heart 100.For example, via the data of switching fabric 187 (for example, the number relevant with packet
According to) transmitting and scheduling is performed on the basis of cell using request grant mechanism (also referred to as asking authentication mechanism).Especially, exist
Send cell request have been based on substantially authorize transmission it is (lossless) be authorized to after, cell is sent to switching fabric 187
(being for example sent to switching fabric 187 from marginal portion 185).Once being allowed to enter switching fabric 187, cell is in switching fabric
Handled in 187 as fragment.Clip stream in switching fabric 187 can be controlled further, for example, so work as switching fabric
When congestion in 187 is detected, fragment is not lost.It is related to the cell in exchcange core 180 and the more details of fragment processing
It will be described below.
Pass through in addition, can be terminated to by data flow of the switching fabric 187 from each peripheral processor 170
Data flow of the switching fabric 187 from remaining peripheral processor 170.Especially, in one or more peripheral processors
Influence does not pass through the data flow of the switching fabric 187 of exchcange core 180 to 170 data congestion in an undesired manner, because
The marginal portion 185 of exchcange core 180, sends request and has been authorized to work, cell is sent only to the friendship of exchcange core 180
Change structure 187.For example, the high-level data traffic in the first peripheral processor 170 can authorize congestion solution based on request
Certainly mechanism is processed, so that the high-level data traffic in the first peripheral processor 170 will not negatively affect second
Peripheral processor 170 is linked into the independent logic entity of exchcange core 180.In other words, when being allowed to enter exchcange core
During 180 switching fabric 187, the traffic associated with the first peripheral processor 170 will be isolated (for example, from congestion angle
Degree is isolated) in the traffic relevant with the second peripheral processor 170.
Equally, the data packet flows that can be resolvable in the exchcange core 180 of cell and fragment can be in peripheral processor
170 flow control mechanisms based on fine granulation (fine grain) are controlled.In certain embodiments, the flow of fine granulation
Level segment of the control based on queue is performed.The flow Control Cooling of fine granulation can prevent (or being essentially prevented) from causing bad luck
The end of a thread obstruction (head-of-line blocking) of network usage.The flow control of fine granulation can also be used for reduction
Stand-by period in (or reduction) exchcange core 180.In certain embodiments, the flow control of fine granulation can activate high-performance
Block sends the disk traffic to peripheral processor 170 and from the reception of magnetic disc traffic of peripheral processor 170, the peripheral processes
Device 170 cannot use Ethernet and internet (IP) network to realize in the desired manner.It is related to the flow control of fine granulation
The more details combination accompanying drawing 22 to 25 of system is described.
In certain embodiments, data center 100, and especially, exchcange core 180 can have modular architecture.
Especially, the exchcange core 180 of data center 100 can be implemented in small-scale place's starting and can be according to needing extension (for example
Increase extension).Exchcange core 180 can be expanded and need not substantially interrupt the continuous operation of existing network and/or can expand
Exhibition without the new equipment of exchcange core 180 should in physical placement it is constrained.
In certain embodiments, one or more parts of exchcange core 180 can be configured to be based on Virtual Private Network
(" VPN ") is operated.Especially, exchcange core 180 can be divided so as to which one or more peripheral processors 170 can be configured as
Via exchcange core 180 it is overlapping or it is nonoverlapping virtualization divide communication.Exchcange core 180 can also be broken down into separation or
The virtual resources of overlapping subset.In other words, exchcange core 180 can be the independent exchange that can be divided with flexi mode.
In certain embodiments, this method can make it that one extension is networked in the merging exchcange core 180 of data center 100.This with
Data center is on the contrary, data center can be the set of independent scalable network, and each of the network has customization and/or special
Fixed resource.In certain embodiments, defining the Internet resources of exchcange core 180 can be merged so as to which it can effectively make
With.
In certain embodiments, data center management module 190 can be configured as defining physics (and/or virtual) resource void
That intends is multi-level, resource definition data center 100.For example, data center management module 190 is configured as defining virtually
Multi-level, it can embody the application width of data center 100.In certain embodiments, relatively low rank can (in two ranks)
With including virtual application cluster (VAC), its can be allocated to and belong to (for example, being controlled by) one or more entities (for example,
Management entity, financial rule) physics (or virtual) resource set being used alone.(in two ranks) higher level can be wrapped
Include virtual data center (VDC), it can include the VAC collection for belonging to (for example, being controlled by) one or more entities.One
In a little embodiments, data center 100 includes multiple VAC, and each of which may belong to different management entities.
Fig. 3 is the schematic diagram for the logical groups 300 for showing the resource associated with data center according to one embodiment.As schemed
Shown in 3, logical groups 300 include virtual data center VDC1, virtual data center VDC2, and virtual data center VDC3 is (together
It is referred to as VDC).Equally, as shown in figure 3, each VDC includes virtual application cluster VAC (such as the VAC32 in VDC3).It is each
The physics or virtual part of the data center of individual VDC aspects data center 100 as shown in Figure 1 are (for example, the portion of exchcange core
Point, the virtual machine inside the part of peripheral processor and/or peripheral processor) logical groups.For example, each in VDC
Individual VAC embodies the logical groups of the peripheral processor of such as calculate node.For example, VDC1 can embody typical data center part
Logical groups, and VAC22 embody VDC1 in peripheral processor 370 logical groups.As shown in figure 3, each VDC can be based on
One group of tactful PY that can be configured as example being defined on operating parameter allowed band in the application of operation in VDC (can also be claimed
For business rules) it is managed.In certain embodiments, VDC can be referred to as the first layer (tier) of logical resource, and VAC is claimed
For the second layer of logical resource.
In certain embodiments, VDC (and VAC) can be established, so that the resource associated with data center is to expect
Mode be managed for example, by entity, the entity use (for example, hires out, possess, by its communication) data center resource
And/or the manager of data center resource.For example, VDC1 can be the virtual data center associated with financial rule, and
VDC2 can be the virtual data center associated with telecommunications service provider.Therefore, tactful PY1 can be defined by financial rule
So as to which VDC1 (and the physics associated with VDC1 and/or virtual data center resource) can be with different from the pipe based on tactful PY2
Reason VDC2 (and the physics relevant with VDC2 and/or virtual data center resource) mode is managed, and PY2 strategies are taken by telecommunications
The definition of business provider.In certain embodiments, one or more tactful (for example, strategy PY1 parts) are by network manager
Set up, so as to when implemented, be carried between the VDC1 relevant with financial rule and the VDC2 relevant with telco service provider
For information security and/or fire wall.
In certain embodiments, strategy can be associated with data center management (not shown) (or integrated wherein).For example,
VDC2 can be based on strategy PY2 (or strategy PY2 subset) management.In certain embodiments, data center management can be configured as,
For example monitor the real-time performance applied in VDC and/or can be configured as distributing or deallocate resource automatically meeting for VDC
The corresponding strategy of interior application.In certain embodiments, strategy can be configured as based on time threshold operation.For example, one or many
Individual strategy can be configured as based on for example in the special time of one day or parameter value during certain day of one week (for example, the traffic
Rank) change periodic event (for example, predictable periodic event) work.
In certain embodiments, strategy can be defined based on high level language.Therefore, strategy be able to can be connect with relative
The mode entered is provided.The example of strategy includes information security policy, the Fault Isolation Strategy, firewall policy, performance guarantee strategy
(being for example related to by the strategy of the service class of application implementation), and/or other be related to the management strategy (example of information protection or acquisition
Such as manage isolation strategy).
In certain embodiments, strategy can be implemented in packet sort module, exemplified by the packet sort module can be configured
Such as, grouped data packet at peripheral processor (for example, IP packets, session control protocol packet, media packet, define
Packet).For example, implementing in the packet sort module that tactful access that can be in the marginal portion of exchcange core is exchanged.Point
Class can include the processing of any execution, so that packet can be based on strategy in data center (for example, the exchange core of data center
The heart) in be processed.In certain embodiments, strategy includes one or more tactful bars associated with instruction that is being performed
Part.Strategy can be, if such as packet has the certain types of network address (policy condition), route data packet
To the strategy of specific destination (instruction).Packet classification can include determining whether policy condition has met, so that the instruction energy
Enough it is performed.For example, one or more parts (for example, field, payload, address part, port section) of packet
Sort module analysis can be grouped based on the policy condition defined in strategy.When policy condition is met, packet can be based on
The instruction associated with policy condition is performed.
In certain embodiments, one or more parts of logical groups 300 can be configured as with from multiple remote locations
" lights-out " (" lights out ") pattern operation-for example for each VDC independent position and one or two status of a sovereign
Put and carry out control logic group 300.In certain embodiments, the data center with logical groups for example shown in Fig. 3 can be configured as
The personnel of not needing physically can just operate in data center side.In certain embodiments, data center has enough redundancy money
Source to adapt to the generation of failure, such as one or more peripheral processors (such as the peripheral processor in VAC) therefore
The failure of barrier, the failure of data center management module, and/or exchcange core component.When in data center (such as in data
In the data center management of the heart) monitoring software when indicating that the failure has arrived at predetermined threshold, personnel can be notified and/or send
Send to replace the component of the failure.
As shown in figure 3, VDC can be logical groups independent mutually.In certain embodiments, data center is (such as in Fig. 1
It is shown) resource (for example, virtual resource, physical resource) can be divided into it is different compared to logical groups shown in Fig. 3
Logical groups 300 (for example, different layers of logical groups).In certain embodiments, two or more VDC of logical groups 300 are overlapping.Example
Such as, the resource (for example, physical resource, virtual resource) that the first VDC can be with the 2nd VDC shared datas center.Especially, first
A part for VDC exchcange core can be shared with the 2nd VDC.In certain embodiments, it may for example comprise in the first VDC VAC
In resource can be included in the 2nd VDC VAC.
In certain embodiments, one or more VDC can be by manual definition (for example, by network manager manual definition)
And/or automatic definition (such as based on tactful automatic definition).In certain embodiments, VDC can be configured as changing (such as dynamic
Change).For example, VDC (such as VDC1) can be included in the specific resources collection in a time cycle and can be included in one not
Different resource collection with (such as separate time cycle, overlapping time cycle) in the time cycle is (such as separate
Resource set, overlapping resource set).
In certain embodiments, one or more parts of data center can be in response to changing, before changing or changing
It is provided dynamically during change, the change is related to VDC (the such as the same VDC of VDC virtual machine part migration).For example, number
Can be including multiple network equipments, such as network switch (network switches) according to the exchcange core at center, each is deposited
Storage includes providing the configuration template database of service order, and the service order is provided and/or asked by virtual machine.When virtual machine to
And/or be connected on the server of network switch port of exchcange core migrate and/or initialize or start when, server
The identifier that the service provided by virtual machine is provided can be sent to the network switch.Network equipment can be based on the identifier from configuration
Option and installment template in template database, and provide port and/or server based on the configuration template.So, supply network end
It is distributed in the network switch that the task of mouth and/or device can be in exchcange core (for example, being distributed, not needing in an automatic fashion
Redefine template distribution), and can be migrated as virtual machine dynamic change or resource between peripheral processor.
In certain embodiments, supply can include multiple types or the device and/or software module of form set, configured
And/or adjustment.For example, supply can include based in the tactful configuration data center of one in the tactful PY for example shown in Fig. 3
Network equipment, such as network switch.More particularly, for example, the supply for being related to data center can be including one in following
Or it is multiple:Configuration network device is to be used as network router or network switch operation;Change the routing table of network equipment;Update
Security strategy and/or address or the identifier for being operably coupled to network equipment equipment;Which selection network equipment will implement
Individual procotol;Virtual Local Area Network (" VLAN ") of the Webisode identifier for example for network equipment port is set to mark;
And/or application access control listses (" ACL ") arrive network equipment.A part for data center can be supplied or configure, so that by
The rule and/or access restriction that tactful (for example, PY3) is defined, which are employed and (applied for example, handling by classifying) to arrive, passes through data
The packet of the part at center.
In certain embodiments, the virtual resource associated with data center can be supplied.Virtual resource can be for example,
Implement software module, the virtual router of virtual switch (virtual switch), or be configured to as in physical network and void
The virtual gateway that medium is operated between plan resource, virtual resource is controlled by the master device of such as server.In certain embodiments,
Virtual resource can be by master device control.In certain embodiments, supply can include setting up virtual resource and virtual bench it
Between virtual port or connection.
It is related to the more details of virtual resources in data center in entitled " Method and Apparatus for
Determining a Network Topology During Network Provisioning (are used for during network provisioning
The method and apparatus for determining network topology) " and the Copending U.S. Patent Application No.12/ that is submitted on December 30th, 2008
346623rd, entitled " Methods and Apparatus for Distributed Dynamic Netowrk
Provisioning (being used for the method and apparatus that dynamic network supplies distribution) " and submitted on December 30th, 2008 it is common not
Certainly U.S. Patent application No.12/346632, entitled " Methods and Apparatus for Distributed Dynamic
Network Provisioning (being used for the method and apparatus that dynamic network supplies distribution) " were simultaneously submitted on December 30th, 2008
Copending U.S. Patent Application No.12/346630 in illustrate, it is all these application herein all quote is used as reference.
Fig. 4 A are to show the schematic diagram of switching fabric 400 that can be included in exchcange core according to one embodiment.
In some embodiments, switching fabric 400 can be included in the exchcange core of the exchcange core 180 for example shown in Fig. 1.As schemed
Shown in 4A, switching fabric 400 is three-level, clog-free Clos (clo this) network, and including the first order 440, the second level 442
With the third level 444.The first order 440 includes module 412 (each of which can be referred to as Switching Module or Cell Switch).The first order
440 each module 412 is the integrated of electronic building brick and circuit.In certain embodiments, for example, each module is special
Integrated circuit (ASIC).In other embodiments, multiple modules are comprised on a single ASIC.In some embodiments
In, each module is the integrated of discrete electronic components.In certain embodiments, it can be referred to as with multistage switching fabric many
Level switching fabric.
In certain embodiments, each module 412 of the first order 440 can be Cell Switch.Cell switching function
It is configured as efficiently redirecting data (for example, fragment), because it is flowed by switching fabric 400.In certain embodiments,
For example, each module 412 of the first order can be configured as redirecting data based on the information being included in swap table.At some
In embodiment, such as the data redirection of the cell in 400 grades of switching fabric can be referred to as exchanging (for example, data exchange) or
If data are in the form of cell in switching fabric 400, referred to as Cell Switch.In certain embodiments, switching fabric 400
Module in exchange can be based on information (for example, header) for example associated with data.Held by the module of switching fabric 400
Capable exchange can with edge device (for example, the edge in the marginal portion 185 of exchcange core 180 shown in Fig. 1 is set
It is standby) the internal ethernet type classification difference performed.In other words, the exchange in the module of switching fabric 400 cannot base
In such as the 2nd layer ethernet address and/or the 4th layer of ethernet address.Being related to the more details based on swap table data exchange will
With reference to Fig. 4 B descriptions.
In certain embodiments, each Cell Switch is also operably coupled to storage buffer (example including multiple
Such as, lead directly to buffer (cut-through buffer)) the input port for writing interface.In certain embodiments, storage buffering
Device is included in buffer module.Similarly, output port collection can be operably coupled to the reading interface of storage buffer
Place.In certain embodiments, storage buffer can be with to all defeated using static RAM on piece (SRAM)
Inbound port provide enough bandwidth be used to writing per a period of time one enter cell (for example, part of packet) and to
All output ports provide the shared storage buffer that enough bandwidth are used to read a removal cell per a period of time.It is each
Individual Cell Switch operation is similar to the exchange in length and breadth (crossbar switch) that can be configured after every a period of time.
In certain embodiments, storage buffer is (for example, the storage buffer of joint particular port and/or stream is several
Part) there is the module (for example, module 412) that enough sizes (for example, length) are used in switching fabric 400 to implement to exchange
(for example, Cell Switch, data exchange) and/or data (for example, cell) are synchronous.However, storage buffer can have pair
Implement in the not enough size (and/or too short processing latency) of the module (for example, module 412) in switching fabric 400
Congestion Control Solution.For example the Congestion Control Solution of mandate/request mechanism can be set at edge for example associated with exchcange core
Implement at standby (not shown), but the data queue relevant with Congestion Control Solution can not be used for using storage buffer
Implement in module in switching fabric 400.In certain embodiments, one or more storages in module (for example, module 414)
There is device buffer inadequate size (and/or too short processing latency) to be used to for example be binned in the data at module
(for example, cell).The more details for being related to shared storage buffer will be with reference to accompanying drawing 15 and entitled " Methods and
Apparatus Related to a Shared Memory Buffer for Variable-Sized Cells (are related to variable
Change the method and apparatus of the shared storage buffer of sized cells) " and the copending United States submitted on March 31st, 2009 are special
Described in profit application No.12/415517, the patent application is incorporated by reference completely herein.
In alternative embodiments, each module of the first order can be the exchange in length and breadth with input port and delivery outlet
Machine.Each input lever (bar) is connected to each take-off lever by multiple exchanges in crossbar switch.Work as crossbar switch
Interior exchange is at " unlatching " position, and input is operably coupled to output and data can flow.Alternatively, ought hand in length and breadth
When exchange in changing is located at " closing " position, input is not operably coupled to output and data do not flow.So, hand in length and breadth
Which input lever exchange in changing planes controls be operably coupled to take-off lever.
Each module 412 of the first order 440 collects including input port 460, is configured as data and enters switching fabric
Data are received when 400.In this embodiment, each module 412 of the first order 440 includes equal number of input port 460.
Similar to the first order 440, the second level 442 of switching fabric 400 includes module 414.The module 414 of the second level 442
Similar in construction to the module 412 of the first order 440.Each module 414 of the second level 442 is operable by data path 420
Ground is couple to each module of the first order 440.In each module and each module of the second level 442 of the first order 440
Each data paths 420 between 414 are configured as promoting mould of the data from the module 412 of the first order 440 to the second level 442
Block 414 is transmitted.
Data path 420 between the module 412 of the first order 440 and the module 414 of the second level 442 can be with any side
Formula builds the (example in the desired manner of module 414 of the module 412 that is configured to promote data from the first order 440 to the second level 442
Such as, in an efficient way) transmit.In certain embodiments, for example, data path is the optical connector of intermodule.In other realities
Apply in example, data path is in midplane.Such midplane can be similar to what is described here in the way of more details.So
Midplane can be efficiently used for each module of the second level being connected to each module of the first order.In other reality
Apply in example, module is comprised in single chip bag, and the data path is electron trajectory.
In certain embodiments, switching fabric 400 is clog-free Clos (clo this) network.So, switching fabric 400
The number of the input port 460 of each module 412 of the number of module 414 of the second level 442 based on the first order 440 and change.
In the clog-free Clos of rearrangable (clo this) network (for example, Benes (David Barnes) network), the module 414 of the second level 442
Number is more than or equal to the number of the input port 460 of each module 412 of the first order 440.So, if n is the first order
The number and m of the input port 460 of 440 each module 412 are the number of the module 414 of the second level 442, m >=n.
In some embodiments, for example, each module of the first order has 5 input ports.So, the second level has at least five module.
All 5 modules of the first order are operably coupled to all 5 modules of the second level by data path.In other words,
Each module of one-level can send data to any module of the second level.
The third level 444 of switching fabric 400 includes module 416.The module 416 of the third level 444 is similar in construction to first
The module 412 of level 440.The number of the module 416 of the third level 444 is equal to the number of the module 412 of the first order 440.The third level 444
Each module 416 include output port 462, output port be configured as allow data sent out from switching fabric 400.3rd
Each module 416 of level 444 includes equal number of output port 462.In addition, each module 416 of the third level 444
The number of output port 462 is equal to the number of input port 460 of each module 412 of the first order 440.
Each module 416 of the third level 444 is connected to each module of the second level 442 by data path 424
414.Data path 424 between the module 414 of the second level 442 and the module 416 of the third level 444 is configured as promoting data
Transmitted from the module 414 of the second level 442 to the module 416 of the third level 444.
Data path 424 between the module 414 of the second level 442 and the module 416 of the third level 444 can be with any side
Formula is constructed to be configured to effectively to promote data to transmit to the module 416 of the third level 444 from the module 414 of the second level 442.
In some embodiments, for example, data path is the optical connector in intermodule.In other embodiments, data path is in
In plane.Such midplane is similar to what is be described in detail here.Such midplane can be efficiently used for the second level
Each module be connected to each module of the third level.In another embodiment, module is comprised in single chip
In bag and data path is electron trajectory.
Fig. 4 B are to show the swap table that can be stored in the memory 498 of module as shown in Figure 4 A according to one embodiment
49 schematic diagram.For example in second level module 414 shown in Fig. 4 A the module (such as Switching Module) of one can be configured as being based on
The swap table of swap table 49 performs Cell Switch for example shown in Fig. 4 B.For example, swap table 49 (or swap table of similar configuration)
It can be used in by the module in (and/or being included) previous module for example, determining that can cell via another grade of mould
Module in block is sent to its destination.In certain embodiments, cell can be sent to its destination via the module
Module is with being referred to as switching purpose.Especially, switching purpose can (it can based on the destination information including such as cell
It is determined outside switching fabric 400) searched in swap table 49.
Swap table 49 includes binary value (for example, binary value " 1 ", binary value " 0 "), and its expression is worth by destination
DT1 to DTk (being shown in 47 rows) represent one or more destinations can by by module value SM1 to SMM (48 row in
Show) represent one or more modules (it can be located at adjacent level) arrival.Especially, when in the row including binary value
When destination (for example, destination DT1) can be reached via the module (for example, module SM2) in the row intersected with row, swap table
49 include binary value " 1 ".When the destination in the row including binary value can not be via with arranging the mould in the row intersected
When block is reached, swap table 49 includes binary value " 0 ".For example, the binary value " 1 " in each entry at 46 represent if
Module (including swap table 49) sends data to the module represented by module value SM1 to SM3, then data can finally be sent to by
The destination that destination value DT3 is represented.In certain embodiments, module can be configured as random selection by module value SM1 to SM3
A module in the module group that (its be switching purpose) is represented, and selected module can be transmitted data to, from
And data can be sent to the destination represented by destination value DT3.
In certain embodiments, destination value 47 can be the edge device with such as exchcange core (for example, access is exchanged
Machine), the associated destination port value of server etc. that is communicated with edge device.In certain embodiments, destination value (its
Corresponding at least one the destination value 47 being included in swap table 49) can be based on the packet being for example included in cell
Classification is associated with cell (for example, being included in cell header).Therefore, the destination associated with cell value can pass through module
It is used for using swap table 49 with inquiring about switching purpose.Packet classification (can be exchanged in the edge device of exchcange core for example, accessing
Machine) it is performed.
In certain embodiments, memory (and such swap table 49) can be included in the module of one or more modules
In system.In certain embodiments, swap table 49 can with the more than one input port of modular system (or multiple systems) and/or
More than one output port is associated.Being related to the more details of modular system will be described with reference to Fig. 7.
Fig. 5 A are the schematic diagrames for showing switching fabric system 500 according to one embodiment.Switching fabric system 500 includes many
Individual input/output module 502, the first cable collection 540, the second cable collection 542 and switching fabric 575.Switching fabric 575 includes portion
The first switching fabric part 571 in shell 570 or frame is affixed one's name to, and second be deployed in shell 572 and frame exchanges
Structure division 573.
Input/output module 502 (it can be such as edge device) is configured as to and/or from the first switching fabric portion
Points 571 and/or second switching fabric part 573 send data and/or receive data.In addition, each input/output module
502 include analytical capabilities, classification feature, forwarding capability and/or queuing and scheduling feature.So, packet parsing, packet classification,
Packets forwarding and packet queue and scheduling all enter the first switching fabric part 571 and/or the second switching fabric in packet
Occur before part 573.Therefore, these functions need not be performed in every one-level of switching fabric 575, and switching fabric part
571,573 each module (being described in further detail here) need not include the ability for performing these functions.This can be reduced
Cost, power attenuation, the cooling of each module of switching fabric part 571,573 are required and/or physical extent needs.This can also
Reduce the stand-by period associated with switching fabric.In certain embodiments, for example, the end-to-end stand-by period is (i.e. by exchanging
Structure sends the time required for data from input/output module to another input/output module) can be than being assisted using Ethernet
The end-to-end stand-by period of the switching fabric system of view is lower.In certain embodiments, switching fabric part 571,573 handle up
Amount is only constrained by Connection Density rather than power and/or the heat limitation of switching fabric system 500.In certain embodiments,
Input/output module 502 (and/or function associated with input/output module 502) can be included in, for example, such as Fig. 1 institutes
In edge device in the marginal portion of the exchcange core shown.Analytical capabilities, classification feature, forwarding capability and queuing and scheduling work(
It is able to can be similar in entitled " Methods and Apparatus Related to Packet Classification
Associated with a Multi-Stage Switch (method and apparatus for being related to the packet classification exchanged about multistage) "
And the U.S. patent application serial number 12/242168 and entitled " Methods and Apparatus submitted for 30th in September in 2008
The for Packet Classification Based on Policy Vectors (sides of the packet classification based on policy vector
Method and equipment) " and function disclosed in the U.S. patent application serial number 12/242172 that September in 2008 is submitted on the 30th perform, this
Both are all fully incorporated by reference herein.
Each input/output module 502 is configured as the first end of the cable of the first cable collection 540 being connected to the second electricity
The first end of the cable of cable collection 542.Each cable 540 is between the switching fabric part 571 of input/output module 502 and first
Deployment.Similarly, each cable 542 is disposed between the switching fabric part 573 of input/output module 502 and second.Use
First cable collection 540 and the second cable collection 542, each input/output module 502 can exchange knot to and/or from first respectively
The switching fabric part 573 of structure part 571 and/or second sends data and/or receives data.
First cable collection 540 and the second cable collection 542 can be by suitable in input/output modules 502 and switching fabric part
Any materials composition of data is transmitted between 571,573.In certain embodiments, for example, each cable 540,542 is by many
Optical fiber is constituted.In such embodiments, each cable 540,542 can have 12 and send and 12 root receiving fibers.Often
12 of a piece cable 540,542, which send optical fiber, to be believed including 8 optical fiber for sending data, 1 for sending control
Number optical fiber, and 3 be used for growth data capacity and/or the optical fiber for redundancy.Similarly, each cable 540,542
12 root receiving fibers can include 8 be used to sending the optical fiber of data, 1 be used to send the optical fiber of control signal, and 3
Optical fiber for growth data capacity and/or for redundancy.In other embodiments, any number of optical fiber can by comprising
In each cable.
First switching fabric part 571 and the second switching fabric part 573 are used for redundancy and/or bigger capacity together.
In other embodiments, only one switching fabric part is used.Still in other embodiments, more than 2 switching fabric portions
Divide and be used for increased redundancy and/or bigger capacity.For example, 4 switching fabric parts can be operably by such as 4
Cable is couple to each input/input module.Second switching fabric part 573 is structurally and functionally similar to first and handed over
Change structure 571.Therefore, the first switching fabric part 571 is only described in detail here.
Fig. 5 B are the schematic diagrames for showing input/output module 502 according to one embodiment.As shown in Figure 5 B, input/output
Module 502 includes sort module 596, processing module 597, and memory 598.Sort module 596 can be configured as performing data
Classification, the ethernet type classification of such as packet.
The all kinds of data processing can be performed in processing module 597.For example, data, for example packet can be in processing module
Cell is resolvable at 597.In certain embodiments, Congestion Control Solution can be implemented and/or via friendship at processing module 597
Changing data (such as cell) transmitting and scheduling of structure (for example, switching fabric 400 shown in Fig. 4 A) can hold at processing module 597
OK.Processing module 597 can also be configured as that information (for example, header information, destination information, source information) is connected into and for example believed
First net load, cell net load can be used by switching fabric (for example, switching fabric 400 shown in Fig. 4 A) cell-switching (base
In swap table as shown in Figure 4 B).
When data processing is performed at sort module 596 and/or processing module 597, data (such as packet, cell)
One or more parts can be stored in (for example, queuing) memory 598.For example, being related to congestion solution when processing module 597 is performed
Certainly during the processing of scheme, being resolvable to the data of cell can queue up in memory 598.Therefore, memory 598 can have enough
Size to implement the Congestion Control Solution as described in accompanying drawing 16A to accompanying drawing 21.
One of Fig. 5 A switching fabric system 500 including the first switching fabric part 571 is shown in greater detail in Fig. 6
Point.First switching fabric part 571 includes interface card 510, its first order and third level phase with the first switching fabric part 571
Association;Interface card 516, it is associated with the second level of the first switching fabric part 571;And midplane 550.In some implementations
The first switching fabric part 571 includes 8 interface cards 510 in example, and it is related to the first order of the first switching fabric and the third level
Connection, and 8 interface cards 516, it is associated with the second level of the first switching fabric.In other embodiments, can use with
The different numbers for the interface card that the first switching fabric first order and the third level are associated and/or with the first switching fabric second level phase
The different numbers of the interface card of association.
As shown in fig. 6, each input/output module 502 is operationally via a cable coupling of the first cable collection 540
It is connected to interface card 510.In certain embodiments, such as each of 8 interface cards 510 be operably coupled to 16 input/
Output module 502, such as here in greater detail.So, the first switching fabric part 571 can be coupled to 128 inputs/defeated
Go out module (16 × 8=128).Each of 128 input/output modules 502 can be sent out to from the first switching fabric part 571
Send data and receive data.
Each interface card 510 is connected to each interface card 516 via midplane 550.So, each interface card
510 can send data and reception data to from each interface card 516, such as here in greater detail.Use midplane 550
Interface card 510 is connected into interface card 516 reduces the number of cable for connecting 571 grades of the first switching fabric part.
First interface card 510 ', midplane 550, and first interface card 516 ' is shown in greater detail in Fig. 7.Interface card
510 ' is associated with the first order of the first switching fabric part 571 and the third level, and interface card 516 ' and the first switching fabric
The second level of part 571 is associated.Each interface card 510 is structurally and functionally similar with first interface card 510 '.Class
As, each interface card 516 is structurally and functionally similar with first interface card 516 '.
First interface card 510 ' includes multiple cable connector ports 560, the first modular system 512, the second modular system
514, and multiple midplane connector ports 562.For example, Fig. 7 is shown with 16 cable connector ports 560 and 8
The first interface card 510 ' of midplane connector port 562.Each quilt of cable connector port 560 of first interface card 510 '
It is configured to receive the second end of the cable from the first cable collection 540.So, as described above, 8 interface cards 510 are on each
16 cable connector ports 560 be used for receive 128 cables (16 × 8=128).Although shown in the figure 7 have 16
Individual cable connector port 560, and in other embodiments, any number of cable connector port can be used, so that
The cable connector port that each cable of the first cable collection can be transferred through in the first switching fabric is received.If for example,
16 interface cards are all used, then each interface card can include 8 cable connector ports.
The first modular system 512 and the second modular system 514 of first interface card 510 ' each include first exchange knot
The module of the first order of structure part 571 and the module of the third level of the first switching fabric part 571.In certain embodiments, 16 electricity
8 cable connector ports of cable connector port 560 are operably coupled to the first modular system 512 and 16 cables connect
Connect the cable connector port of device port 560 remaining 8 and be operably coupled to the second modular system 514.First modular system
512 and second modular system 514 can be operably coupled to interface card 510 ' 8 midplane connector ports 562 it is every
One.
The first modular system 512 and the second modular system 514 of first interface card 510 ' are ASIC.First modular system
512 and second modular system 514 be identical ASIC example.So, due to independent ASIC multiple examples can be produced, manufacture
Cost can be reduced.In addition, the module of the first order of the first switching fabric part 571 and the module of the first switching fabric third level are all
It is included on each ASIC.
In certain embodiments, each midplane connector port in 8 midplane connector ports 562 has twice
The data capacity of each cable connector port in 16 cable connector ports 560.So, 8 midplane connectors
There are 16 data to send and the connection of 16 data receivers for each for port 562, rather than connect with the transmission of 8 data and 8 data receivers
Connect.So, the bandwidth of 8 midplane connector ports 562 is equal to the bandwidth of 16 cable connector ports 560.In other realities
Apply in example, there are each midplane connector port 32 data to send and the connection of 32 data receivers.In such embodiments,
There are each cable connector port 16 data to send and the connection of 16 data receivers.
8 midplane connector ports 562 of first interface card 510 ' are connected to midplane 550.Midplane 550 by with
It is set to and is connected to each interface card 510 associated with the third level with the first order of the first switching fabric part 571 and first
Each associated interface card 516 of the second level of switching fabric part 571.So, midplane 550 ensures each interface card
510 each midplane connector port 562 is connected to the midplane connector port 580 of distinct interface card 516.Change sentence
Talk about, two identical midplane connector ports of no interface card 510 are operably coupled to identical interface card 516.
So, midplane 550 allows each interface card 510 to send data and reception to from any one in 8 interface cards 516
Data.
Although Fig. 7 shows the schematic diagram of first interface card 510 ', midplane 550 and first interface card 516 ', and one
In a little embodiments, first interface card 510, midplane 550 and first interface card 516 are that physical location is analogous respectively to horizontal level
Interface card 620, midplane 640 and upright position interface card 630, are described in further detail as illustrated in figs. 5-7 and herein.This
Sample, the module associated with the first order and the module (on interface card 510) associated with the third level are located at the one of midplane
Side, and the module (on interface card 516) associated with the second level is located at the opposite side of midplane 550.It is such topology allow with
Each associated module of the first order is operably coupled to each module related to the second level, and with second level phase
Each module closed is operably coupled to each module related to the third level.
First interface card 516 ' includes multiple midplane connector ports 580, the first modular system 518, and the second module
System 519.Multiple midplane connector ports 580 are configured to send data to from any interface card 510 via midplane 550
With reception data.In certain embodiments, first interface card 516 ' includes 8 midplane connector ports 580.
The first modular system 518 and the second modular system 519 of first interface card 516 ' are operably coupled to first and connect
Each midplane connector port 580 of mouth card 516 '.So, by midplane 550, with the first switching fabric part 571
The first order each modular system 512,514 associated with the third level is operably coupled to and the first switching fabric part
Each associated modular system 518,519 of 571 second level.In other words, with the first order of the first switching fabric part 571 and
Each related modular system 512,514 of the third level can to from associated with the second level of the first switching fabric part 571
Each modular system 518,519 sends data and receives data, and vice versa.Especially, with modular system 512 or 514
The associated module of the first order can send data to the module associated with the second level in modular system 518 or 519.Similarly,
The module associated with the second level in modular system 518 or 519 can be to associated with the third level in modular system 512 or 514
Module sends data.In other embodiments, the module associated with the third level can be sent to the module associated with the second level
Data and/or control signal, and the module associated with the second level can be sent to the module associated with the first order data and/
Or control signal.
In the first switching fabric part 571, each module of the first order has 8 input (that is, each interface cards 510
Two modules) embodiment in, the second level of the first switching fabric part 571 have at least eight module be used for the first switching fabric
Part 571 with maintain can rearrange it is clog-free.So, the second level of the first switching fabric part 571 has at least eight mould
Block is simultaneously clog-free by that can rearrange.In certain embodiments, the number of modules of twice second level be used to promote to exchange to tie
Construction system 500 expands to 5 grades of switching fabrics from 3 grades of switching fabrics, as being described in further detail here.In such 5 grades friendships
Change in structure, the exchange handling capacity of 2 times of second level in the three-level switching fabric of switching fabric system 500 is supported in the second level.
For example, in certain embodiments, 16 modules of the second level can be used to promote exchange from three-level the future of switching fabric system 500
Structure extension is 5 grades of switching fabrics.
The first modular system 518 and the second modular system 519 of first interface card 516 ' are ASIC.First modular system
518 and second modular system 519 be identical ASIC example.In addition, in certain embodiments, with the first switching fabric part
The first modular system 518 and the second modular system 519 that 571 second level are associated are to be equally used for and the first switching fabric part
First modular system 512 of the first interface card 510 ' that 571 first order and the third level are associated and the second modular system 514
ASIC example.So, because individually ASIC multiple examples can be used for each module of the first switching fabric part 571
System, making expense can reduce.
In use, data are sent to via the first switching fabric part 571 from the first input/output module 502
Two input/output modules 502.First input/output module 502 is via the cable of the first cable collection 540 to the first switching fabric
Part 571 sends data.Data are by the cable connector port 560 of one in interface card 510 ' and are sent to module system
First order module in system 512 or 514.
A connector in the midplane that first order module in modular system 512 or 514 passes through interface card 510 '
Port 562, midplane 550 and one into interface card 516 ' transmission data, and forward the data to modular system 518 or
Second level module in 519.Data enter interface card 516 ' by the midplane connector port 580 of interface card 516 '.Then
Data are sent to the second level module in modular system 518 or 519.
Second level module determines how the second input/output module 502 connects and redirect data via midplane 550 and return
Interface card 510 '.Because each modular system 518 or 519 is operably coupled to each module system on interface card 510 '
System 512 and 514, the second level module in modular system 518 or 519 can determine that in modular system 512 or 514 which the 3rd
Level module is operably coupled to the second input/output module and correspondingly sends data.
Data are sent to the third level module in the modular system 512,514 on interface card 510 '.Third level module is right
By from the first cable collection 540 cable by cable connector port 560 to input/output module 502 second input/it is defeated
Go out module and send data.
In other embodiments, single second level module is sent data to instead of first order module, first order module will
Data are divided into independent part (for example, cell) and to a part for each second level module forwards data, first order mould
Block is operably coupled to second level module (for example, in this embodiment, each second level module receives one of data
Point).Each second level module is it is then determined that several parts how the second input/output module is connected directional data of laying equal stress on are returned
To single third level module.Third level module and then rebuild several parts of the data received and to the second input/output module
Send data.
Fig. 8-10 is shown to be used to accommodate switching fabric (such as the first switching fabric as described above according to one embodiment
Part 571) shell 600 (i.e. frame).Shell 600 includes overcoat 610, midplane 640, the and of interface card 620 of horizontal level
The interface card 630 of upright position.Fig. 8 shows the front view of overcoat 610, wherein can see 8 water being deployed in overcoat 610
The interface card 620 that prosposition is put.Fig. 9 shows the rearview of overcoat 610, wherein can see 8 be deployed in overcoat 610 vertically
The interface card 630 of position.
The interface card 620 of each horizontal level is operably coupled to each upright position via midplane 640
Interface card 630 (referring to Figure 10).Midplane 640 includes preceding surface 642, rear surface 644 and connects preceding surface 642 and rear surface
644 jack (receptacle) array 650, as described below.As shown in Figure 10, the interface card 620 of horizontal level includes multiple
It is connected to the midplane connector port 622 of jack on the preceding surface 642 of midplane 640.Similarly, the interface card of upright position
630 include multiple midplane connectors 632 for being connected to jack on the rear surface 644 of midplane 640.In this way, by
The plane that the interface card 620 of each horizontal level is defined and the plane phase defined by the interface card 630 of each upright position
Hand over.
The interface card 620 that the jack 650 of midplane 640 is operatively coupled to each horizontal level arrives each vertical position
The interface card 630 put.Jack 650 promotes the signal between horizontal level interface card 620 and upright position interface card 630 to transmit.
In some embodiments, for example, jack 650 can be arranged to the midplane connector end that reception is placed on interface card 620,630
Many peg type connectors, the tolerable injury level of many peg type connectors (multiple pin-connector) on mouth 622,632
Blank pipe that positional interface card 620 is directly connected with upright position interface card 630, and/or it is configured to be operatively coupled to two to connect
Other any devices of mouth card.Using such midplane 640, each horizontal level interface card 620 is operably coupled to
Each upright position interface card 630, (for example, electron trajectory) is connected without the route on midplane.
Figure 10, which is shown, includes the midplane of all 64 jacks 650 in 8 × 8 arrays.In such embodiment
In, 8 horizontal level interface cards 620 can be operably coupled to 8 upright position interface cards 630.In other embodiments,
Any number of jack can be included in midplane and/or any number of horizontal level interface card can be by midplane by coupling
It is connected to any number of upright position interface card.
If the first switching fabric part 571 is located in shell 600, for example, first with the first switching fabric part 571
Level and the third level be associated each interface card 510 can be horizontal level and with the first switching fabric part 571 second
Each associated interface card 516 of level can be upright position.So, with the first order of the first switching fabric part 571 and
Each associated interface card 510 of the third level can be easily connected to and the first switching fabric portion by midplane 640
Each interface card 516 for dividing 571 second level associated.In other embodiments, with the first switching fabric part first order and
Each associated interface card of the third level is upright position and each associated with the first switching fabric part second level
Interface card is horizontal level.In another embodiment, it is associated with the third level with the first order of the first switching fabric each
Individual interface card can be that any angle of opposite shell is placed, and each associated with the second level of the first switching fabric
Interface card can be orthogonal to the interface card associated with the third level with the first switching fabric part first order relative to shell
The position of angle.
Figure 11 and 12 is to show the switching fabric 1100 respectively in the first configuration and the second configuration according to one embodiment
Schematic diagram.Switching fabric 1100 includes multiple switching fabric systems 1108.
Each switching fabric system 1108 includes multiple input/output modules 1102, first the 1140, second electricity of cable collection
Cable collection 1142, the first switching fabric part 1171 being deployed in shell 1170 and the second friendship being deployed in shell 1172
Change structure division 1173.Each switching fabric system 1108 is structurally and functionally similar.In addition, input/output module
1102nd, the first cable collection 1140 and the second cable collection 1142 are structurally and functionally analogous respectively to input/output module
202nd, the first cable collection 240 and the second cable collection 242.
When during switching fabric 1100 is configured first, the first switching fabric part of each switching fabric system 1108
1171 and second switching fabric part 1173 be functionally similar to above-mentioned the first switching fabric part 571 and the second switching fabric portion
Divide 573.So, when during switching fabric 1100 is configured first, the first switching fabric part 1171 and the second switching fabric portion
1173 are divided to be used as self-existent three-level switching fabric to operate.Therefore, when during switching fabric 1100 is configured first, each
Switching fabric system 1108 is not operably coupled to other switching fabrics as self-existent switching fabric system acting
System 1108.
In the second configuration (Figure 12), switching fabric 1100 further comprises that the 3rd cable collection 1144 and multiple connections are exchanged
Structure 1191, each in shell 1190.Shell 1190 can be similar to shell 600 detailed above.It is each
Each switching fabric part 1171,1173 of individual switching fabric system 1108 is operatively coupled to via the 3rd cable collection 1144
To each connection switching fabric 1191.So, when during switching fabric 1100 is configured second, each switching fabric system
1108 are operably coupled to other switching fabric systems 1108 via connection switching fabric 1191.Therefore, in second configures
Switching fabric 1100 be 5 grades of Clos (clo this) network.
3rd cable collection 1144 can by suitable for switching fabric part 1171,1173 and connection switching fabric 1191 it
Between transmission data any materials composition.In certain embodiments, for example, each cable 1144 is made up of multifiber.
In such embodiment, each cable 1144 can have 36 to send and 36 receive optical fiber.The 36 of each cable 1144
Root, which sends optical fiber, can include 32 optical fiber for being used to send data, and 4 are used for growth data capacity and/or for redundancy
Optical fiber.Similarly, 36 root receiving fibers of each cable 1144 include 32 optical fiber for being used to send data, and 4
Optical fiber for growth data capacity and/or for redundancy.In other embodiments, arbitrary number can be included in each cable
Optical fiber.By using the cable with increase number optical fiber, the number of cable used can be efficiently reduced.
As discussed above, flow control can be performed inside the switching fabric of such as data center.Figure 13 and 14 with
And adjoint description, it is the schematic diagram for showing the flow control inside switching fabric.Especially, Figure 13 is according to an implementation
Example shows the schematic diagram of the data traffic associated with switching fabric 1300.Shown switching fabric 1300 is similar in fig. 13
Shown switching fabric 400, and implementing in the data center of the data center 100 for example shown in Fig. 1 in Figure 4 A.
In this embodiment, switching fabric 1300 is 3 grades of clog-free Clos (clo this) network and including the first order 1340, the second level
1342, and the third level 1344.The first order 1340 includes module 1312, and the second level 1342 includes module 1314, and the third level
1344 include module 1316.In certain embodiments, switching fabric 1300 can be the switching fabric and of Cell Switch
Each module 1312 of one-level 1340 can be Cell Switch.Each module 1312 of the first order 1340 includes input
Mouth collection 1360, data are received when being configured as data into switching fabric 1300.Each module 1316 of the third level 1344
Including output port 1362, it is configured as allowing data to leave switching fabric 1300.Each module 1316 of the third level 1344
Including equal number of output port 1362.
Each module 1314 of the second level 1342 is operably coupled to the first order by unidirectional data path 1320
1340 each module.It is every between each module of the first order 1340 and each module 1314 of the second level 1342
One unidirectional data path 1320 is configured as promoting data to be sent to the second level 1342 from the module 1312 of the first order 1340
Module 1314.Because data path 1320 is unidirectional, it does not promote data to be sent to from the module 1314 of the second level 1342
The module 1312 of the first order 1340.Such unidirectional data path 1320 relative to similar bi-directional data path spend it is less,
Connect and be more easily performed using less data.
Each module 1316 of the third level 1344 is operably coupled to the second level by unidirectional data path 1324
1342 each module 1314.Each between the module 1314 of the second level 1342 and the module 1316 of the third level 1344
Unidirectional data path 1324 is configured as the module for promoting data to be sent to the third level 1344 from the module 1314 of the second level 1342
1316.Because data path 1324 is unidirectional, it does not promote data to be sent to second from the module 1316 of the third level 1344
The module 1314 of level 1344.As described above, such unidirectional data path 1324 is spent relative to similar bi-directional data path
It is less, use less region.
Unidirectional data path 1320 between the module 1312 of the first order 1340 and the module 1314 of the second level 1342 and/
Or the unidirectional data path between the module 1314 of the second level 1342 and the module 1316 of the third level 1344 can be with any side
Formula is constructed, and is configured as effectively promoting data to transmit.In certain embodiments, for example, data path is the light connects of intermodule
Device.In other embodiments, data path is in midplane connector.Such midplane connector can be analogous to such as figure
Midplane connector described in 8 to 10.Such midplane connector can be efficiently used for each mould of the second level
Block is connected to each module of the third level.In other embodiments, module is comprised in single chip bag and unidirectional number
It is electron trajectory according to path.
Each module 1312 of the first order 1340 be relative to the third level 1344 corresponding module 1316 physically close to
's.In other words, each module 1312 and the module 1316 of the third level 1344 of the first order 1340 are paired.For example, at some
In embodiment, each module 1312 of the first order 1340 is with the module 1316 of the third level 1344 in identical chip bag.It is double
To flow control path 1322 between each module 1312 of the first order 1340 and the corresponding module 1316 of the third level 1344
In the presence of.Flow control path 1322 allows the module 1312 of the first order 1340 to send stream to the corresponding module 1316 of the third level 1344
Amount control designator, vice versa.As being described in further detail here, this allow switching fabric arbitrary number of level operational blocks which partition system to
It sends the module transmitted traffic control designator of data.In certain embodiments, bidirectional traffics control path 1322 is by two
Single one-way flow control path is built.Two single one-way flow control paths allow flow to control designator first
Pass through between 1340 module 1312 of level and the module 1316 of the third level 1344.
Figure 14 is the schematic diagram for showing flow control in switching fabric 1300 shown in fig. 13 according to one embodiment.
Especially, schematic diagram shows the detailed view of the first row 1310 of switching fabric 1300 shown in Figure 13.The first row includes the first order
1340 module 1312 ', the module 1314 ' of the second level 1342, the module 1316 ' of the third level 1344.The module of the first order 1340
1312 ' include processor 1330 and memory 1332.Processor 1330 is configured as control and receives and send data.Memory
1332 modules 1314 ' for being configured as the second level 1342 can't receive data and/or the module 1312 ' of the first order 1340 is gone back
Buffered data when can not send data.In certain embodiments, if for example, the module 1314 ' of the second level 1342 warp-wise
The module 1312 ' of one-level 1340 have sent termination designator, then the buffered data of module 1312 ' of the first order 1340 is until the second level
1342 module 1314 ' can receive data.Similarly, in certain embodiments, when module 1312 ' is substantially simultaneously receiving many
During individual data-signal (such as from multiple input ports), the module 1312 ' of the first order 1340 can buffered data.Implement such
In example, if only one single data-signal can be by module 1312 ' in the given time (for example, each clock cycle)
Export, then other data-signals received can be buffered.Similar to the module 1312 ' of the first order 1340, in switching fabric 1300
Each module include processor and memory.
The module 1312 ' of the first order 1340 and the module 1316 ' of the third level 1344 matched with it are all included in first
On chip bag 1326.This allows the flow control between the module 1312 ' of the first order 1340 and the module 1316 ' of the third level 1344
Path 1322 processed is easily built.For example, flow control path 1322 can be in the module 1312 ' of the first order 1340 and the 3rd
Track between the module 1316 ' of level on the first chip bag 1326.In other embodiments, the module of the first order and the third level
Module is wrapped but very close to each other in independent chip, and its flow control path for still allowing for in-between need not make
With substantial amounts of distribution and/or long track with regard to that can be established.
The module 1314 ' of the second level 1342 is included on the second chip bag 1328.In the module 1312 ' of the first order 1340
Unidirectional data path 1320 between the module 1314 ' of the second level 1342, and in the module 1314 ' of the second level 1342 and the 3rd
First chip bag 1326 is operationally connected to the second chip by the unidirectional data path 1324 between the module 1316 ' of level 1344
Bag 1328.Although figure 14 illustrates the module 1312 ' of the first order 1340 and the module 1316 ' of the third level 1344 are not also
Each module of the second level is connected to by unidirectional data path.As described above, unidirectional data path can be in any way
Construction, is configured as effectively promoting data to transmit in intermodule.
Flow control path 1322 and unidirectional data path 1320,1324 can be effectively used in module 1312 ',
Transmitted traffic controls designator between 1314 ', 1316 '.If for example, the positive second level of the module 1312 ' of the first order 1340
The data volume that 1342 module 1314 ' is sent in data and the buffer of the module 1314 ' in the second level 1342 has exceeded threshold value,
Then the module 1314 ' of the second level 1342 can via the module 1314 ' in the second level 1342 and the third level 1344 module 1316' it
Between module 1316' transmitted traffics from unidirectional data path 1324 to the third level 1344 control designator.Flow control is indicated
The module 1316 ' of the symbol triggering third level 1344 is sent via flow control path 1322 to the module 1312 ' of the first order 1340 to flow
Amount control designator.The flow sent from the module 1316 ' of the third level 1344 to the module 1312 ' of the first order 1340 controls to indicate
Symbol triggers the module 1312 ' of the first order 1340 to stop sending data to the module 1314 ' of the second level 1342.Similarly, via
The flow control that the module 1316 ' of three-level 1344 is sent from the module 1314 ' of the second level 1342 to the module 1312 ' of the first order 1340
Designator processed, asks to send data (that is, continuation hair from the module 1312 ' of the first order 1340 to the module 1314 ' of the second level 1342
Send data).
There is the two-stage switching fabric on chip in the identical chips bag of bidirectional traffics control path to minimize in-between
The connection of independent chip parlor, the independent chip inclusion product is big and/or needs large volume.In addition, having in-between on chip
Two-stage in the identical bag of bidirectional traffics control path, communication is controlled when providing the flow between sending module and receiving module
During ability, it is allowed to which the data path between chip bag is unidirectional.It is related to bidirectional traffics control path in switching fabric
More details entitled " Flow Control in a Switch Fabric (flow control) in switching fabric " and in
It is described in the Copending U.S. Patent Application number 12/345490 that on December 29th, 2008 submits, it is drawn completely herein
It is used as reference.
With reference to as described in Figure 13 and 14, buffer module can be included in the module in switching fabric level.Being related to can quilt
Being included in the more details of the buffer module in such as switching fabric level will be described with reference to Figure 15.
Figure 15 is the schematic diagram for showing buffer module 1500 according to one embodiment.As shown in figure 15, data-signal S0
Received to SM at buffer module 1500 on the input side 1580 of buffer module 1500 (for example, by buffer mould
The input port 1562 of block 1500).After the processing of buffer module 1500, data-signal S0 to SM is from buffer module 1500
Buffer module 1500 (for example, by output port 1564 of buffer module 1500) on outlet side 1585 is sent.Data
Each in signal S0 to SM can define channel (can also be referred to as data channel).Data-signal S0 to SM can be collectively referred to as
Data-signal 1560.Although the input side 1580 of buffer module 1500 and the outlet side 1585 of buffer module 1500 are shown in
The different physical sides of buffer module 1500, the input side 1580 of buffer module 1500 and the outlet side of buffer module 1500
1585 by logical definition and are not excluded for the various physical configurations of buffer module 1500.For example, one of buffer module 1500
Or multiple input ports 1562 and/or one or more output ports 1564 can be physically located in any of buffer module 1500
Side (and/or phase homonymy).
Buffer module 1500 can be configured as processing data signal 1560 so as to by the data of buffer module 1500
The processing latency of signal 1560 can be relatively small and is basically unchanged.Therefore, because data-signal 1560 passes through buffer module
1500 are processed, and the bit rate of data-signal 1560 can be basically unchanged.For example, the data-signal S2 for passing through buffer module 1500
Processing latency can be the number of clock cycles (for example, single clock cycle, several clock cycle) being basically unchanged.Cause
This, data-signal S2 can be the time migration by multiple clock cycle, and is sent to buffer module 1500 and inputs
The data that the data-signal S2 of side 1580 bit rate sends the outlet side 1585 substantially and from buffer module 1500 are believed
Number S2 bit rate is identical.
Buffer module 1500 can be configured to respond to one or more parts modification one of flow control signal 1570
The bit rate of individual or multiple data-signals 1560.For example, buffer module 1500 can be configured to respond to flow control signal
1570 part come postpone buffer module 1500 receive data-signal S2, the indicated number of flow control signal 1570 it is believed that
Number S2 should be delayed by the specific time cycle.Especially, buffer module 1500 can be configured as storage (for example, holding) number
It is believed that number S2 one or more parts indicate what data-signal S2 should be no longer delayed by until buffer module 1500 is received
Designator (for example, part of flow control signal 1570).Therefore, it is sent to the input side 1580 of buffer module 1500
Data-signal S2 bit rate is different from the bit rate for the data-signal S2 that the outlet side 1585 from buffer module 1500 is sent
(for example, substantially different).
In certain embodiments, it can deposited in the processing of buffer module 1500 based on for example variable-sized cell fragment
Body is stored up to perform.For example, in certain embodiments, the fragment of cell can be included in buffer module 1500 by different
Memory bank (for example, static random access memories (SRAM) memory bank) is processed during distribution is handled.Store physical efficiency common
The shared storage buffer of definition.In certain embodiments, the fragment of data-signal can predefine mode during distribution is handled
(such as with the predefined pattern according to predefined algorithm) is assigned to memory bank.For example, in certain embodiments, data-signal
1560 guiding fragment can be carried out in several parts (for example, particular bank of buffer module 1500) of buffer module 1500
Processing, the part is different from several parts of the tracking section (trailing segments) of the processing in buffer module 1500.
In some embodiments, the section of data-signal 1560 can be handled in a particular order.In certain embodiments, for example, data-signal
1560 each fragment can be handled based on its respective position in cell.In cell fragment by shared
Storage buffer it is processed after, cell section can be sorted and be postponed during the processing of restructuring and rush device module 1500 and send.
In certain embodiments, for example, the reading multiplexing module of buffer module 1500 can be configured as restructuring with
The associated fragment of data-signal 1560 simultaneously sends (for example, transmission) data-signal 1560 from buffer module 1500.At restructuring
Reason can be defined based on the predefined methodology for being used for the memory bank allocated segment to buffer module 1500.For example, reading
Take frequency multiplexing technique module and can be configured as with polling mode (because fragment is write with polling mode) from guiding memory bank the
One reads the guiding fragment associated with cell, and then from track memory bank with polling mode reading it is relevant with cell with
Track fragment.Therefore, considerably less control signal, if any, need in write-in multiplexing module and read multiplexing
Sent between module.It is related to the more details of fragment processing (for example, fragment distribution and/or fragment restructuring) entitled
“Methods and Apparatus Related to Shared Memory Buffer for Variable-Sized
Cells (method and apparatus for being related to the shared storage buffer for variable-sized cells) " was simultaneously submitted on March 31st, 2009
Copending U.S. Patent Application number 12/415517 described in, it is incorporated by reference completely herein.
Figure 16 A are, according to one embodiment, to be configured as the coordinating transmissions of switching fabric 1600 via exchcange core 1690
The entrance scheduler module 1620 of cell group and the schematic block diagram of outlet scheduler module 1630.Coordinate to include for example via exchange
The scheduled transmission cell group of structure 1600, tracking are related to request and/or response of transmission cell group etc..Entrance scheduler module 1620
The entrance side and outlet scheduler module 1630 of switching fabric 1600, which can be included in, can be included in going out for switching fabric 1600
Mouth side.Switching fabric 1600 can include entrance level 1602, intergrade 1604, and export-grade 1606.In certain embodiments, exchange
Structure 1600 can be based on Clos (clo this) network architecture (for example, clog-free Clos networks, proper clog-free Clos
Network, Benes (David Barnes) network) it is defined, and switching fabric 1600 can include datum plane and control plane.In some realities
Apply in example, switching fabric 1600 can be the core of data center's (not shown), it can include network or device interconnecting.
As shown in Figure 16 A, input rank IQ1 to IQK (being collectively referred to as entry queue 1610) can be located at switching fabric
1600 entrance side.Entry queue 1610 can be associated with the entrance level 1602 of switching fabric 1600.In certain embodiments, enter
Mouth queue 1610 can be included in line card (line card).In certain embodiments, entry queue 1610, which can be located at, exchanges knot
Outside structure 1600 and/or outside exchcange core 1690.Each entry queue 1610 can be FIFO (FIFO) type team
Row.Although to show, but in certain embodiments, each entry queue IQ1 to IQK can be with input/output end port (example
Such as, 10Gb/s ports) related (for example, unique related).In certain embodiments, each entry queue IQ1 to IQK can have
Enough sizes are to implement Congestion Control Solution, and such as request authorizes Congestion Control Solution.For example, input rank IQK-1 can have
There are enough sizes to hold cell (or cell group), until request authorizes congestion scheme for cell (or cell group) quilt
Perform.
As shown in Figure 16 A, output port P1 to PL (being collectively referred to as output port 1640) can be located at switching fabric 1600
Outlet side.Output port 1640 can be related to the output stage 1606 of switching fabric 1600.In certain embodiments, output end
Mouth 1640 can be referred to as destination port.
In certain embodiments, input rank 1610 can be included in one or more inputs positioned at switching fabric 1600
In input line card (not shown) outside level 1602.In certain embodiments, output port 1640 can be included in one or
In multiple output line card (not shown) outside the output stage 1606 of switching fabric 1600.In certain embodiments, one
Or multiple input ranks 1610 and/or one or more output ports 1640 can be included in one or many of switching fabric 1600
In individual level (for example, input stage 1602).In certain embodiments, output scheduling module 1620 can be included in one or more defeated
Going out line card neutralization/or input scheduling module 1630 can be included in one or more input linear.In certain embodiments, with
Each relevant line card (for example, output line card, inputs line card) of exchcange core 1690 can include one or more scheduling moulds
Block (for example, output scheduling module, input scheduling module).
In certain embodiments, input rank 1610 and/or output port 1640 can be included in one or more be located at
In gateway apparatus (not shown) between switching fabric 1600 and/or peripheral processor (not shown).One or more gateways
At least a portion of device, switching fabric 1600 and/or peripheral processor energy common definition data center (not shown).One
In a little embodiments, one or more gateway apparatus can be the edge device in the marginal portion of exchcange core 1690.One
In a little embodiments, switching fabric 1600 and peripheral processor can be configured as based on different protocol processes data.For example, outer
Enclosing processing unit can include, such as one or more to be configured as based on Ethernet protocol and can be the structure based on cell
Switching fabric 1600 and communicate master device (for example, being configured as performing master device, the Wan Wei of one or more virtual resources
Network server).In other words, one or more gateway apparatus can be provided to other devices being configured to via a protocol communication
To the access of switching fabric 1600, the switching fabric can be configured as via another protocol communication.In certain embodiments, one
Individual or multiple gateway apparatus can be referred to as access and exchange or network equipment.In certain embodiments, one or more gateway apparatus
It can be configured as router, hub device, and/or network Biodge device.
In this embodiment, for example, input scheduling module 1630 can be configured as being defined on the letter of input rank IQ1 queuings
The tuple GA and cell group GC queued up in input rank IQK-1.Cell group GA queues up in input rank IQ1 front portion, and believes
Tuple GB queues up in input rank IQ1 after cell group GA.Because input rank IQ1 is fifo type queue, cell group GB
It can not be sent via switching fabric 1600 until cell group GA is sent from input rank IQ1.GC is in input rank for cell group
IQK-1 anterior queuing.
In certain embodiments, to be mapped to (for example, assigning to) one or more defeated for the part of input rank 1610
Exit port 1640.For example, input rank IQ1 to IQK-1 can be mapped to output port P1, so that all in input port 1Q1
It will all be dispatched to the IQK-1 cells 310 queued up by input scheduling module 1620 and be transferred to output port via switching fabric 1600
P1.Similarly, input rank IQK can be mapped to output port P2.The mapping can be stored in storage as such as inquiry table
Device (for example, memory 1622), when scheduling (for example, request) transmission cell group, input scheduling module 1620 can access the inquiry
Table.
In certain embodiments, one or more input ranks 1610 can be with priority valve (also known as transmission preferences weights)
It is related.Input scheduling module 1620 can be configured as the transmission from the scheduling cells of input rank 1610 based on priority valve.For example,
Because input rank IQK-1 can be associated with the priority valve higher than input rank IQ1, input scheduling module 1620 can by with
It is set to the request cell group GC before request cell group GA is transferred to output port P1 and is transferred to output port P1.Priority valve energy
It is defined based on service class (for example, service quality (QoS)).For example, in certain embodiments, different types of network service
Amount can be associated from different service class (and different priority).For example, the storage traffic is (for example, reading and writing
The traffic), inter-processor communication, media signaling, session layer signaling etc. it is each related at least one service class
Connection.In certain embodiments, priority valve can be based on such as IEEE802.1qbb agreements, which define the flow based on priority
Control strategy.
In certain embodiments, one or more input ranks 1610 and/or one or more output ports 1640 can be with
It is suspended.In certain embodiments, one or more input ranks 1610 and/or one or more output ports 1640 can be temporary
Stop so as to which cell will not be lost.If for example, output port P1 is temporarily unavailable, from input rank IQ1 and/or input rank
The cell of IQK-1 transmission can be suspended, so that will not be because output port P1 is temporarily unavailable and loses in output port P1 cells
Lose.In certain embodiments, one or more input ranks 1610 can be associated with priority valve.If for example, output end
Mouthful P1 congestions, then can suspend from input rank IQ1 to output port P1 cell transmission, rather than from input rank IQK-1
It can be transmitted to output port P1 cell, because input rank IQK-1 can be with the priority valve phase higher than input rank 1Q1
Association.
Input scheduling module 1620 can be configured as exporting with (for example, be sent to signal and receive from it signal) and adjust
Degree module 1630 exchange signal with coordinate via switching fabric 1600 to output port P1 transmit cell group GA, and coordinate via
Switching fabric 1600 transmits cell group GC to output port P1.Because cell group GA will be sent to output port P1, the output
Port P1 can be referred to as cell group GA destination port.Similarly, output port P1 can be referred to as cell group GB destination
Port.As shown in Figure 16 A, cell group GA can be sent via transmission path 4112, and transmission path 4112 is different from sending cell
Group GC transmission path 4114.
Cell group GA and cell group GB are by defining by input scheduling module 1620 based on the cell queued up in input rank IQ1
4110 definition.Especially, cell group GA can be based on coming from having public purpose port and with specific in input rank IQ1
Each cell is defined in the cell group GA of position.Similarly, cell group GC can be based on coming from having public purpose port
It is defined with each cell in the cell group GC of ad-hoc location in input rank IQK-1.Although it is not shown, but
In some embodiments, such as cell 4110 can be included in exchcange core 1690 from one or more peripheral processors (for example, individual
People's computer, server, router, personal digital assistant (PDA)) via it is one or more can be wiredly and/or wirelessly
The content (for example, packet) that network (for example, LAN (LAN), wide area network (WAN), virtual net) is received.It is related to definition letter
The more details of tuple, such as cell group GA, cell group GB and/or cell group GC, discuss with reference to accompanying drawing 17 and 18.
Figure 16 B are to be shown to be related to the signaling process figure of the signaling of cell group GA transmission according to one embodiment.Such as Figure 16 B institutes
Show, the time increases in the downstream direction.After cell group GA has been defined (as shown in fig. 16), input scheduling module
1620 can be configured as transmission request with scheduling cells group GA to transmit via switching fabric 1600;The request is asked as transmission
22 displays.Transmission request 22 can be defined as the destination port to cell group GA, i.e. output port P1 sends cell group GA's
Request.In certain embodiments, cell group GA destination port, which can also be referred to as transmitting, asks 22 target (to be also known as mesh
Mark destination port).In certain embodiments, transmission request 22 can be included via specific transmission path (such as in Figure 16 A
Shown transmission path 4112) send cell group GA request by switching fabric 1600, or in special time.Input scheduling mould
Block 1620 can be configured as transmission request 22 input scheduling module 1620 be defined it is rear to output scheduling module
1630 send transmission request 22.
In certain embodiments, transmission request 22 can exchanged before the outlet side of switching fabric 1600 is sent to
The input side of structure 1600 is queued up.In certain embodiments, transmission request 22 can queue up until input scheduling module 1620 is triggered
Send the outlet side that switching fabric 1600 is arrived in transmission request 22.In certain embodiments, because for from switching fabric 1600
The capacity for the transmission request that input side is sent is higher than threshold value, and input scheduling module 1620 can be configured as keeping (or triggering is kept)
Transmission request 22 is in such as input transmission request queue (not shown).The threshold value can be based on the transmission via switching fabric 1600
Stand-by period is defined.
In certain embodiments, transmission request 22 can be arranged in the output queue (not shown) of the outlet side of switching fabric 1600
Team.In certain embodiments, output queue can be included in or beyond switching fabric 1600, or positioned at exchcange core 1690
In outer line card (not shown).Although it is not shown, in certain embodiments, transmission request 22 can with specific input rank
Queued up at (for example, input rank IQ1) associated output queue or a part for output queue.In certain embodiments, often
One output port 1640 can be related to output queue, output queue it is associated with the priority valve of input rank 1610 (for example,
Corresponding to).For example, output port P1 can be associated with input rank IQ1 (it has specific priority valve) output team
Arrange (or part of output queue) and the output queue associated with input rank IQK (it has specific priority valve)
(or part of output queue) is associated.Therefore, input rank IQ1 queue up transmission request 22 can with input rank
Output queue associated IQ1 is queued up.In other words, transmission request 22 can be at (outlet side of switching fabric 1600) and at least one
The associated output queue of the priority valve of individual input rank 1610 is queued up.Similarly, transmission request 22 can be asked in input transmission
Seek queue (not shown) or the part that inputs transmission queue associated with the priority valve of at least one input rank 1610
It is middle to queue up.
If output scheduling module 1630 determines cell group GA destination port (the output port P1 i.e. shown in Figure 16 A)
Available for cell group GA is received, then output scheduling module 1630 can be configured as sending transmission response to input scheduling module 1620
24.Transmission response 24 can be for example, for that (will be sent for example, sending IQ1 from the input rank shown in Figure 16 A) to cell
The mandate for the cell group GA that group GA destination port is sent.Transmission mandate can be referred to as by sending the mandate of cell group.In some realities
Apply in example, cell group GA and/or input rank IQ1 can be referred to as the target of transmission response 24.In certain embodiments, process is worked as
When the transmission of switching fabric 1600 is substantially authorized to, for example, because when destination port is available, for by letter to be sent
Tuple GA mandate can be awarded.
In response to transmission response 24, input scheduling module 1620 can be configured as laterally handing over from the input of switching fabric 1600
The outlet side for changing structure 1600 sends cell group GA via switching fabric 1600.In certain embodiments, transmission response 24 can be wrapped
Include via particular transmission path (such as the transmission path 4112 shown in Figure 16 A) by switching fabric 1600, or when specific
Between send cell group GA instruction.In certain embodiments, the instruction can be defined based on such as routing policy.
As shown in fig 16b, transmission request 22 includes cell quantitative value 30, destination mark symbol (ID) 32, queue identifier
(ID) 34, queue sequential value (SV) 36 (it can be collectively referred to as asking label).Cell quantitative value 30, which can embody, is included in letter
Cell quantity in tuple GA.For example, in this embodiment, cell group GA includes the individual cell in seven (7) (shown in Figure 16 A).
Destination mark symbol 32 can represent cell group GA destination port can be by output scheduling module so as to transmit the target of request 22
1630 determine.
Cell quantitative value 30 and destination mark symbol 32 can be output scheduler module 1630 and use with scheduling cells group GA warps
Transmitted from switching fabric 1600 to output port P1 (shown in Figure 16 A).As shown in fig 16b, in this embodiment, because being included in
Cell quantity in cell group GA can be in cell group GA purpose location port (for example, output port P1 shown in Figure 16 A)
Processed (for example, can be received), output scheduling module 1630 can be configured as defining and sending transmission response 24.
In certain embodiments, the destination port if as cell group GA is unavailable (for example, in down state
In, in congestion state), be included in cell quantity in cell group GA can not cell group GA destination port (for example, figure
Output port P1 shown in 16A) processed (for example, can not be received), then output scheduling module 1630 can be configured as not
Input scheduling module 1620 is arrived available for communication.In certain embodiments, for example, output scheduling module 1630 can be configured as
Refusal sends cell group GA request (not shown) via switching fabric 1600 when cell group GA destination port is unavailable.Pass
The refusal of defeated request 22 can be referred to as transmission refusal.In certain embodiments, transmission refusal can include responsive tags.
In certain embodiments, such as output port P1 (shown in Figure 16 A) available or unavailable energy is by output scheduling
Condition of the module 1630 based on satisfaction is determined.For example, condition can relate to exceed the queue associated with output port P1 (not in figure
Shown in 16A) storage limitation, the data traffic speed via output port P1, get out scheduling and be used for from input rank
1610 cell quantity transmitted via switching fabric 1600 (shown in Figure 16 A) etc..In certain embodiments, output port is worked as
When P1 is disabled, output port P1 is not useable for receiving cell via switching fabric 1600.
As shown in fig 16b, queue identifier 34 and queue sequential value 36 are sent to output scheduling in transmission request 22
Module 1630.Queue identifier 34 can represent and/or can be used to identify what (for example, being separately identified) cell group GA queued up wherein
Input rank IQ1 (shown in Figure 16 A).Queue sequential value 36 can represent cell group GA relative to other letters in input rank IQ1
The position of tuple.For example, cell group GA can and cell group GB associated with queue sequential value x (in input as shown in fig. 16
Queued up at queue IQ1) can be associated with queue sequential value Y.Queue sequential value x can indication information element group GA will with queue sequence
Sent before cell group GB related value Y from input rank IQ1.
In certain embodiments, from the scope of the queue sequential value associated with input rank IQ1 (shown in Figure 16 A)
Select queue sequential value 36.The scope of queue sequential value can be defined to come from the sequential value pair in queue sequential value scope
Do not repeated within the specific period in input rank IQ1.For example, the scope of queue sequential value can be defined to come from team
Queue sequential value in row sequential value scope is not repeated within least one period, and the time cycle is needed by exchcange core
1690 (shown in Figure 16 A) remove several cell cycles (for example, cell 160) that some queue up in input rank IQ1.One
In a little embodiments, queue sequential value can be increased (in the range of queue sequential value) and with being based on by input scheduling module 1620
Each cell group that the cell 4110 that input rank IQ1 queues up is defined is associated.
In certain embodiments, the queue sequential value scope associated with input rank IQ1 can with input rank 1610
Another associated queue sequential value overlapping ranges of (shown in Figure 16 A).Therefore, queue sequential value 36, even if from
The not exclusive scope of queue sequential value, can also be included (including e.g., including) queue identifier 34 (it can be unique) with
Unique mark cell group GA (at least during the specific period).In certain embodiments, queue sequential value 36 is exchanging knot
It is unique or global unique value (GUID) (for example, universal unique identifier (UUID)) in structure 1600.
In certain embodiments, input scheduling module 1620 can be configured as waiting associated with cell group GB to define
Transmission request (not shown).For example, input scheduling module 1620 can be configured as waiting until transmission request 22 is sent or waited
Treat until the response (for example, transmission response 24, transmission are refused) in response to transmission request 22 is associated with cell group GB in definition
Transmission request before received.
As shown in fig 16b, output scheduling module 1630 can be configured as including queue identifier 34 in transmission response 24
With queue sequential value 36 (it can be collectively referred to as responsive tags).When transmission response 24 is received in input scheduling module 1620
When, queue identifier 34 and queue sequential value 36 can be included in transmission response 24, so that transmission response 24 can be with inputting
The cell group GA of scheduler module 1620 is associated.Especially, queue identifier 34 and queue sequential value 36 can be used for jointly by
Cell group GA is designated mandate and transmitted via switching fabric 1600.
In certain embodiments, output scheduling module 1630 can be configured as delay and send the biography for corresponding to transmission request 22
Defeated response 24.In certain embodiments, if output scheduling module 1630 can be configured as such as cell group GA purpose ground terminal
Unavailable (for example, interim unavailable) the then delay response of mouth (the output port P1 i.e. shown in Figure 16 A).In some embodiments
In, output scheduling module 1630 can be configured to respond to output port P1 and change into upstate transmission biography from down state
Defeated response 24.
In certain embodiments, output scheduling module 1630 can be configured as because cell group GA destination port (i.e.
Output port P1 shown in Figure 16 A) data are received from another input rank 1610, and postpone to send transmission response 24.Example
Such as, because output port P1 receives different cell group (not shown), output from such as input rank IQK (shown in Figure 16 A)
Port P1 is not useable for receiving data from input rank IQ1.In certain embodiments, based on input rank IQ1 and input team
The associated priority valves of IQK are arranged, the cell group from input rank IQ1 can be with having than the cell group from input rank IQK
There is higher priority valve.Output scheduling module 1630 can be configured as delay and send the period of transmission response 24 1, the time
Section is based on for example in the size calculating of the output port P1 different cell groups received.For example, output scheduling module 1630 can by with
It is set to complete to postpone in the processing of output port P1 different cell groups to send 24 1 expeced times of transmission response,
The target of transmission response 24 schedules cell group GA.In other words, output scheduling module 1630 can be configured as being based on output port P1
The predetermined time delay changed from down state to upstate sends the transmission response 24 that target schedules cell group GA.
In certain embodiments, because at least a portion transmission paths for being sent by it of cell group GA are (such as in Figure 16 A
Shown transmission path 4112) unavailable (for example, congestion), output scheduling module 1630 can be configured as delay send transmission ring
Answer 24.Output scheduling module 1630 can be configured as delay and send transmission response 24 until the fractional transmission path no longer congestion,
Or the scheduled time based on the fractional transmission path no longer congestion.
As shown in fig 16b, cell group GA can be sent to cell group GA mesh based on (for example, in response to) transmission response 24
Ground port.In certain embodiments, cell group GA can be sent out based on one or more instructions being included in transmission response 24
Send.For example, in certain embodiments, cell group GA can be based on being included in transmission via transmission path 4112 (shown in Figure 16 A)
Response 24 in instruction, or based on it is one or more be used for via switching fabric 1600 cell group transmit rule (for example, with
In the rule transmitted via the cell group that can recombinate switching fabric) sent.Although it is not shown, in certain embodiments,
After cell group GA is in output port P1 (shown in Figure 16 A) by reception, the content from cell group is (for example, data
Packet) can via it is one or more can be that network (for example, LAN, WAN, virtual net) wiredly and/or wirelessly is sent to one
Individual or multiple network entities (for example, personal computer, server, router, PDA).
Referring again to Figure 16 A, in certain embodiments, cell group GA sent via transmission path 4112 and compared to
The relatively small output queue (not shown) of such as input rank 1610 is received.In certain embodiments, output queue (or output
A part for queue) can be relevant with priority valve.Priority valve can be associated with one or more input ranks 1610.Output is adjusted
Degree module 1630 can be configured as extracting cell group GA from output queue and can be configured as sending cell group to output port P1
GA。
In certain embodiments, when cell group GA is sent to the outlet side of switching fabric 1600, cell group GA is adjoint
Being included in the response identifier in cell group GA can be extracted by input scheduling module 1620 and be sent to output port P1 together.Ring
Identifier is answered to be defined and be included in transmission response 24 in output scheduling module 1630.In certain embodiments, if
Cell group GA is queued up in the output queue (not shown) associated with cell group GA destination port, then response identifier can be used
In extracting cell group GA from cell group GA destination port, so that cell group GA can be from switching fabric 1600 via cell group GA
Destination port sent.Response identifier can be associated with the position in output queue, and the output queue is via defeated
Go out the queuing that scheduler module 1630 is cell group GA to retain.
In certain embodiments, when transmission request (such as transmission request Figure 16 B shown in associated with cell group
22) when being defined, the cell group queued up in input rank 1610 can be moved to memory 1622.For example, in input rank IQK
The cell group GD of queuing can be defined in response to the transmission request associated with cell group GD and be moved to memory 1622.
In some embodiments, cell group GD can be adjusted in the transmission request associated with cell group GD from input scheduling module 1620 to output
Degree module 1630 is moved to memory 1622 before sending.Cell group GD can be stored in memory 1622, until cell
The outlet side of group GD from the lateral switching fabric 1600 of input of switching fabric 1600 is sent.In certain embodiments, cell group energy
Memory 1622 is moved to, so as to reduce the congestion (for example, the end of a thread (HOL) blocks) at input rank IQK.
In certain embodiments, input scheduling module 1620 can be configured as based on the queue identity associated with cell group
Symbol and/or queue sequential value extract the cell group being stored in memory 1622.In certain embodiments, cell is in memory
Cell group position in 1622 can be determined based on inquiry table and/or index value.Cell group can be in cell group by from switching fabric
The outlet side of the 1600 lateral switching fabric 1600 of input is extracted before sending.For example, cell group GD energy and queue identifier
And/or queue sequential value is relevant.The position that cell group GD is stored in memory 1622 can be with queue identifier and/or queue
Sequential value is associated.The transmission request for being defined from input scheduling module 1620 and being sent to output scheduling module 1630 can include team
Column identifier and/or queue sequential value.From output scheduling module 1630 receive transmission response can include queue identifier and/or
Queue sequential value.In response to transmission response, input scheduling module 1620 can be configured as based on queue identifier and/or queue
Cell group GD is extracted in the position of sequential value from memory 1622, and input scheduling module 1620 can trigger cell group GD biography
It is defeated.
In certain embodiments, some cell numbers being included in cell group can be based on available in memory 1622
Amount of space is defined.For example, input scheduling module 1620 can be configured as being based on being included in storage when cell group GD is defined
Amount of available storage space in device 1622 determines the cell quantity being included in cell group GD.In certain embodiments, if bag
The amount of available storage space included in memory 1622 increases, then the cell quantity being included in cell group GD can increase.One
In a little embodiments, cell group GD be moved to memory 1622 be used for store before or after, be included in cell group GD
Cell quantity can be increased by input scheduling module 1620.
In certain embodiments, being included in the quantity of some cells in cell group can be based on passing through such as switching fabric
The stand-by period of 1600 transmission is defined.Especially, in view of the stand-by period associated with switching fabric 1600, input scheduling
Module 1620 can be configured as defining the size of cell group promoting flow to pass through switching fabric 1600.For example, because cell group
The threshold size that the stand-by period based on switching fabric 1600 defines is reached, input scheduling module 1620 can be configured as closing
Close cell group (for example, defining the size of cell group).In certain embodiments, input scheduling module 1620 can be configured as immediately
The packet in cell group is sent, bigger cell group is defined without being to wait for other packet, because by handing over
The stand-by period for changing structure 1600 is short.
In certain embodiments, to be configured as limitation lateral from the input of switching fabric 1600 for input scheduling module 1620
The quantity for the transmission request that the outlet side of switching fabric 1600 is sent.In certain embodiments, the limitation can be defeated based on being stored in
The strategy for entering scheduler module 1620 is defined.In certain embodiments, the limitation can be based on and one or more input ranks
1610 associated priority valves are defined.For example, input scheduling module 1620 can be configured as allowing and (being based on threshold restriction)
The transmission request associated with input rank IQ1 is more than the transmission request from input rank IQK, because input rank IQ1 has
There is the priority valve higher than input rank IQK.
In certain embodiments, one or more parts of input scheduling module 1620 and/or output scheduling module 1630
Can be hardware based module (for example, DSP, FPGA) and/or module based on software (for example, computer code module, energy
The processor readable instruction sets performed on a processor).In certain embodiments, with input scheduling module 1620 and/or output
The associated one or more functions of scheduler module 1630 can be included in different modules and/or be combined into one or more
Module.For example, cell group GA can be in input scheduling module 1620 the first submodule define and transmit request 22 (Figure 16 B
It is shown) the second submodule that can be in input scheduling module 1620 defines.
In certain embodiments, switching fabric 1600 has than more or less levels shown in Figure 16 A.In some realities
Apply in example, switching fabric 1600 can be that the switching fabric and/or time division multiplexing of reconfigurable (for example, can recombinate) exchange knot
Structure.In certain embodiments, switching fabric 1600 can be based on Clos (clo this) network architecture (for example, on stricti jurise
Clog-free Clos (clo this) network, Benes (David Barnes) network) be defined.
Figure 17 is to be shown to queue up at the input rank 1720 positioned at the input side of switching fabric 1700 according to one embodiment
Two cell groups schematic block diagram.Cell group is defined by input scheduling module 1740 on the input side of switching fabric 1700,
Switching fabric 1700 can be for example associated with exchcange core and/or be included in exchcange core for example shown in Figure 16 A
In.Input rank 1720 is also on the input side of switching fabric 1700.In certain embodiments, input rank 1720 can be included
In the input line card (not shown) associated with switching fabric 1700.Although it is not shown, but in certain embodiments, one
Or multiple cell groups can include multiple cells (for example, 25 cells, 10 cells, 100 cells) or only one cell.
As shown in figure 17, input rank 1720 includes cell 1 to T (i.e. cell 1 arrives cell T), and it can the row of being collectively referred to as
Team's cell 1710.Input rank 1720 is fifo type queue, and cell 1 is located at front end 1724 (or transmission end) and the letter of queue
First T is located at the rear end 1722 (or arrival end) of queue.As shown in figure 17, the queuing cell 1710 at input rank 1720 includes
First cell group 1712 and the second cell group 1716.In certain embodiments, each cell of queuing cell 1710 is come from
With equal length (for example, 32 bit lengths, 64 bit lengths).In certain embodiments, two in queuing cell 1710
It is individual or more to have different length.
Come from queuing cell 1710 each cell have be to by come from queuing cell 1710 each letter
One-output end in four output ports 1770 that output port label (for example, alphabetical " E ", alphabetical " F ") in member is indicated
Mouth E, output port F, output port G or the transferring queued contents of output port H.The output port 1770 that cell is sent to
Destination port can be referred to as.Queuing cell 1710 each its corresponding purpose can be sent to via switching fabric 1700
Ground port.In certain embodiments, input scheduling module 1740 can be configured as based on the same inquiry table of such as routing table
(LUT) determine for the destination port for each cell for coming from queuing cell 1710.In certain embodiments, come from
The destination port of each cell of queuing cell 1710 can the purpose based on the content (for example, data) being included in cell
Ground is determined.In certain embodiments, one or more output ports 1770 can be associated with output queue, believes in output queue
Member can queue up to be sent until via output port 1770.
First cell group 1712 and the second cell group 1716 can be by input scheduling modules 1740 based on queuing cell 1710
Destination port is defined.As shown in figure 17, each cell being included in the first cell group 1712 has by output port
The identical destination port (that is, output port E) that label " E " is indicated.Similarly, it is included in each in the second cell group 1716
Individual cell has the identical destination port (that is, output port F) indicated by output port label " F ".
Cell group (for example, first cell group 1712) can be defined based on destination port, because cell group is via exchange
Structure 1700 is sent as group.For example, if cell 1 were included in the first cell group 1712, the first cell group 1712
Single destination port can not be sent to, because cell 1 has and cell 2 to the different mesh of cell 7 (output port " E ")
Ground port (output port " F ").So, the first cell group 1712 is transmitted not via switching fabric 1700 as group.
Cell group is defined as continuous block of cells because cell group via switching fabric 1700 as group by transmission
And because input rank 1720 is the queue of fifo type.For example, cell 12, and cell 2 cannot function as cell group to cell 7
It is defined, because cell 12 can not together be sent with the block of cells of cell 2 to cell 7.Cell 8 to cell 11 be between
Cell, its in cell 2 to cell 7 after input rank 1720 is sent, but in cell 12 from the quilt of input rank 1720
It must be sent before transmission from input rank 1720.In certain embodiments, if input rank 1720 is not fifo type
Queue, one or more queuing cells 1710 may be sent out of order and group may span across cell between.
Although it is not shown, but come from queuing cell 1710 each cell can have can be referred to as sequence of cells
The sequential value of value.Sequence of cells value can represent such as order of the cell 2 relative to cell 3.Sequence of cells value can be used for for example
One or more output ports 1770 are reset column cell from output port 1770 in the content associated with cell before sending.
For example, in certain embodiments, cell group 1712 can be received simultaneously in the output queue (not shown) associated with output port E
Based on sequence of cells value permutatation.In certain embodiments, output queue can compared to input rank 1720 it is relatively small (for example,
Shallow (shallow) output queue).
In addition, the data (for example, packet) being included in cell can also have the sequence for being referred to as data sequence value
Value.For example, data sequence value can represent relative ranks of such as the first packet relative to the second packet.Data sequence
Value can be used at for example one or more output ports 1770 in packet from output port 1770 by weight before sending
Data packets.
Figure 18 is that had to arrange at the bright input rank 1820 positioned at the input side of switching fabric 1800 according to another embodiment
The schematic block diagram of two cell groups of team.Cell group is defined by input scheduling module 1840 on the input side of switching fabric 1800,
Switching fabric 1800 can be for example associated with exchcange core and/or be included in exchcange core as shown in Figure 16 A.It is defeated
Enqueue 1820 is also on the input side of switching fabric 1800.In certain embodiments, input rank 1820 can be included in
In the associated input line card (not shown) of switching fabric 1800.Although it is not shown, but in certain embodiments, one or many
Individual cell group can include only one cell.
As shown in figure 18, input rank 1820 includes cell 1 to Z (i.e. cell 1 arrives cell Z), and it is collectively referred to as queuing up
Cell 1810.Input rank 1820 is fifo type queue, and wherein cell 1 is in the front end 1824 (or transmission end) of queue and letter
First Z is in the rear end 1822 (or arrival end) of queue.As shown in figure 18, the queuing cell 1810 at input rank 1820 includes the
One cell group 1812 and the second cell group 1816.In certain embodiments, each cell from queuing cell 1810 has
Equal length (for example, 32 bit lengths, 64 bit lengths).In certain embodiments, two or more queuing cells 1810
With different length.In this embodiment, input rank 1820 is mapped to output port F2 so as to all cells 1810
Dispatched by input scheduling module 1840 for being transferred to output port F2 via switching fabric 1800.
Coming from each cell of queuing cell 1810 has with one or more packets (for example, ether netting index
According to packet) associated content.The packet is represented by alphabetical " Q " to " Y ".For example, as shown in figure 18, packet R quilts
It is divided into three different cells, cell 2, cell 3 and cell 4.
Cell group (for example, first cell group 1812) is defined, so that partial data packet is not associated to different cells
Group.In other words, cell group is defined, so that all packets are all associated with single cell group.The border of cell group
Border based on the packet queued up at input rank 1820 is defined, so that packet is not included in different letters
In tuple.Fragment data packets are that different cell groups may cause undesirable result, for example, exported in switching fabric 1800
The buffering of side.If for example, packet T Part I (such as cell 6) be included in the first cell group 1812 and
Packet T Part II (such as cell 7) is included in the second cell group 1816, then packet T Part I must
It must be buffered at least a portion in one or more output queue (not shown) of the outlet side of switching fabric 1800, until number
The outlet side of switching fabric 1800 is sent to according to packet T Part II, so that all data packets T is passed through from switching fabric 1800
Sent by output port E2.
In certain embodiments, sequential value can also be had by being included in the packet in queuing cell 1810, and it is referred to as
Data sequence value.Data sequence value can represent such as relative ranks of the packet R relative to packet S.Data sequence value
It can be used in packet from output port 1870 by before sending, be recombinated at for example one or more output ports 1870
Packet.
Figure 19 is to show the method flow diagram via the transmission of switching fabric scheduling cells group according to one embodiment.Such as Figure 19
Shown, 1900, cell is queued up at input rank to be received for the designator of transmission via switching fabric.In some implementations
In example, switching fabric can be based on Clos (clo this) architecture, and can have multistage.In certain embodiments, exchange
Structure can be associated with exchcange core (for example, within).In certain embodiments, when new cell is received in input rank
When, or when getting out (or being ready to) when cell and being sent via switching fabric at once, designator can be received.
1910, the cell group with common purpose ground is defined according to the cell queued up at input rank.Come from
The destination of each cell of cell group is determined based on inquiry table.In certain embodiments, destination be based on strategy and/or
It is determined based on packet sorting algorithm.In certain embodiments, can be related to switching fabric importation common purpose
The common purpose of connection ground port.
1920, request label is related to cell group.Label is asked to include for example, one or more cell quantity
Value, destination mark symbol, queue identifier, queue sequential value etc..Cell group be sent to switching fabric input side it
Before, request label can be associated with cell group.
1930, including the transmission request of label is asked to be sent to output scheduling module.In certain embodiments, transmit
Request is included in special time or the request sent via particular transmission path.In certain embodiments, transmission request can be
Cell group is sent after being already stored in the memory associated with switching fabric input stage.In certain embodiments,
Cell group can be moved to memory to reduce the possibility of the congestion at input rank.In other words, cell group can be moved
To memory so as to which other cells queued up after cell group can be prepared for from the transmission (or transmission) at input rank,
Without waiting cell group to be sent at input rank.In certain embodiments, transmission request can be sent to specific defeated
The request of exit port (for example, specific destination port).
1950, asked when in response to transmission, via the transmission of switching fabric when 1940 are not authorized to, including response
The transmission refusal of label is sent to input scheduling module.In certain embodiments, transmission request can be rejected, because exchanging
Structure congestion, destination port are unavailable etc..In certain embodiments, transmission request can be rejected a specific time
Section.In certain embodiments, responsive tags be able to can be used to transmit what refusal was associated with cell group including one or more
Identifier.
If be authorized to 1940 via the transmission of switching fabric, 1960, including to the response of input scheduling module
The transmission response of label is sent.In certain embodiments, transmission response can be that transmission is authorized.In certain embodiments, pass
Defeated response can be ready to be sent (or being ready to) receives cell group at once after in the destination of cell group.
1970, cell group is extracted based on responsive tags.If cell group has been moved to memory, cell group
It can be extracted from memory.If cell group is queued up at input rank, cell group can be extracted from input rank.Cell
Group can be extracted based on the queue identifier and/or queue sequential value being included in responsive tags.Queue identifier and/or team
Row sequential value may be from queue label.
1980, cell group can be sent via switching fabric.Cell group can be according to the instruction being included in transmission response
Sent via switching fabric.In certain embodiments, cell group can be in the specific time and/or via specific transmission path
Sent.In certain embodiments, cell group can be sent via switching fabric to the destination of such as output port.In some realities
Apply in example, via switching fabric by send after, cell group can with the destination of cell group (for example, destination port) phase
Queued up at the output queue of association.
Figure 20 is to show the signaling process figure that the request sequence value associated with transmission request is handled according to one embodiment.
As shown in figure 20, transmission request 52 is sent in switching fabric from the input scheduling module 2020 on switching fabric input side
Output scheduling module 2030 on outlet side.Transmission request 56 is after transmission request 52 is sent from input scheduling module 2020
It is sent to output scheduling module 2030.As shown in figure 20, transmission request 54 is sent from input scheduling module 2020, but not
Received by output scheduling module 2030.Transmission request 52, transmission request 54 and transmission request 56 are each inputted with identical
Queue IQ1 is associated, as indicated by its corresponding queue identifier, and relevant with identical destination port EP1, such as its phase
The destination mark symbol answered is indicated.Transmission request 52, transmission request 54 and transmission request 56 can be collectively referred to as transmission request
58.As shown in figure 20, the time increases in the downstream direction.
As shown in figure 20, each transmission request 58 may include request sequence value (SV).Request sequence value can represent transmission
Ask the sequence relative to other transmission requests.In this embodiment, request sequence value may be from and destination port EP1 phases
The scope of the request sequence value of association, and increased by numerical order in the form of full integer.In certain embodiments, sequence is asked
Train value can be such as string (strings), and (for example, opposite numerical order) can increase in a different order.Transmission please
Asking 52 includes request sequence value 5200, and transmission request 54 includes request sequence value 5201, and transmission request 56 includes request sequence
Train value 5202.In this embodiment, request sequence value 5200 indicate transmission request 52 transmission request 54 before be defined and
Sent, transmission request 54 has request sequence value 5201.
Output scheduling module 2030 can determine the biography from the transmission request of input scheduling module 2020 based on request sequence value
It is defeated to have failed.Especially, output scheduling module 2030 can determine that the transmission request associated with request sequence value 5201
Do not received in transmission request 56 before receiving, transmission request 56 is relevant with request sequence value 5202.In certain embodiments,
Exceed threshold time period when the period (being shown as the period 2040) between the reception in transmission request 52 and transmission request 56
When, output scheduling module 2030 can perform the action of the transmission request 54 on loss.In certain embodiments, output scheduling mould
Block 2030 can request that input scheduling module 2020 retransmits transmission request 54.Output scheduling module 2030 may include the request sequence lost
Train value, so that input scheduling module 2020 can recognize that transmission request 54 is not received.In certain embodiments, output scheduling module
2030 can refuse to be included in the request for transmitting cell group in transmission request 56.In certain embodiments, output scheduling mould
Block 2030 can be configured as based on queue sequential value in the way of being substantially similar to and be described method together with request sequence value
Processing and/or response transmission request (such as transmission request 58).
Figure 21 is the signaling process figure for showing the response sequence value relevant with transmission response according to one embodiment.Such as Figure 21
Shown, transmission response 62 is sent to the defeated of switching fabric input side from the output scheduling module 2130 on switching fabric outlet side
Enter scheduler module 2120.Transmission response 66 is sent to input from output scheduling module 2130 after transmission response 62 is sent and adjusted
Spend module 2120.As shown in figure 21, transmission response 64 is sent from output scheduling module 2130, but not by input scheduling module
2120 receive.Transmission response 62, transmission response 64 and transmission response 66 and identical by its correspondingly queue identifier indicate
Input rank IQ2 is associated.Transmission response 62, transmission response 64 and transmission response 66 can be collectively referred to as transmission response 68.Such as
Shown in Figure 21, the time increases in the downstream direction.
As shown in figure 21, each transmission response 68 may include response sequence value (SV).Response sequence value can represent relative
In the transmission response sequence of other transmission responses.In this embodiment, response sequence value may come from and input rank IQ2 phases
The scope of the response sequence value of association, and increased according to numerical order in the form of full integer.In certain embodiments, respond
Sequential value can for example be gone here and there, and can increase in a different order (for example, reversely numerical order).Transmission response 62 may include
Response sequence value 5300, transmission response 64 includes response sequence value 5301, and outflow response 66 includes response sequence value 5302.
In the embodiment, response sequence value 5300 indicates the quilt before the transmission response 64 with corresponding sequence value 5301 of transmission response 62
Definition and transmission.
Input scheduling module 2120 can determine the biography from the transmission response of output scheduling module 2130 based on response sequence value
It is defeated to have failed.Especially, input scheduling module 2120 can determine that the transmission response associated with response sequence value 5301
Do not received in transmission response 66 before receiving, transmission response 66 is associated with response sequence value 5302.In some embodiments
In, when the period (being shown as the time cycle 2140) between the reception in transmission response 62 and transmission response 66 exceeding threshold value
Between the cycle when, input scheduling module 2120 can perform the action of the transmission response 64 on loss.In certain embodiments, input
Scheduler module 2120 can request that output scheduling module 2130 retransmits transmission response 64.Input scheduling module 2120 may include what is lost
Response sequence value, so that output scheduling module 2130 can recognize that transmission response 64 is not received.In certain embodiments, when with biography
When the associated transmission response of defeated request is not received within the specific time cycle, input scheduling module 2120 can dropped cell
Group.
Figure 22 is the multistage schematic block diagram for showing the controllable queue of flow according to one embodiment.As shown in figure 22, first
The source that the level sending side of queue 2210 and the sending side of second level queue 2220 are included on the sending side of physical link 2200 is real
In body 2230.The receiving side of first order queue 2210 and the receiving side of second level queue 2220 are included in physical link 2200 and connect
Receive in the destination entity 2240 on side.Source entity 2230 and/or destination entity 2240 can be any type of computing device (examples
Such as, a part for exchcange core, peripheral processor), it can be configured as receiving and/or sending via physical link 2200
Data.In certain embodiments, source entity 2230 and/or destination entity 2240 can be associated with data center.
As shown in figure 22, first order queue 2210 is included in transmit queue A1 to the A4 on the sending side of physical link 2200
(being referred to as first order transmit queue 2234) and receiving queue D1 to the D4 in the receiving side of physical link 2200 (are referred to as the first order
Receiving queue 2244).The transmit queue B1 and B2 that second level queue 2220 is included on the sending side of physical link 2200 (are referred to as the
Two grades of transmit queues 2232) and the receiving side of physical link 2200 on receiving queue C1 and C2 (be referred to as second level receiving queue
2242)。
Can be based on the stream between source entity 2230 and destination entity 2240 via the data flow of physical link 2200
The associated flow control signaling of amount control ring is controlled (for example, modification, pause).For example, from the sending side of physical link 2200
On source entity 2230 send data can be received in the destination entity 2240 in the receiving side of physical link 2200.Work as destination entity
2240 be not useable for from source entity 2230 via physical link 2200 receive data when, flow control signal can be in destination entity
It is defined at 2240 and/or source entity 2230 can be sent to from destination entity 2240.Flow control signal can be configured as touching
Entity 2230 is risen to change the data flow from source entity 2230 to destination entity 2240.
If for example, receiving queue D2 is not useable for the data that processing is sent from transmit queue A1, destination entity 2240
It can be configured as sending the flow control signal associated with flow control ring to source entity 2230;Flow control signal can by with
Triggering is set to from transmit queue A1 to pauses of the receiving queue D2 via the data transfer of transmission path, transmission path includes second
At least a portion and physical link 2200 of level queue 2220.In certain embodiments, receiving queue D2 may unavailable, example
Such as, when can not receive data when receiving queue D2 is too full.In certain embodiments, receiving queue D2 can in response to previously from
The data that transmit queue A1 is received change into down state (for example, congestion state) from upstate.In certain embodiments,
Transmit queue A1 can be referred to as the target of flow control signal.Transmit queue A1 can be based on sending team in flow control signal
The associated queue identifiers of A1 are arranged to be identified.In certain embodiments, flow control signal can be referred to as feedback signal.
In this embodiment, flow control ring is associated with physical link 2200 (being referred to as physical link control ring), flow
Control ring is associated with first order queue 2210 (being referred to as first order control ring), and flow control ring and second level queue 2220
Associated (being referred to as second level control ring).Especially, physical link control ring is not with including physical link 2200 and including first
The transmission path of level queue 2210 and second level queue 2200 is associated.It can be based on and thing via the data flow of physical link 2200
The relevant flow control signaling of reason link control ring is turned on and off.
First order control ring can the number based at least one transmit queue 2234 come from second level queue 2210
According to transmission and based at least one availability of receiving queue 2244 (for example, designator of availability) in first order queue 2210
The flow control signal of definition.So, first order control ring can be referred to as associated with first order queue 2210.The first order is controlled
Ring can be with including at least a portion of physical link 2200, at least a portion of second level queue 2220 and first order queue 2210
Transmission path be associated.The flow control signaling relevant with first order control ring can trigger control and come from and first order queue
The data flow of 2210 associated transmit queues 2234.
Second level control ring can with including physical link 2200 and including at least a portion of second level queue 2220, but
Not including the transmission path of first order queue 2210 is associated.Second level control ring can be based on out of second level queue 2220 at least
One transmit queue 2232 and based at least one availability of receiving queue 2242 in second level queue 2220 (for example, availability
Designator) data transfer of flow control signal that defines.So, second level control ring can be referred to as and second level queue
2220 are associated.The flow control signaling being associated to second level control ring, which can be triggered, to be controlled from related with second level queue 2220
The data flow of the transmit queue 2232 of connection.
In this embodiment, the flow control ring associated with second level queue 2220 is the flow control based on priority
Ring.Especially, come from each transmit queue of second level transmit queue 2232 and come from second level receiving queue 2242
Receiving queue pairing;And each queue pair is relevant with service class (being also known as the grade of service or service quality).
In the embodiment, second level transmit queue B1 and second level transmit queue C1 define queue pair and associated with service class x.The
Two grades of transmit queue B2 and second level transmit queue C2 define queue pair and associated with service class Y.In certain embodiments,
Different types of Internet traffic can be associated from different service class (i.e. different priority).For example, the storage traffic (example
Such as, read and write traffic), inter-processor communication, media signaling, session layer signaling etc. can be with an at least seeervice level
It is not related.In certain embodiments, second level control ring can be based on, for example Institute of Electrical and Electric Engineers (IEEE)
802.1qbb agreements, it defines the flow control policy based on priority.
Via the data traffic of transmission path 74, as shown in figure 22, it can be controlled using at least one control ring.Transmit road
Footpath 74 includes first order transmit queue A2, second level transmit queue B1, physical link 2200, second level receiving queue C1 and first
Level receiving queue D3.However, via number of the queue in the one-level of transmission path 74 based on the flow control ring associated with this grade
, can be by the another first order impact data flow of transmission path 74 according to the change in stream.Flow control at one-level can influence
Another grade of data flow, because the queue (for example, transmit queue 2232, transmit queue 2234) and purpose in source entity 2230 are real
Queue (for example, receiving queue 2242, receiving queue 2244) in body 2240 is classification section.In other words, based on a flow
The flow control of control ring can have via the factor associated with different flow control ring on the raw influence of data miscarriage.
For example, can be based on to first order receiving queue D3 data flow via transmission path 74 from first order transmit queue A1
One or more control rings-first order control ring, second level control ring and/or physical link control ring are changed.To the first order
The pause of receiving queue D3 data flow from upstate may change into down state (example due to first order receiving queue D3
Such as, congestion state) and be triggered.
If the data flow to first order receiving queue D3 is associated with service class x, via second level transmit queue
B1 and second level receiving queue C1 (its define the queue associated with service class x to) data flow can based on and second level control
The associated flow control signaling pause of ring (it is the control ring based on priority) processed.But via related to service class x
The data transmission suspension of the queue pair of connection can cause the data for coming from the transmit queue for being input to second level transmit queue B1 to pass
Defeated pause.Especially, it can cause to come not only from first via the data transmission suspension of the queue pair associated with service class x
Level transmit queue A2 data transfer, is also from the pause of first order transmit queue A1 data transfer.In other words, come from
First order transmit queue A1 data flow is indirect or is concurrently affected.In certain embodiments, received at transmit queue A1
Data and the data that are received at transmit queue A2 can be associated with identical service class X, but at transmit queue A1
The data of reception and the data received at transmit queue A2 may be from for example different (for example, independent) network equipments
(not shown), such as peripheral processor, it can be associated from different service class.
Data flow to first order receiving queue D3 can also be especially by coming from first order transmit queue A2 data
Transmission pause is suspended based on the flow control signaling relevant with first order control ring.Team's A2 numbers are sent by coming from the first order
According to the direct pause of transmission, coming from first order transmit queue A1 data transfer can be not disrupted.In other words, the first order
Transmit queue A2 flow control can be directly controlled based on the flow control signal associated with first order control ring, without
Come from other first order transmit queues such as first order transmit queue A1 data transmission suspension.
Data flow to first order receiving queue D3 can also be by being based on having with physical link control ring via physical link 220
The flow control signaling data transmission suspension of pass is controlled.But via the data transmission suspension of physical link 2200 can cause through
By all data transmission suspensions of physical link 2200.
Queue on the sending side of physical link 2200 can be referred to as transmit queue 2236 and in physical link receiving side
On queue can be referred to as receiving queue 2246.In certain embodiments, transmit queue 2236 can also be referred to as source queue, and connect
Destination queue can be referred to as by receiving queue 2246.Although it is not shown, in certain embodiments, one or more transmit queues
2236 can be included in one or more interface cards associated with source entity 2230, and one or more receiving queues
2246 can be included in one or more interface cards relevant with destination entity 2240.
When source entity 2230 sends data via physical link 2200, source entity 2230 can be referred to as being located at physical link
The transmitter of 2200 sending sides.Destination entity 2240 can be configured as receiving data and be referred to as receiving positioned at physical link 2200
Receiver on side.Although it is not shown, in certain embodiments, source entity 2230 (and associated element is (for example, hair
Send queue 2236)) it can be configured as working as destination entity (for example, receiver) and destination entity 2240 is (and related
Element (for example, receiving queue 2246)) it can be configured as working as source entity (for example, transmitter).In addition, physical link
2200 can work as bi-directional link.
In certain embodiments, physical link 2200 can be tangible link, for example optical link (for example, fiber optic cables,
Plastic optical fiber cable), cable link (for example, electric wire based on copper), twisted pair wire links (for example, 5 class cables) etc..At some
In embodiment, physical link 2200 can be Radio Link.Such as ether is based on via the data transmissions of physical link 2200
FidonetFido, wireless protocols, Ethernet protocol, fibre channel protocol, Ethernet fibre channel protocol, the agreement for being related to infinite bandwidth
And/or etc. agreement be defined.
In certain embodiments, second level control ring can be referred to as being nested in first order control ring, because and the second level
The associated second level queue 2220 of control ring is located in the first order queue 2210 associated with first order control ring.It is similar
Ground, physical link control ring can be referred to as being nested in the control ring of the second level.In certain embodiments, second level control ring energy quilt
Referred to as internal control ring, and first order control ring can be referred to as outside control ring.
Figure 23 is the multistage schematic block diagram for showing the controllable queue of flow according to one embodiment.As shown in figure 23, first
The level sending side of queue 2310 and the sending side of second level queue 2320 are included on the sending side of physical link 2300
In source entity 2330.The receiving side of first order queue 2310 and the receiving side of second level queue 2320 are included in positioned at physics chain
In destination entity 2340 in the receiving side of road 2300.Queue on the sending side of physical link 2300 can be collectively referred to as transmit queue
Queue on 2336, and physical link receiving side can be collectively referred to as receiving queue 2346.Although it is not shown, at some
In embodiment, source entity 2330 can be configured as destination entity working, and destination entity 2340 can be configured as conduct
Source entity (for example, transmitter) works.In addition, physical link 2300 can work as bi-directional link.
As shown in figure 23, source entity 2330 communicates with destination entity 2340 via physical link 2300.Source entity 2330 has
There is queue QP1, it is configured as in data via physical link 2300 by buffered data (if desired) before sending, and mesh
Entity 2340 have queue QP2, its be configured as data destination entity 2340 be allocated before buffer via physical link
2300 data (if desired) received.In certain embodiments, it can be processed via the data flow of physical link 2300, without
Need buffering queue QP1 and queue QP2.
Being included in transmit queue QAl to the QAN in first order queue 2310, each can be referred to as first order transmit queue
And transmit queue 2334 (or queue 2334) can be collectively referred to as.The transmit queue QB1 being included in second level queue 2320 is arrived
QBM each can be referred to as second level transmit queue and transmit queue 2332 (or queue 2332) can be collectively referred to as.It is included in
Receiving queue QD1 to QDR in first order queue 2310 each can be referred to as first order receiving queue and can be collectively referred to as
Receiving queue 2344 (or queue 2344).Being included in receiving queue QC1 to the QCM in second level queue 2320, each can be claimed
For second level receiving queue and receiving queue 2342 (or queue 2342) can be collectively referred to as.
As shown in figure 23, each queue for coming from second level queue 2320 is located in physical link 2300 and come from
Within transmission path in first order queue 2310 between at least one queue.For example, a part for transmission path can be by first
Level receiving queue QD4, second level receiving queue QC1 and physical link 2300 are defined.Second level receiving queue QC1 is located at first
In transmission path between level receiving queue QD4 and physical link 2300.
In this embodiment, physical link control ring is associated with physical link 2300, first order control ring and first
Level queue 2310 is associated, and second level control ring is associated with second level queue 2320.In certain embodiments, the second level
Control ring can be the control ring based on priority.In certain embodiments, physical link control ring include physical link 2300,
Queue QP1 and queue QP2.
Flow control signal can be at source entity 2330 source control module 2370 and destination entity 2340 at purpose control
Molding block 2380 is defined and/or sent in-between.In certain embodiments, source control module 2370 can be referred to as source stream
Control module is measured, and purpose control module 2380 can be referred to as target flow control module.For example, purpose control module 2380
The one or more receiving queues 2346 (for example, receiving queue QD2) that can be configured as at destination entity 2340 are unavailable
When data are received, to the transmitted traffic control signal of source control module 2370.Flow control signal can be configured as trigger source control
Molding block 2370 for example suspends the data flow from one or more receiving queues 2330 to one or more receiving queues 2346.
In data by before sending, source control module 2370 is by queue identifier and is coming from the hair of transmit queue 2336
The data queued up at queue are sent to be associated.Queue identifier can represent and/or be used for the transmit queue that identification data is queued up.Example
Such as, when packet is queued up in first order transmit queue QA4, unique identification first order transmit queue QA4 queue identifier
It can be added in packet or be included in the field (for example, head, afterbody, payload) in packet.
In some embodiments, queue identifier can be relevant with the data at source control module 2370, or is touched by source control module 2370
Hair.In certain embodiments, only data by send before, or data from one of transmit queue 2336 by send after,
Queue identifier can be associated with data.
Queue identifier can be related to being sent to the data of the receiving side of physical link 2300 from the sending side of physical link 2300
Connection can be identified so as to data source (for example, source queue).Therefore, flow control signal can be defined to temporary based on queue identifier
Stop the transmission of one or more transmit queues 2336.For example, the queue identifier energy quilt associated with first order transmit queue QAN
In being included in the packet sent from first order transmit queue QAN to first order receiving queue QD3.If receiving data point
After group, first order receiving queue QD3 can not receive another packet for coming from first order transmit queue QAN, then please
The flow control signal energy for asking first order transmit queue QAN pauses to be transmitted to first order receiving queue QD3 additional data packet
It is defined based on the queue identifier associated with first order transmit queue QAN.Queue identifier can be by purpose control module
2380 parse from packet, and by purpose control module 2380 for defining flow control signal.
In certain embodiments, connect from several transmit queues 2336 (for example, first order transmit queue 2334) to the first order
The data transmissions for receiving queue QDR are changed into down state from upstate in response to first order receiving queue QDR and suspended.
Each in several transmit queues 2336 can be identified based on its corresponding queue identifier in flow control signal.
In certain embodiments, one or more transmit queues 2336 and/or one or more receiving queues 2346 can be with
It is virtual queue (for example, set of queues of logical definition).Therefore, queue identifier can be associated with virtual queue (for example, energy
Embody).In certain embodiments, the queue that the queue that queue identifier can be to coming from definition virtual queue is concentrated is related
Connection.In certain embodiments, each queue identifier of the queue identifier collection associated with physical link 2300 is come from
Can be unique.For example, each transmit queue being associated with physical link 2300 (for example, associated with redirecting)
2336 can be associated with unique queue identifier.
In certain embodiments, source control module 2370 can be configured as by queue identifier only with transmit queue 2336
One particular subset and/or only associated with the data subset queued up at one of transmit queue 2336 place.If for example, data
It is not accompanied by queue identifier and is sent to first order receiving queue QD1 from first order transmit queue QA2, then is configured to request and comes from
It can not be defined in the flow control signal of first order transmit queue QA2 data transmission suspension, because being unaware of source data.
Therefore, when data are sent from transmit queue, by queue identifier and data not being contacted into (for example, omission), come from
The transmit queue of transmit queue 2336 can be exempted from flow control.
In certain embodiments, the unavailability energy base of one or more receiving queues 2346 at destination entity 2340
It is satisfied and is defined in condition.The condition can relate to the storage limitation of queue, queue access rate, the data for being input to queue
Flow rate etc..For example, flow control signal can be at purpose control module 2380 in response to one or more receiving queues
2346 state, such as second level receiving queue QC2 from upstate be based on threshold value storage limitation be exceeded change into it is unavailable
State (for example, congestion state) is defined.When in down state, second level receiving queue QC2 is not useable for receiving number
According to because such as second level receiving queue QC2 is considered as too full (such as storing exceeding for limitation by threshold value indicated).In some realities
Apply in example, when disabled, one or more receiving queues 2346 can be in down state.In certain embodiments, reception is worked as
When queue is not useable for receiving data, flow control signal can be based on request to the receiving queue for coming from receiving queue 2346
Data transmission suspension is defined.In certain embodiments, the state of one or more receiving queues 2346 can be in response to receiving team
The particular subset that row 2346 (for example, receiving queue in specific level) are in congestion state changes into congestion state from upstate
(by purpose control module 2380).
In certain embodiments, flow control signal can be defined to indicate receiving queue at purpose control module 2380
One in 2346 is changed into upstate from down state.For example, initially, purpose control module 2380 can by with
It is set to definition and changes into down state from upstate in response to first order receiving queue QD3 and sends first flow control letter
Number arrive source control module 2370.First order receiving queue QD3 can be in response to the data that are sent from first order transmit queue QA2 from can
It is down state with state change.Therefore, the target of first flow control signal can be first order transmit queue QA2 (bases
Indicated in queue identifier).When first order receiving queue QD3 changes back upstate from down state, purpose control
Module 2380 can be configured as definition and send second flow control signal to source control module 2370, and it is indicated from unavailable shape
State changes back upstate.In certain embodiments, source control module 2370 can be configured to respond to second flow control letter
Number data transfer of the triggering from one or more transmit queues 2336 to first order receiving queue QD3.
In certain embodiments, flow control signal can have one or more parameter values, and it passes through source control module
2370 are used for the biography that modification comes from one of transmit queue 2336 (being recognized by queue identifier in flow control signal)
It is defeated.For example, flow control signal may include that trigger source control module 2370 suspends the biography for coming from one of transmit queue 2336
The parameter value of a defeated special time period (for example, 10 milliseconds (ms)).In other words, flow control signal may include time out section
Parameter value.In certain embodiments, time out section can be uncertain.In certain embodiments, flow control signal energy
Definition is from one or more transmit queues 2336 with special speed (for example, specified number of frames per second, given number bit per second)
Send the request of data.
In certain embodiments, flow control signal (for example, time out section in flow control signal) can be based on stream
Amount control algolithm is defined.Time out section can be based on coming from receiving queue 2346 (for example, first order receiving queue QD4)
Receiving queue be defined for the down state elapsed time cycle.In certain embodiments, time out section can be based on many
It is defined in a first order receiving queue 2344 for down state.For example, in certain embodiments, when similar one specific
When the first order receiving queue 2344 of number is congestion state, time out section increase.In certain embodiments, it is such
It is determined that can be determined in purpose control module 2380.The period that receiving queue is in unavailable experience can be by purpose control
Module 2380 is calculated based on the rate of discharge (for example, historical traffic rate, previous traffic rate) for for example coming from receiving queue data
Plan (such as, it is contemplated that) period.
In certain embodiments, source control module 2370 can refuse or change modification and comes from one or more transmit queues
The request of 2336 data flow.For example, in certain embodiments, source control module 2370 can be configured as reducing or increasing pause
Period.In certain embodiments, be not in response in flow control signal suspend data transfer, source control module 2370 can by with
Modification is set to transmitting one of queue 2336 associated transmission path.If for example, first order transmit queue QA2 bases
The request of pause transmission is received in the change of first order receiving queue QD2 states, then source control module 2370 can be configured as touching
Hair is from first order transmit queue QA2 to such as first order receiving queue QD3 data transfer, rather than asking according to pause transmission
Ask progress.
As shown in figure 23, within second level queue 2320 queue fan-in (fan into) is fanned out to (fan out) physics
Link 2300.For example, transmit queue 2332 (for example, QB1 to QBM) fan-in physical link on the sending side of physical link 2300
Queue QP1 on 2300 sending sides.Therefore, the data queued up at any transmit queue 2332 can be sent to physical link
2300 queue QP1.In the receiving side of physical link 2300, the data energy quilt sent from physical link 2300 via queue QP2
It is broadcast to receiving queue 2342 (that is, queue QC1 to QCM).
Equally, as shown in figure 23, the fan-in of transmit queue 2334 in first order queue 2310 is to second level queue 2320
Interior transmit queue 2332.For example, the data that any place is queued up in the first order transmit queue QA1, QA4 and QAN-2 can be sent out
It is sent to second level transmit queue QB2.In the receiving side of physical link 2300, the number sent from such as second level receiving queue QCM
According to the first order receiving queue QDR-1 and QDR can be broadcast to.
Due to many flow control rings (for example, first control ring) and different fan-ins, it is fanned out to architecture and is associated, stream
Amount control ring has different influences to the data flow via physical link 2300.For example, when from second level transmit queue QB1's
When data transfer is suspended based on second level control ring, from the first order transmit queue QA1, QA2, QA3 and QAN-1 via the second level
Transmit queue QB1 is also suspended to the data transfer of one or more receiving queues 2346.In this case, under coming from
Row flow queue (for example, second level transmit queue QB1) transmission pause when, come from one or more up flow queues (for example,
First order transmit queue QA1) data transmissions be suspended.If on the contrary, from first order transmit queue QA1 along including at least
The data transfer of downstream second level transmit queue QB1 transmission path is suspended based on first order control ring, then comes from second
Level transmit queue QB1 data on flows rate can be reduced, and the data transfer without coming from second level transmit queue QB1 is complete
Suspend in portion;For example, first order transmit queue QA1, still is able to send data via second level transmit queue QB1.
In certain embodiments, fan-in and be fanned out to architecture can be with the difference shown in Figure 23.For example, in some realities
Apply in example, some queues in first order queue 2310 can be configured as the roundabout ground fan-in physical link of second level queue 2320
2300。
The flow control signaling associated with transmit queue 2336 is handled and and receiving queue by source control module 2370
2346 associated flow control signalings are handled by purpose control module 2380.Although it is not shown, in certain embodiments,
Flow control signaling can by it is one or more can be control module that is independent and/or being integrated into single control module (or
Control submodule) processing.For example, the flow control signaling associated with first order receiving queue 2344 can by independently of by with
It is set to the control module processing for the control module for handling the flow control signaling associated with second level receiving queue 2342.It is similar
Ground, the flow control signaling associated with first order transmit queue 2334 can be by sending out independently of being configured as processing with the second level
The control module of the relevant flow control signaling control module of queue 2332 is sent to handle.In certain embodiments, source control module
2370 and/or one or more parts of purpose control module 2380 can be hardware based module (for example, DSP, FPGA)
And/or the module (for example, calculate node module, the processor readable instruction sets that can be performed on a processor) based on software.
Figure 24 is the schematic block diagram for showing purpose control module 2450 according to one embodiment, the purpose control module by with
It is set to and defines the flow control signal 6428 associated with multiple receiving queues.Queue level includes first order queue 2410 and second
Level queue 2420.As shown in figure 24, source control module 2460 is associated with the sending side of first order queue 2410 and purpose is controlled
Module 2450 is associated with the receiving side of first order queue 2410.Queue on the sending side of physical link 2400 can be claimed jointly
For transmit queue 2470.Queue in the receiving side of physical link 2400 can be collectively referred to as receiving queue 2480.
Purpose control module 2450 is configured to respond to one or more receiving queues in first order queue 2410 not
Data are received available for from the single source queue at first order queue 2410, to the transmitted traffic control signal of source control module 2460
6428.Source control module 2460 is configured as suspending from the source queue at first order queue 2410 based on flow control signal 6428
The data transfer of multiple receiving queues to first order queue 2410.
Flow control signal 6428 can by purpose control module 2450 based on each in first order queue 2410 not
It can be defined with the information that receiving queue is associated.Purpose control module 2450 can be configured as collecting and unavailable receiving queue
Associated information simultaneously is configured as defining flow control signal 6428, so that the flow control signal (not shown) of potential conflict
It is not delivered to the single source queue at first order queue 2410.In certain embodiments, the stream of the information definition based on collection
Amount control signal 6428 can be referred to as aggregated flows control signal.
Especially, in this example embodiment, purpose control module 2450 is configured to respond to two receiving queue-receiving queues
2442 and receiving queue 2446- is not useable at the receiving side of first order queue 2410 from the sending side of first order queue 2410
Transmit queue 2412 receives data, to define flow control signal 6428.In this embodiment, in response to from transmit queue 2412
The packet sent respectively via transmission path 6422 and transmission path 6424, receiving queue 2442 and receiving queue 2446 from
Upstate changes into down state.As shown in figure 24, transmission path 6422 includes transmit queue 2412, second level queue
Transmit queue 2422, physical link 2400 in 2420, receiving queue 2432 and receiving queue in second level queue 2420
2442.Transmission path 6424 includes transmit queue 2412, transmit queue 2422, physical link 2400, receiving queue 2432 and connect
Receive queue 2446.
In certain embodiments, flow control algorithm can be used for based on the information for being related to the unavailability of receiving queue 2442
And/or be related to the information of the unavailability of receiving queue 2446 and define flow control signal 6428.If for example, purpose controls mould
Block 2450 determines that receiving queue 2442 and receiving queue 2446 are not useable for the different periods, then purpose control module 2450 can
To be configured as defining flow control signal 6428 based on the different periods.For example, purpose control module 2450 can be via stream
Amount control signal 6428 asks the period of data transmission suspension one from transmit queue 2412, and the period is based on the different time
Section (for example, the period equal to different time sections average value, period equal to higher value in different time sections) is calculated.One
In a little embodiments, flow control signal 6428 can be based on the independent pause request (example for coming from the receiving side of first order queue 2410
Such as, the pause request associated with receiving queue 2442 and the pause request associated with receiving queue 2446) definition.
In certain embodiments, flow control signal 6428 can allow the period to define based on maximum or most I.One
In a little embodiments, flow control signal 6428 can be based on the collective data flow rate meter for coming from such as transmit queue 2412
Calculate.For example, time out section can be measured based on the collective data flow rate for coming from transmit queue 2412.In some embodiments
In, if for example, the data traffic speed for coming from transmit queue 2412 is higher than threshold value, time out section can be increased, with
And time out section can be reduced if the data traffic speed for coming from transmit queue 2412 is less than threshold value.
In certain embodiments, flow control algorithm can be configured as in definition and/or transmitted traffic control signal 6428
The specific period is waited before.Waiting period, which can be defined such that, to be related to transmit queue 2412 and can wait in section not
Flow control signal 6428 can be used to define with multiple pauses request that the time is received.In certain embodiments, during wait
Between at least one the pause request of section in response to being related to transmit queue 2412 received and be triggered.
In certain embodiments, flow control signal 6428 can be based on and each receiving queue in first order queue 2410
Associated priority valve is defined by flow control algorithm.If for example, receiving queue 2442 have than with receiving queue 2446
The higher priority valve of associated priority valve, then purpose control module 2450 can be configured as be based on and receiving queue 2442
Rather than the associated information definition flow control signal 6428 of receiving queue 2446.For example, flow control signal 6428 can base
In the time out section associated with receiving queue 2442 rather than the time out section definition associated with receiving queue 2446,
Because receiving queue 2442 has the higher priority valve of the priority valve more associated than with receiving queue 2446.
In certain embodiments, flow control signal 6428 can based on inside first order queue 2410 each reception
The associated attribute of queue is defined by flow control algorithm.For example, it is particular type queue that flow control signal 6428, which can be based on,
The receiving queue 2442 and/or receiving queue 2446 of (for example, then enter first to go out (LIFO) queue, FIFO (FIFO) queue) are fixed
Justice.In certain embodiments, flow control signal 6428 can be based on being configured as receiving specific type of data (for example, control number
According to/signal queue, media data/signal queue) receiving queue 2442 and/or receiving queue 2446 define.
Although it is not shown, the one or more control moulds associated with queue level (for example, first order queue 2410)
Block can be configured as sending information to different control modules, and the wherein information is used to define flow control signal.Different
Control module is relevant from different queue levels.For example, the pause request associated with receiving queue 2442 and and receiving queue
2446 relevant pause requests can be defined in purpose control module 2450.Pause request can be sent to and second level queue
The associated purpose control module (not shown) of 2420 receiving sides.Flow control signal (not shown) can with second level queue
Based on pause request and based on flow control algorithm definition at the associated purpose control module of 2420 receiving sides.
Flow control signal 6428 can be based on the flow control ring associated with first order queue 2410 (for example, the first order
Control ring) definition.One or more flow control signal (not shown) can also be based on the stream associated with second level queue 2420
Measure control ring and/or the flow control ring definition associated with physical link 2400.
The data transfer associated with transmit queue (except transmit queue 2412) in first order queue 2410 substantially not by
Flow control signal 6428 is limited, because being controlled to the data flow of receiving queue 2442 and 2446 based on first order flow control ring
System.Even if for example, from the data transmission suspension of transmit queue 2412, transmit queue 2414 can also be continued on through by transmit queue 2422
Send data.Even if for example, transmit queue 2414 can be configured as the data via transmit queue 2422 from transmit queue 2412
Transmission has timed out, moreover it is possible to send data to receiving queue 2448 via the transmission path 6426 including transmit queue 2422.
In some embodiments, though transmit queue 2422 can be configured as from queue 2412 via transmission path 6422 data transfer
Through being suspended based on flow control signal 6428, moreover it is possible to continue to send number from such as transmit queue 2416 to receiving queue 2442
According to.
, whereas if the data transfer to receiving queue 2442 and 2446 passes through based on the stream relevant with second level control ring
The (not shown) control of amount control signal is suspended via the data flow of transmit queue 2422, then (removes and come from transmit queue 2412
Data transfer outside) will also be limited via the data transfer of transmit queue 2422 from transmit queue 2414 and transmit queue 2416
System.It will be suspended from the data transfer of transmit queue 2422, and because it is associated with special services rank, and cause and for example exist
The data of congestion can be associated with special services rank at receiving queue 2442 and 2446.
The one or more parameter values defined within flow control signal 6428 can be stored in purpose control module
In 2450 memory 2452.In certain embodiments, after one or more parameter values are defined and/or when flow control
When signal 6428 is sent to source control module 2460, parameter value can be stored in the memory 2452 of purpose control module 2450
Place.The parameter value defined in flow control signal 6428 can be used for the state for tracking such as transmit queue 2412.For example, depositing
Entry in reservoir 2452 can indicate transmit queue 2412 in halted state (such as non-sent state).Entry can be based in stream
The time out section parameter value defined in amount control signal 6428 is defined.Section has timed, out between when pausing, the entry energy quilt
It is updated to indicate that the state of transmit queue 2412 has been changed to such as active state (for example sending state).Although it is not shown,
But in certain embodiments, one or more parameter values can be stored in the memory (example outside purpose control module 2450
Such as, remote memory) in.
In certain embodiments, it is stored in one or more of the memory 2452 of purpose control module 2450 parameter value
(for example, the status information defined based on one or more parameter values) can be used to determine additional stream by purpose control module 2450
Whether amount control signal (not shown) should be defined.In certain embodiments, one or more parameter values can be by purpose control
Module 2450 defines one or more additional flow control signals.
If for example, receiving queue 2442 in response to the first packet for being received from transmit queue 2412 from upstate
Down state (for example, congestion state) is changed into, then suspending can be via stream from the request of the data transfer of transmit queue 2412
Amount control signal 6428 is sent.Flow control signal 6428 can indicate that transmit queue 2412 is the request based on queue indicator
Target and can specify time out section.When flow control signal 6428 is sent to source control module 2460, with transmission
The associated time out section of queue 2412 and queue identifier can be stored in the memory 2452 of purpose control module 2450
In.Flow control signal 6428 by send after, receiving queue 2444 can in response to received from transmit queue 2412 second
Packet changes into congestion state from upstate (transmission path is not shown in fig. 24).In the number from transmit queue 2412
Before transmission pause, the second packet can be sent based on flow control signal 6428 from transmit queue 2412.Purpose control
Molding block 2450 can access the information being stored in memory 2452, and can be in response to having off status with receiving queue 2444
Change, to determine that the additional flow control signal that target is transmit queue 2412 should not be defined and be sent to source control module
2460, because flow control signal 6428 is sent.
In certain embodiments, source control module 2460 can be configured as temporary based on nearest flow control signal parameter value
Stop coming from the transmission of transmit queue 2412.For example, target for transmit queue 2412 flow control signal 6428 by
It is sent to after source control module 2460, target can be controlled for the slower flow control signal (not shown) of transmit queue 2412 in source
Received at molding block 2460.Source control module 2460 can be configured as performing one associated with subsequent flow control signal
Or multiple parameter values, rather than the parameter value associated with flow control signal 6428.In certain embodiments, slower flow control
Signal processed can trigger transmit queue 2412 maintain halted state keep a ratio indicated more in flow control signal 6428
The long or shorter period.
In certain embodiments, when the priority valve associated with one or more parameter values higher than (or less than) with and stream
When measuring the priority valve that the associated one or more parameter values of control signal 6428 are associated, source control module 2460 is alternatively
Perform one or more parameter values associated with slower flow control signal.In certain embodiments, each priority valve
It can be defined in purpose control module 2450, and each priority valve can be based on and one or more phases of receiving queue 2480
The priority valve definition of association.
In certain embodiments, flow control signal 6428 and slower flow control signal (are all that target is transmit queue
2412) the identical receiving queue for being responsive to come from receiving queue 2480 is unavailable and is defined.For example, slower flow control
Signal can include the undated parameter value defined by purpose control module 2450 based on receiving queue 2442, and receiving queue 2442 is not
One is maintained in upstate than being previously calculated the longer period.In certain embodiments, target is transmit queue 2412
Flow control signal 6428 can change state in response to one of receiving queue 2480 (can not for example, being changed into from upstate
With state) and be defined, and target can be in response to receiving queue 2480 for the slower flow control signal of transmit queue 2412
In another change state (for example, changing into down state from upstate) and be defined.
In certain embodiments, multiple flow control signals can be defined to pause from the in purpose control module 2450
The transmission of more than 2410 transmit queue of one-level queue.In certain embodiments, multiple transmit queues can receive team to independent
Row such as receiving queue 2444 sends data.In certain embodiments, to multiple transmit queues from first order queue 2410
The history of flow control signal can be stored in the memory 2452 of purpose control module 2450.In certain embodiments,
The slower flow control signal associated with independent receiving queue can the history based on flow control signal calculated.
In certain embodiments, the time out related to multiple transmit queues section can be grouped and be included in flow control
In system packet.For example, when the time out section associated with transmit queue 2412 and the pause associated with transmit queue 2414
Between section can be included in flow control packet (being also known as flow control packet).It is related to the more details of flow control packet
It will be described with reference to Figure 25.
Figure 25 is the schematic diagram for showing flow control packet according to one embodiment.Flow control packet includes head
2510th, afterbody 2520 and including the temporary of several transmit queues for being represented by queue identifier (ID) (in row 2514 show)
Stop the pay(useful) load 2530 of period parameter value (being shown in row 2512).As shown in figure 25, by queue ID 1 to V (i.e. queues
ID1 to queue IDV) transmit queue that represents each with time out section parameter value 1 to V, (i.e. the time out cycle 1 is to suspending
Time cycle V) it is associated.Time out section parameter value 2514 indicates the transmit queue represented by queue 2512 from being sent data
The period that (for example, forbidding) is undergone should be suspended.
In certain embodiments, flow control packet can be for example, purpose control module 2450 for example shown in Figure 24
Purpose control module at be defined.In certain embodiments, purpose control module can be configured as the time interval in rule
Define flow control packet.For example, purpose control module, which can be configured as every 10ms, defines a flow control packet.At some
In embodiment, when pausing between section parameter value when being calculated, and/or when pausing between section parameter value given number
When being calculated, purpose control module can be configured as defining flow control packet with random time.In certain embodiments, purpose
The status information that control module can be accessed based on for example one or more parameter values and/or by purpose control module determines at least one
Partial discharge control packet should not be defined and/or send.
Although it is not shown, in certain embodiments, multiple queue ID can be with independent time out cycle parameter value phase
Association.In certain embodiments, at least one queue ID can be associated with the parameter value in addition to time out section parameter value.
For example, queue ID can be associated with flow rate parameter value.Flow rate parameter value can indicate transmit queue (by queue ID tables
Show) flow rate (for example, maximum stream flow speed) of data should be sent.In certain embodiments, flow control packet can have
There are one or more means for being configured as indicating whether specific receiving queue can be used for reception data.
Flow control packet can be from purpose control module to source control module (such as source control module shown in Figure 24
2460) sent via flow control signal (such as the flow control signal 6428 shown in Figure 24).In certain embodiments, flow
Amount control packet can be defined based on the 2nd layer of (for example, the 2nd layer of osi model) agreement.In other words, flow control packet energy
The 2nd layer in network system is defined and is used wherein.In certain embodiments, flow control packet can with the 2nd layer
Sent between associated device (for example, mac device).
Referring again to Figure 25, the one or more parameter values associated with flow control signal 6428 are (for example, based on parameter
The status information of value definition) it can be stored in the memory 2562 of source control module 2560.In certain embodiments, flow is worked as
Control signal 6428 is when source control module 2560 is received, and one or more parameter values can be stored in source control module 2560
Memory 2562 in.Parameter value defined in flow control signal 6428 can be used to track one or more receiving queues
The state of 2580 (for example, receiving 2542).For example, the entry in memory 2562 can indicate that receiving queue 2542 is not useable for connecing
Receive data.The entry can be defined and with connecing based on the time out cycle parameter value defined in flow control signal 6428
Receiving the identifier (for example, queue identifier) of queue 2542 is associated.Section time-out between when pausing, the entry can be updated to refer to
Show that the state of receiving queue 2542 has been changed to such as active state.Although it is not shown, but in certain embodiments, one
Or multiple parameter values can be stored in the memory outside source control module 2560 (for example, remote memory).
In certain embodiments, one or more parameter values at the memory 2562 of source control module 2560 are stored in
(and/or status information) can be used to determine whether data should be sent to one or more reception teams by source control module 2560
Row 2580.For example, source control module 2560 can be configured as based on the state for being related to receiving queue 2544 and receiving queue 2542
Information sends data from transmit queue 2516 to receiving queue 2544 rather than receiving queue 2542.
In certain embodiments, source control module 2560 can analyze data transmission mode to determine whether data should be from one
Individual or multiple sources queue 2570 is sent to one or more receiving queues 2580.For example, source control module 2560 can be based on storage
Parameter value at the memory 2562 of source control module 2560 determines that transmit queue 2514 is sent relatively to receiving queue 2546
High data volume.Based on the determination, source control module 2560 can trigger queue 2516 to receiving queue 2548 rather than receive team
Row 2546 send data, because receiving queue 2546 receives high data volume from transmit queue 2514.Pass through analysis and transmit queue
Congestion at 2570 associated transmission modes, one or more receiving queues 2580 starts to be substantially avoided.
In certain embodiments, source control module 2560 can be analyzed and is stored at the memory 2562 of source control module 2560
Parameter value (and/or status information) to determine whether data should be sent to one or more receiving queues 2580.Pass through
The parameter value (and/or status information) of storage is analyzed, the congestion at one or more transmit queues 2580 starts can be basic
On be avoided by.For example, source control module 2560 can be based on compared to the history availability of receiving queue 2542 (for example, more preferably, more
Difference) the history availability of receiving queue 2540 carry out trigger data and be sent to receiving queue 2540 rather than receiving queue 2542.
In some embodiments, for example, source control module 2560 can be based on relevant data burst pattern compared to the history of receiving queue 2544
The historical performance of receiving queue 2542 of performance sends data to receiving queue 2542 rather than receiving queue 2544.In some implementations
In example, specific time window, certain types of net can be based on by being related to the parameter value analysis of one or more receiving queues 2580
Network processing (for example, inter-processor communication), special services rank etc..
In certain embodiments, purpose control module 2550 can send about receiving queue 2580 status information (for example,
Current state information), it can be used to determine whether data should be from one or more source queues 2570 by source control module 2560
Sent.For example, source control module 2560 can trigger queue 2514 sends data to queue 2544 rather than queue 2546, because
Queue 2546 has the more active volumes of ratio queue 2544 as indicated by purpose control module 2550.In some embodiments
In, current state information, transmission mode analysis and any combination of historical data analysis can be used to be essentially prevented or reduce
The possibility that the congestion of one or more receiving queues 2580 starts.
In certain embodiments, flow control signal 6428 can be from purpose control module 2550 via out-of-band transmission path quilt
It is sent to source control module 2560.For example, flow control signal 6428 can be via the Special chain for being related to the communication of flow control signaling
Road is sent.In certain embodiments, flow control signal 6428 can via the queue associated with second level queue 2520, with
The associated queue of first order queue 2510, and/or physical link 2500 are sent.
Some embodiments described herein are related to computer readable medium (being also known as processor readable medium)
Computer stores product, computer readable medium have have thereon instruction by performing the executable operation of various computers or based on
Calculation machine code.Medium and computer code (being also known as code) can be designed to and build for a specific purpose those
Medium and computer code.The example of computer readable medium includes, but is not limited to:Such as hard disk, floppy disk and tape
Magnetic storage media;Such as compact disk/Digital video disc (CD/DVD), compression compact disc-ROM (CD-ROM) and entirely
Cease the optical storage medium of device;The magnetic-light storage medium of such as CD;Carrier signal processing module;And be specifically configured to
Store and configuration processor code hardware unit, such as ASIC, programmable logic device (PLD), and read-only storage (ROM) and
Ram set.
The example of computer code includes, but is not limited to, microcode or microcommand, machine instruction, such as by collecting
Code that person produces, for producing web services, and include the high-level instructions performed by computer using translator
File.For example, embodiment can use Java, C++ or other programming languages (for example, programming language of object-oriented) and exploitation
Instrument is implemented.The additional examples of computer code include, but are not limited to control signal, encrypted code and compression code.
Although various embodiments have been described more than it should be appreciated that its merely by example rather than
The mode of limitation embodies, and can carry out the various change in form and details.Times of equipment and/or method described herein
Meaning part can be combined in any way, except mutually exclusive combination.The embodiments described herein can include the difference of description
Function, the various combinations of component and/or feature and/or the son of embodiment are combined.
Claims (47)
1. a kind of communication equipment, including:
Exchcange core, the exchcange core defines single logic entity and with the multistage being physically distributed across multiple frames
Switching fabric, the multilevel interchange frame has multiple input ports and multiple output ports, and the exchcange core is configured as
Multiple peripheral processors are couple to via the multiple input port and the multiple output port,
The exchcange core is configured as in the first peripheral processor for arranging to have the first frame and is arranged in the second frame
The second peripheral processor between provide clog-free connectivity with line rate, the exchcange core be configured as receiving with it is described
The first associated packet of first peripheral processor, the exchcange core is configured as based on associated with the described first packet
Cell, sequentially to second peripheral processor send second packet and to the 3rd peripheral processor send the 3rd
Packet, the multilevel interchange frame is configured as from the input port in the multiple input port into the output port
Output port sends the cell.
2. communication equipment as claimed in claim 1, has virtually wherein the multiple peripheral processor includes at least one
Change the peripheral processor and at least one peripheral processor without virtual resources of resource.
3. communication equipment as claimed in claim 1, wherein the number of the multiple input port and the multiple output port
More than 1000, each output port of each input port and the multiple output port in the multiple input port
It is both configured to operate with the speed for being not less than 10Gb/s.
4. communication equipment as claimed in claim 1, wherein:
First peripheral processor and second peripheral processor be memory node device, calculate node device,
One in service node device or router.
5. communication equipment as claimed in claim 1, wherein the exchcange core is configured as filling in second peripheral processes
Put between the 3rd peripheral processor with the clog-free connectivity of line rate offer.
6. communication equipment as claimed in claim 5, wherein:
First peripheral processor and the 3rd peripheral processor be memory node device, calculate node device,
One in service node device or router;And
Second peripheral processor is at least one in firewall device, intersecting detection means or load balance device.
7. a kind of communication equipment, including:
Exchcange core, the exchcange core has the multilevel interchange frame being physically distributed across multiple frames, the multistage friendship
Changing structure has multiple input ports and multiple output ports, and the exchcange core is configured as via the multiple input port
Multiple peripheral processors are couple to the multiple output port,
The exchcange core be configured as using line rate as the multiple peripheral processor in each peripheral processor
The connectedness of each remaining processing unit in the multiple peripheral processor is provided, so that the multiple output end
Each output port in mouthful can be by each peripheral processor in the multiple peripheral processor via described
An input port in multiple input ports is coequally accessed, the number of the multiple input port and the multiple output port
Mesh is more than each output end of each input port and the multiple output port in 1000, the multiple input port
Mouth is both configured to operate with the speed for being not less than 10Gb/s.
8. communication equipment as claimed in claim 7, wherein the multiple peripheral processor includes at least one via ether
The peripheral processor that net connection is couple to the exchcange core is couple to the friendship with least one via non-Ethernet connection
Change the peripheral processor of core.
9. communication equipment as claimed in claim 7, wherein the multiple peripheral processor, which includes at least one, uses the 3rd layer
The peripheral processor of route and at least one the 4th layer of peripheral processor to the 7th layer of device.
10. a kind of communication equipment, including:
Exchcange core, the exchcange core defines single logic entity and with multilevel interchange frame, and multistage exchange is tied
Structure has multiple levels being physically distributed across multiple frames, and the multiple level has multiple input ports and multiple outputs jointly
Port, the exchcange core is configured as being couple to multiple peripheries via the multiple input port and the multiple output port
Processing unit,
The transmission that the exchcange core is configured as the multiple cells associated with packet can be essentially ensures that without logical
When crossing the loss of the multilevel interchange frame, it is allowed to the input port that the multiple cell enters in the multiple input port.
11. communication equipment as claimed in claim 10, wherein the multiple peripheral processor includes being configured as and optical fiber
Channel agreement communication the first peripheral processor and be configured as with fiber channel covering Ethernet protocol communicate second
Peripheral processor.
12. communication equipment as claimed in claim 10, wherein being configured to determine that property of the multilevel interchange frame network.
13. communication equipment as claimed in claim 10, wherein being configured to determine that property of the multilevel interchange frame network, so that
When the multiple cell can be sent to an output port in the multiple output port in the scheduled time, the multistage
Switching fabric allows the packet to enter input port.
14. communication equipment as claimed in claim 10, wherein:
The exchcange core is configured as the first output port and into the multiple output port from the input port
Two output ports send multiple cells associated with the packet, without in multiple levels of the multilevel interchange frame
At least one-level at perform packet loss processing.
15. communication equipment as claimed in claim 10, wherein:
The exchcange core is couple to the multistage including multiple via the multiple input port and the multiple output port
The edge device of switching fabric, the multiple edge device is couple to the multiple peripheral processor, and the multiple edge is set
Each edge device in standby is configured as receiving described be grouped and based on the multiple cell of the packet definition.
16. communication equipment as claimed in claim 10, wherein:
The exchcange core is configured as multiple levels via the multilevel interchange frame from the input port to the multiple
An output port in output port sends multiple cells associated with the packet, without in the multiple level
At least one-level at perform packet loss processing.
17. a kind of communication equipment, including:
Exchcange core, the exchcange core defines single logic entity and with multilevel interchange frame, and multistage exchange is tied
Structure has multiple levels being physically distributed across multiple frames, and the multilevel interchange frame has multiple input ports and multiple defeated
Exit port, the exchcange core is configured as being couple to outside multiple via the multiple input port and the multiple output port
Enclose processing unit,
The exchcange core is configured as receiving packet, the exchcange core quilt from the input port in the multiple input port
It is configured to send multiple and institute via output port of the multiple level from the input port into the multiple output port
The associated cell of packet is stated, is damaged without performing packet at least one-level in multiple levels of the multilevel interchange frame
Consumption processing.
18. communication equipment as claimed in claim 17, wherein being configured to determine that property of the multilevel interchange frame network, so that
Only when the transmission for the multiple cells associated with packet that can be essentially ensures that in the switching fabric is lossless,
Just allow the packet of the input port in the multiple input port.
19. communication equipment as claimed in claim 17, wherein:
The output port is the first output port,
The exchcange core is configured as first output port into the multiple output port from the input port
Sent and the multiple cell associated with the packet with the second output port.
20. communication equipment as claimed in claim 17, wherein:
The exchcange core is couple to the multistage including multiple via the multiple input port and the multiple output port
The edge device of switching fabric, the multiple edge device is couple to the multiple peripheral processor, and the multiple edge is set
Each edge device in standby is configured as receiving described be grouped and based on the multiple cell of the packet definition.
21. a kind of communication equipment, including:
Exchcange core, the exchcange core defines single logic entity and the multistage exchange with being configured to determine that property network
Structure, the multilevel interchange frame has a multiple input ports and multiple output ports, the exchcange core be configured as via
The multiple input port and the multiple output port are couple to multiple peripheral processors,
The exchcange core is configured as receiving packet, the exchcange core quilt from the input port in the multiple input port
It is configured to the output port from the input port into the multiple output port and sends multiple associated with the packet
Cell.
22. communication equipment as claimed in claim 21, wherein the multilevel interchange frame is physically distributed across multiple frames.
23. communication equipment as claimed in claim 21, wherein being configured to determine that property of the multilevel interchange frame network, so that
Only when the transmission for the multiple cells associated with packet that can be essentially ensures that in the multilevel interchange frame is lossless
It is time-consuming, just allow the packet of the input port in the multiple input port.
24. communication equipment as claimed in claim 21, wherein being configured to determine that property of the multilevel interchange frame network, so that
An output in the multiple output port can be sent in the scheduled time when the multiple cell associated with packet
During port, it is allowed to the packet of the input port in the multiple input port.
25. communication equipment as claimed in claim 21, wherein:
The exchcange core is couple to the multistage including multiple via the multiple input port and the multiple output port
The edge device of switching fabric, the multiple edge device is couple to the multiple peripheral processor, and the multiple edge is set
Each edge device in standby is configured as receiving described be grouped and based on the multiple cell of the packet definition.
26. communication equipment as claimed in claim 21, wherein:
The exchcange core is configured as multiple levels via the multilevel interchange frame from the input port to the output
Port sends multiple cells associated with the packet, without performing packet at least one-level in the multiple level
Loss is handled.
27. a kind of communication equipment, including:
Exchcange core, the exchcange core has the multilevel interchange frame being physically distributed between multiple frames, described many
Level switching fabric has multiple input buffers and multiple output ports, and the exchcange core is configured to couple to multiple edges
Equipment;With
The control for not needing software during operation and realizing and needed during configuration and monitoring software to realize with hardware
Device, the controller is couple to the multiple input buffer and the multiple output port, and the controller is configured to work as
When the congestion at an output port in multiple output ports is foreseen and it occurs for the congestion in the exchcange core
Before, an input buffer transmitted traffic control signal into the multiple input buffer.
28. communication equipment as claimed in claim 27, wherein the controller is configured as independently of for the exchange core
Flow is controlled in the structure of the multilevel interchange frame of the heart, the input buffer and the output port is performed end-to-end
Flow is controlled.
29. communication equipment as claimed in claim 27, wherein the controller is configured as independently of for the multiple side
The flow control of edge equipment, End-to-end flow control is performed to the input buffer and the output port.
30. communication equipment as claimed in claim 27, further comprises:
Multiple peripheral processors for being configured to couple to the multiple edge device,
The controller is configured as independently of the flow control for the multiple edge device, to the input buffer and
The output port performs End-to-end flow control.
31. communication equipment as claimed in claim 27, wherein the controller is configured as performing End-to-end flow control, from
And cell is buffered at the input buffer for a period of time being sent to before the output port, the time and institute
Stating End-to-end flow control is associated.
32. communication equipment as claimed in claim 27, wherein the controller is configured as independently of in the multistage exchange
At one level of structure cache cell section and independently of at an edge device in the multiple edge device cache
Packet, at the input buffer cache cell perform End-to-end flow control.
33. communication equipment as claimed in claim 27, wherein the controller is configured as independently of associated with Ethernet
Flow control mechanism, at the input buffer cache cell perform End-to-end flow control.
34. a kind of communication equipment, including:
Exchcange core, the exchcange core has the multilevel interchange frame being physically distributed between multiple frames, described many
Level switching fabric is configured as receiving multiple cells associated with packet and is configured as being based on the multiple cell switching
Multiple cell sections;
An edge device in multiple edge devices for being couple to the exchcange core, the edge device is configured as receiving
The packet, the edge device is configured to send the multiple cell to the multilevel interchange frame;With
The controller of the multilevel interchange frame is couple to, the controller is configured as setting independently of for the multiple edge
Standby flow is controlled and controlled for flow in the structure of the multilevel interchange frame, to the multiple cell traffic control
System.
35. communication equipment as claimed in claim 34, wherein:
The controller is not needed software and is realized with hardware and need software real during configuration and monitoring during operation
It is existing.
36. communication equipment as claimed in claim 34, wherein:
The multilevel interchange frame has multiple input buffers and multiple output ports,
When the congestion that the controller is configured as at an output port in the multiple output port is foreseen with
And before the congestion in the exchcange core occurs, an input buffer into the multiple input buffer sends stream
Measure control signal.
37. communication equipment as claimed in claim 34, wherein:
The multilevel interchange frame has multiple input buffers and multiple output ports,
The controller is configured as independently of the flow control mechanism associated with Ethernet, to being buffered in the multiple input
The cell cached at an input buffer in device performs End-to-end flow control.
38. a kind of communication equipment, including:
Exchcange core, the exchcange core has multilevel interchange frame;
More than first peripheral processor, the multilevel interchange frame, described are couple to by multiple connections with agreement
Each peripheral processor in more than one peripheral processor is the memory node with virtual resources, more than described first
The virtual storage resource that the virtual resources common definition of individual peripheral processor is interconnected by the exchcange core;With
More than second peripheral processor, the multilevel interchange frame, described are couple to by multiple connections with agreement
Each peripheral processor in more than two peripheral processor is the memory node with virtual resources, more than described second
The virtual computing resource that the virtual resources common definition of individual peripheral processor is interconnected by the exchcange core.
39. communication equipment as claimed in claim 38, wherein:
Each peripheral processor in more than first peripheral processor has virtual resources, more than described first
Each peripheral processor in peripheral processor is configured such that its virtual resources can be by from described first
The virtual resource of remaining peripheral processor in multiple peripheral processors is substituted;And
Each peripheral processor in more than second peripheral processor has virtual resources, more than described second
Each peripheral processor in peripheral processor is configured such that its virtual resources can be by from described second
The virtual resource of remaining peripheral processor in multiple peripheral processors is substituted.
40. communication equipment as claimed in claim 38, wherein:
More than first peripheral processor is associated and associated with security protocol with based on packet communication protocol;And
More than second peripheral processor is associated and associated with security protocol with based on packet communication protocol.
41. a kind of communication equipment, including:
Exchcange core, the exchcange core has multilevel interchange frame, and the exchcange core is configured as being logically divided into
One virtual switch core and the second virtual switch core;
Multiple peripheral processors for being couple to the multilevel interchange frame, the multiple peripheral processor has operationally
It is couple to the first peripheral processor subset of the first virtual switch core and to be operably coupled to described second virtual
Second peripheral processor subset of exchcange core.
42. communication equipment as claimed in claim 41, wherein:
The exchcange core be configured such that the first virtual switch core and the second virtual switch core independently of
Manage to being managed property each other.
43. communication equipment as claimed in claim 41, wherein:
The exchcange core is configured such that the first virtual switch core has independently of the second virtual switch core
The bandwidth of the bandwidth of the heart.
44. communication equipment as claimed in claim 41, wherein:
The exchcange core is configured such that the first virtual switch core has and the second virtual switch core
Bandwidth and the independent bandwidth of managerial management and managerial management.
45. communication equipment as claimed in claim 41, wherein:
The exchcange core is configured such that the first virtual switch core is operated using l2 protocol, and described second
Virtual switch core is operated using l2 protocol and layer-3 protocol.
46. communication equipment as claimed in claim 41, wherein:
The first peripheral processor subset has virtual resource, and the second peripheral processor subset has virtual money
Source.
47. communication equipment as claimed in claim 41, wherein:
The first peripheral processor subset is included in being calculate node, memory node, service node device and router
The peripheral processor of one, and including being remaining in calculate node, memory node, service node device and router
The peripheral processor of one;And
The second peripheral processor subset is included in being calculate node, memory node, service node device and router
The peripheral processor of one, and including being remaining in calculate node, memory node, service node device and router
The peripheral processor of one.
Applications Claiming Priority (25)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US9620908P | 2008-09-11 | 2008-09-11 | |
US61/096,209 | 2008-09-11 | ||
US9851608P | 2008-09-19 | 2008-09-19 | |
US61/098,516 | 2008-09-19 | ||
US12/242,230 | 2008-09-30 | ||
US12/242,224 US8154996B2 (en) | 2008-09-11 | 2008-09-30 | Methods and apparatus for flow control associated with multi-staged queues |
US12/242,224 | 2008-09-30 | ||
US12/242,230 US8218442B2 (en) | 2008-09-11 | 2008-09-30 | Methods and apparatus for flow-controllable multi-staged queues |
US12/343,728 US8325749B2 (en) | 2008-12-24 | 2008-12-24 | Methods and apparatus for transmission of groups of cells via a switch fabric |
US12/343,728 | 2008-12-24 | ||
US12/345,502 | 2008-12-29 | ||
US12/345,502 US8804711B2 (en) | 2008-12-29 | 2008-12-29 | Methods and apparatus related to a modular switch architecture |
US12/345,500 US8804710B2 (en) | 2008-12-29 | 2008-12-29 | System architecture for a scalable and distributed multi-stage switch fabric |
US12/345,500 | 2008-12-29 | ||
US12/495,337 | 2009-06-30 | ||
US12/495,364 US9847953B2 (en) | 2008-09-11 | 2009-06-30 | Methods and apparatus related to virtualization of data center resources |
US12/495,361 US8755396B2 (en) | 2008-09-11 | 2009-06-30 | Methods and apparatus related to flow control within a data center switch fabric |
US12/495,337 US8730954B2 (en) | 2008-09-11 | 2009-06-30 | Methods and apparatus related to any-to-any connectivity within a data center |
US12/495,358 | 2009-06-30 | ||
US12/495,344 US20100061367A1 (en) | 2008-09-11 | 2009-06-30 | Methods and apparatus related to lossless operation within a data center |
US12/495,344 | 2009-06-30 | ||
US12/495,364 | 2009-06-30 | ||
US12/495,358 US8335213B2 (en) | 2008-09-11 | 2009-06-30 | Methods and apparatus related to low latency within a data center |
US12/495,361 | 2009-06-30 | ||
CN200910246898.XA CN101917331B (en) | 2008-09-11 | 2009-09-11 | Systems, methods, and apparatus for a data centre |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910246898.XA Division CN101917331B (en) | 2008-09-11 | 2009-09-11 | Systems, methods, and apparatus for a data centre |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103916326A CN103916326A (en) | 2014-07-09 |
CN103916326B true CN103916326B (en) | 2017-10-31 |
Family
ID=43324725
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910246898.XA Active CN101917331B (en) | 2008-09-11 | 2009-09-11 | Systems, methods, and apparatus for a data centre |
CN201410138824.5A Active CN103916326B (en) | 2008-09-11 | 2009-09-11 | System, method and equipment for data center |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910246898.XA Active CN101917331B (en) | 2008-09-11 | 2009-09-11 | Systems, methods, and apparatus for a data centre |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN101917331B (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102420775A (en) * | 2012-01-10 | 2012-04-18 | 西安电子科技大学 | Routing method for module-expansion-based data center network topology system |
US9094308B2 (en) | 2012-06-06 | 2015-07-28 | Juniper Networks, Inc. | Finding latency through a physical network in a virtualized network |
US8750288B2 (en) * | 2012-06-06 | 2014-06-10 | Juniper Networks, Inc. | Physical path determination for virtual network packet flows |
CN103023803B (en) * | 2012-12-12 | 2015-05-20 | 华中科技大学 | Method and system for optimizing virtual links of fiber channel over Ethernet |
CN104871145A (en) * | 2012-12-20 | 2015-08-26 | 马维尔国际贸易有限公司 | Memory sharing in network device |
US9419892B2 (en) * | 2013-09-30 | 2016-08-16 | Juniper Networks, Inc. | Methods and apparatus for implementing connectivity between edge devices via a switch fabric |
US9787559B1 (en) | 2014-03-28 | 2017-10-10 | Juniper Networks, Inc. | End-to-end monitoring of overlay networks providing virtualized network services |
CN105099939A (en) * | 2014-04-23 | 2015-11-25 | 株式会社日立制作所 | Method and device for implementing flow control among different data centers |
CN105577575B (en) * | 2014-10-22 | 2019-09-17 | 深圳市中兴微电子技术有限公司 | A kind of chainlink control method and device |
CN107104871B (en) * | 2016-02-22 | 2021-11-19 | 中兴通讯股份有限公司 | Subnet intercommunication method and device |
CN105827544B (en) * | 2016-03-14 | 2019-01-22 | 烽火通信科技股份有限公司 | A kind of jamming control method and device for multistage CLOS system |
CN107276908B (en) * | 2016-04-07 | 2021-06-11 | 深圳市中兴微电子技术有限公司 | Routing information processing method and packet switching equipment |
US10243840B2 (en) * | 2017-03-01 | 2019-03-26 | Juniper Networks, Inc. | Network interface card switching for virtual networks |
CN113099488B (en) * | 2019-12-23 | 2024-04-09 | 中国移动通信集团陕西有限公司 | Method, device, computing equipment and computer storage medium for solving network congestion |
US11323312B1 (en) | 2020-11-25 | 2022-05-03 | Juniper Networks, Inc. | Software-defined network monitoring and fault localization |
CN113595935A (en) * | 2021-07-20 | 2021-11-02 | 锐捷网络股份有限公司 | Data center switch architecture and data center |
CN113961628B (en) * | 2021-12-20 | 2022-03-22 | 广州市腾嘉自动化仪表有限公司 | Distributed data analysis control system |
CN115225589A (en) * | 2022-07-17 | 2022-10-21 | 奕德(广州)科技有限公司 | CrossPoint switching method based on virtual packet switching |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5457682A (en) * | 1993-05-05 | 1995-10-10 | At&T Ipm Corp. | Apparatus and method for supporting a line group apparatus remote from a line unit |
US5945922A (en) * | 1996-09-06 | 1999-08-31 | Lucent Technologies Inc. | Widesense nonblocking switching networks |
CN1084579C (en) * | 1997-03-27 | 2002-05-08 | 上海贝尔电话设备制造有限公司 | S12 exchanger timing supply method and system thereof |
JP2001313660A (en) * | 2000-02-21 | 2001-11-09 | Nippon Telegr & Teleph Corp <Ntt> | Wavelength multiplexed optical network |
US7420969B2 (en) * | 2000-11-29 | 2008-09-02 | Rmi Corporation | Network switch with a parallel shared memory |
US6567576B2 (en) * | 2001-02-05 | 2003-05-20 | Jds Uniphase Inc. | Optical switch matrix with failure protection |
CN101132286B (en) * | 2006-08-21 | 2012-10-03 | 丛林网络公司 | Multi-chassis router with multiplexed optical interconnects |
-
2009
- 2009-09-11 CN CN200910246898.XA patent/CN101917331B/en active Active
- 2009-09-11 CN CN201410138824.5A patent/CN103916326B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN101917331B (en) | 2014-05-07 |
CN103916326A (en) | 2014-07-09 |
CN101917331A (en) | 2010-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103916326B (en) | System, method and equipment for data center | |
US11451491B2 (en) | Methods and apparatus related to virtualization of data center resources | |
US10454849B2 (en) | Methods and apparatus related to a flexible data center security architecture | |
CN105721358B (en) | The method and apparatus in multi-hop distributed controll face and single-hop data surface switching fabric system | |
US8335213B2 (en) | Methods and apparatus related to low latency within a data center | |
US8755396B2 (en) | Methods and apparatus related to flow control within a data center switch fabric | |
Baransel et al. | Routing in multihop packet switching networks: Gb/s challenge | |
CN103534997B (en) | For lossless Ethernet based on port and the flow-control mechanism of priority | |
CN104272653B (en) | Congestion control in grouped data networking | |
CN105323185B (en) | Method and apparatus for flow control relevant to switch architecture | |
CN106899503B (en) | A kind of route selection method and network manager of data center network | |
US20100061394A1 (en) | Methods and apparatus related to any-to-any connectivity within a data center | |
CN103516632B (en) | Methods and apparatus for providing services in a distributed switch | |
US20100061391A1 (en) | Methods and apparatus related to a low cost data center architecture | |
US20100061367A1 (en) | Methods and apparatus related to lossless operation within a data center | |
CN107819695A (en) | A kind of distributed AC servo system SiteServer LBS and method based on SDN | |
CN105187331B (en) | The system of dynamic resource management in the distributed control planes of interchanger | |
EP2557742A1 (en) | Systems, methods, and apparatus for a data centre | |
CN101697524A (en) | Relay method and device in switch | |
US20220150185A1 (en) | Methods and apparatus related to a flexible data center security architecture | |
CN209692803U (en) | SDN exchange network based on fat tree construction | |
Robles-Gomez et al. | A complete topology management mechanism for the Advanced Switching Interconnect technology | |
CN109743266A (en) | SDN exchange network based on fat tree construction | |
Robles-Gomez et al. | Evaluation of a Fabric Management Mechanism for Advanced Switching in Presence of Traffic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |