CN103444133A - Performance and power optimized computer system architecture and leveraging power optimized tree fabric interconnecting - Google Patents

Performance and power optimized computer system architecture and leveraging power optimized tree fabric interconnecting Download PDF

Info

Publication number
CN103444133A
CN103444133A CN2011800553292A CN201180055329A CN103444133A CN 103444133 A CN103444133 A CN 103444133A CN 2011800553292 A CN2011800553292 A CN 2011800553292A CN 201180055329 A CN201180055329 A CN 201180055329A CN 103444133 A CN103444133 A CN 103444133A
Authority
CN
China
Prior art keywords
calculation element
server
fabric switch
node
switch machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011800553292A
Other languages
Chinese (zh)
Inventor
M.B.戴维斯
D.J.博尔兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Silicon Valley Bank Inc
III Holdings 2 LLC
Original Assignee
Calxeda Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/234,054 external-priority patent/US9876735B2/en
Application filed by Calxeda Inc filed Critical Calxeda Inc
Priority to CN201610113343.8A priority Critical patent/CN105743819B/en
Publication of CN103444133A publication Critical patent/CN103444133A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/101Packet switching elements characterised by the switching fabric construction using crossbar or matrix
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/40Constructional details, e.g. power supply, mechanical construction or backplane

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Small-Scale Networks (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a performance and power optimized computer system architecture, and a leveraging power optimized tree fabric interconnecting. In one embodiment, low power server clusters with a fabric of tiled building blocks are constructed. In another embodiment, implementing storage solutions or cooling solutions are achieved. In another embodiment, the fabric is used to switch non-Ethernet packets, and switch multiple protocols for network processors and other devices.

Description

Performance and power optimization computer system architecture and the method for using the power optimization tree structure to interconnect
prioity claim/related application
The application requires the u.s. patent application serial number 12/794 that the title submitted on June 7th, 2010 is " System and Method for High-Performance; Low-Power DataCenter Interconnect Fabric(is for the system and method for high-performance, low-power data hub interconnection structure) " according to 35 USC 120,996 preferred rights and interests, it is incorporated into this paper in full by reference.In addition, present patent application is carried out the U.S. Provisional Patent Application sequence number 61/383 that the title submitted in requirement on September 16th, 2010 is " Performance and Power Optimized Computer System Architectures and Methods Leveraging Power Optimized Tree Fabric Interconnect(performance and power optimization computer system architecture and use the method for power optimization table structure interconnection) " according to 35 USC 119 (e) and 120,585 rights and interests, it is incorporated into this paper in full by reference.
Background technology
Fig. 1 and Fig. 2 illustrate current well-known conventional data centers network polymerization (aggregation).Fig. 1 illustrates the sketch of typical network data center framework 100, wherein, top switch 101a-n is in frame (rack) 102a-n top, its mid frame 102a-n is equipped with and has interted the blade server 107a-n of local router 103a-f, 105a-b, and supplementary bay unit 108a-n comprises Additional servers 104e-k and router one 06a-g.Fig. 2 illustrates the exemplary physical view 110 of the system with the external services device 111a-bn arranged around edge router system 112a-h, and wherein edge router system 112a-h places around the core switching system 113 that is positioned at center.Usually, such polymerization 110 has the 1Gb Ethernet of the frame top switch from rack server to them, and usually has the 10Gb ethernet port to edge and core router.
The accompanying drawing explanation
Fig. 1 and Fig. 2 illustrate the polymerization of typical data central site network;
Fig. 3 illustrates the network polymerization according to the use server of an embodiment;
Fig. 4 illustrates according to the data center in the frame of an embodiment;
Fig. 5 illustrates the advanced topologies of the network system with switching fabric;
Fig. 6 illustrates the server board that forms a plurality of server nodes that interconnect with the point-to-point interconnection of describing;
Fig. 6 a-6c illustrates another example of structural topology;
Fig. 7 illustrates the example of the passive backplane that is connected to one or more gusset plates and two polymeric plate;
Fig. 8 illustrates the example across machine frame (shelves) expansion structure and cross-server frame link machine frame;
Fig. 9 a illustrates have the disk topography size exemplary server 700 of (form factor);
Fig. 9 b and Fig. 9 c illustrate the exemplary array according to the disk-server combination of the use storage server 1 node SATA plate of an embodiment;
Fig. 9 d illustrates 3.5 inches drivers of standard;
Fig. 9 e illustrates the realization of a plurality of server nodes of 3.5 inches disc driver overall dimensions of standard;
Figure 10 illustrates the realization with the server of the Deep integrating of memory;
Figure 11 illustrates the realization that the existing 3.5 inches JBOD memory packs of utilization (leverage) come intensive sealed storage device and server;
Figure 12 illustrates the realization of usining in the identical appearance size of 2.5 inches drivers as the server node of example;
Figure 13 illustrates the cooling realization of frame chimney;
Figure 13 a illustrates the exemplary illustration of the thermal convection of the cooling middle use of chimney frame shown in Figure 13;
Figure 14 illustrates relative to each other obliquely (diagonally) and places so that minimum server node is reduced in the spontaneous heating of cross-server node;
Figure 15 illustrates demonstration 16 node systems according to an embodiment, and wherein heat wave rises from printed circuit board (PCB);
Figure 16 illustrates the more high density variant of 16 node systems, wherein has and is similarly arranged so that minimum node is reduced in the spontaneous heating of cross-node;
Figure 17 illustrates the inside structure of server node fabric switch machine;
Figure 18 illustrates the server node that comprises the PCIe controller that is connected to the internal cpu bus structure;
Figure 18 a illustrates the system with a plurality of protocol bridge of using the fabric switch machine;
Figure 19 illustrates the integrated of server architecture and network processing unit;
Fabric switch machine and the FPGA provided such as the service of IP virtual server (IPVS) is provided Figure 20;
Figure 21 illustrates OpenFlow stream is processed to the mode in the Calxeda structure that is building up to;
Figure 22 illustrates and via PCIe, power optimization fabric switch machine is integrated into to an example of existing processor; And
Figure 23 illustrates and via Ethernet, power optimization fabric switch machine is integrated into to an example of existing processor.
Embodiment
Disclose performance and power optimization computer system architecture and used the method for power optimization tree structure interconnection.An embodiment builds to use has the low-power server cluster that tiling builds the structure of piece, and another embodiment realizes storage solution or cooling solution.Another embodiment exchanges other thing by structure.
Co-pending patent application 12/794,996 describe the framework of supporting to come with tree-shaped or graph topology the power optimization server communication structure of route, tree-shaped or graph topology is supported a plurality of links of every node, and wherein in topology, each link is designated as upper and lower or horizontal link.This system is used segmentation MAC framework, and this framework can have recycling (re-purpose) MAC IP address with for internal mac and outside MAC, and utilization is normally for the method for the physics signaling of MAC feed-in switch.Calxeda XAUI system interconnection reduce frame power, reduce the wire of frame and dwindle the size of frame.Not for high power, the Ethernet switch of costliness and the needs of the high power Ethernet Phys on alone server.It sharply reduces cable (cable complexity, cost, great the source of trouble).It also realizes that the heterogeneous server of frame inside mixes, to support to use any equipment of Ethernet or SATA or PCIe.In this framework, power is saved mainly from the aspect on two frameworks: 1) minimize the Ethernet Phys across structure, with the point-to-point XAUI between node, interconnect to substitute them, and 2) dynamically adjust XAUI width and the speed of link based on load.
Fig. 3 illustrates network polymerization 200.10Gb/sec ethernet communication 201 (thick line) between this network support aggregation router 202 and three frame 203a-c.In frame 203a, the Calxeda interconnection structure provides a plurality of high speed 10Gb paths between the server 206a-d on the machine frame in frame, and a plurality of high speed 10Gb paths mean with thick line.Embedded switch in server 206a-d can substitute the frame top switch, thereby saves a large amount of power and cost, still to aggregation router, provides the 10Gb ethernet port simultaneously.The Calxeda switching fabric can be integrated into traditional ethernet (1Gb or 10Gb) in Calxeda XAUI structure, and the Calxeda server can serve as the frame top switch for the server that third party's Ethernet connects.
Intermediate stand 203b illustrates another kind of situation, and wherein Calxeda server 206e, f can be integrated in the available data center frame that comprises frame top switch 208a.In this case, IT group can continue to make their other server upwards to be connected to existing frame top switch via the 1Gb Ethernet.The Calxeda internal server can be connected via Calxeda 10Gb XAUI structure, and they can enough 1Gb or the 10Gb ethernet interconnect upwards be integrated into existing frame top switch.The frame 203c on right side is the previous mode of working as that racks of data centers is disposed traditionally.The Thin Red Line means the 1Gb Ethernet.Therefore, the current deployment of racks of data centers is traditional 1Gb Ethernet upward to frame top switch 308b, is then out to the thick red line 201 of 10Gb(of aggregation router from the frame top switch).Note, Servers-all exists with unknown quantity, although for clarity and brevity, they here illustrate with finite quantity.In addition, use to strengthen the Calxeda server, do not need additional router, because following their the XAUI switching fabric of discussing of their operations.
Fig. 4 illustrates the sketch plan according to the demonstration of an embodiment " data center in frame " 400.It has 10Gb ethernet PHY 401a-n and 1Gb private ethernet PHY 402.Mainframe computer (power (power) server) 403a-n supports search, data mining, index, Hadoop, Java software frame, MapReduce, by Google, introduces to support software frame to the Distributed Calculation of the large data collection on the cluster of computer, cloud application etc.Computer (server) 404a-n with local flash memory and/or solid magnetic disc (SSD) supports search, MySQL, CDN, software to serve (SaaS), cloud application etc.Single, large-scale, that fan 405 increases the convection current of vertically arranged server thereon at a slow speed is cooling.Data center 400 has the array 406 of the hard disk that for example adopts JBOD (JBOD) configuration, and the Calxeda server (the green box in array 406 and 407) in the disk topography size alternatively, serves as alternatively Magnetic Disk Controler.Hard disk server or Calxeda disk server can be used for Web server, user's application and cloud application etc.What also illustrate is the array 407 of storage server and has history server 408a, the b (any size, any supplier) for the standard ethernet interface of legacy application.
Fig. 5 illustrates the advanced topologies 500 of the network system of describing in common patent application co-pending 12/794,996, and it illustrates the XAUI connected by switching fabric and connects the SoC node.10Gb ethernet port Eth0 501a and Eth1 501b are from the top of tree.Oval 502a-n is the Calxeda node that comprises computation processor and embedded switch.Node has 5 XAUI links that are connected to the inner exchanging machine.The exchange layer is used for exchange by whole 5 XAUI links.The 0th grade of leaf node 502d, e are (, N0n node or Nxy, wherein x=level and y=item number) only with 1 XAUI link, be attached to interconnection, can be used as to stay 4 high-speed ports that XAUI, 10Gb Ethernet, the confessions such as PCIe, SATA are attached to I/O.Tree and the great majority of fat tree only have the active node as leaf node, and other node is pure switching node.This mode makes route much more direct.Topology 500 has each node of allowance as combination calculating and switching node or the only in return flexibility of node.Most of tree type realizes having I/O on leaf node, but topology 500 makes I/O on any node.In general, the top that Ethernet is placed on to tree makes to minimize to the average number of hops of Ethernet.
Build piece with tiling and build power optimization server architecture plate
Fig. 6 illustrates the server board that forms a plurality of server nodes that interconnect with the point-to-point interconnection of describing.Server board has:
Each of ellipse in-this sketch is the stand-alone server node that comprises processor, memory, I/O and fabric switch machine.
-fabric switch facility have independently dynamically revises the width (number of channels) of each passage and the ability of speed for each link.
-14 gusset plate examples illustrate two Ethernets of self-structure to move back mouthful (escapes).Usually these Ethernets are moved back to mouth and be routed to standard ethernet switch or router.It can be standard 1Gb or 10Gb Ethernet that these Ethernets move back mouth.
-14 node sample topologys are the fat trees of butterfly, and it provides redundant via to allow the self adaptation route to walk around the fault route and to walk around the hot localised points route.
-3 node aggregation device plates allow only to use two pan tile sheets (tile) to form the large server structure.
-for redundancy, add the second polymerizer
-I/O:
-for the PCIe connector of smooth-stone structure
-optional Ethernet support (disconnect, 1,2,5,10 or 20Gbs)
-based on application, the Ethernet of required bandwidth is judged
Node on-polymerizer plate can be exactly switching node or the full computing node that comprises exchange.
The I/O of-plate can be the PCIe connector of supporting 2 x4 XAUI (2 smooth-stone fabric link) and/or optional Ethernet support (disconnect, 1,2,10 or 20Gbs).
-as the example structure topology of 14 node examples makes to cross over, the outside number of links of plate minimizes so that connector (size and number) and relevant cost reduce to minimum, still retains that Ethernet moves back mouthful and the multichannel redundancy simultaneously.
-two polymerizer plates can be used to realize the path redundancy degree when expansion structure.
-power is saved and can be configured to realize by enough static links
Low-level nodes in-Tu (being labeled as leaf node) can be moved with 1Gb/sec.
Ground floor switching node in-Tu (being labeled as the 1st layer switch) has the Incoming bandwidth from the 3Gb/sec of leaf node.The static link configuration of 2.5 or 5Gb/sec between this 1st layer of permission and the 2nd layer switch.
The abducent link of-2 layer switch layer can move with 10Gb/sec.
-in this topology, because most of nodes are leaf nodes, most of links are just operating in slow rate (being 1Gb/sec in this example), thereby the power consumption that makes to network reduces to minimum.
-permission Ethernet moves back any node of mouth in structure and is pulled (pull) to allow the compromise Ethernet of structural design personnel to move back the required bandwidth of mouth, the port number of frame top switch use and cost and the power associated with ethernet port.
-power is saved and can be used the dynamic link configuration driven to be further optimized via link.In this example, each link and the associated ports of fabric switch machine comprise bandwidth counter, wherein have and allow to use based on dynamic link the configurable threshold event that link width and speed are reconfigured up and down.
-because in many common server service conditions, Ethernet service is mainly node to external ethernet rather than node to node, so the tree structure proposed structure, and especially the fat tree example of butterfly makes to stride across structure to the jumping figure of Ethernet and minimizes, thereby the stand-by period is minimized.This permission creates the large-scale low latency structure to Ethernet, uses simultaneously and has the switch of the switching port of the quantity of few (being 5 in this example) relatively.
The integrated new system of another kind of using the server architecture limited that illustrates of server 209a in-Fig. 2.In this case, for performance and the power management that utilizes server architecture, and the use of the port on the frame top switch is minimized, this illustrates existing server isomery is integrated on the server architecture of restriction, so that can be by gateway passes in structure (can be gateway ' ed into the fabric) from the Ethernet service of existing server, with the node communication in permission and structure, and make the 209a Ethernet service be carried into up link ethernet port 201 by structure.
Fig. 6 a-6c illustrates another example as the structural topology of the 48 node structure topologys that are comprised of 12 cards, and wherein each card comprises 4 nodes that are connected in system board.This topology provides some redundant links, but there is no huge redundancy.Topology has 4 ethernet gateway and moves back mouth, and these ethernet gateway move back mouthful each can be 1Gb or 10Gb, but not all these ethernet gateway all need to be used or to be connected.In the example shown, 8 fabric link are taken out of (are brought off) four node cards, and in one example, PCIe x16 connector is used for the card release of 4 fabric link bands.
Build the summary that piece builds power optimization server architecture plate/general introduction with tiling
1. the Ethernet of any amount of permission cross-server interconnection structure moves back the server tree structure of mouth, so that the quantity of Ethernet Phys minimizes, make power and the cost of the port association for consuming on saving and Ethernet Phys, associated current and frame top Ethernet switch/router.
2. switching node can be to save the pure switching node of power by turn-offing computing subsystem, or can be as the full computing subsystem that comprises fabric switch.With reference to Figure 17, in one implementation, with a plurality of power domains come from management processor (frame 906) and fabric switch machine (remainder of frame) decouples computation subsystem (frame 905).This allows to utilize computing subsystem (frame 905) to configure SOC, by by the fabric switch machine, completing and be de-energized, and the management processing in holding frame 906, and hardware packet switching and route.
Butterfly fat tree topology server architecture provide the minimum number in plate link (saving power and cost), cross over the link (saving power and cost) of the minimum number of plate, allow in plate and the redundant link path of straddle simultaneously.
4. the substrate that proposed and polymerizer plate allow only with two plates, to build piece and form scalable fault recovery server architecture.
Towards the server architecture of tree and as the variant of the fat tree of the example butterfly static link width and the speed that allow the aggregate bandwidth of child node that can be by this node to limit specify, thereby allow link configuration easily to make interconnect power reduce to minimum simultaneously.
6. power is saved and can be used driven dynamic link to configure further via link to optimize.In this example, each link and the associated ports of fabric switch machine comprise bandwidth counter, wherein have and allow to use based on dynamic link the configurable threshold event that link width and speed are reconfigured up and down.
7. due in many common server service conditions, Ethernet service is mainly node to external ethernet rather than node to node, so the tree structure proposed structure, especially the fat tree example of butterfly makes to stride across structure to the jumping figure of Ethernet and minimizes, thereby the stand-by period is minimized.This permission creates the large-scale low latency structure to Ethernet, uses the switch of the switching port with relatively few (being 5 in this example) quantity simultaneously.
8. allow carrying from the integrated server communication structure to limiting of the structural isomerism server of the Ethernet service that has server now and by it.
Build piece with tiling and build power optimization server machine frame and frame
At this moment can combine machine frame and the rack of these plates " tile " with the server node of structural texture connection.Fig. 7 illustrate passive backplane how can connect 8 14 gusset plates and 2 polymeric plate with combination the example by 236 machine frames that server node was formed.For example, each plate can be for 6U for example 8.7 " height+mechanical part<10.75 " (8.7 " tall+mechanical<10.75 " for 6U), for the density fin that interweaves, and 16 plates are assemblied in 19 inches wide frames.Base plate can be simple/cheap, has PCIe transducer and route, and wherein route can be XAUI signal (blue and green)+very simple and without the power (+Power which is very simple without wires) of wire.Ethernet is shown at 8 plate congruent point places to be connected.
Fig. 8 illustrates the example across the structure of machine frame expansion and cross-server frame link machine frame.Ethernet moves back mouth and can be pulled by any Nodes in structure, in this example, from the passive backplane base plate that connects the multinode blade, pulls them.
Build the summary that piece builds power optimization server machine frame and frame/general introduction with tiling
1. draw Ethernet with the PCIe connector and move back mouth and the outside XAUI link of plate, so that plate and ptp services device structure are linked together, do not use the PCIe signaling, but use physical connector with the power for plate and XAUI signal, be kept for the redundancy communication path that troubleshooting (fail-over) and focus reduce simultaneously.
2. be finished completely without the formed XAUI ptp services of source base plate device interconnection structure.
3. the Ethernet across the structure across frame moves back the every one-level place of mouth in tree, and the top of just not setting.
4. the Ethernet that can dynamically enable and forbid across structure moves back mouth, so that bandwidth and optimizing power are used coupling.
5. node to the node traffic that comprises system management service is stayed on the structure across frame, and can not pass the frame top Ethernet switch fully.
Storage
Fig. 9 a illustrates the exemplary server with disk topography size 700 according to an embodiment, typically such as 2.3 inches of the standards with SCSI or SATA drive or 3.5 inches hard disk drives (HDD).Server board 701 is assemblied in the infrastructure identical with disc driver 702 in current magnetic disk machine stand.Server 701 is full servers, wherein has server S oC on DDR, chip, optional flash memory, local power supply management, connects (1-16 to the SATA of disk ... be subject to the restriction of connector size).Its output can be the structure (XAUI) of Ethernet or Calxeda, wherein has two XAUI outputs for troubleshooting.Alternatively, it can use PCIe to replace SATA (SSD or need other thing of PCIe), wherein has 1 to 4 node with the relative storage demand of EQUILIBRIUM CALCULATION FOR PROCESS.Such server can carry out RAID realization and the application of LAMP stack server.On each disk, use Calxeda ServerNode that full LAMP stack server and a plurality of SATA interface of the DDR3 with 4GB will be provided.Alternatively, if necessary, can add the Section Point of the DDR of 8GB.
Fig. 9 b and Fig. 9 c illustrate respectively the exemplary array 710 and 720 according to the disk of the use of embodiment storage server 1-as above node SATA plate-server combination 700a-n.Eliminate the needs to large-scale Ethernet switch by standard or proprietary certain express network or the connection of interconnection, thereby saved power, reduce costs, reduce heat and reduce area.Each plate 701 is less than the height of disk and the degree of depth.Array can arrange with the disk replaced and plate, and as shown in Fig. 7 b, or a plate can be a plurality of disk work, for example adopts the layout of disk, disk, plate, disk, disk, as shown in Figure 7 c.Therefore, rated output is so that mode and disk ratio mate neatly.The connectedness of plate 701a-n can be based on each node, and wherein SATA is for linking disk and a plurality of SATA links a plurality of disks.It also can be based on node to node, wherein as previously described in the structure configuration and in application 61/256723, there are two XAUI in order to redundancy in each node.Node connects by the XAUI structure.Such connection is tree or fat tree topology, and node is to node to node to node, and wherein certainty, irrelevant (oblivious) or self adaptation route are carried out Mobile data towards correct direction.Alternatively, can use complete proprietary interconnection, to forward other processing unit to.Some ports can forward Ethernet output or any other I/O conduit to.Each node can directly forward " box " inner Ethernet or XAUI to and then arrive PHY or XAUI to PHY to XAUI polymerizer (switch).Perhaps can use above any combination.In other situation, SATA connects available PCI e and substitutes, so that with the SSD with PCIe connection.Some SSD enter the disk topography size together with PCIe or SATA.Perhaps can mix PCIe and SATA.Ethernet that can be outer with box replaces AXUI for system interconnection.In some cases, for example, but Application standard SATA connector, but in other cases, can make the higher density connector had by the proprietary wiring of proprietary base plate.
In another situation, server capability can, in disc driver, add disk with the full server that single disc driver overall dimension is provided.For example, ServerNode can be placed on the plate of disk inside.This mode can realize by XAUI or Ethernet connectedness.Under these circumstances, on the known chip of the inventor, server (server-on-chip) mode can be used as Magnetic Disk Controler and adds server.Fig. 9 d illustrates this concept.3.5 inches drivers of standard shown in Fig. 9 d, project 9d0.It has the integrated circuit card 9d1 that controls disc driver.Be labeled as the large quantity space of 9d2 and be not used in driver, wherein can form Calxeda low-power servlets node PCB and be assemblied in this untapped space in disc driver.
Fig. 9 e illustrates the realization of a plurality of server nodes being put into to 3.5 inches disc driver overall dimensions of standard.In this case, the connector from server PCB to base plate produces the interconnection of server architecture based on XAUI, so that communication structure between network and server to be provided, and for 4 sata ports of the connection to adjacent S ATA driver.
Figure 10 illustrates for by server and the integrated realization of memory depth.Server node (101) illustrates the complete low-power server of integrated calculating core, DRAM, integrated I/O and fabric switch machine.In this example, be illustrated in the server node 101 in the overall dimension identical with 2 1/2 inches disc drivers of standard (102).(103) right mode one to one combines these server nodes and disc driver the employing group to be shown, and wherein each server node has its oneself local storage.(104) server node of controlling 4 disc drivers is shown.System (105) illustrates via the uniform server structure and combines these storage servers, then pulls 4 10Gb/sec Ethernets from structure in this example and moves back mouth to be connected to Ethernet switch or router.
Figure 11, by the use of using existing 3.5 inches JBOD (JBOD) memory pack is shown, illustrates the specific implementation of this intensive encapsulation of memory and server.In this case, do not change the JBOD mechanical part that comprises the disk housing, but be illustrated in man-to-man memory node right with the group of disc driver in unmodified JBOD box.This illustrates a concept, and wherein server node is the pluggable modules that inserts the basic mainboard that comprises fabric link.In this diagram, 23 3.5 inches disks (being shown rectangle in logical view) are placed in this standard JBOD box, and this illustrate in the JBOD box comprise 31 server nodes (being shown ellipse/circle in logical view) with control 23 disks and manifest two 10Gb/sec ethernet links (being shown dark wide line in logical view).Closely integrated servers/stores concept is only taken ready-made memory JBOD box, then adds by 31 server nodes in the identical appearance size of power optimization structure communication.This has reflected the application that preferably has local storage well.
Figure 12 illustrates and uses following true related notion: can be of a size of example with server node and 2.5 inches driver identical appearance.In this case, they are integrated in 2.5 inches JBOD with 46 disks.This concept is illustrated in 64 server nodes integrated in the overall dimension that the JBOD memory is identical.In this example, from structure, pull 2 10Gb ethernet links, and 1Gb/sec management ethernet link.
The summary of memory/general introduction
1. draw Ethernet with the PCIe connector and move back mouth and the outside XAUI link of plate, so that plate and ptp services device structure are linked together, do not use the PCIe signaling, but use physical connector with the power for plate and XAUI signal, be kept for the redundancy communication path of fault recovery and load balance simultaneously.
2. by server node and the disk group of enabling little overall dimension low power configuration, the server architecture that uses restriction is had now to the JBOD storage system to convert, provide with this locality storage closely organize right, via power and the integrated very high-density computer server of performance optimization server architecture, thereby create new high performance computing service device and storage server solution, and do not affect physics and the Machine Design of JBOD storage system.
3. for using at the high density computing system, encapsulate the method for integrity server in order to replace some drivers with Additional servers in the overall dimension of hard disk drive.
4. as in claim 3, wherein server is via with additional switching fabric, being connected to network.
5. as in claim 3, base plate in the shell of driver wherein is housed and replaces with the base plate that is applicable to creating at least one interexchange channel.
6. for using at high-density memory system, low-power server PCB is integrated into to the method in the white space in 3.5 inches disc drivers of standard, so that the integrated computing capability in disc driver to be provided.
The frame of integrated low power server cooling
An aspect that is driven into low-power computer server solution is that the temperature of management by frame and straddle, cooling and air move.The miniaturization of fan is an aspect that reduces the TCO (TCO) of low-power server.Fan, because moving component increases cost, complexity, reduction reliability, consumes a large amount of power and produces much noise.Reduce and remove fan remarkable benefit can be provided on reliability, TCO and power consumption.
Figure 13 illustrates and supports by whole frame or the only cooling cooling new realization of frame chimney at the chimney of a section of frame.In chimney frame concept, an importance is single fan, uses rising free convection under the help of a fan.The large-scale fan of cooling whole frame can be at a slow speed.It can be placed in bottom or the frame under the cooling subset of vertically arranged convection current of frame.When cooling-air arrives bottom, fan pushes it through chimney and releases top.Because all plates are all vertical, so do not exist level to block.Although in this example, fan is shown in the frame bottom, it can be in any position of system.That is to say, if " tradition " below ventilation hole and fan is cooling, stay top as vertical stack, this system may have level and block.This is vertical, the bottom coohng mode can work to mini system.Fan can be speed change and temperature correlation.
Figure 13 a illustrates the exemplary illustration of the new principle of the thermal convection 500 of using in chimney frame concept.With angled alignment, come placing modules to make thermal transpiration Double Data Rate (DDR) the memory chip 503a-n of hot-fluid 501a-n from printed circuit board (PCB) 502 increase, so those thermal transpiration chips do not form Hot Spare or heating mutually.In this example, the DDR chip is obliquely placed rather than vertical stacking each other, because they tend to mutual heating.In addition, the DDR chip be placed on such as on the mass computing chip 504a of ASIC, SOC or processor rather than under because they can tend to heat SOC.And the coolest chip of flash chip 506() be placed under SOC.Equally, as discussed below, node is not vertical stacking.Figure 14 expands to this concept the how mutual obliquely placement of server node to be shown so that the spontaneous heating of cross-server node reduces to minimum.
Figure 15 illustrates demonstration 16 node systems according to an embodiment, and wherein heat wave rises from printed circuit board (PCB).For typical 16 node systems, individual node is configured to make the unit on not heating from the heat of each unit rise.Monolithic case is normally longer, not high and not intensive.In addition, be not that PCB obliquely is installed as shown, PCB but can meet at right angles ground (squarely aligned) align and be rectangle, but assembly can obliquely align to place so that heating reduces to minimum mutually.PCB in different rows can have complementary layout or correspondingly stagger arrangement to reduce mutual heating.Similarly, Figure 16 illustrates the more high density variant of 16 node systems, wherein has the node arranged similarly and makes the spontaneous heating of cross-node reduce to minimum.
For the additional cooling concept of the frame of low-power server, be to create ascending air and without fan with pneumatic air pressure reduction.It for the technology of doing like this, is the sealed frame that creates the prolongation vent riser with air.This ventilation duct is high (approximately 20-30 foot above (20-30 feet+)) enough, thereby creates ascending air to create enough air pressure differences.This moves and cooling system for the complete passive air that the frame of low-power server provides.
The cooling summary of the low-power server that frame is installed/general introduction
1. for using at the high density computing system, the thermal transpiration assembly is placed on to the method on vertical placement mounting panel.
Wherein do not have the thermal transpiration assembly to be placed directly on another thermal transpiration assembly or under.
2. as claimed in claim 1, wherein assembly across the substantially oblique setting of mounting panel.
3. as claimed in claim 1, wherein assembly intersects oblique setting across mounting panel basically with some.
4. as described in claim 1,2 and 3, wherein mounting panel is printed substrate.
The server architecture exchange of non-Ethernet grouping
Described in common patent application 12/794996 co-pending, Figure 17 illustrates the inside structure of server node fabric switch machine.Figure 17 illustrates the block diagram according to the demonstration switch 900 of an aspect of system and method disclosed herein.It has four area-of-interest 910a-d.Zone 910a divides into groups corresponding to the Ethernet between CPU and internal mac.Zone 910b is corresponding to ethernet frame, the beginning that it comprises preamble, frame and the interframe gap field of the Ethernet physical interface at internal mac.Zone 910c is corresponding to the externally ethernet frame of the Ethernet physical interface of MAC, the beginning that it comprises preamble, frame and interframe gap field.Zone 910d is corresponding to processor and the grouping of the Ethernet between outside MAC 904 of route header 901.This segmentation MAC framework is asymmetric.Internal mac has to the Ethernet physics signaling interface in the route header processor, and outside MAC has to the Ethernet packet interface in the route header processor.Therefore, MAC IP is reused with for to internal mac and outside MAC, and uses normally the physics signaling for MAC feed-in switch.The MAC configuration makes the operating system device driver management of A9 core 905 and controls inner Eth0 MAC 902 and inner ETH1 MAC 903.Inner Eth2 MAC 907 is managed and controlled to the device driver of management processor 906.Outside Eth MAC 904 can't help device driver and controls.MAC 904 is configured to transmit all frames and does not carry out any filtering for network monitoring with promiscuous mode.The initialization of this MAC is coordinated between the hardware illustration of MAC and any other necessary management processor initialization.Outside Eth MAC 904 registers are that A9 905 and management processor 906 address mappings are all visible.The interruption of outside Eth MAC 904 can be routed to A9 or management processor.
Key to node is, route header processor 910d, receiving while from MAC, mailing to the grouping of switch, adds the structure route header to grouping, and removes the structure route header while receiving the grouping of from switch, mailing to MAC.Fabric switch machine itself is only to being included in node ID and the out of Memory route comprised in the structure route header, and original packet do not divided into groups to check.
Distributed PCIe structure
Figure 18 illustrates the server node that comprises the PCIe controller that is connected to the internal cpu bus structure.This allow to create uses the new PCIe switch architecture of high-performance, power optimization server architecture, in order to create scalable, high-performance, power optimization PCIe structure.
This technology is as follows:
-PCIe controller 902 is connected to Mux 902a, to allow the PCIe controller, is directly connected to exterior PC Ie Phy or is connected to PCIe route header processor 910c.When Mux 902a is configured to the PCIe business is led to local PCIe Phy, this is equivalent to the local PCIe of standard and connects.When Mux 902a is configured to PCIe business guiding PCIe route header processor 910c, this realizes new PCIe distributed frame switch mechanism.
-PCIe route header processor 910c is used the embedded routing iinformation (address, ID or implicit expression) in grouping to create the structure route header that this PCIe Packet routing is mapped to destination structure node PCIe controller.
-this provides the advantage with distributed PCIe structural similarity: server architecture provides networking.
-PCIe the affairs that are derived from processor core (905) can (via the Mux bypass or via switch) be routed to local PCIe Phy, structural any other node be can be routed to, inner PCIe controller (902) or exterior PC Ie controller/Phy (904) are routed directly to.
-same, Incoming PCIe affairs enter exterior PC Ie controller (904), by PCIe route header processor (910), by the structure route header, are carried out mark, and then structure is given the PCIe transmitted in packets its final goal.
The distributed bus protocol architecture
Figure 18 a illustrates additional extension, and it illustrates a plurality of protocol bridge can utilize the following fact: the fabric switch machine is to the route header rather than directly basic grouped payload (for example layer 2 ethernet frame) is carried out to route.In this diagram, 3 protocol bridge are shown: Ethernet, PCIe and bus protocol bridger.
The effect of bus protocol bridger is to obtain processor or inner SOC infrastructure protocol, to its grouping, adds Calxeda structure route header, then it is carried out to route by the Calxeda structure.
As a feasible example, consider the bus protocol such as AMBA AXI, HyperTransport or QPI (via interconnects fast) in SOC.
Consider following data flow:
Processor on-inner SOC bus structures sends memory and loads (or storage) request.
The physical address target of-storage operation has been mapped to structural remote node.
-bus transaction is passed the bus protocol bridger:
-bus transaction is divided into groups
-physical address map of memory transaction is arrived to remote node, use this node ID when building route header.
-routing frame is built by the bus protocol bridger, and routing frame comprises the route header with remote node id, and payload is the bus transaction of grouping.
-bus transaction routing frame, through the fabric switch machine, through structure, and is received by the frame switch of destination node.
-destination node bus protocol bridger is unpacked to the bus transaction of grouping, sends bus transaction in target SOC structure, completes memory and loads, and return results by same steps as, and wherein result is back to origination node.
Network processing unit and server architecture are integrated
Figure 19 illustrates server architecture and the integrated diagram of network processing unit (911).For by server architecture and network processing unit is integrated some service conditions, comprising:
-network processing unit can be used as the network packet processor accelerator of native processor (905) and structural any other processor.
-can be the network processing unit Center, wherein from the grouping of the Incoming of external ethernet for network processing unit, and network processing unit and control plane are processed and can be discharged into larger processing core (905).
-server architecture can be as the communication structure between network processing unit.
In order to realize these new service conditions, to network processing unit assignment MAC Address.In the exchange board structure shown in Figure 19, be not attached to the route header processor of port one-4.Therefore, the agency who is directly connected to port one-4 need to inject the grouping with the fabric switch machine header that divides into groups default to payload.It is integrated that network processing unit adds the fabric switch machine to its design through the following steps:
-carrying out the striking out grouping of mark from network processing unit with fabric switch machine header, it is to being encoded from MAC destination, destination node ID.
-before the Ethernet packet transaction, from the Incoming to network processing unit of fabric switch machine, grouping is removed fabric switch machine header.
External device (ED) and server architecture integrated
Figure 19 illustrates the diagram that server architecture and any external device (ED) (912) is integrated.External device (ED) means any processor, DSP, GPU, I/O or needs communication or the processing unit of communication structure between device.Typical case's service condition will be to need the DSP of the interconnection structure between DSP or GPU processor or the large-scale treatment system that the GPU processor forms.
The fabric switch machine carrys out the route grouping based on the structure route header, and the grouping payload is not divided into groups to check.The grouping payload is not formatted as the hypothesis of ethernet frame, and as opaque payload, treats fully.
This allows external device (ED) (for example DSP or GPU processor) to be attached to the fabric switch machine, and uses through the following steps scalable, high-performance, power optimization communication structure:
-add the routing frame header of the destination node ID that comprises grouping to send to frame switch any grouping payload.
-peel off the routing frame header when the grouping that receives from frame switch.
Load balance
When the structural topology considered shown in Fig. 5, each of the node in structure produces at least one MAC Address and IP address, with the gateway node by shown in 501a and 501b, provides the external ethernet connectedness.
Exposing these fine-grained MAC and IP address is favourable for the extensive World Wide Web (WWW) operation of using the hardware load balancer, because it provides the simple list of MAC/IP address for load balancer, to be operated based on simple list, wherein the internal structure of structure is sightless for load balancer.
But potential a large amount of new MAC/IP address that less data center can be provided by high density low-power server potentially overwhelms.Advantageously can be provided for the option of load balance, with isolating exterior data center infrastructure in order to avoid must process separately a large amount of IP address such as the layer of web services.
Consider Figure 20, wherein we have taken a port on the fabric switch machine, and the FPGA provided such as the service of IP virtual server (IPVS) has been provided.This IP is virtual can be completed in the scope of the network level that comprises the 4th layer (transmission) and the 7th layer (application).In many cases, advantageously, at the 7th layer of place of the data center's layer for such as web services, complete load balance so that the http session status can by specific Web server node local keep.IPVS FPGA only is attached to gateway node ( node 501a and 501b in Fig. 5).
In this example, when the structure shown in Fig. 5 enlarges at the IPVS FPGA with on gateway node, can produce every gateway node single ip address.IPVS FPGA for example, carries out load balance for the Incoming request (HTTP request) of the node in structure.For the 4th layer of load balance, IPVS FPGA can complete to stateless, and uses the algorithm comprise the cross-node poll, or the request of the maximum quantity of the every node of illustration before using next node.For the 7th layer of load balance, IPVS FPGA will need hold mode, so that utility cession can be for specific node.
The flow process that produces becomes:
-Incoming request (for example HTTP request) enters the gateway node (port 0) in Figure 20.
-fabric switch machine routing table has been configured to the IPVS FPGA port on the Incoming business guide frame switch from port 0.
-IPVS FPGA rewrites route header, with the specific node in structure, and by the produced destination node that forwards a packet to.
-destination node is processed request, and usually from gateway node, sends out result.
The structure of enabling networking of OpenFlow/ software definition
OpenFlow is to provide the communication protocol to the access of the Forwarding plane of switch or router by network.OpenFlow allows to be determined by the software moved on independent server by the path of the network of network grouping of switch.The permission ratio that separates of controlling with forwarding is used to the more complicated service management of the feasible service management of ACL and Routing Protocol now.OpenFlow is considered to the realization of the general fashion of software defined networking.
Figure 21 illustrates OpenFlow (or more generally software defined networking (SDF)) stream is processed to the mode in the Calxeda structure that is building up to.Each of gateway node will be illustrated in the FPGA that enables OpenFlow on the port of fabric switch machine of gateway node.OpenFlow FPGA need to be to the band of control plane processor path, this can complete by the independent networking port on OpenFlow FPGA, or another port that can be outer by claimed structure switch has simply been conversed to the control plane processor.
The flow process that produces becomes:
The request of-Incoming enters the gateway node (port 0) in Figure 20.
-fabric switch machine routing table is configured, will be directed to from the Incoming business of port 0 the OpenFlow/SDF FPGA port on the fabric switch machine.
-OpenFlow/SDF FPGA realizes that standard OpenFlow processes, and is included in and gets in touch with alternatively in case of necessity the control plane processor.OpenFlow/SDF FPGA rewrites route header, with the specific node (passing through MAC Address) in structure, and by the produced destination node that forwards a packet to.
-destination node is processed request, and sends it back result to OpenFlow FPGA, and wherein it realizes any out stream processing.
Via PCIe by the power optimization structure assembly to standard processor
Shown in Fig. 5 and previously described power optimization server architecture provide noticeable advantage for the existing standard processor, and can come with existing processor integrated by the integrated chip solution.Standard desktop and processor-server usually directly or via the integrated chip group are supported the PCIe interface.Figure 22 illustrates and via PCIe, power optimization fabric switch machine is integrated into to an example of existing processor.Project 22a illustrates directly or supports via the integrated chip group standard processor of one or more PCIe interfaces.Project 22b illustrates the disclosed fabric switch machine had the integrated ethernet mac controller of its integrated PCIe interface.Project 22b can realize being integrated with FPGA or the ASIC of PCIe integrated morphology switch usually.
In the disclosure, the node shown in Fig. 5 can be the isomery combination of power optimization server S OC and integrated morphology switch, and the PCIe of the present disclosure standard processor connected and the PCIe interface module that comprises ethernet mac and fabric switch machine is integrated.
Integrated via the power optimization structure of Ethernet and standard processor
Shown in Fig. 5 and previously described power optimization server architecture provide noticeable advantage to the existing standard processor, and can be integrated as integrated chip solution and existing processor.Standard desktop and processor-server are usually via integrated chip or the Ethernet interface that provides support in SOC potentially.Figure 23 illustrates and via Ethernet, power optimization fabric switch machine is integrated into to an example of existing processor.Project 23a illustrates by SOC or supports the standard processor of Ethernet interface via integrated chip.Project 23b illustrates disclosed fabric switch machine does not have integrated inner ethernet mac controller.Project 23b can realize being integrated with FPGA or the ASIC of integrated morphology switch usually.
In the disclosure, the node shown in Fig. 5 can be the isomery combination of power optimization server S OC and integrated morphology switch, and mode with FPGA or ASIC of the present disclosure realizes the integrated of standard processor that Ethernet connects and integrated morphology switch.
Although aforementioned with reference to specific embodiment of the present invention, but it will be understood by those of skill in the art that, in the situation that do not deviate from principle of the present disclosure and spirit, can carry out the present embodiment is changed, the scope of the present disclosure is limited by appended claims.

Claims (47)

1. a calculation element comprises:
A plurality of server nodes, wherein, each server node comprises processor, memory, input/output circuitry and the fabric switch machine of interconnection each other;
The fabric switch machine, described fabric switch machine is interconnected described a plurality of server nodes by a plurality of fabric link; And
One or more Ethernets move back mouth, and described one or more Ethernets move back the described fabric switch machine that mouth carrys out self-forming power optimization server architecture.
2. calculation element as claimed in claim 1, wherein, described a plurality of server nodes are parts of server board.
3. calculation element as claimed in claim 2, also comprise the polymerization of one group of plate, wherein, each plate has one or more server nodes, wherein, each server node comprises processor, memory, input/output circuitry, and described one group of plate and described fabric switch machine interconnect to produce larger server.
4. calculation element as claimed in claim 1, wherein, described server node also comprises one or more fabric switch machines, described one or more fabric switch machines exchanges are attached to the route header of the 2nd layer of grouping of Ethernet.
5. calculation element as claimed in claim 1, wherein, described fabric switch facility have a plurality of server node links, wherein, the speed of each server node link are set with optimizing power.
6. calculation element as claimed in claim 1, wherein, described fabric switch facility have a plurality of server node links, and wherein, the speed of each server node link can be dynamically adjusted with optimizing power.
7. calculation element as claimed in claim 6, wherein, one of them of the instantaneous use that the speed of each server node link can be based on the server node link and the average use of described server node link is dynamically adjusted.
8. calculation element as claimed in claim 3, wherein, one or more fabric link and one or more Ethernet move back and mouthful with the PCIe connector, are connected to described one group of plate.
9. calculation element as claimed in claim 1, also comprise the passive backplane that ptp services device interconnection structure is provided.
10. calculation element as claimed in claim 1, wherein, described a plurality of server nodes form the tree with one or more levels, and wherein, and described Ethernet moves back any level mouthful in described tree.
11. calculation element as claimed in claim 1, wherein, it is one of to enable and forbid so that bandwidth and optimizing power are used coupling that each Ethernet moves back mouth.
12. calculation element as claimed in claim 1, wherein, the data between described server node are moved back mouth through described fabric switch machine rather than described Ethernet.
13. calculation element as claimed in claim 4, wherein, each server node makes described computation module turn-off to reduce power.
14. calculation element as claimed in claim 2, also comprise one of them a plurality of server board that form frame and base plate.
15. calculation element as claimed in claim 14, also comprise a plurality of machine frames that form frame.
16. a calculation element comprises:
Storage device, have overall dimension;
Server node, wherein said server node comprises processor, memory, input/output circuitry, switch architecture and, for one or more SATA interfaces of described storage device, described server node has the overall dimension identical with described storage device.
17. calculation element as claimed in claim 16, also comprise the array of storage devices and the server node array that interconnect each other.
18. calculation element as claimed in claim 16, wherein, described server node is in described storage device.
19. calculation element as claimed in claim 16, wherein, described server node is connected to described storage device, and described storage device is the local storage of described server node.
20. calculation element as claimed in claim 16, also comprise a plurality of storage devices, wherein, each storage device is connected to one of described SATA interface, so that described server node is controlled described a plurality of storage devices.
21. calculation element as claimed in claim 16, also comprise a plurality of server nodes and a plurality of storage device, wherein, described a plurality of server nodes are switch architectures, and control described a plurality of storage device.
22. calculation element as claimed in claim 16, also comprise that one or more Ethernets move back mouth and link, and wherein, each Ethernet moves back mouth and link has the PCIe connector.
23. the method for the production of the high density computing system, described method comprises:
Provide there is processor, the server node of memory, input/output circuitry, switch architecture and one or more SATA interfaces; And
Described server node is encapsulated in the overall dimension of hard disk drive.
24. method as claimed in claim 23, wherein, described switch architecture is connected to network by described server node.
25. method as claimed in claim 23, also comprise the base plate that substitutes described hard disk drive with the base plate that is suitable for creating at least one interexchange channel.
26. the method for the production of the high density computing system, described method comprises:
Standard profile size disc driver is provided; And
The server node that will have processor, memory, input/output circuitry, switch architecture and one or more SATA interfaces is integrated in described standard profile size disc driver, wherein, provide integrated computing capability in described standard profile size disc driver.
27. a calculation element comprises:
Circuit board;
Be arranged on the one or more dynamic ram chips on described circuit board;
Be installed to one or more computing chips of described circuit board;
Be installed to one or more flash memory dies of described circuit board;
Wherein, described circuit board is vertically installed, so that described one or more flash memory dies is under described one or more computing chips, and described one or more dynamic ram chip is on described one or more computing chips;
The chimney cooler of the circuit board of vertically installing.
28. calculation element as claimed in claim 27, also comprise by the cooling circuit board that carrys out cooling a plurality of vertical orientations of chimney.
29. calculation element as claimed in claim 27, wherein, described chimney cooler is the fan at the cooling described circuit board at place, described circuit board bottom.
30. calculation element as claimed in claim 27, wherein, described chimney cooler is pneumatic air source and ventilation duct.
31. calculation element as claimed in claim 27, wherein, the described one or more dynamic ram chips in the described circuit board of vertically installing are not directly on described one or more computing chips.
32. calculation element as claimed in claim 27, wherein, described circuit board is printed circuit board (PCB).
33. calculation element as claimed in claim 27, wherein, obliquely be arranged on described one or more dynamic ram chips, one or more computing chip and one or more flash memory dies on described circuit board.
34. a calculation element comprises:
One or more processors;
Be connected to the bus structures of described one or more processors;
Be connected to described bus-structured fabric switch machine, described fabric switch machine is from described calculation element to one or more port output data; And
One or more route header processors, wherein, each route header processor is used to the route particular transport stream, so that described fabric switch machine is processed different transport stream.
35. calculation element as claimed in claim 34, wherein, described different transport stream comprises server transport stream, memory transfer stream and networking transport stream.
36. calculation element as claimed in claim 34, also comprise and be connected to described bus-structured one or more ethernet mac controllers, and described fabric switch machine is connected to the described one or more ethernet mac controllers from described calculation element to one or more port output data, and described one or more route header processor is for carry out the PCIe header processor of route PCIe data across described fabric switch machine.
37. calculation element as claimed in claim 36, also comprise the PCIe route header that is connected to described bus-structured PCIe controller, is connected to the described PCIe controller that can be connected with PCIe PHY.
38. calculation element as claimed in claim 34, also comprise the network processing unit that is connected to described switch architecture.
39. calculation element as claimed in claim 36, also comprise the external device (ED) that is connected at least one port.
40. a calculation element comprises:
One or more processors;
Be connected to the bus structures of described one or more processors;
Be connected to described bus-structured fabric switch machine, described fabric switch machine is from described calculation element to one or more port output data;
Be connected to the bus protocol bridger between described bus structures and described switch architecture; And
One or more route header processors, wherein each route header processor is for the route particular transport stream, so that described fabric switch machine is processed different transport stream.
41. one kind for exchanging the method for different transport stream, comprising:
One or more processors and the bus structures that are connected to described one or more processors are provided;
The described bus-structured fabric switch machine that is connected to is provided, and described fabric switch machine is from described calculation element to one or more port output data; And
Exchange particular transport stream with one or more route header processors, so that described fabric switch machine is processed different transport stream.
42. method as claimed in claim 41, wherein, described different transport stream comprise server transport stream, memory transfer stream and networking transport stream.
43. method as claimed in claim 41, also comprise across described fabric switch machine and carry out route PCIe data.
44. a method of using switch architecture to carry out load balance comprises:
Server node is provided, and described server node has one or more processors, is connected to the bus structures of described one or more processors; Be connected to described bus structures, from described calculation element, to one or more ports, export the fabric switch machine of data and the IP virtual server that is connected to described fabric switch machine;
Receive the Incoming request;
Described Incoming request is routed to the described IP virtual server be connected with described fabric switch machine;
Use is connected to the particular sections dot generation route header of the described IP virtual server of described fabric switch machine to described structure;
Described Incoming request is forwarded to described specific node; And
Process described Incoming request with described specific node, so that load balance to be provided.
45. a method of using switch architecture to be processed comprises:
Server node is provided, and described server node has one or more processors, is connected to the bus structures of described one or more processors; Be connected to described bus structures, from described calculation element, to one or more ports, export the fabric switch machine of data and the OpenFlow device that is connected to described fabric switch machine;
Receive the Incoming request;
Described Incoming request is routed to the described OpenFlow device be connected with described fabric switch machine;
Use the particular sections dot generation route header of described OpenFlow device to described structure;
Described Incoming request is forwarded to described specific node;
Process described Incoming request with described specific node, so that load balance to be provided; And
Send it back treated Incoming request to described OpenFlow device.
46. a calculation element comprises:
One or more processors;
Be connected to the bus structures of described one or more processors;
Be connected to described bus-structured fabric switch machine, described fabric switch machine is from described calculation element to one or more port output data;
Be connected to described bus-structured PCIe interface; And
Ppu, described ppu is used described PCIe interface to be connected to described calculation element.
47. a calculation element comprises:
Fabric switch machine from from described calculation element to one or more port output data;
Be connected to the ethernet port of described fabric switch machine; And
Use Ethernet interface to be connected to the ppu of described calculation element.
CN2011800553292A 2010-09-16 2011-09-16 Performance and power optimized computer system architecture and leveraging power optimized tree fabric interconnecting Pending CN103444133A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610113343.8A CN105743819B (en) 2010-09-16 2011-09-16 Computing device

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US38358510P 2010-09-16 2010-09-16
US61/383,585 2010-09-16
US13/234,054 2011-09-15
US13/234,054 US9876735B2 (en) 2009-10-30 2011-09-15 Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect
PCT/US2011/051996 WO2012037494A1 (en) 2010-09-16 2011-09-16 Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201610113343.8A Division CN105743819B (en) 2010-09-16 2011-09-16 Computing device

Publications (1)

Publication Number Publication Date
CN103444133A true CN103444133A (en) 2013-12-11

Family

ID=45831990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011800553292A Pending CN103444133A (en) 2010-09-16 2011-09-16 Performance and power optimized computer system architecture and leveraging power optimized tree fabric interconnecting

Country Status (4)

Country Link
CN (1) CN103444133A (en)
DE (1) DE112011103123B4 (en)
GB (1) GB2497493B (en)
WO (1) WO2012037494A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107104895A (en) * 2016-02-23 2017-08-29 迈络思科技Tlv有限公司 The unicast forwarding that adaptive route is notified
CN111488302A (en) * 2019-01-28 2020-08-04 广达电脑股份有限公司 Computing system with elastic configuration, computer-implemented method and storage medium
CN112703462A (en) * 2018-06-28 2021-04-23 推特股份有限公司 Method and system for maintaining storage device fault tolerance in a composable infrastructure

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8782654B2 (en) 2004-03-13 2014-07-15 Adaptive Computing Enterprises, Inc. Co-allocating a reservation spanning different compute resources types
US8413155B2 (en) 2004-03-13 2013-04-02 Adaptive Computing Enterprises, Inc. System and method for a self-optimizing reservation in time of compute resources
US20070266388A1 (en) 2004-06-18 2007-11-15 Cluster Resources, Inc. System and method for providing advanced reservations in a compute environment
US8176490B1 (en) 2004-08-20 2012-05-08 Adaptive Computing Enterprises, Inc. System and method of interfacing a workload manager and scheduler with an identity manager
WO2006053093A2 (en) 2004-11-08 2006-05-18 Cluster Resources, Inc. System and method of providing system jobs within a compute environment
US8863143B2 (en) 2006-03-16 2014-10-14 Adaptive Computing Enterprises, Inc. System and method for managing a hybrid compute environment
WO2006107531A2 (en) 2005-03-16 2006-10-12 Cluster Resources, Inc. Simple integration of an on-demand compute environment
US9231886B2 (en) 2005-03-16 2016-01-05 Adaptive Computing Enterprises, Inc. Simple integration of an on-demand compute environment
CA2603577A1 (en) 2005-04-07 2006-10-12 Cluster Resources, Inc. On-demand access to compute resources
US8041773B2 (en) 2007-09-24 2011-10-18 The Research Foundation Of State University Of New York Automatic clustering for self-organizing grids
US9054990B2 (en) 2009-10-30 2015-06-09 Iii Holdings 2, Llc System and method for data center security enhancements leveraging server SOCs or server fabrics
US20130107444A1 (en) 2011-10-28 2013-05-02 Calxeda, Inc. System and method for flexible storage and networking provisioning in large scalable processor installations
US9465771B2 (en) 2009-09-24 2016-10-11 Iii Holdings 2, Llc Server on a chip and node cards comprising one or more of same
US9077654B2 (en) 2009-10-30 2015-07-07 Iii Holdings 2, Llc System and method for data center security enhancements leveraging managed server SOCs
US9876735B2 (en) 2009-10-30 2018-01-23 Iii Holdings 2, Llc Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect
US9069929B2 (en) 2011-10-31 2015-06-30 Iii Holdings 2, Llc Arbitrating usage of serial port in node card of scalable and modular servers
US8599863B2 (en) 2009-10-30 2013-12-03 Calxeda, Inc. System and method for using a multi-protocol fabric module across a distributed server interconnect fabric
US20110103391A1 (en) 2009-10-30 2011-05-05 Smooth-Stone, Inc. C/O Barry Evans System and method for high-performance, low-power data center interconnect fabric
US9680770B2 (en) 2009-10-30 2017-06-13 Iii Holdings 2, Llc System and method for using a multi-protocol fabric module across a distributed server interconnect fabric
US11720290B2 (en) 2009-10-30 2023-08-08 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US9648102B1 (en) 2012-12-27 2017-05-09 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
US9311269B2 (en) 2009-10-30 2016-04-12 Iii Holdings 2, Llc Network proxy for high-performance, low-power data center interconnect fabric
US10877695B2 (en) 2009-10-30 2020-12-29 Iii Holdings 2, Llc Memcached server functionality in a cluster of data processing nodes
CN104022953B (en) * 2013-02-28 2018-02-09 新华三技术有限公司 Message forwarding method and device based on open flows Openflow
EP2997482A4 (en) * 2013-05-16 2016-11-16 Hewlett Packard Development Co Multi-mode agent
US11575594B2 (en) 2020-09-10 2023-02-07 Mellanox Technologies, Ltd. Deadlock-free rerouting for resolving local link failures using detour paths
US11411911B2 (en) 2020-10-26 2022-08-09 Mellanox Technologies, Ltd. Routing across multiple subnetworks using address mapping
US11870682B2 (en) 2021-06-22 2024-01-09 Mellanox Technologies, Ltd. Deadlock-free local rerouting for handling multiple local link failures in hierarchical network topologies
US11765103B2 (en) 2021-12-01 2023-09-19 Mellanox Technologies, Ltd. Large-scale network with high port utilization

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7447147B2 (en) * 2003-02-28 2008-11-04 Cisco Technology, Inc. Ethernet switch with configurable alarms
CN101359431A (en) * 2008-01-14 2009-02-04 珠海天瑞电力科技有限公司 PLC network television teaching system
US20090166065A1 (en) * 2008-01-02 2009-07-02 Clayton James E Thin multi-chip flex module
US7616646B1 (en) * 2000-12-12 2009-11-10 Cisco Technology, Inc. Intraserver tag-switched distributed packet processing for network access servers
US20100008038A1 (en) * 2008-05-15 2010-01-14 Giovanni Coglitore Apparatus and Method for Reliable and Efficient Computing Based on Separating Computing Modules From Components With Moving Parts

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7161901B2 (en) 2001-05-07 2007-01-09 Vitesse Semiconductor Corporation Automatic load balancing in switch fabrics
US7917658B2 (en) 2003-01-21 2011-03-29 Emulex Design And Manufacturing Corporation Switching apparatus and method for link initialization in a shared I/O environment
US7688578B2 (en) 2007-07-19 2010-03-30 Hewlett-Packard Development Company, L.P. Modular high-density computer system
US8918488B2 (en) 2009-02-04 2014-12-23 Citrix Systems, Inc. Methods and systems for automated management of virtual resources in a cloud computing environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7616646B1 (en) * 2000-12-12 2009-11-10 Cisco Technology, Inc. Intraserver tag-switched distributed packet processing for network access servers
US7447147B2 (en) * 2003-02-28 2008-11-04 Cisco Technology, Inc. Ethernet switch with configurable alarms
US20090166065A1 (en) * 2008-01-02 2009-07-02 Clayton James E Thin multi-chip flex module
US7796399B2 (en) * 2008-01-02 2010-09-14 Microelectronics Assembly Technologies, Inc. Thin multi-chip flex module
CN101359431A (en) * 2008-01-14 2009-02-04 珠海天瑞电力科技有限公司 PLC network television teaching system
US20100008038A1 (en) * 2008-05-15 2010-01-14 Giovanni Coglitore Apparatus and Method for Reliable and Efficient Computing Based on Separating Computing Modules From Components With Moving Parts

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107104895A (en) * 2016-02-23 2017-08-29 迈络思科技Tlv有限公司 The unicast forwarding that adaptive route is notified
CN107104895B (en) * 2016-02-23 2021-08-13 迈络思科技Tlv有限公司 Unicast forwarding of adaptive routing advertisements
CN112703462A (en) * 2018-06-28 2021-04-23 推特股份有限公司 Method and system for maintaining storage device fault tolerance in a composable infrastructure
CN111488302A (en) * 2019-01-28 2020-08-04 广达电脑股份有限公司 Computing system with elastic configuration, computer-implemented method and storage medium
CN111488302B (en) * 2019-01-28 2022-03-29 广达电脑股份有限公司 Computing system with elastic configuration, computer-implemented method and storage medium

Also Published As

Publication number Publication date
GB201306075D0 (en) 2013-05-22
DE112011103123T5 (en) 2013-12-05
DE112011103123B4 (en) 2023-08-10
GB2497493B (en) 2017-12-27
WO2012037494A1 (en) 2012-03-22
GB2497493A (en) 2013-06-12

Similar Documents

Publication Publication Date Title
CN103444133A (en) Performance and power optimized computer system architecture and leveraging power optimized tree fabric interconnecting
US9876735B2 (en) Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect
TWI534629B (en) Data transmission method and data transmission system
US8335884B2 (en) Multi-processor architecture implementing a serial switch and method of operating same
CN104657317B (en) Server
CN103546299A (en) 50 Gb/s ethernet using serializer/deserializer lanes
CN202535384U (en) Network equipment expansion connection and virtual machine interconnection optimization system based on PCIe bus
CN105227496A (en) Cluster switch, network and transmit the method for data on that network
CN201282471Y (en) Cluster type server application device
CN103136141A (en) High speed interconnection method among multi-controllers
WO2019072115A1 (en) Server system having optimized cooling, and installation method
CN102103471B (en) Data transmission method and system
CN104580527B (en) A kind of more I/O high density multi-node server system design methods of cloud service-oriented device application
US20200077535A1 (en) Removable i/o expansion device for data center storage rack
Maniotis et al. How data center networks can improve through co-packaged optics
CN105743819A (en) Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect
CN203241890U (en) Multi-unit server based on ATCA board card interfaces
CN103002600B (en) A kind of many 10,000,000,000 interface arrangements based on 1U design
CN204539696U (en) Blade server and network equipment cabinet
CN208969660U (en) A kind of SRIO exchange board of OpenVPX structure
CN102932213B (en) A kind of coexist and the server system design method of on-demand interchange based on Infiniband and ten thousand mbit ethernets
CN204965277U (en) Support blade server of network load balancing exchange
CN107370681A (en) A kind of modular router
Wang Survey of recent research issues in data center networking
Shao et al. OeIM: An Optoelectronic Interconnection Middleware for the Exascale Computer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1192385

Country of ref document: HK

ASS Succession or assignment of patent right

Owner name: III HOLDINGS NO. 2 LLC

Free format text: FORMER OWNER: SILICON VALLEY BANK

Effective date: 20140918

Owner name: SILICON VALLEY BANK

Free format text: FORMER OWNER: CALXEDA INC.

Effective date: 20140918

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20140918

Address after: Delaware

Applicant after: III HOLDINGS 2, LLC

Address before: California, USA

Applicant before: Silicon Valley Bank

Effective date of registration: 20140918

Address after: California, USA

Applicant after: Silicon Valley Bank

Address before: Texas, USA

Applicant before: Calxeda, Inc.

WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131211

WD01 Invention patent application deemed withdrawn after publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1192385

Country of ref document: HK