TWI540862B - Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect - Google Patents
Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect Download PDFInfo
- Publication number
- TWI540862B TWI540862B TW100133390A TW100133390A TWI540862B TW I540862 B TWI540862 B TW I540862B TW 100133390 A TW100133390 A TW 100133390A TW 100133390 A TW100133390 A TW 100133390A TW I540862 B TWI540862 B TW I540862B
- Authority
- TW
- Taiwan
- Prior art keywords
- organization
- server
- computing device
- switch
- node
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 51
- 239000004744 fabric Substances 0.000 title description 5
- 230000008520 organization Effects 0.000 claims description 262
- 238000001816 cooling Methods 0.000 claims description 23
- 235000012431 wafers Nutrition 0.000 claims description 22
- 230000002776 aggregation Effects 0.000 claims description 18
- 238000004220 aggregation Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 12
- 230000002093 peripheral effect Effects 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000004806 packaging method and process Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000004044 response Effects 0.000 claims 1
- 101100095028 Arabidopsis thaliana SAT3 gene Proteins 0.000 description 19
- 238000004891 communication Methods 0.000 description 18
- 230000010354 integration Effects 0.000 description 16
- 238000005457 optimization Methods 0.000 description 13
- 238000007726 management method Methods 0.000 description 12
- 238000010438 heat treatment Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 4
- 230000011664 signaling Effects 0.000 description 4
- 235000008694 Humulus lupulus Nutrition 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 101100435070 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) APN2 gene Proteins 0.000 description 2
- 101100268779 Solanum lycopersicum ACO1 gene Proteins 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000012856 packing Methods 0.000 description 2
- 150000003071 polychlorinated biphenyls Chemical class 0.000 description 2
- LHMQDVIHBXWNII-UHFFFAOYSA-N 3-amino-4-methoxy-n-phenylbenzamide Chemical compound C1=C(N)C(OC)=CC=C1C(=O)NC1=CC=CC=C1 LHMQDVIHBXWNII-UHFFFAOYSA-N 0.000 description 1
- 101100513046 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) eth-1 gene Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013403 standard screening design Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 239000000454 talc Substances 0.000 description 1
- 229910052623 talc Inorganic materials 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/40—Constructional details, e.g. power supply, mechanical construction or backplane
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/10—Packet switching elements characterised by the switching fabric construction
- H04L49/101—Packet switching elements characterised by the switching fabric construction using crossbar or matrix
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Small-Scale Networks (AREA)
- Multi Processors (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Description
本申請案依專利法主張2010年6月7日申請之標題名稱為「System and Method for High-Performance,Low-Power Data Center Interconnect Fabric」之美國專利申請案第12/794,996號之優先權,該案之全部內容以引用之方式併入本文中。另外,本專利申請案依專利法主張2010年9月16日申請之標題名稱為「Performance and Power Optimized Computer System Architectures and Methods Leveraging Power Optimized Tree Fabric Interconnect」之美國臨時專利申請案第61/383,585號之權益,該案之全部內容以引用之方式併入本文中。The present application claims priority from U.S. Patent Application Serial No. 12/794,996, filed on Jun. 7, 2010, entitled, <RTIgt;<RTIID=0.0>> The entire contents of this application are incorporated herein by reference. In addition, the present patent application claims the U.S. Provisional Patent Application No. 61/383,585, entitled "Performance and Power Optimized Computer System Architectures and Methods Leveraging Power Optimized Tree Fabric Interconnect", filed on September 16, 2010. The entire contents of this application are hereby incorporated by reference.
本發明係有關於有效利用功率最佳化樹組織互連結構之效能與功率最佳化電腦系統架構與方法。The present invention relates to a computer system architecture and method for efficient performance and power optimization for efficient use of power optimized tree organization interconnect structures.
第1圖及第2圖圖示如當前熟知的傳統資料中心網路聚合。第1圖圖示典型網路資料中心架構100之簡圖,其中頂層交換器101a-n處於支架102a-n之頂部,支架102a-n裝滿刀鋒型伺服器107a-n,刀鋒型伺服器107a-n散佈有本地端路由器103a-f。105a-b及額外支架單元108a-n含有額外伺服器104e-k及路由器106a-g。第2圖圖示具有佈置於邊緣路由器系統112a-h周圍之周邊伺服器111a-bn之系統的示例性實體視圖110,邊緣路由器系統112a-h置放於中心定位核心交換系統113周圍。通常,此聚合110具有自支架伺服器至該等支架伺服器之架頂式交換器之1-Gb乙太網路,及通常至邊緣及核心路由器之10 Gb乙太網路埠。Figures 1 and 2 illustrate a conventional data center network aggregation as is currently known. 1 is a simplified diagram of a typical network data center architecture 100 in which top-level switches 101a-n are on top of brackets 102a-n, brackets 102a-n are filled with blade-type servers 107a-n, blade-type servers 107a -n is distributed with local routers 103a-f. 105a-b and additional bracket units 108a-n contain additional servers 104e-k and routers 106a-g. 2 illustrates an exemplary physical view 110 of a system having peripheral servers 111a-bn disposed around edge router systems 112a-h, with edge router systems 112a-h placed around central positioning core switching system 113. Typically, this aggregation 110 has a 1-Gb Ethernet network from the rack server to the top-of-rack switches of the rack servers, and typically 10 Gb Ethernet ports to the edge and core routers.
依據本發明之一實施例,係特地提出一種計算裝置,該計算裝置包含:多個伺服器節點,其中每一伺服器節點皆包括彼此互連之一處理器、一記憶體、一輸入/輸出電路及一組織交換器;藉由多個組織鏈路將該等多個伺服器節點互連於一起的一組織交換器;以及一或更多乙太網路逸出口,該一或更多乙太網路逸出口來自該組織交換器,該一或更多乙太網路逸出口形成一功率最佳化伺服器組織。According to an embodiment of the present invention, a computing device is specifically provided, the computing device comprising: a plurality of server nodes, wherein each server node includes a processor, a memory, an input/output interconnected with each other a circuit and a tissue switch; a tissue switch interconnecting the plurality of server nodes by a plurality of organization links; and one or more Ethernet outlets, the one or more B The Ethernet out port is from the organization switch, and the one or more Ethernet outlets form a power optimized server organization.
依據本發明之另一實施例,係特地提出一種計算裝置,該計算裝置包含:一儲存裝置,該儲存裝置具有一形狀因子;一伺服器節點,其中該伺服器節點包括一處理器、一記憶體、一輸入/輸出電路、一交換組織及用於該儲存裝置之一或更多SATA介面,該伺服器節點具有與該儲存裝置相同之形狀因子。According to another embodiment of the present invention, a computing device is specifically provided, the computing device comprising: a storage device having a shape factor; a server node, wherein the server node includes a processor, a memory Body, an input/output circuit, an exchange organization, and one or more SATA interfaces for the storage device, the server node having the same form factor as the storage device.
依據本發明之又一實施例,係特地提出一種用於產生一高密度計算系統之方法,該方法包含以下步驟:提供一伺服器節點,該伺服器節點具有一處理器、一記憶體、一輸入/輸出電路、一交換組織及一或更多SATA介面;以及將該伺服器節點封裝進一硬碟驅動機之一形狀因子中。According to still another embodiment of the present invention, a method for generating a high-density computing system is specifically provided, the method comprising the steps of: providing a server node having a processor, a memory, and a An input/output circuit, an exchange organization, and one or more SATA interfaces; and packaging the server node into a form factor of a hard disk drive.
依據本發明之再一實施例,係特地提出一種用於產生一高密度計算系統之方法,該方法包含以下步驟:提供一標準形狀因子碟片驅動機;以及將一伺服器節點整合進該標準形狀因子碟片驅動機中,該伺服器節點具有一處理器、一記憶體、一輸入/輸出電路、一交換組織及一或更多SATA介面,其中在該標準形狀因子碟片驅動機內提供整合式計算能力。In accordance with still another embodiment of the present invention, a method for generating a high density computing system is provided, the method comprising the steps of: providing a standard form factor disc drive; and integrating a server node into the standard In a form factor disc drive machine, the server node has a processor, a memory, an input/output circuit, an exchange organization, and one or more SATA interfaces, wherein the standard form factor disc drive is provided Integrated computing power.
依據本發明之另一實施例,係特地提出一種計算裝置,該計算裝置包含:一電路板;安裝在該電路板上之一或更多動態記憶體晶片;安裝至該電路板的一或更多計算晶片;安裝至該電路板的一或更多快閃記憶體晶片;其中該電路板係垂直安裝,以便該一或更多快閃記憶體晶片處於該一或更多計算晶片下方,且該一或更多動態記憶體晶片處於該一或更多計算晶片上方;該垂直安裝之電路板之一煙囪式冷卻器。In accordance with another embodiment of the present invention, a computing device is provided, the computing device comprising: a circuit board; one or more dynamic memory chips mounted on the circuit board; one or more mounted to the circuit board a multi-computing wafer; one or more flash memory chips mounted to the circuit board; wherein the circuit board is mounted vertically such that the one or more flash memory wafers are below the one or more computing wafers, and The one or more dynamic memory chips are above the one or more computing wafers; one of the vertically mounted circuit boards is a chimney cooler.
依據本發明之又一實施例,係特地提出一種計算裝置,該計算裝置包含:一或更多處理器;連接至該一或更多處理器之一匯流排組織;一組織交換器,該組織交換器連接至該匯流排組織,該組織交換器將資料自該計算裝置輸出至一或更多埠;以及一或更多路由標頭處理器,其中每一路由標頭處理器用來依路由傳遞一特定傳送串流,以便該組織交換器處理不同傳送串流。According to still another embodiment of the present invention, a computing device is provided, the computing device comprising: one or more processors; a bus bar organization connected to one of the one or more processors; an organization switch, the organization A switch is connected to the bus organization, the organization switch outputs data from the computing device to one or more ports; and one or more routing header processors, wherein each routing header processor is configured to be routed A particular transport stream is streamed so that the organization switch handles different transport streams.
依據本發明之再一實施例,係特地提出一種計算裝置,該計算裝置包含:一或更多處理器;連接至該一或更多處理器之一匯流排組織;一組織交換器,該組織交換器連接至該匯流排組織,該組織交換器將資料自該計算裝置輸出至一或更多埠;連接於該匯流排組織與該交換組織之間的一匯流排協定橋接器;以及一或更多路由標頭處理器,其中每一路由標頭處理器用來依路由傳遞一特定傳送串流,以便該組織交換器處理不同傳送串流。According to still another embodiment of the present invention, a computing device is provided, the computing device comprising: one or more processors; a bus bar organization connected to one of the one or more processors; an organization switch, the organization a switch connected to the bus organization, the organization switch outputting data from the computing device to one or more ports; a bus bar protocol bridge connected between the bus bar organization and the exchange organization; and one or More routing header processors, where each routing header processor is used to route a particular transport stream by routing so that the organization switch handles different transport streams.
依據本發明之另一實施例,係特地提出一種用於交換不同傳送串流之方法,該方法包含以下步驟:提供一或更多處理器及連接至該一或更多處理器之一匯流排組織;提供連接至該匯流排組織之一組織交換器,該組織交換器將資料自計算裝置輸出至一或更多埠;以及使用一或更多路由標頭處理器來交換一特定傳送串流,以便該組織交換器處理不同傳送串流。In accordance with another embodiment of the present invention, a method for exchanging different transport streams is specifically provided, the method comprising the steps of: providing one or more processors and connecting to one of the one or more processors Organizing; providing an organization switch connected to the bus organization, the organization switch outputting data from the computing device to one or more ports; and using one or more routing header processors to exchange a particular transport stream So that the organization switch handles different transport streams.
依據本發明之又一實施例,係特地提出一種使用一交換組織進行負載平衡之方法,該方法包含以下步驟:提供一伺服器節點,該伺服器節點具有一或更多處理器;連接至該一或更多處理器之一匯流排組織;連接至該匯流排組織之一組織交換器,該組織交換器將資料自該計算裝置輸出至一或更多埠;以及連接至該組織交換器之一IP虛擬伺服器;接收一進入請求;將該進入請求依路由傳遞至連接至該組織交換器之該IP虛擬伺服器;使用連接至該組織交換器之該IP虛擬伺服器產生針對該組織之一特定節點之一路由標頭;將該進入請求轉發至該特定節點;以及使用該特定節點處理該進入請求以提供負載平衡。According to still another embodiment of the present invention, a method for load balancing using an exchange organization is specifically proposed, the method comprising the steps of: providing a server node having one or more processors; connecting to the One of the one or more processors' busbar organization; one of the busbar organizations connected to the organization, the organization switch outputting data from the computing device to one or more ports; and connecting to the organization switch An IP virtual server; receiving an incoming request; routing the incoming request to the IP virtual server connected to the organization switch; generating the target for the organization using the IP virtual server connected to the organization switch One of a particular node routing a header; forwarding the incoming request to the particular node; and processing the incoming request using the particular node to provide load balancing.
依據本發明之再一實施例,係特地提出一種使用一交換組織進行處理之方法,該方法包含以下步驟:提供一伺服器節點,該伺服器節點具有一或更多處理器;連接至該一或更多處理器之一匯流排組織;連接至該匯流排組織之一組織交換器,該組織交換器將資料自計算裝置輸出至一或更多埠;以及連接至該組織交換器之一開放流裝置;接收一進入請求;將該進入請求依路由傳遞至連接至該組織交換器之該開放流裝置;使用該開放流裝置產生針對該組織之一特定節點之一路由標頭;將該進入請求轉發至該特定節點;使用該特定節點處理該進入請求以提供負載平衡;以及將該經處理進入請求送回至該開放流裝置。According to still another embodiment of the present invention, a method for processing using an exchange organization is provided, the method comprising the steps of: providing a server node having one or more processors; connecting to the one One or more processors, a bus organization; one of the bus organizations connected to the organization, the organization switch outputs data from the computing device to one or more ports; and is connected to one of the organization switches a streaming device; receiving an incoming request; routing the incoming request to the open flow device connected to the organization switch; using the open flow device to generate a routing header for one of the specific nodes of the organization; The request is forwarded to the particular node; the incoming request is processed using the particular node to provide load balancing; and the processed incoming request is sent back to the open flow device.
依據本發明之另一實施例,係特地提出一種計算裝置,該計算裝置包含:一或更多處理器;連接至該一或更多處理器之一匯流排組織;連接至該匯流排組織之一組織交換器,該組織交換器將資料自該計算裝置輸出至一或更多埠;連接至該匯流排組織之一PCIe介面;以及使用該PCIe介面連接至該計算裝置之一外部處理器。According to another embodiment of the present invention, a computing device is provided, the computing device comprising: one or more processors; a busbar organization connected to one of the one or more processors; and connected to the busbar organization An organization switch that outputs data from the computing device to one or more ports; to a PCIe interface of the bus bar organization; and to the external processor of the computing device using the PCIe interface.
依據本發明之又一實施例,係特地提出一種計算裝置,該計算裝置包含:一組織交換器,該組織交換器將資料自該計算裝置輸出至一或更多埠;連接至該組織交換器之一乙太網路埠;以及使用一乙太網路介面連接至該計算裝置之一外部處理器。In accordance with yet another embodiment of the present invention, a computing device is provided, the computing device comprising: a tissue switch that outputs data from the computing device to one or more ports; and connects to the tissue switch One of the Ethernet ports; and an external processor connected to the computing device using an Ethernet interface.
第1圖及第2圖圖示典型資料中心網路聚合;Figure 1 and Figure 2 illustrate typical data center network aggregation;
第3圖圖示根據一個實施例之使用伺服器之網路聚合;Figure 3 illustrates network aggregation using a server in accordance with one embodiment;
第4圖圖示根據一個實施例之支架中之資料中心;Figure 4 illustrates a data center in a stent in accordance with one embodiment;
第5圖圖示具有交換組織之網路系統之高階拓撲;Figure 5 illustrates a high-level topology of a network system with an exchange organization;
第6圖圖示伺服器板,該伺服器板組成多個伺服器節點,該多個伺服器節點與所述點對點互連結構互連;Figure 6 illustrates a server board that constitutes a plurality of server nodes interconnected with the point-to-point interconnect structure;
第6a圖-第6c圖圖示組織拓撲之另一實例;Figure 6a - Figure 6c illustrate another example of a tissue topology;
第7圖圖示被動底板之實例,該被動底板連接至一或更多節點板及兩個聚合板;Figure 7 illustrates an example of a passive backplane connected to one or more node boards and two polymeric boards;
第8圖圖示延伸組織越過架子及鏈接架子越過伺服器支架之實例;Figure 8 illustrates an example of an extended tissue crossing the shelf and the link shelf over the server bracket;
第9a圖圖示具有碟片形狀因子之示例性伺服器700;Figure 9a illustrates an exemplary server 700 having a disc form factor;
第9b圖及第9c圖圖示根據一個實施例之碟片-伺服器組合之示例性陣列,該碟片-伺服器組合使用儲存伺服器1節點SATA板;Figures 9b and 9c illustrate an exemplary array of disc-server combinations in accordance with one embodiment, the disc-server combination using a storage server 1-node SATA board;
第9d圖圖示標準3.5吋驅動機;Figure 9d illustrates a standard 3.5” drive;
第9e圖圖示標準3.5吋碟片驅動機形狀因子中之多個伺服器節點之實施;Figure 9e illustrates the implementation of a plurality of server nodes in a standard 3.5" disc drive form factor;
第10圖圖示與儲存器深入整合之伺服器之實施;Figure 10 illustrates the implementation of a server that is deeply integrated with the storage;
第11圖圖示有效利用現有3.5吋JBOD儲存盒之儲存器與伺服器之緊密堆積的實施;Figure 11 illustrates an implementation of the close packing of the storage and server of an existing 3.5" JBOD storage box;
第12圖圖示在2.5吋驅動機之相同形狀因子中例證之伺服器節點的實施;Figure 12 illustrates the implementation of a server node exemplified in the same form factor of a 2.5" drive;
第13圖圖示支架煙囪冷卻之實施;Figure 13 illustrates the implementation of the bracket chimney cooling;
第13a圖圖示用於第13圖中所示之煙囪支架冷卻中之熱對流的示例性說明;Figure 13a illustrates an exemplary illustration of the heat convection used in the cooling of the chimney bracket shown in Figure 13;
第14圖圖示伺服器節點,該等伺服器節點以相對於彼此成對角之方式置放,以最小化越過伺服器節點之自熱;Figure 14 illustrates server nodes that are placed diagonally relative to one another to minimize self-heating across the server node;
第15圖圖示根據一個實施例之示例性16節點系統,其中熱浪自印刷電路板上升;Figure 15 illustrates an exemplary 16-node system in accordance with one embodiment in which heat waves rise from a printed circuit board;
第16圖圖示具有類似地經佈置以最小化越過節點之自熱之節點的16節點系統之較高密度變體;Figure 16 illustrates a higher density variant of a 16-node system having nodes similarly arranged to minimize self-heating across nodes;
第17圖圖示伺服器節點組織交換器之內部架構;Figure 17 illustrates the internal architecture of the server node organization switch;
第18圖圖示伺服器節點,該伺服器節點包括PCIe控制器,該PCIe控制器連接至內部CPU匯流排組織;Figure 18 illustrates a server node that includes a PCIe controller that is coupled to an internal CPU bus organization;
第18a圖圖示具有使用組織交換器之多個協定橋接器的系統;Figure 18a illustrates a system with multiple protocol bridges using a tissue switch;
第19圖圖示伺服器組織與網路處理器之整合;Figure 19 illustrates the integration of server organization and network processor;
第20圖圖示組織交換器及FPGA,該FPGA提供諸如網際網路協定虛擬伺服器(Internet Protocol Virtual Server;IPVS)之服務;Figure 20 illustrates an organization switch and an FPGA that provides services such as the Internet Protocol Virtual Server (IPVS);
第21圖圖示將開放流(OpenFlow)流程處理構建為Calxeda組織之方法;Figure 21 illustrates a method of constructing OpenFlow process processing as a Calxeda organization;
第22圖圖示功率最佳化組織交換器經由PCIe整合至現有處理器之一個實例;以及Figure 22 illustrates an example of a power optimized organization switch integrated into an existing processor via PCIe;
第23圖圖示功率最佳化組織交換器經由乙太網路整合至現有處理器之一個實例。Figure 23 illustrates an example of a power optimized organization switch integrated into an existing processor via Ethernet.
本發明揭示有效利用功率最佳化樹組織互連結構之效能與功率最佳化電腦系統架構與方法。一個實施例使用瓦片式(tile)構建塊來構建有效利用組織之低功率伺服器叢集,而另一實施例實施儲存解決方案或冷卻解決方案。另一實施例使用組織交換其他事物。The present invention discloses an architecture and method for efficiently optimizing the performance and power optimization of a power-optimized tree organization interconnect structure. One embodiment uses a tile building block to build a low power server cluster that effectively utilizes the organization, while another embodiment implements a storage solution or cooling solution. Another embodiment uses an organization to exchange other things.
同在申請中之專利申請案第12/794,996號描述功率最佳化伺服器通訊組織之架構,該功率最佳化伺服器通訊組織使用樹狀拓撲或圖形拓撲支援路由,該樹狀拓撲或圖形拓撲在拓撲內每一節點支援多個鏈路,其中每一鏈路係指定為上行鏈路、下行鏈路或橫向鏈路。系統使用分段MAC架構,該分段MAC架構可具有重新目的化用於內部MAC及外部MAC之MAC IP位址,及有效利用通常將為用於MAC之實體信號傳遞以饋送至交換器之機制的方法。Calxeda XAUI系統互連結構減小功率、導線及支架之大小。在個別伺服器上不需要高功率、昂貴的乙太網路交換器及高功率乙太網路實體層(Phy)。此舉顯著減少電纜(電纜複雜性、成本、故障之重要來源)。此舉亦賦能支架內之異質伺服器混合,從而支援使用乙太網路或SATA或PCIe之任何設備。在此架構中,功率節省主要來自兩個架構態樣:1)越過組織之乙太網路實體層之最小化,在節點之間以點對點XAUI互連結構替換該等乙太網路實體層,及2)基於負載動態調整鏈路之XAUI寬度及速度之能力。The architecture of the power optimization server communication organization is described in the patent application Serial No. 12/794,996, the power optimization server communication organization uses a tree topology or a graphics topology to support routing, the tree topology or graphics The topology supports multiple links per node within the topology, with each link designated as an uplink, downlink, or lateral link. The system uses a segmented MAC architecture that can have a MAC IP address repurposed for internal MAC and external MAC, and effectively utilizes mechanisms that will typically be used for MAC physical signal delivery to the switch. Methods. The Calxeda XAUI system interconnect structure reduces power, wire and bracket size. No high power, expensive Ethernet switches and high power Ethernet physical layer (Phy) are required on individual servers. This significantly reduces the cable (cable complexity, cost, and an important source of failure). This also enables heterogeneous server mixing in the rack to support any device using Ethernet or SATA or PCIe. In this architecture, power savings come mainly from two architectural aspects: 1) minimization of the physical layer of the Ethernet across the organization, replacing the Ethernet entity layers with nodes in a point-to-point XAUI interconnect structure, And 2) the ability to dynamically adjust the XAUI width and speed of the link based on the load.
第3圖圖示網路聚合200。此網路支援聚合路由器202與三個支架203a-c之間的10-Gb/sec乙太網路通訊201(粗線)。在支架203a中,Calxeda互連結構組織提供支架內之架子上之伺服器206a-d之間的多個高速10 Gb路徑,該等路徑由粗線表示。伺服器206a-d中之嵌式交換器可替換架頂式(top-of-rack)交換器,因此節省大量功率及成本,同時仍提供10 Gb乙太網路埠至聚合路由器。Calxeda交換組織可將傳統乙太網路(1 Gb或10 Gb)整合至Calxeda XAUI組織中,且Calxeda伺服器可充當用於第三方乙太網路連接伺服器之架頂式交換器。Figure 3 illustrates network aggregation 200. This network supports 10-Gb/sec Ethernet communication 201 (thick line) between the aggregation router 202 and the three brackets 203a-c. In the bracket 203a, the Calxeda interconnect structure provides a plurality of high speed 10 Gb paths between the servers 206a-d on the shelves within the rack, the paths being indicated by thick lines. The embedded switch in servers 206a-d replaces the top-of-rack switch, thus saving a lot of power and cost while still providing 10 Gb Ethernet to the aggregation router. The Calxeda Exchange organization can integrate legacy Ethernet (1 Gb or 10 Gb) into the Calxeda XAUI organization, and the Calxeda server can act as a top-of-rack switch for third-party Ethernet connectivity servers.
中間支架203b圖示另一情況,其中Calxeda伺服器206e、206f可整合至現有資料中心支架中,該等現有資料中心支架含有架頂式交換器208a。在此狀況下,IT群組可持續使該IT群組之其他伺服器經由1 Gb乙太網路向上連接至現有架頂式交換器。可經由Calxeda 10 Gb XAUI組織連接Calxeda內部伺服器,且可使用1 Gb或10 Gb乙太網路互連結構將該等Calxeda內部伺服器向上整合至現有架頂式交換器。右側之支架203c為以傳統方式佈署資料中心支架之當前方式。細紅線表示1 Gb乙太網路。因此,傳統上,資料中心支架之當前部署為1 Gb乙太網路向上連接至架頂式交換器308b,且然後10 Gb(粗紅線201)自架頂式交換器輸出至聚合路由器。應注意,所有伺服器以未知數量存在,然而在此出於清晰性及簡明性之目的以有限數量來圖示該等伺服器。又,藉由使用加強Calxeda伺服器,不需要額外路由器,因為該等加強Calxeda伺服器操作自己的XAUI交換組織,如下文所述。The intermediate bracket 203b illustrates another situation in which the Calxeda servers 206e, 206f can be integrated into existing data center racks that contain overhead switches 208a. In this case, the IT group can continue to connect other servers of the IT group up to the existing top-of-rack switch via 1 Gb Ethernet. The Calxeda internal server can be connected via the Calxeda 10 Gb XAUI organization and the Calxeda internal servers can be integrated up to the existing top-of-rack switch using a 1 Gb or 10 Gb Ethernet interconnect structure. The bracket 203c on the right side is the current way of deploying the data center bracket in a conventional manner. The thin red line indicates 1 Gb Ethernet. Thus, traditionally, the data center rack is currently deployed with a 1 Gb Ethernet connection up to the top-of-rack switch 308b, and then 10 Gb (thick red line 201) is output from the top-of-rack switch to the aggregation router. It should be noted that all servers exist in unknown quantities, however, such servers are illustrated in limited numbers for purposes of clarity and conciseness. Also, by using the enhanced Calxeda server, no additional routers are needed, as these enhance the Calxeda server to operate its own XAUI exchange organization, as described below.
第4圖圖示根據一個實施例之示例性「支架中資料中心」400之概觀。支架中資料中心400具有10-Gb乙太網路實體層401a-n及1-Gb專用乙太網路實體層402。大型電腦(功率伺服器)403a-n支援搜尋;資料探勘;索引;Hadoop、Java軟體框架;MapReduce,亦即由Google引入以支援電腦之叢集上之大型資料集上之分散式計算的軟體框架;雲端應用程式等。具有區域快閃記憶體及/或固態碟片(solid-state disk;SSD)之電腦(伺服器)404a-n支援搜尋、MySQL、CDN、軟體即服務(software-as-a-service;SaaS)、雲端應用程式等。單一大型低速風扇405擴充單一大型低速風扇405上方垂直安裝之伺服器之對流冷卻。資料中心400具有例如集束碟片(Just a Bunch of Disks;JBOD)配置中之硬碟,及任擇地碟片形狀因子中之Calxeda伺服器(陣列406及407中之綠框)之陣列406,該等Calxeda伺服器任擇地充當碟片控制器。硬碟伺服器或Calxeda碟片伺服器可用於網路伺服器、使用者應用程式及雲端應用程式等。亦圖示儲存伺服器之陣列407,及具有用於舊有應用程式之標準乙太網路介面之歷史伺服器408a、408b(任何大小、任何供應商)。FIG. 4 illustrates an overview of an exemplary "in-bracket data center" 400 in accordance with one embodiment. The in-stand data center 400 has a 10-Gb Ethernet physical layer 401a-n and a 1-Gb dedicated Ethernet physical layer 402. Large computer (power server) 403a-n support search; data exploration; index; Hadoop, Java software framework; MapReduce, which is a software framework introduced by Google to support distributed computing on large data sets on a cluster of computers; Cloud applications, etc. Computer (server) 404a-n with regional flash memory and/or solid-state disk (SSD) supports search, MySQL, CDN, software-as-a-service (SaaS) , cloud applications, etc. A single large low speed fan 405 expands the convection cooling of a vertically mounted server above a single large low speed fan 405. The data center 400 has an array 406 of, for example, a hard disk in a Just a Bunch of Disks (JBOD) configuration, and optionally a Calxeda server (a green frame in arrays 406 and 407) of the disc form factor. These Calxeda servers optionally act as disc controllers. The hard disk server or Calxeda disc server can be used for web servers, user applications, and cloud applications. An array 407 of storage servers is also illustrated, as well as historical servers 408a, 408b (any size, any vendor) having a standard Ethernet interface for legacy applications.
第5圖圖示在同在申請中之專利申請案第12/794,996號中描述之網路系統之高階拓撲500,該案說明由交換組織連接之XAUI連接SoC節點。10 Gb乙太網路埠Eth0 501a及Eth1 501b來自樹之頂部。卵形502a-n為Calxeda節點,該等Calxeda節點包含計算處理器及嵌式交換器兩者。節點具有連接至內部交換器之五個XAUI鏈路。交換層將所有五個XAUI鏈路用於交換。層次0葉節點502d、502e(亦即,N0n節點或Nxy,其中x=層次且y=項目號碼)僅使用一個XAUI鏈路以附接至互連結構,從而留下可用作附接至I/O之XAUI、10 Gb以太網路、PCIe、SATA等之四個高速埠。絕大多數樹及粗樹僅具有如葉節點之有效節點,而其他節點為純交換節點。此方法使得路由更加直接。拓撲500具有靈活性以容許每一節點成為組合計算及交換節點,或僅為交換節點。大多數樹類型實施在葉節點上具有I/O,但是拓撲500允許I/O處於任何節點上。通常,將乙太網路置放於樹之頂部可最小化至乙太網路之中繼段之平均數目。Figure 5 illustrates a high-order topology 500 of the network system described in the co-pending patent application Serial No. 12/794,996, which is incorporated herein by reference. 10 Gb Ethernet 埠 Eth0 501a and Eth1 501b come from the top of the tree. The ovals 502a-n are Calxeda nodes, which include both computational processors and embedded switches. The node has five XAUI links connected to the internal switch. The switch layer uses all five XAUI links for switching. Hierarchy 0 leaf nodes 502d, 502e (ie, N0n nodes or Nxy, where x=hierarchy and y=item number) use only one XAUI link to attach to the interconnect structure, leaving left to be attached to I /O's XAUI, 10 Gb Ethernet, PCIe, SATA and other four high-speed ports. Most trees and thick trees only have valid nodes like leaf nodes, while other nodes are pure switching nodes. This approach makes routing more straightforward. Topology 500 has the flexibility to allow each node to be a combined computing and switching node, or just a switching node. Most tree type implementations have I/O on leaf nodes, but topology 500 allows I/O to be on any node. Typically, placing Ethernet on top of the tree minimizes the average number of hops to the Ethernet.
第6圖圖示伺服器板,該伺服器板組成多個伺服器節點,該多個伺服器節點與所述點對點互連結構互連。伺服器板具有:Figure 6 illustrates a server board that constitutes a plurality of server nodes that are interconnected with the point-to-point interconnect structure. The server board has:
●此圖表中之卵形中之每一者皆為獨立伺服器節點,該獨立伺服器節點包括處理器、記憶體、I/O及組織交換器。• Each of the ovals in this graph is a standalone server node that includes a processor, memory, I/O, and organization switch.
●組織交換器具有獨立地動態修改用於每一鏈路之每一路徑之寬度(路徑之數目)及速度的能力。• The organization switch has the ability to dynamically modify the width (number of paths) and speed of each path for each link independently.
●14節點板實例圖示出自組織的兩個乙太網路逸出口。將通常使此等乙太網路逸出口路由至標準乙太網路交換器或路由器。此等乙太網路逸出口可為標準1 Gb或10 Gb乙太網路。The 14-node board example illustrates two self-organizing Ethernet outlets. These Ethernet egress ports will typically be routed to a standard Ethernet switch or router. These Ethernet outlets can be standard 1 Gb or 10 Gb Ethernet.
●14節點實例拓撲為蝶形粗樹,該蝶形粗樹提供冗餘路徑以允許適應性路由至故障周圍之路線及定位熱點周圍之路線。The 14-node instance topology is a butterfly-shaped thick tree that provides redundant paths to allow adaptive routing to routes around faults and to locate routes around hotspots.
●3節點聚合器板允許大型伺服器組織僅由兩個板瓦片組成。The 3-node aggregator board allows large server organizations to consist of only two tile tiles.
■出於冗餘,添加第二聚合器■ Adding a second aggregator for redundancy
■輸入/輸出:■Input / Output:
●用於滑石(smooth-stone)組織之PCIe連接器● PCIe connector for smooth-stone organization
●任擇乙太網路支援(關閉、1 Gb、2 Gb、5 Gb、10 Gb或20 Gb)●Optional Ethernet support (off, 1 Gb, 2 Gb, 5 Gb, 10 Gb or 20 Gb)
■基於應用程式所需頻寬之乙太網路決策■Ethernet decision based on the bandwidth required by the application
●聚合器板上之節點可僅為交換節點,或為包括交換之全計算節點。The nodes on the aggregator board can be only switching nodes, or full computing nodes including switches.
●板輸入/輸出可為PCIe連接器及/或任擇乙太網路支援(關閉、1 Gb、2 Gb、10 Gb或20 Gb),該PCIe連接器支援兩個x4 XAUI(2個滑石組織鏈路)。● Board input/output can be PCIe connector and / or optional Ethernet support (off, 1 Gb, 2 Gb, 10 Gb or 20 Gb), the PCIe connector supports two x4 XAUI (2 talc organizations) link).
●類似14節點實例之示例性組織拓撲最小化橫跨板之鏈路之數目,以最小化連接器(大小及數目)及關聯成本,同時仍保持乙太網路逸出口及多路徑冗餘。An exemplary organizational topology like a 14-node instance minimizes the number of links across the board to minimize connector (size and number) and associated costs while still maintaining Ethernet egress and multipath redundancy.
●當延伸組織時兩個聚合器板可用來達成路徑冗餘。• Two aggregator boards can be used to achieve path redundancy when extending the tissue.
●使用靜態鏈路配置可達成功率節省● Power savings can be achieved using static link configuration
○圖中之下層節點(記為葉節點)可以1 Gb/sec執行。○ The lower layer nodes (denoted as leaf nodes) in the figure can be executed at 1 Gb/sec.
○圖中之第一層交換節點(記為層1交換器)然後將具有來自葉節點之3 Gb/sec之輸入頻寬。此舉允許層1交換器與層2交換器之間2.5 Gb/sec或5 Gb/sec之靜態鏈路配置。o The first layer of switching nodes (denoted as layer 1 switches) in the figure will then have an input bandwidth of 3 Gb/sec from the leaf nodes. This allows a static link configuration of 2.5 Gb/sec or 5 Gb/sec between the Layer 1 switch and the Layer 2 switch.
○然後,延伸離開層2交換層之鏈路可以10 Gb/sec執行。o Then, the link extending away from the layer 2 switching layer can be executed at 10 Gb/sec.
○ 在此拓撲中,由於大多數節點為葉節點,故大多數鏈路以最慢速率(在此實例中1 Gb/sec)執行,因此最小化網路連接功率消耗。○ In this topology, since most nodes are leaf nodes, most links are executed at the slowest rate (1 Gb/sec in this example), thus minimizing network connection power consumption.
○ 允許將乙太網路逸出口在組織中之任何節點處拉出,從而允許組織設計者折衷乙太網路逸出口之所需頻寬、由架頂式交換器使用之埠之數目及與乙太網路埠相關聯之成本與功率。○ Allows the Ethernet egress to be pulled out at any node in the organization, allowing the organization designer to compromise the required bandwidth of the Ethernet exit, the number of eaves used by the top-of-rack switch, and The cost and power associated with Ethernet.
●可經由鏈路利用所驅動之動態鏈路配置進一步最佳化功率節省。在此實例中,每一鏈路及組織交換器之關聯埠含有頻寬計數器,該等頻寬計數器具有可配置臨界值事件,該等可配置臨界值事件允許基於動態鏈路利用重新配置上行鏈路寬度及速度,及下行鏈路寬度及速度。• Power savings can be further optimized via the link utilizing the driven dynamic link configuration. In this example, each link and organization switch association has a bandwidth counter with configurable threshold events that allow reconfiguration of the uplink based on dynamic link utilization Road width and speed, and downlink width and speed.
●由於在許多共用伺服器使用狀況下,乙太網路訊務主要為節點至外部乙太網路而非節點至節點,因此所提出之樹組織結構(且尤其蝶形粗樹實例)最小化越過組織至乙太網路之中繼段的數目,因此最小化潛時。此舉允許建立乙太網路之大型低潛時組織,同時利用具有相對小(在此實例中為5)數目之交換埠的交換器。● Since the Ethernet traffic is mainly from node to external Ethernet instead of node to node under many shared server usage conditions, the proposed tree organization (and especially the butterfly thick tree instance) is minimized. The number of hops that traverse the organization to the Ethernet network is minimized, thus minimizing latency. This allows the establishment of a large low-latency organization of the Ethernet network while utilizing a switch with a relatively small number of exchanges (in this example 5).
●第2圖中之伺服器209a之整合圖示所定義伺服器組織之另一新穎系統使用。在此狀況下,為利用伺服器組織之效能及功率管理,且為最小化架頂式交換器上之埠利用,此圖圖示現有伺服器異質整合至所定義伺服器組織上,以使得來自現有伺服器之乙太網路訊務可經由閘道進入組織中,從而允許與組織內之節點通訊,並使209a乙太網路訊務經由組織載運至上行鏈路乙太網路埠201。The integration of the server 209a in Figure 2 illustrates the use of another novel system of server organization defined. In this case, to exploit the performance and power management of the server organization, and to minimize the use of overhead on the top-of-rack switch, this figure illustrates the heterogeneous integration of existing servers onto the defined server organization so that The existing server's Ethernet traffic can enter the organization via the gateway, allowing communication with nodes within the organization and 209a Ethernet traffic to be carried over the organization to the uplink Ethernet port 201.
第6a圖-第6c圖圖示組織拓撲之另一實例,該組織拓撲為由連接至系統板中之12個卡組成之四十八節點組織拓撲,其中每一卡含有4個節點。此拓撲提供一些冗餘鏈路,但是沒有重大冗餘。拓撲具有四個乙太網路閘道逸出口且此等乙太網路閘道逸出口中之每一者皆可為1 Gb或10 Gb,但是並非需要使用或連接所有此等乙太網路閘道。在所示實例中,自四節點卡引出八個組織鏈路,且在一個實例中,PCIe x16連接器用來自卡引出4個組織鏈路。Figure 6a - Figure 6c illustrate another example of an organization topology that is a forty-eight node organization topology consisting of 12 cards connected to a system board, where each card contains 4 nodes. This topology provides some redundant links but no major redundancy. The topology has four Ethernet gateway escape ports and each of these Ethernet gateway escape ports can be 1 Gb or 10 Gb, but does not require the use or connection of all such Ethernet networks. Gateway. In the illustrated example, eight tissue links are taken from a four-node card, and in one example, the PCIe x16 connector uses four slave links from the card.
1. 伺服器樹組織允許越過伺服器互連結構組織之任意數目之乙太網路逸出口,以最小化所使用乙太網路實體層之數目,以節省與乙太網路實體層相關聯之功率及成本、關聯電纜,及在架頂式乙太網路交換器/路由器上消耗之埠。1. The server tree organization allows any number of Ethernet egress exits across the server interconnect fabric to minimize the number of Ethernet entities used to save association with the Ethernet physical layer Power and cost, associated cables, and consumption on top-of-rack Ethernet switches/routers.
2. 交換節點可為藉由斷開計算子系統來節省功率之純交換節點,或該等交換節點可用作包括組織交換之完全計算子系統。參閱第17圖,在一個實施中,多個功率域用來將計算子系統(方塊905)與管理處理器(方塊906)及組織交換器(剩餘方塊)分離。此舉允許將SOC配置為電源關閉計算子系統(方塊905),保持方塊906中之管理處理,且由組織交換器來進行硬體封包交換及路由。2. The switching node can be a pure switching node that conserves power by disconnecting the computing subsystem, or the switching nodes can be used as a full computing subsystem including organizational switching. Referring to Figure 17, in one implementation, multiple power domains are used to separate the computing subsystem (block 905) from the management processor (block 906) and the organization switch (remaining blocks). This allows the SOC to be configured as a power off calculation subsystem (block 905), maintaining the management process in block 906, and hardware packet exchange and routing by the organization switch.
3. 蝶形粗樹拓撲伺服器組織提供板內最少數目之鏈路(節省功率及成本)、橫跨板之最少數目之鏈路(節省功率及成本),同時允許板內及越過板之冗餘鏈路路徑。3. The Butterfly-Shaped Tree Topology Server organization provides the minimum number of links in the board (saving power and cost), the minimum number of links across the board (saving power and cost), while allowing redundancy within and across boards. The remaining link path.
4. 所提出之底板及聚合器板允許可擴充故障恢復伺服器組織僅由兩個板構建塊組成。4. The proposed backplane and aggregator board allows the scalable failover server organization to consist of only two board building blocks.
5. 樹導向伺服器組織及類似示例性蝶形粗樹之變體允許可由該節點之子節點之聚合頻寬決定的靜態鏈路寬度及速度規格,從而允許容易的鏈路配置,同時最小化互連結構功率。5. Tree-directed server organization and variants of similar exemplary butterfly-shaped thick trees allow for static link width and speed specifications that can be determined by the aggregate bandwidth of the child nodes of the node, allowing for easy link configuration while minimizing mutual Connected structure power.
6. 可經由鏈路利用所驅動之動態鏈路配置進一步最佳化功率節省。在此實例中,每一鏈路及組織交換器之關聯埠含有頻寬計數器,該等頻寬計數器具有可配置臨界值事件,該等可配置臨界值事件允許基於動態鏈路利用重新配置上行鏈路寬度及速度,及下行鏈路寬度及速度。6. The power savings can be further optimized via the link utilizing the driven dynamic link configuration. In this example, each link and organization switch association has a bandwidth counter with configurable threshold events that allow reconfiguration of the uplink based on dynamic link utilization Road width and speed, and downlink width and speed.
7. 由於在許多共用伺服器使用狀況下,乙太網路訊務主要為節點至外部乙太網路而非節點至節點,故所提出之樹組織結構(且尤其蝶形粗樹實例)最小化越過組織至乙太網路之中繼段的數目,因此最小化潛時。此舉允許建立乙太網路之大型低潛時組織,同時利用具有相對小(在此實例中為5)數目之交換埠的交換器。7. Since the Ethernet traffic is mainly from node to external Ethernet instead of node to node under many shared server usage conditions, the proposed tree organization (and especially the butterfly thick tree instance) is the smallest. The number of hops that traverse the organization to the Ethernet network is minimized, thus minimizing latency. This allows the establishment of a large low-latency organization of the Ethernet network while utilizing a switch with a relatively small number of exchanges (in this example 5).
8. 允許異質伺服器整合至組織,從而自現有伺服器承載乙太網路訊務進入並穿過所定義伺服器通訊組織。8. Allow heterogeneous servers to be integrated into the organization, allowing Ethernet traffic from existing servers to enter and pass through defined server communication organizations.
現可組成此等板「瓦片」,以建構組織連接伺服器節點之架子及支架。第7圖圖示被動底板可如何連接8個14節點板及兩個聚合板以組成由236個伺服器節點構成之架子的實例。每一板可為例如對於6U而言8.7"高+機械高度<10.75",用於密度之交插熱槽及16個板適合19吋寬支架。底板可為簡單/廉價的,具有PCIe連接器及路由,其中路由可為非常簡單而無導線之XAUI信號(藍色及綠色)+功率。乙太網路連接圖示於8板聚合點處。These board "tiles" can now be formed to construct the shelves and brackets that connect the server nodes. Figure 7 illustrates an example of how a passive backplane can connect eight 14-node boards and two aggregation boards to form a shelf of 236 server nodes. Each plate can be, for example, 8.7" high + mechanical height < 10.75" for 6U, interspersed hot trough for density and 16 plates for 19" wide brackets. The backplane can be simple/cheap, with PCIe connectors and routing, where the routing can be a very simple and wireless XAUI signal (blue and green) + power. The Ethernet connection diagram is at the 8-plate aggregation point.
第8圖圖示延伸組織越過架子、鏈接架子越過伺服器支架之實例。可在組織中之任何節點處拉出乙太網路逸出口,在此實例中,自連接多節點刀鋒之被動互連結構底板拉出該等乙太網路逸出口。Figure 8 illustrates an example in which the extended tissue passes over the shelf and the link shelf passes over the server bracket. The Ethernet egress can be pulled out at any node in the organization, in this example, the Ethernet exits are pulled out from the passive interconnect fabric backplane that connects the multi-node blades.
1. 利用PCIe連接器將乙太網路逸出口及XAUI鏈路引離板以將板與點到點伺服器組織連接於一起,並非利用PCIe信號傳遞,但是將實體連接器用於板之功率及XAUI信號,同時維持用於故障切換及熱點減少之冗餘通訊路徑。1. Use the PCIe connector to route the Ethernet out port and the XAUI link away from the board to connect the board to the point-to-point server organization, not using PCIe signaling, but using the physical connector for the power of the board and The XAUI signal maintains redundant communication paths for failover and hotspot reduction.
2. 以完全被動底板形成之XAUI點對點伺服器互連結構組織。2. XAUI peer-to-peer server interconnect structure organized by a fully passive backplane.
3. 越過組織之乙太網路逸出口,該組織橫跨樹之每一層次處而非僅在樹之頂部處之支架。3. Cross the organization's Ethernet outlet, which spans every level of the tree, not just the top of the tree.
4. 可動態地賦能並去能越過組織之乙太網路逸出口,以匹配頻寬與最佳化功率使用。4. Can be dynamically enabled and can be used to cross the organization's Ethernet outlet to match bandwidth and optimize power usage.
5. 包括系統管理訊務之節點至節點訊務停留於橫跨支架之組織上,而未曾通行穿過架頂式乙太網路交換器。5. The node-to-node traffic, including system management traffic, stays on the traversal of the cradle and does not pass through the top-of-rack Ethernet switch.
第9a圖圖示根據一個實施例之具有碟片形狀因子之示例性伺服器700,該碟片形狀因子通常諸如具有SCSI或SATA驅動機之標準2.3吋或3.5吋硬碟驅動機(hard disk drive;HDD)。伺服器板701適應與當前碟片支架中之碟片驅動機702相同的基礎結構。伺服器701為全伺服器,該全伺服器具有DDR、伺服器單晶片(server-on-a-chip) SoC、任擇快閃記憶體、本地功率管理、至碟片之SATA連接(受連接器大小限制的1-16……)。伺服器701之輸出可為乙太網路或Calxeda之組織(XAUI),其中兩個XAUI輸出用於故障切換。任擇地,伺服器701可使用PCIe而非ATA(SSD或需要PCIe之其他物),其中1至4節點平衡計算與儲存需求。此伺服器可進行RAID實施及LAMP堆疊伺服器應用程式。在每一碟片上使用Calxeda ServerNodeTM將提供具有4 GB之DDR3之全LAMP堆疊伺服器,及多個SATA介面。任擇地,在需要時可添加用於8 GB之DDR之第二節點。Figure 9a illustrates an exemplary server 700 having a disc form factor, typically such as a standard 2.3" or 3.5" hard disk drive with a SCSI or SATA driver, according to one embodiment. ;HDD). The servo board 701 accommodates the same basic structure as the disc drive 702 in the current disc holder. Server 701 is a full server with DDR, server-on-a-chip SoC, optional flash memory, local power management, SATA connection to the disc (connected The size of the device is limited to 1-16......). The output of server 701 can be an Ethernet or Calxeda organization (XAUI) with two XAUI outputs for failover. Alternatively, server 701 can use PCIe instead of ATA (SSD or other things that require PCIe), with 1 to 4 nodes balancing computing and storage requirements. This server is available for RAID implementation and LAMP stacking server applications. Use Calxeda ServerNode TM on each disc to provide a full 4 GB of DDR3 LAMP stack of servers, and a plurality of SATA interface. Optionally, a second node for 8 GB of DDR can be added as needed.
第9b圖及第9c圖分別圖示根據一個實施例之碟片-伺服器組合700a-n之示例性陣列710及720,碟片-伺服器組合700a-n使用如上文所論述之儲存伺服器1節點SATA板。藉由一些高速網路或互連結構(標準或專屬)之連接消除對大型乙太網路交換器之需要,從而節省功率、成本、熱量及區域。每一板701皆小於碟片之高度及深度。可以交替碟片與板之方式佈置陣列,如第7b圖中所示,或一個板可服務於多個碟片,例如,在碟片-碟片-板-碟片-碟片佈置中,如第7c圖中所示。因此,可以撓性方式使計算功率與碟片比率匹配。板701a-n之連接性可基於每一節點,其中SATA用來鉤住碟片且多個SATA用來鉤住多個碟片。板701a-n之連接性亦可基於節點至節點,其中每一節點中之組織配置中之兩個XAUI(如先前及在申請案第61/256,723號中所述)用於冗餘。節點經由XAUI組織得以連接。此等連接可具有樹或粗樹拓撲,亦即,節點-節點-節點-節點,其中確定性、無關或適應性路由在正確方向上移動資料。或者,可使用全專屬互連結構,轉向其他處理單元。一些埠可轉向乙太網路輸出或任何其他I/O管線。每一節點可直接轉向乙太網路(「框」內),或XAUI轉向XAUI聚合器(交換器),然後轉向實體層,或XAUI轉向實體層。或可使用以上任何組合。在其他狀況下,可使用具有PCIe連接之SSD,以PCIe連接替換SATA連接。一些SSD使用PCIe或SATA進入碟片形狀因子。或可混合PCIe與SATA。可代替XAUI將來自框之乙太網路用於系統互連。在一些狀況下,例如,可使用標準SATA連接器,但是在其他狀況下可製造專屬佈線穿過專屬底板之較高密度連接器。Figures 9b and 9c illustrate exemplary arrays 710 and 720 of disc-server combinations 700a-n, respectively, using a storage server as discussed above, in accordance with one embodiment 1 node SATA board. Eliminate the need for large Ethernet switches by connecting high-speed networks or interconnects (standard or proprietary) to save power, cost, heat and area. Each plate 701 is smaller than the height and depth of the disc. The array can be arranged alternately with the disc and the board, as shown in Figure 7b, or one board can serve multiple discs, for example, in a disc-disc-plate-disc-disc arrangement, such as Shown in Figure 7c. Therefore, the calculated power can be matched to the disc ratio in a flexible manner. The connectivity of the boards 701a-n can be based on each node, where SATA is used to hook the disc and multiple SATAs are used to hook multiple discs. The connectivity of the boards 701a-n may also be based on node-to-node, where two of the XAUIs in the organization configuration in each node (as previously described in the application and in the 61/256,723 application) are used for redundancy. Nodes are connected via the XAUI organization. Such connections may have a tree or coarse tree topology, ie, node-node-node-nodes, where deterministic, irrelevant or adaptive routing moves data in the correct direction. Alternatively, you can use a fully proprietary interconnect structure to move to other processing units. Some can turn to Ethernet output or any other I/O pipeline. Each node can go directly to the Ethernet (within the "box"), or XAUI to the XAUI aggregator (switch) and then to the physical layer, or XAUI to the physical layer. Or any combination of the above can be used. In other cases, an SSD with a PCIe connection can be used to replace the SATA connection with a PCIe connection. Some SSDs use PCIe or SATA to enter the disc form factor. Or you can mix PCIe and SATA. Instead of XAUI, the Ethernet from the box can be used for system interconnection. In some cases, for example, a standard SATA connector can be used, but in other cases a higher density connector with dedicated wiring through the dedicated backplane can be fabricated.
在另一狀況下,伺服器功能可在碟片驅動機內,從而提供單碟片驅動機形狀因子中之全伺服器加碟片。舉例而言,可將ServerNodeTM安放於碟片內之板上。可用XAUI或乙太網路連接性來實施此方法。在此狀況下,可將發明者已知的伺服器單晶片方法用作碟片控制器加伺服器。第9d圖圖示此概念。在第9d圖中圖示標準3.5吋驅動機(項目9d0)。該3.5吋驅動機具有控制碟片驅動機之積體電路卡9d1。未使用驅動機內之大量空間(由9d2標注),其中可形成Calxeda低功率、小伺服器節點PCB以裝配於碟片驅動機內之此未使用空間內。In another case, the server function can be in the disc drive, thereby providing a full server plus disc in the single disc drive form factor. For example, ServerNode TM may be mounted on a board within the disc. This method can be implemented with XAUI or Ethernet connectivity. In this case, the server single chip method known to the inventors can be used as the disc controller plus the servo. Figure 9d illustrates this concept. The standard 3.5" drive (item 9d0) is illustrated in Figure 9d. The 3.5" drive has an integrated circuit card 9d1 for controlling the disc drive. A large amount of space in the drive unit (marked by 9d2) is not used, in which a Calxeda low power, small server node PCB can be formed to fit within this unused space within the disc drive.
第9e圖圖示在標準3.5吋碟片驅動機形狀因子中安放多個伺服器節點之實施。在此狀況下,自伺服器PCB至底板之連接器輸出基於XAUI之伺服器組織互連結構以提供網路及伺服器間通訊組織,及用於連接至鄰接SATA驅動機之4個SATA埠。Figure 9e illustrates the implementation of placing multiple server nodes in a standard 3.5" disc drive form factor. In this case, the connector from the server PCB to the backplane outputs an XAUI-based server organization interconnect structure to provide network and inter-server communication organization, and four SATA ports for connecting to adjacent SATA drivers.
第10圖圖示用於深入整合伺服器與儲存器之實施。伺服器節點(101)展示完全低功率伺服器,該完全低功率伺服器整合計算核心、DRAM、整合式I/O及組織交換器。在此實例中,以與標準2.5吋碟片驅動機(102)相同的形狀因子圖示伺服器節點101。(103)圖示以成對一對一方式組合此等伺服器節點與碟片驅動機,其中每一伺服器節點具有該伺服器節點自己的本地儲存器。(104)圖示控制4個碟片驅動機之伺服器節點。系統(105)圖示經由統一伺服器組織組合此等儲存伺服器,且然後在此實例中自組織拉出四個10-Gb/sec的乙太網路逸出口,以連接至乙太網路交換器或路由器。Figure 10 illustrates an implementation for deep integration of servers and storage. The server node (101) presents a fully low power server that integrates the compute core, DRAM, integrated I/O, and organization switch. In this example, the server node 101 is illustrated with the same form factor as the standard 2.5 inch disc drive (102). (103) The illustration combines the server nodes and the disc drive in a pair-to-one manner, wherein each server node has its own local storage of the server node. (104) shows a server node that controls four disk drive machines. The system (105) illustrates that these storage servers are combined via a unified server organization, and then in this example, four 10-Gb/sec Ethernet outlets are pulled out of the organization to connect to the Ethernet. Switch or router.
第11圖圖示藉由說明有效利用現有3.5吋集束碟片(JBOD)儲存盒之使用的儲存器與伺服器之此緊密堆積之具體實現。在此狀況下,不改變包括碟片外殼之JBOD機械物體,但是儲存節點係展示為與未修改JBOD盒內之碟片驅動機一對一成對。此說明伺服器節點為可插模組之概念,該等可插模組插入含有組織鏈路之下層主機板中。在此說明中,此標準JBOD盒容納23個3.5吋碟片(在邏輯視圖中圖示為矩形),且此圖圖示在JBOD盒內含有控制23個碟片之31個伺服器節點(在邏輯視圖中圖示為卵形/圓形),且暴露兩個10 Gb/sec之乙太網路鏈路(在邏輯視圖中圖示為暗寬線)。此緊密整合伺服器/儲存器概念僅採用現成儲存器JBOD盒,且然後在經由功率最佳化組織通訊之相同形狀因子中添加31個伺服器節點。此極好地映射至較佳具有本地儲存器之應用程式。Figure 11 illustrates a specific implementation of this close accumulation of memory and server by illustrating the efficient use of existing 3.5-inch cluster disk (JBOD) storage boxes. In this case, the JBOD mechanical object including the disc casing is not changed, but the storage node is shown as being paired with the disc drive in the unmodified JBOD box. This shows that the server node is a pluggable module, and the pluggable modules are inserted into the motherboard below the organization link. In this description, the standard JBOD box holds 23 3.5-inch discs (illustrated as rectangles in the logical view), and this figure shows 31 server nodes in the JBOD box that control 23 discs. The logical view is shown as oval/circular) and exposes two 10 Gb/sec Ethernet links (shown as dark wide lines in the logical view). This tightly integrated server/storage concept uses only the off-the-shelf JBOD box and then adds 31 server nodes in the same form factor via power optimized tissue communication. This is excellently mapped to applications that preferably have local storage.
第12圖圖示有效利用可在2.5吋驅動機之相同形狀因子中例證伺服器節點之事實的相關概念。在此狀況下,將伺服器節點整合至具有46個碟片之2.5吋JBOD中。此概念圖示整合於JBOD儲存器之相同形狀因子中之64個伺服器節點。在此實例中,自組織拉出兩個10 Gb乙太網路鏈路,及1 Gb/sec之管理乙太網路鏈路。Figure 12 illustrates the related concept of effectively utilizing the fact that server nodes can be instantiated in the same form factor of a 2.5" drive. In this case, the server node is integrated into a 2.5" JBOD with 46 discs. This concept illustrates 64 server nodes integrated into the same form factor of the JBOD storage. In this example, two 10 Gb Ethernet links are pulled out of the organization, and a 1 Gb/sec managed Ethernet link.
1. 利用PCIe連接器將乙太網路逸出口及XAUI鏈路引離板以將板與點到點伺服器組織連接於一起,並非利用PCIe信號傳遞,但是將實體連接器用於板之功率及XAUI信號,同時維持用於故障恢復及負載平衡之冗餘通訊路徑。1. Use the PCIe connector to route the Ethernet out port and the XAUI link away from the board to connect the board to the point-to-point server organization, not using PCIe signaling, but using the physical connector for the power of the board and The XAUI signal maintains redundant communication paths for fault recovery and load balancing.
2. 藉由使小形狀因子低功率組織賦能伺服器節點與碟片成對,利用所定義伺服器組織來轉換現有JBOD儲存系統,從而提供極高密度的計算伺服器,該等計算伺服器與本地儲存器緊密成對,經由功率及效能最佳化伺服器組織加以整合,以建立新的高效能計算伺服器及儲存伺服器解決方案,而不影響JBOD儲存系統之實體及機械設計。2. By using a small form factor low power organization to enable the server node to pair with the disc, using the defined server organization to convert the existing JBOD storage system, thereby providing a very high density computing server, the computing server Tightly paired with local storage and integrated by power and performance optimization server organization to create new high performance computing server and storage server solutions without affecting the physical and mechanical design of the JBOD storage system.
3. 為用於高密度計算系統中,將完全伺服器封裝於硬碟驅動機之形狀因子中之方法,用於以額外伺服器替換一些驅動機之目的。3. For high-density computing systems, the method of packaging a full server into the form factor of a hard disk drive is used to replace some drivers with additional servers.
4. 如申請專利範圍3中,其中伺服器經由額外交換組織連接至網路。4. As claimed in claim 3, wherein the server is connected to the network via an additional exchange organization.
5. 如申請專利範圍3中,其中以適合於建立至少一個內部交換路徑之底板替換固定驅動機之外殼中之底板。5. In claim 3, wherein the base plate in the outer casing of the fixed drive is replaced with a base plate adapted to establish at least one internal exchange path.
6. 為用於高密度儲存系統中,將低功率伺服器PCB整合至標準3.5吋碟片驅動機內之空白空間中之方法,提供碟片驅動機內之整合計算能力。6. For high-density storage systems, the integration of low-power servo PCBs into the blank space of a standard 3.5-inch disc drive provides integrated computing capabilities within the disc drive.
驅動至低功率電腦伺服器解決方案之一個態樣為熱量、冷卻及空氣經由支架及越過板之運動的管理。風扇之最小化為降低低功率伺服器之所有權的總成本(total cost of ownership;TCO)之一個態樣。風扇增加成本、複雜性,由於運動部件而減少可靠性,消耗大量功率,且產生大量雜訊。風扇之減少及移除可提供在可靠性、TCO及功率消耗方面之顯著益處。One aspect of driving to a low-power computer server solution is the management of heat, cooling, and air movement through the rack and across the board. The minimization of the fan is one aspect of reducing the total cost of ownership (TCO) of the low power server. Fans increase cost, complexity, reduce reliability due to moving parts, consume a lot of power, and generate a lot of noise. Fan reduction and removal can provide significant benefits in terms of reliability, TCO, and power consumption.
第13圖圖示支架煙囪冷卻之新穎實施,該新穎實施支援貫穿全部支架或僅支架區段中之煙囪冷卻。重要態樣為煙囪支架概念中之單個風扇,該煙囪支架概念在來自一個風扇之幫助下使用向上自然對流。冷卻全部支架之大型風扇可為低速。可將大型風扇定位於底部處,或定位於支架之垂直安裝對流冷卻子集下方之支架內。當冷空氣進入底部中時,風扇推動冷空氣穿過煙囪並推出頂部。因為所有板為垂直的,所以不存在水平阻隔。儘管在此實例中風扇圖示為在支架之底部,但是風扇可在系統中之任何地方。亦即,系統在排氣孔及風扇之下可具有含「傳統」冷卻之水平阻塞,作為垂直煙囪離開頂部。此垂直底部冷卻方法可在小系統上工作。風扇可為變速且隨溫度而變的。Figure 13 illustrates a novel implementation of stent chimney cooling that supports chimney cooling throughout all or only the stent sections. An important aspect is the single fan in the chimney bracket concept, which uses upward natural convection with the help of a fan. The large fan that cools all the brackets can be low speed. The large fan can be positioned at the bottom or within the bracket below the vertically mounted convection cooling subset of the bracket. As cold air enters the bottom, the fan pushes cold air through the chimney and pushes out the top. Since all the plates are vertical, there is no horizontal barrier. Although the fan is illustrated at the bottom of the bracket in this example, the fan can be anywhere in the system. That is, the system may have a horizontal blockage with "conventional" cooling under the vents and fans, leaving the top as a vertical chimney. This vertical bottom cooling method works on small systems. The fan can be variable speed and varies with temperature.
第13a圖圖示用於煙囪支架概念中之熱對流500之新穎原理的示例性說明。以某一角度對準置放組件使熱流501a-n自印刷電路板502上之散熱雙倍資料速率(Double Data Rate;DDR)記憶體晶片503a-n上升,因此彼等散熱晶片不形成熱備份或相互變熱。在此實例中,與彼此成對角而非垂直堆疊地置放DDR晶片,因為該等DDR晶片易於彼此加熱。又,將DDR晶片置放在大型計算晶片504a(諸如,ASIC、SOC或處理器)上方而非下方,因為該等DDR晶片易於加熱SOC。且將最冷晶片(快閃晶片506)置放於SOC下方。同樣地,節點並非垂直堆疊,如下文所述。第14圖擴展此概念以圖示如何相對於彼此成對角地置放伺服器節點,以最小化越過伺服器節點之自熱。Figure 13a illustrates an exemplary illustration of the novel principles of heat convection 500 for use in the chimney bracket concept. Aligning the placement components at an angle causes the heat streams 501a-n to rise from the Double Data Rate (DDR) memory chips 503a-n on the printed circuit board 502, so that the heat sinks do not form a hot backup. Or get hot from each other. In this example, the DDR wafers are placed in a diagonal rather than a vertical stack with each other because the DDR wafers are easily heated to each other. Again, the DDR wafers are placed over large compute wafers 504a (such as ASICs, SOCs, or processors) rather than below because the DDR wafers tend to heat the SOC. The coldest wafer (flash chip 506) is placed under the SOC. Likewise, nodes are not stacked vertically, as described below. Figure 14 extends this concept to illustrate how server nodes are placed diagonally with respect to each other to minimize self-heating across server nodes.
第15圖圖示根據一個實施例之示例性16節點系統,其中熱浪自印刷電路板上升。對於典型16節點系統而言,佈置個別單元,以便自每一單元上升之熱量不加熱上方之單元。整個外殼將通常較長、不太高且不太密集。又,並非如圖所示成對角地安裝PCB,PCB可成方形對準且為矩形,但是組件可以對角對準之方式置放以最小化相互加熱。不同列中之PCB可具有互補佈局或可交錯,因此減少相互加熱。類似地,第16圖圖示16節點系統之較高密度變體,其中節點以類似方式經佈置以最小化越過節點之自熱。Figure 15 illustrates an exemplary 16-node system in accordance with one embodiment in which heat waves rise from a printed circuit board. For a typical 16-node system, individual cells are arranged so that the heat rising from each cell does not heat the cells above. The entire outer casing will typically be longer, not too tall and less dense. Again, instead of mounting the PCB diagonally as shown, the PCB can be square aligned and rectangular, but the components can be placed diagonally to minimize mutual heating. PCBs in different columns can have complementary layouts or can be staggered, thus reducing mutual heating. Similarly, Figure 16 illustrates a higher density variant of a 16-node system in which nodes are arranged in a similar manner to minimize self-heating across nodes.
用於低功率伺服器之支架之額外冷卻概念在於使用氣動式氣壓差來建立向上氣流,而不需要風扇。用於進行此操作之技術在於建立具有用於空氣之延伸垂直排氣管之密封支架。此排氣管必須足夠高(約20-30呎+),以建立充分的氣壓差來產生向上氣流。此舉提供用於低功率伺服器之支架之完全被動空氣運動及冷卻系統。The additional cooling concept for brackets for low power servers is to use a pneumatic air pressure differential to establish an upward airflow without the need for a fan. The technique used to do this is to create a sealed stent with an extended vertical exhaust pipe for air. This exhaust pipe must be high enough (about 20-30 呎+) to establish a sufficient air pressure differential to create an upward flow. This provides a fully passive air motion and cooling system for the brackets of low power servers.
1. 為用於高密度計算系統中,將散熱組件置放於垂直置放之安裝板上之方法,其中散熱組件中沒有一個係直接置放於另一散熱組件上方或下方,1. A method for placing a heat dissipating component on a vertically placed mounting board for use in a high density computing system, wherein none of the heat dissipating components are placed directly above or below the other heat dissipating component,
2. 如申請專利範圍1中,其中組件係越過安裝板而佈置於實質對角佈置中。2. As claimed in claim 1, wherein the components are arranged in a substantially diagonal arrangement across the mounting plate.
3. 如申請專利範圍1中,其中組件係越過安裝板而佈置於若干實質交叉對角佈置中。3. As claimed in claim 1, wherein the components are arranged across several mounting plates in a number of substantially intersecting diagonal arrangements.
4. 如申請專利範圍1、2及3中,其中安裝板為印刷電路板。4. As claimed in claims 1, 2 and 3, wherein the mounting board is a printed circuit board.
如同在申請中之專利申請案第12/794,996號中所述,第17圖圖示伺服器節點組織交換器之內部架構。第17圖圖示根據本文所揭示之系統及方法之一個態樣之示例性交換器900之方塊圖。交換器900具有四個相關區域910a-d。區域910a對應於CPU與內部MAC之間的以太網路封包。區域910b對應於內部MAC處乙太網路實體介面處之乙太網路訊框,區域910b含有前序信號、訊框開始及訊框間間隙欄位。區域910c對應於外部MAC處乙太網路實體介面處之乙太網路訊框,區域910c含有前序信號、訊框開始及訊框間間隙欄位。區域910d對應於路由標頭901之處理器與外部MAC 904之間的以太網路封包。此分段MAC架構為非對稱的。內部MAC具有進入路由標頭處理器之乙太網路實體信號傳遞介面,而外部MAC具有進入路由標頭處理器之以太網路封包介面。因此,重新目的化MAC IP以用於內部MAC及外部MAC,且有效利用將通常為用於MAC之實體信號傳遞以饋送至交換器中之機制。MAC配置使得A9核心905之作業系統裝置驅動器管理並控制內部Eth0 MAC 902及內部ETH1 MAC 903。管理處理器906之裝置驅動器管理並控制內部Eth2 MAC 907。外部Eth MAC 904不受裝置驅動器控制。以雜亂模式配置MAC 904,以在沒有用於網路監視之任何篩選之情況下傳遞所有訊框。在MAC之硬體例示與任何其他必要管理處理器初始化之間協調此MAC之初始化。外部Eth MAC 904暫存器對A9 905及管理處理器906位址映射皆可見。外部Eth MAC 904之中斷信號可路由至A9或管理處理器。As described in the patent application Serial No. 12/794,996, the FIG. 17 illustrates the internal architecture of the server node organization switch. Figure 17 illustrates a block diagram of an exemplary switch 900 in accordance with one aspect of the systems and methods disclosed herein. The switch 900 has four associated areas 910a-d. The area 910a corresponds to an Ethernet path packet between the CPU and the internal MAC. The area 910b corresponds to an Ethernet frame at the Ethernet physical interface at the internal MAC, and the area 910b includes a preamble signal, a frame start, and an inter-frame gap field. The area 910c corresponds to an Ethernet frame at the Ethernet physical interface at the external MAC, and the area 910c includes a preamble signal, a frame start, and an inter-frame gap field. Region 910d corresponds to an Ethernet path packet between the processor of routing header 901 and external MAC 904. This segmented MAC architecture is asymmetric. The internal MAC has an Ethernet physical signaling interface that enters the routing header processor, while the external MAC has an Ethernet trunking interface that enters the routing header processor. Therefore, the MAC IP is repurposed for internal MAC and external MAC, and the mechanism that will typically be the physical signaling for the MAC to feed into the switch is utilized. The MAC configuration causes the operating system device driver of the A9 core 905 to manage and control the internal Eth0 MAC 902 and the internal ETH1 MAC 903. The device driver of the management processor 906 manages and controls the internal Eth2 MAC 907. The external Eth MAC 904 is not controlled by the device driver. The MAC 904 is configured in a messy mode to pass all frames without any filtering for network monitoring. The initialization of this MAC is coordinated between the hardware representation of the MAC and any other necessary management processor initialization. The external Eth MAC 904 register is visible to both the A9 905 and the management processor 906 address map. The interrupt signal of the external Eth MAC 904 can be routed to the A9 or management processor.
關鍵是應注意到,當自前往交換器之MAC接收封包時,路由標頭處理器910d將組織路由標頭添加至封包,且當自前往MAC之交換器接收封包時,路由標頭處理器910d移除組織路由標頭。組織交換器本身僅在節點ID及含於組織路由標頭中之其他資訊上路由,且組織交換器本身不進行原始封包之封包檢驗。It is important to note that when receiving a packet from the MAC going to the switch, the route header processor 910d adds the organization route header to the packet, and when receiving the packet from the switch to the MAC, the route header processor 910d Remove the organization route header. The organization switch itself routes only the node ID and other information contained in the organization route header, and the organization switch itself does not perform packet inspection of the original packet.
第18圖圖示伺服器節點,該伺服器節點包括PCIe控制器,該PCIe控制器連接至內部CPU匯流排組織。此允許建立新穎PCIe交換組織,該新穎PCIe交換組織有效利用高效能、功率最佳化伺服器組織,以建立可擴充、高效能、功率最佳化PCIe組織。Figure 18 illustrates a server node that includes a PCIe controller that is connected to an internal CPU bus organization. This allows for the creation of a novel PCIe switching organization that effectively utilizes high-performance, power-optimized server organization to build a scalable, high-performance, power-optimized PCIe organization.
技術如下:The technology is as follows:
●PCIe控制器902連接至多工器(Mux) 902a,多工器902a允許PCIe控制器直接連接至外部PCIe實體層或連接至PCIe路由標頭處理器910c。當多工器902a經配置以將PCIe訊務導向至局部PCIe實體層時,此相當於標準局部PCIe連接。當多工器902a經配置以將PCIe訊務導向至PCIe路由標頭處理器910c時,此賦能新穎PCIe分散式組織交換器機制。The PCIe controller 902 is coupled to a multiplexer (Mux) 902a that allows the PCIe controller to connect directly to an external PCIe physical layer or to a PCIe routing header processor 910c. When multiplexer 902a is configured to direct PCIe traffic to the local PCIe physical layer, this is equivalent to a standard partial PCIe connection. This enables the novel PCIe decentralized tissue switch mechanism when multiplexer 902a is configured to direct PCIe traffic to PCIe routing header processor 910c.
●PCIe路由標頭處理器910c利用封包內之嵌式路由資訊(位址、ID或隱含)來建立組織路由標頭,該組織路由標頭將該PCIe封包路由映射至目的地組織節點PCIe控制器。The PCIe routing header processor 910c uses the embedded routing information (address, ID, or implied) in the packet to establish an organization routing header that maps the PCIe packet route to the destination organization node PCIe control. Device.
●此提供與伺服器組織提供至網路連接之分散式PCIe組織類似之優點。This provides similar advantages to a decentralized PCIe organization that the server organization provides to the network.
●源自處理器核心(905)之PCIe異動可路由至局部PCIe實體層(經由多工器旁路或經由交換器),可路由至組織上之任何其他節點,直接路由至內部PCIe控制器(902)或路由至外部PCIe控制器/實體層(904)。The PCIe transaction originating from the processor core (905) can be routed to the local PCIe physical layer (via multiplexer bypass or via switch), routed to any other node on the organization, and routed directly to the internal PCIe controller ( 902) or route to an external PCIe controller/entity layer (904).
●同樣地,輸入PCIe異動進入外部PCIe控制器(904),由PCIe路由標頭處理器(910)加標記於組織路由標頭,且然後組織將PCIe封包傳送至該PCIe封包之最終目標。Similarly, the incoming PCIe transaction enters the external PCIe controller (904), is tagged by the PCIe routing header processor (910) to the organization routing header, and then the organization transmits the PCIe packet to the final destination of the PCIe packet.
第18a圖圖示額外延伸,該額外延伸展示多個協定橋接器可利用組織交換器在路由標頭上而非直接在下層封包有效負載(例如,層2乙太網路訊框)上路由之事實。在此說明中,圖示3個協定橋接器:乙太網路、PCIe及匯流排協定橋接器。Figure 18a illustrates an additional extension showing the fact that multiple contract bridges can utilize the organization switch to route on the routing header rather than directly on the underlying packet payload (eg, layer 2 Ethernet frame) . In this description, three protocol bridges are illustrated: Ethernet, PCIe, and bus protocol bridge.
匯流排協定橋接器之角色為取得處理器或內部SOC組織協定,分封化該處理器或內部SOC組織協定,添加Calxeda組織路由標頭,且然後經由Calxeda組織路由該處理器或內部SOC組織協定。The role of the bus protocol bridge is to obtain a processor or internal SOC organization agreement, to encapsulate the processor or internal SOC organization protocol, to add a Calxeda organization route header, and then to route the processor or internal SOC organization protocol via the Calxeda organization.
作為有形實例,考慮SOC內諸如AMBA AXI、HyperTransport或快速路徑互連(QPI)之匯流排協定。As a tangible example, consider a bus protocol such as AMBA AXI, HyperTransport, or Fast Path Interconnect (QPI) within the SOC.
考慮以下資料流:Consider the following data flow:
●內部SOC匯流排組織上之處理器發出記憶體負載(或儲存)請求。• The processor on the internal SOC bus organization issues a memory load (or store) request.
●已將用於記憶體操作之實體位址目標映射至組織上之遠端節點。• The physical address target for memory operations has been mapped to a remote node on the organization.
●匯流排異動通行穿過匯流排協定橋接器:● Busbar traffic passes through the busbar protocol bridge:
○ 分封化匯流排異動。○ Separate bus bar transaction.
○ 將用於記憶體異動之實體位址映射至遠端節點,當構建路由標頭時使用該節點ID。○ Map the physical address used for memory transaction to the remote node, which is used when building the route header.
○ 由匯流排協定橋接器構建路由訊框,該匯流排協定橋接器由具有遠端節點ID之路由標頭組成,且有效負載為經分封化之匯流排異動。○ The routing frame is constructed by the bus protocol bridge. The bus protocol bridge is composed of a routing header with a remote node ID, and the payload is a sealed bus bar.
●匯流排異動路由訊框通過組織交換器,通行穿過組織,並由目標節點之訊框交換器接收。● The bus routing message frame passes through the organization switch, passes through the organization, and is received by the frame switch of the target node.
●目標節點匯流排協定橋接器解開經分封化之匯流排異動,將匯流排異動發出至目標SOC組織中,完成記憶體負載,並經由相同步驟傳回結果,其中結果流回至發端節點。The target node bus bar protocol bridge unpacks the blocked bus bar transaction, sends the bus bar transaction to the target SOC organization, completes the memory load, and returns the result via the same step, and the result flows back to the originating node.
第19圖圖示將伺服器組織與網路處理器(911)整合之說明。存在用於伺服器組織與網路處理器整合之若干使用狀況,包括:Figure 19 illustrates an illustration of integrating the server organization with a network processor (911). There are several usage scenarios for server organization and network processor integration, including:
● 網路處理器可充當本地處理器(905)及組織上之任何其他處理器之網路封包處理加速器。• The network processor acts as a network packet processing accelerator for the local processor (905) and any other processor on the organization.
● 可為網路處理器中心設計,其中將來自外部乙太網路之輸入封包定標至網路處理器,且將網路處理器及控制面處理卸載至較大處理器核心(905)。• Designed for the Network Processor Center, where the input packets from the external Ethernet are scaled to the network processor and the network processor and control plane processing is offloaded to the larger processor core (905).
● 伺服器組織可充當網路處理器之間的通訊組織。● The server organization acts as a communication organization between network processors.
為賦能此等新穎使用狀況,為網路處理器指派MAC位址。在第19圖中所示之交換器架構中,不存在附接至埠1-4之路由標頭處理器。因此,直接連接至埠1-4之代理者需要注入封包,該等封包具有前置於有效負載封包之組織交換器標頭。網路處理器藉由以下操作將組織交換器整合添加至該等網路處理器之設計:To enable these novel usage conditions, a network address is assigned to the network processor. In the switch architecture shown in Figure 19, there is no routing header processor attached to 1-4. Therefore, agents directly connected to 埠 1-4 need to inject packets that have a tissue switch header that is placed before the payload packet. The network processor adds the organization switch integration to the design of the network processors by:
● 將來自網路處理器之輸出封包加標記於組織交換器標頭,該組織交換器標頭編碼來自目的地MAC之目的地節點ID。• Mark the output packet from the network processor to the organization switch header, which encodes the destination node ID from the destination MAC.
● 自組織交換器至網路處理器之輸入封包在乙太網路封包處理之前移除組織交換器標頭。● The input packet from the self-organizing switch to the network processor removes the organization switch header before the Ethernet packet is processed.
第19圖圖示伺服器組織與任意外來裝置(912)整合之說明。藉由外來裝置,吾人意謂任何處理器、DSP、GPU、I/O或需要裝置間通訊組織之通訊裝置或處理裝置。典型使用狀況將為大型處理系統,該大型處理系統由DSP或GPU處理器組成,該等DSP或GPU處理器在DSP或GPU處理器之間需要互連結構組織。Figure 19 illustrates an illustration of the integration of the server organization with any foreign device (912). By means of an external device, I mean any processor, DSP, GPU, I/O or communication device or processing device that requires an inter-device communication organization. A typical use case would be a large processing system consisting of DSP or GPU processors that require interconnected fabric organization between DSP or GPU processors.
組織交換器基於組織路由標頭路由封包,且組織交換器不進行封包有效負載之封包檢驗。封包有效負載不具有被格式化為乙太網路訊框之假定,且該封包有效負載被完全視為不透明有效負載。The organization switch routes the packet based on the organization routing header, and the organization switch does not perform the packet inspection of the packet payload. The packet payload does not have the assumption of being formatted as an Ethernet frame, and the packet payload is considered completely opaque payload.
此允許外來裝置(例如,DSP或GPU處理器)藉由以下操作附接至組織交換器並有效利用可擴充、高效能、功率最佳化通訊組織:This allows external devices (eg, DSP or GPU processors) to attach to the organization switch and effectively utilize scalable, high-performance, power-optimized communication organizations by:
●將含有封包之目的地節點ID之路由訊框標頭添加至發送至訊框交換器之任意封包有效負載。• Add the routing frame header containing the destination node ID of the packet to any packet payload sent to the frame switch.
●當自訊框交換器接收封包時剝離路由訊框標頭。● Strip the routing frame header when the slave frame switch receives the packet.
當慮及諸如第5圖中所示之組織拓撲時,組織中之節點中之每一者皆輸出至少一個MAC位址及IP位址,以經由501a及501b中所示之閘道節點提供外部乙太網路連接性。When considering a tissue topology such as that shown in Figure 5, each of the nodes in the organization outputs at least one MAC address and IP address to provide an external via the gateway node shown in 501a and 501b. Ethernet connectivity.
暴露此等細化的MAC及IP位址對於使用硬體負載平衡器之大規模網路操作為有利的,因為該暴露為負載平衡器提供MAC/IP位址之平坦列表以對比操作,其中組織之內部結構對負載平衡器不可見。Exposing such refinement of MAC and IP addresses is advantageous for large-scale network operations using hardware load balancers because the exposure provides a flat list of MAC/IP addresses for the load balancer to compare operations, where the organization The internal structure is not visible to the load balancer.
但是,較小資料中心可潛在地承受高密度低功率伺服器可提供之潛在大量之新MAC/IP位址。有利地能夠提供用於負載平衡之選項,以使外部資料中心基礎結構免於必須分別處理用於諸如網路服務之層之大量IP位址。However, smaller data centers can potentially withstand the potentially large number of new MAC/IP addresses that high-density, low-power servers can provide. It is advantageous to be able to provide options for load balancing to protect the external data center infrastructure from having to process a large number of IP addresses for layers such as network services, respectively.
考慮第20圖,其中吾人已在組織交換器上採用一個埠且已添加提供諸如IP虛擬伺服器(IPVS)之服務之FPGA。可在包括層4(傳送)及層7(應用)之網路層次範圍內進行此IP虛擬化。在許多狀況下,在用於諸如網路服務之資料中心層之層7處進行負載平衡為有利的,以使得可藉由特定網路伺服器節點局部地維持http通信期狀態。僅將IPVS FPGA附接至閘道節點(第5圖中之節點501a及501b)。Consider Figure 20, where we have adopted an FPGA on the organization switch and added an FPGA that provides services such as IP Virtual Server (IPVS). This IP virtualization can be done at the network level including Layer 4 (Transport) and Layer 7 (Application). In many cases, it may be advantageous to load balance at layer 7 for a data center layer, such as a network service, such that the http communication period state can be maintained locally by a particular network server node. Only the IPVS FPGA is attached to the gateway node (nodes 501a and 501b in Figure 5).
在此實例中,第5圖中所示之組織在使用閘道節點上之IPVS FPGA擴充時可每一閘道節點輸出單個IP位址。然後,IPVS FPGA將進入請求(例如,HTTP請求)負載平衡至組織內之節點。在層4負載平衡之情況下,IPVS FPGA可無狀態地進行,並使用包括越過節點之循環或在使用下一節點之前例證每一節點最大數目之請求的演算法。在層7負載平衡之情況下,IPVS FPGA將需要維持狀態,以使得可將應用通信期定標至特定節點。In this example, the organization shown in Figure 5 can output a single IP address per gateway node when using IPVS FPGA expansion on the gateway node. The IPVS FPGA then loads the incoming request (eg, HTTP request) into nodes within the organization. In the case of layer 4 load balancing, the IPVS FPGA can proceed statelessly and use an algorithm that includes a loop that crosses the node or exemplifies the maximum number of requests per node before using the next node. In the case of Layer 7 load balancing, the IPVS FPGA will need to maintain state so that the application communication period can be scaled to a particular node.
所得流程變為:The resulting process becomes:
●進入請求(例如,HTTP請求)進入第20圖中之閘道節點(埠0)。• The entry request (eg, an HTTP request) enters the gateway node (埠0) in Figure 20.
●組織交換器路由表已經配置以將來自埠0之進入訊務導向至組織交換器上之IPVS FPGA埠。The organization switch routing table has been configured to direct incoming traffic from 埠0 to the IPVS FPGA port on the organization switch.
●IPVS FPGA重寫路由標頭以定標組織內之特定節點,且IPVS FPGA將所得封包轉發至目標節點。• The IPVS FPGA rewrites the routing header to calibrate specific nodes within the organization, and the IPVS FPGA forwards the resulting packet to the destination node.
●目標節點處理請求,並將結果正常地發送出閘道節點。The target node processes the request and sends the result out of the gateway node normally.
開放流為通訊協定,該通訊協定提供經由網路對交換器或路由器之轉發平面之存取。開放流允許網路封包穿過交換器之網路的路徑由在單獨伺服器上執行之軟體決定。此控制與轉發之分離允許比當今使用ACL及路由協定可行之更複雜的訊務管理。將開放流視為軟體定義之網路連接之一般方法的實施。The open flow is a communication protocol that provides access to the forwarding plane of the switch or router via the network. The path that the open flow allows the network packet to pass through the network of the switch is determined by the software executing on a separate server. This separation of control and forwarding allows for more complex traffic management than is possible with today's ACLs and routing protocols. Open flow is considered an implementation of the general approach to software-defined network connections.
第21圖圖示將開放流(或更大體而言軟體定義之網路連接(software defined networking;SDF))流程處理構建為Calxeda組織之方法。閘道節點中之每一者將例證閘道節點之組織交換器之埠上的開放流賦能FPGA。開放流FPGA需要至控制面處理器之帶外路徑,此舉可由開放流FPGA上之單獨網路連接埠來進行,或可藉由簡單要求離開組織交換器之另一埠對控制面處理器談話來進行。Figure 21 illustrates a method of constructing an OpenFlow (or, more specifically, software defined networking (SDF)) process to be a Calxeda organization. Each of the gateway nodes will instantiate the FPGA on the top of the organization switch that exemplifies the gateway node. Open-flow FPGAs require an out-of-band path to the control plane processor, which can be done by a separate network interface on the OpenFlow FPGA, or can be talked to by another control plane processor that simply leaves the organization switch Come on.
所得流程變為:The resulting process becomes:
●進入請求進入第20圖中之閘道節點(埠0)。● Enter the request to enter the gateway node (埠0) in Figure 20.
●組織交換器路由表已經配置以將來自埠0之進入訊務導向至組織交換器上之開放流/SDF FPGA埠。• The organization switch routing table has been configured to direct incoming traffic from 埠0 to the OpenFlow/SDF FPGA埠 on the organization switch.
●開放流/SDF FPGA實施標準開放流處理,包括在必要時任擇地聯繫控制面處理器。開放流/SDF FPGA重寫路由標頭以定標組織內之特定節點(藉由MAC位址),且開放流/SDF FPGA將所得封包轉發至目標節點。• Open Stream/SDF FPGA implements standard open stream processing, including optionally contacting the control plane processor when necessary. The OpenFlow/SDF FPGA rewrites the routing headers to calibrate specific nodes within the organization (by MAC address), and the OpenFlow/SDF FPGA forwards the resulting packets to the destination node.
●目標節點處理請求,且將結果送回至開放流FPGA,在該開放流FPGA中目標節點實施任何輸出流程處理。The target node processes the request and sends the result back to the OpenFlow FPGA where the target node performs any output process processing.
在第5圖中圖示及先前所述之功率最佳化伺服器組織向現有標準處理器提供令人信服的優點,且該功率最佳化伺服器組織可作為整合式晶片解決方案與現有處理器整合。標準桌上型及伺服器處理器通常直接地或經由整合式晶片組而支援PCIe介面。第22圖圖示功率最佳化組織交換器經由PCIe整合至現有處理器之一個實例。項目22a圖示標準處理器,該標準處理器直接地或經由整合式晶片組而支援一或更多PCIe介面。項目22b圖示具有整合式乙太網路MAC控制器之所揭示之組織交換器,PCIe介面已整合至該等整合式乙太網路MAC控制器。通常可利用PCIe整合式組織交換器之FPGA或ASIC實施將項目22b整合於一起。The power optimization server organization illustrated in Figure 5 and previously described provides compelling advantages to existing standard processors, and the power optimization server organization can be used as an integrated wafer solution with existing processing Integration. Standard desktop and server processors typically support the PCIe interface either directly or via an integrated chipset. Figure 22 illustrates an example of a power optimized organization switch integrated into an existing processor via PCIe. Item 22a illustrates a standard processor that supports one or more PCIe interfaces either directly or via an integrated chipset. Item 22b illustrates the disclosed organization switch with an integrated Ethernet MAC controller that has been integrated into the integrated Ethernet MAC controller. Project 22b can typically be integrated using an FPGA or ASIC implementation of a PCIe integrated organization switch.
在本揭示內容中,在第5圖中所示之節點可為功率最佳化伺服器SOC與整合式組織交換器之異質組合,及PCIe連接標準處理器至PCIe介面模組之此揭示之整合,該PCIe介面模組含有乙太網路MAC及組織交換器。In the present disclosure, the node shown in FIG. 5 may be a heterogeneous combination of a power optimization server SOC and an integrated organization switch, and an integration of the PCIe connection standard processor to the PCIe interface module. The PCIe interface module includes an Ethernet MAC and an organization switch.
在第5圖中圖示及先前所述之功率最佳化伺服器組織向現有標準處理器提供令人信服的優點,且該功率最佳化伺服器組織可作為整合式晶片解決方案與現有處理器整合。標準桌上型及伺服器處理器通常經由整合式晶片或潛在地提供於SOC內而支援乙太網路介面。第23圖圖示功率最佳化組織交換器經由乙太網路整合至現有處理器之一個實例。項目23a圖示標準處理器,該標準處理器藉由SOC或經由整合式晶片而支援乙太網路介面。項目23b圖示不具有整合式內部乙太網路MAC控制器之所揭示之組織交換器。通常可利用整合式組織交換器之FPGA或ASIC實施將項目23b整合於一起。The power optimization server organization illustrated in Figure 5 and previously described provides compelling advantages to existing standard processors, and the power optimization server organization can be used as an integrated wafer solution with existing processing Integration. Standard desktop and server processors typically support the Ethernet interface via integrated chips or potentially provided within the SOC. Figure 23 illustrates an example of a power optimized organization switch integrated into an existing processor via Ethernet. Item 23a illustrates a standard processor that supports the Ethernet interface via the SOC or via an integrated chip. Item 23b illustrates the disclosed tissue switch without an integrated internal Ethernet MAC controller. Project 23b can typically be integrated together using an FPGA or ASIC implementation of an integrated organization switch.
在本揭示內容中,在第5圖中所示之節點可為功率最佳化伺服器SOC與整合式組織交換器之異質組合,及乙太網路連接標準處理器至整合式組織交換器之此揭示之整合,該整合式組織交換器實施於FPGA或ASIC中。In the present disclosure, the node shown in FIG. 5 may be a heterogeneous combination of a power optimization server SOC and an integrated organization switch, and an Ethernet connection standard processor to an integrated organization switch. The integration of this disclosure, the integrated organization switch is implemented in an FPGA or ASIC.
儘管上述內容已參閱本發明之特定實施例,但是熟習此項技術者將瞭解,在不脫離本揭示內容之原理及精神之情況下可對此實施例進行改變,本揭示內容之範疇係由隨附申請專利範圍定義。While the invention has been described with respect to the specific embodiments of the present invention, it will be understood by those skilled in the art A definition of the scope of the patent application is attached.
9d0,22a-23b...項目9d0, 22a-23b. . . project
9d1...積體電路卡9d1. . . Integrated circuit card
9d2...驅動機內之大量空間9d2. . . a lot of space inside the drive
100...典型網路資料中心架構100. . . Typical network data center architecture
101,104...伺服器節點101,104. . . Server node
101a-n...頂層交換器101a-n. . . Top-level switch
102...標準2.5吋碟片驅動機102. . . Standard 2.5" disc drive
102a-n,203a-c...支架102a-n, 203a-c. . . support
103a-f...本地端路由器103a-f. . . Local router
104e-k...額外伺服器104e-k. . . Extra server
105...系統105. . . system
105a-b...支架單元105a-b. . . Bracket unit
106a-g...路由器106a-g. . . router
107a-n...刀鋒型伺服器107a-n. . . Blade server
108a-n...額外支架單元108a-n. . . Additional bracket unit
110...示例性實體視圖/聚合110. . . Exemplary entity view/aggregation
111a-bn...周邊伺服器111a-bn. . . Peripheral server
112a-h...邊緣路由器系統112a-h. . . Edge router system
113...核心交換系統113. . . Core switching system
200...網路聚合200. . . Network aggregation
201...10-Gb/sec乙太網路通訊/粗紅線/上行鏈路乙太網路埠201. . . 10-Gb/sec Ethernet communication/thick red line/uplink Ethernet network埠
202...聚合路由器202. . . Aggregation router
206a-d、209a...伺服器206a-d, 209a. . . server
208a...架頂式交換器208a. . . Top-of-rack exchanger
400...資料中心400. . . information Center
401a-n...10-Gb乙太網路實體層401a-n. . . 10-Gb Ethernet physical layer
402...1-Gb專用乙太網路實體層402. . . 1-Gb dedicated Ethernet physical layer
403a-n...大型電腦(功率伺服器)403a-n. . . Large computer (power server)
404a-n...電腦(伺服器)404a-n. . . Computer (server)
405...單一大型低速風扇405. . . Single large low speed fan
406,407、710,720...陣列406,407,710,720. . . Array
408a,b...歷史伺服器408a, b. . . Historical server
500...高階拓撲/拓撲/熱對流500. . . High-order topology/topology/thermal convection
501a-n...熱流501a-n. . . Heat flow
501a...節點/10 Gb乙太網路埠Eth0501a. . . Node/10 Gb Ethernet 埠Eth0
501b...節點/10 Gb乙太網路埠Eth1501b. . . Node/10 Gb Ethernet 埠Eth1
502...印刷電路板502. . . A printed circuit board
502a-n...卵形502a-n. . . Oval
502d,e...卵形/層次0葉節點502d, e. . . Oval/level 0 leaf node
503a-n...散熱雙倍資料速率(DDR)記憶體晶片503a-n. . . Thermal double data rate (DDR) memory chip
504a...大型計算晶片504a. . . Large computing chip
506...快閃晶片506. . . Flash chip
700...示例性伺服器700. . . Exemplary server
701...伺服器板/伺服器/板701. . . Server board/server/board
701a-n...板701a-n. . . board
702...碟片驅動機702. . . Disc drive
900...交換器900. . . Exchanger
901...路由標頭901. . . Routing header
902...內部Eth0 MAC/PCIe控制器/內部PCIe控制器902. . . Internal Eth0 MAC/PCIe controller/internal PCIe controller
902a...多工器902a. . . Multiplexer
903...內部ETH1 MAC903. . . Internal ETH1 MAC
904...外部MAC/外部Eth MAC/MAC/外部PCIe控制器/實體層904. . . External MAC/External Eth MAC/MAC/External PCIe Controller/Solid Layer
905...方塊/A9核心/A9/處理器核心/本地處理器905. . . Block / A9 Core / A9 / Processor Core / Local Processor
906...方塊/管理處理器906. . . Block/management processor
907...內部Eth2 MAC907. . . Internal Eth2 MAC
910a-d...相關區域910a-d. . . Related area
911...網路處理器911. . . Network processor
912...外來裝置912. . . Foreign device
第1圖及第2圖圖示典型資料中心網路聚合;Figure 1 and Figure 2 illustrate typical data center network aggregation;
第3圖圖示根據一個實施例之使用伺服器之網路聚合;Figure 3 illustrates network aggregation using a server in accordance with one embodiment;
第4圖圖示根據一個實施例之支架中之資料中心;Figure 4 illustrates a data center in a stent in accordance with one embodiment;
第5圖圖示具有交換組織之網路系統之高階拓撲;Figure 5 illustrates a high-level topology of a network system with an exchange organization;
第6圖圖示伺服器板,該伺服器板組成多個伺服器節點,該多個伺服器節點與所述點對點互連結構互連;Figure 6 illustrates a server board that constitutes a plurality of server nodes interconnected with the point-to-point interconnect structure;
第6a圖-第6c圖圖示組織拓撲之另一實例;Figure 6a - Figure 6c illustrate another example of a tissue topology;
第7圖圖示被動底板之實例,該被動底板連接至一或更多節點板及兩個聚合板;Figure 7 illustrates an example of a passive backplane connected to one or more node boards and two polymeric boards;
第8圖圖示延伸組織越過架子及鏈接架子越過伺服器支架之實例;Figure 8 illustrates an example of an extended tissue crossing the shelf and the link shelf over the server bracket;
第9a圖圖示具有碟片形狀因子之示例性伺服器700;Figure 9a illustrates an exemplary server 700 having a disc form factor;
第9b圖及第9c圖圖示根據一個實施例之碟片-伺服器組合之示例性陣列,該碟片-伺服器組合使用儲存伺服器1節點SATA板;Figures 9b and 9c illustrate an exemplary array of disc-server combinations in accordance with one embodiment, the disc-server combination using a storage server 1-node SATA board;
第9d圖圖示標準3.5吋驅動機;Figure 9d illustrates a standard 3.5” drive;
第9e圖圖示標準3.5吋碟片驅動機形狀因子中之多個伺服器節點之實施;Figure 9e illustrates the implementation of a plurality of server nodes in a standard 3.5" disc drive form factor;
第10圖圖示與儲存器深入整合之伺服器之實施;Figure 10 illustrates the implementation of a server that is deeply integrated with the storage;
第11圖圖示有效利用現有3.5吋JBOD儲存盒之儲存器與伺服器之緊密堆積的實施;Figure 11 illustrates an implementation of the close packing of the storage and server of an existing 3.5" JBOD storage box;
第12圖圖示在2.5吋驅動機之相同形狀因子中例證之伺服器節點的實施;Figure 12 illustrates the implementation of a server node exemplified in the same form factor of a 2.5" drive;
第13圖圖示支架煙囪冷卻之實施;Figure 13 illustrates the implementation of the bracket chimney cooling;
第13a圖圖示用於第13圖中所示之煙囪支架冷卻中之熱對流的示例性說明;Figure 13a illustrates an exemplary illustration of the heat convection used in the cooling of the chimney bracket shown in Figure 13;
第14圖圖示伺服器節點,該等伺服器節點以相對於彼此成對角之方式置放,以最小化越過伺服器節點之自熱;Figure 14 illustrates server nodes that are placed diagonally relative to one another to minimize self-heating across the server node;
第15圖圖示根據一個實施例之示例性16節點系統,其中熱浪自印刷電路板上升;Figure 15 illustrates an exemplary 16-node system in accordance with one embodiment in which heat waves rise from a printed circuit board;
第16圖圖示具有類似地經佈置以最小化越過節點之自熱之節點的16節點系統之較高密度變體;Figure 16 illustrates a higher density variant of a 16-node system having nodes similarly arranged to minimize self-heating across nodes;
第17圖圖示伺服器節點組織交換器之內部架構;Figure 17 illustrates the internal architecture of the server node organization switch;
第18圖圖示伺服器節點,該伺服器節點包括PCIe控制器,該PCIe控制器連接至內部CPU匯流排組織;Figure 18 illustrates a server node that includes a PCIe controller that is coupled to an internal CPU bus organization;
第18a圖圖示具有使用組織交換器之多個協定橋接器的系統;Figure 18a illustrates a system with multiple protocol bridges using a tissue switch;
第19圖圖示伺服器組織與網路處理器之整合;Figure 19 illustrates the integration of server organization and network processor;
第20圖圖示組織交換器及FPGA,該FPGA提供諸如網際網路協定虛擬伺服器(IPVS)之服務;Figure 20 illustrates an organization switch and an FPGA that provides services such as the Internet Protocol Virtual Server (IPVS);
第21圖圖示將開放流流程處理構建為Calxeda組織之方法;Figure 21 illustrates a method of constructing an OpenFlow process process as a Calxeda organization;
第22圖圖示功率最佳化組織交換器經由PCIe整合至現有處理器之一個實例;以及Figure 22 illustrates an example of a power optimized organization switch integrated into an existing processor via PCIe;
第23圖圖示功率最佳化組織交換器經由乙太網路整合至現有處理器之一個實例。Figure 23 illustrates an example of a power optimized organization switch integrated into an existing processor via Ethernet.
200...網路聚合200. . . Network aggregation
201...10-Gb/sec乙太網路通訊/粗紅線/上行鏈路乙太網路埠201. . . 10-Gb/sec Ethernet communication/thick red line/uplink Ethernet network埠
202...聚合路由器202. . . Aggregation router
203a-c...支架203a-c. . . support
206a-d...伺服器206a-d. . . server
208a...架頂式交換器208a. . . Top-of-rack exchanger
209a...伺服器209a. . . server
Claims (46)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US38358510P | 2010-09-16 | 2010-09-16 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW201230724A TW201230724A (en) | 2012-07-16 |
TWI540862B true TWI540862B (en) | 2016-07-01 |
Family
ID=46934225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW100133390A TWI540862B (en) | 2010-09-16 | 2011-09-16 | Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105743819B (en) |
TW (1) | TWI540862B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104347998A (en) * | 2013-08-07 | 2015-02-11 | 日本航空电子工业株式会社 | Connector |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110727631B (en) * | 2019-09-12 | 2023-08-08 | 无锡江南计算技术研究所 | H-shaped assembling method based on orthogonal and non-orthogonal heterogeneous interconnection of double middle plates |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7752385B2 (en) * | 2006-09-07 | 2010-07-06 | International Business Machines Corporation | Flexible disk storage enclosure |
US7761738B2 (en) * | 2006-09-07 | 2010-07-20 | International Business Machines Corporation | Establishing communications across virtual enclosure boundaries |
US20090166065A1 (en) * | 2008-01-02 | 2009-07-02 | Clayton James E | Thin multi-chip flex module |
US20100008038A1 (en) * | 2008-05-15 | 2010-01-14 | Giovanni Coglitore | Apparatus and Method for Reliable and Efficient Computing Based on Separating Computing Modules From Components With Moving Parts |
-
2011
- 2011-09-16 CN CN201610113343.8A patent/CN105743819B/en not_active Expired - Fee Related
- 2011-09-16 TW TW100133390A patent/TWI540862B/en not_active IP Right Cessation
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104347998A (en) * | 2013-08-07 | 2015-02-11 | 日本航空电子工业株式会社 | Connector |
Also Published As
Publication number | Publication date |
---|---|
CN105743819B (en) | 2020-06-26 |
TW201230724A (en) | 2012-07-16 |
CN105743819A (en) | 2016-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9876735B2 (en) | Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect | |
WO2012037494A1 (en) | Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect | |
KR101516216B1 (en) | System and method for high-performance, low-power data center interconnect fabric | |
EP3063903B1 (en) | Method and system for load balancing at a data network | |
TWI534629B (en) | Data transmission method and data transmission system | |
TWI543566B (en) | Data center network system based on software-defined network and packet forwarding method, address resolution method, routing controller thereof | |
US9292460B2 (en) | Versatile lane configuration using a PCIe PIE-8 interface | |
US9300574B2 (en) | Link aggregation emulation for virtual NICs in a cluster server | |
US9264346B2 (en) | Resilient duplicate link aggregation emulation | |
US9141171B2 (en) | Network routing protocol power saving method for network elements | |
US8335884B2 (en) | Multi-processor architecture implementing a serial switch and method of operating same | |
US8982734B2 (en) | Methods, apparatus, and systems for routing information flows in networks using spanning trees and network switching element resources | |
EP3531633B1 (en) | Technologies for load balancing a network | |
TWI540862B (en) | Performance and power optimized computer system architectures and methods leveraging power optimized tree fabric interconnect | |
Qian et al. | Alibaba hpn: A data center network for large language model training | |
WO2020050975A1 (en) | Removable i/o expansion device for data center storage rack | |
CN203241890U (en) | Multi-unit server based on ATCA board card interfaces | |
US11362904B2 (en) | Technologies for network discovery | |
Baidu et al. | A Novel Networking Box System Architecture and Design for Data Center Energy Efficiency | |
Feng et al. | Analysis of internet data center virtualization deployment technology | |
Fang et al. | Network Equipment Selection Scheme of University Informatization Construction | |
Dobson | The Role of PCI Express® in Wired Communications Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |