CN117631775A - Multi-GPU server based on 2U chassis - Google Patents

Multi-GPU server based on 2U chassis Download PDF

Info

Publication number
CN117631775A
CN117631775A CN202311402460.2A CN202311402460A CN117631775A CN 117631775 A CN117631775 A CN 117631775A CN 202311402460 A CN202311402460 A CN 202311402460A CN 117631775 A CN117631775 A CN 117631775A
Authority
CN
China
Prior art keywords
gpu
board
chassis
cpu
water
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311402460.2A
Other languages
Chinese (zh)
Inventor
林子健
余世茂
张海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Metabrain Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Metabrain Intelligent Technology Co Ltd filed Critical Suzhou Metabrain Intelligent Technology Co Ltd
Priority to CN202311402460.2A priority Critical patent/CN117631775A/en
Publication of CN117631775A publication Critical patent/CN117631775A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/18Packaging or power distribution
    • G06F1/181Enclosures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/18Packaging or power distribution
    • G06F1/183Internal mounting support structures, e.g. for printed circuit boards, internal connecting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/20Cooling means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2200/00Indexing scheme relating to G06F1/04 - G06F1/32
    • G06F2200/20Indexing scheme relating to G06F1/20
    • G06F2200/201Cooling arrangements using cooling fluid
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Power Engineering (AREA)
  • Cooling Or The Like Of Electrical Apparatus (AREA)

Abstract

The application provides a multi-GPU server based on a 2U chassis, which comprises a chassis body, and an interaction node, a GPU node, a heat dissipation module, a hard disk, a power panel, a CPU board, a GPU board and a switch board which are positioned in the chassis body; the CPU board is provided with at least one CPU chip, and the switch board is provided with at least one switch chip; the heat dissipation module at least comprises a water cooling module, wherein the water cooling module is arranged above at least one of the CPU board, the GPU board and the switch board and is used for reducing the temperature of a chip on a corresponding board; the height of the case body is 2U, and the height of the water cooling module is 10mm. Through the server provided by the application, the problem that high-efficiency heat dissipation cannot be met when a plurality of GPUs are integrated on the 2U high-altitude server is solved.

Description

Multi-GPU server based on 2U chassis
Technical Field
The application relates to the technical field of server architecture, in particular to a multi-GPU server based on a 2U chassis.
Background
With the important breakthrough of AI technology application, various industries have increased the investment to artificial intelligence, and the computing power demand presents explosive growth situation, so servers with a large number of GPUs are more favored by the market. The multi-GPU server has been widely used in the fields of data development, machine learning, deep learning, graphics rendering, and the like, and is excellent in performance.
The specifications of the servers are typically designed according to the standard dimensions of the racks or racks to ensure that the servers fit and fit within the standard racks or racks. The 2U standard server can more effectively utilize the vertical space of the rack or cabinet, the data center can better plan and manage the server architecture, the utilization rate of the whole computing resource is improved, and the ever-increasing computing demands are met.
With the increasing computational demands of multiple GPU-type servers, the power consumption of the servers is increasing, and therefore this type of server typically requires the configuration of an efficient heat sink. The traditional air cooling technology cannot meet the increasing heat dissipation requirement gradually, and the air cooling radiator is installed on the GPU at too high a height, occupies too large space and is difficult to be suitable for a server with the 2U standard. The traditional liquid cooling technology is usually immersed heat dissipation, a special cabinet and a liquid cooling environment are usually required to be equipped, space occupation and maintenance cost are increased, the traditional liquid cooling technology is still difficult to be suitable for a server with the 2U standard, and the overall space utilization rate is low.
Disclosure of Invention
The embodiment of the application provides a multi-GPU server based on a 2U chassis, which aims to solve the problems that in the prior art, a plurality of GPUs are integrated on a 2U high-level server to have huge heat dissipation challenges and space limitation.
The application provides a multi-GPU server based on a 2U chassis, which comprises a chassis body, and interaction nodes, GPU nodes, a heat dissipation module, a hard disk, a power panel, a CPU board, a GPU board and a switch board which are positioned in the chassis body; the CPU board is provided with at least one CPU chip, and the switch board is provided with at least one switch chip; wherein,
the heat dissipation module at least comprises a water cooling module, wherein the water cooling module is arranged above at least one of the CPU board, the GPU board and the switch board and is used for reducing the temperature of a chip on a corresponding board;
the height of the case body is 2U, and the height of the water cooling module is 10mm.
Optionally, the water cooling module comprises a CPU cold plate, a GPU cold plate and a switch cold plate; the CPU cold plate is arranged on the CPU plate, the GPU cold plate is arranged on the GPU plate, and the switch cold plate is arranged on the switch plate.
Optionally, the CPU cold plate, the GPU cold plate and the switch cold plate each include a heat dissipation plate, a plurality of water channels are formed in the heat dissipation plate, a water inlet pipeline and a water outlet pipeline which are communicated with the water channels are arranged on the heat dissipation plate, the water inlet pipeline is used for being communicated with a cooling medium input device, and the water outlet pipeline is used for being communicated with a cooling medium output device; wherein,
the cooling medium input device inputs cooling medium into each water channel through the water inlet pipeline, the cooling medium exchanges heat with the chips on the corresponding plates contacted with the heat dissipation plates, and the cooling medium after heat exchange is discharged out of the water channels through the water outlet pipeline so as to reduce the temperature of the chips;
the water inlet pipeline is provided with at least one water outlet pipeline, the water outlet pipeline is provided with at least one water outlet pipeline, and the water inlet pipelines and the water outlet pipelines are correspondingly communicated with different water channels.
Optionally, the water outlet pipeline of the CPU cold plate is communicated with the water inlet pipeline of the switch cold plate.
Optionally, the chassis body includes a partition board, the partition board divides a front end area of the chassis body into an upper layer space and a lower layer space, the upper layer space is provided with the hard disk, the power panel and the CPU board, the lower layer space is provided with the GPU node, and an end area of the chassis body is provided with the interaction node;
wherein, the GPU node is provided with the GPU board; the switch board is arranged on the interaction node;
the height of the GPU node is 1U, and the height of the interaction node is close to the height of the case body.
Optionally, the heat dissipation module further includes a fan module, where the fan module is configured to dissipate heat inside the chassis body; the fan module is located in the upper space.
Optionally, the GPU node is slidably connected with the chassis body, and the interaction node is slidably connected with the chassis body.
Optionally, a plurality of PCIE cards and a bracket are installed on the interaction node, a first part of the PCIE cards are set along a height direction of the interaction node, and the rest of the PCIE cards are placed on the bracket along a height direction perpendicular to the interaction node;
the number of PCIE cards in the first part is larger than that of PCIE cards in the remaining part.
Optionally, the bracket is provided with an OCP network card module.
Optionally, the interaction node is in high-density connection with the GPU node.
Optionally, the interaction node is electrically connected to the BUSBAR.
Aiming at the prior art, the application has the following advantages:
the embodiment of the application provides a multi-GPU server based on a 2U chassis, which comprises a chassis body, and an interaction node, a GPU node, a heat dissipation module, a hard disk, a power panel, a CPU board, a GPU board and a switch board which are positioned in the chassis body; the CPU board is provided with at least one CPU chip, and the switch board is provided with at least one switch chip; the heat dissipation module at least comprises a water cooling module, wherein the water cooling module is arranged above at least one of the CPU board, the GPU board and the switch board and is used for reducing the temperature of a chip on a corresponding board; the height of the case body is 2U, and the height of the water cooling module is 10mm.
Through adopting the technical scheme of this application, carry on a plurality of GPUs and water-cooled heat dissipation module in the quick-witted case body of 2U height, can provide powerful parallel computing power, be applicable to the AI, the scene that computational power demands such as cloud calculation are big, it utilizes water or other coolant to circulate in confined pipeline, take away the heat that the inside heating element of server produced, it has better cooling effect to have traditional forced air cooling heat dissipation relatively, be applicable to the chip that the consumption is bigger, when satisfying the high-efficient heat dissipation demands such as many GPUs and many CPUs, the high of water-cooled module is only 10mm, make the complete machine accord with standard 2U server height, the space has been saved greatly, the machine case inner space utilization has been improved, and a structure is compacter.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is an exploded view of a chassis body and nodes of a 2U chassis-based multi-GPU server provided in an embodiment of the present application;
fig. 2 is an exploded view of the overall structure of a 2U chassis-based multi-GPU server according to an embodiment of the present application;
fig. 3 is a schematic overall structure of a chassis body according to an embodiment of the present disclosure;
fig. 4 is a schematic overall structure of an interaction node according to an embodiment of the present application;
fig. 5 is a schematic overall structure of a GPU node according to an embodiment of the present application;
fig. 6 is a schematic diagram of the overall structure of a CPU cold plate according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of the overall structure of a switch cold plate according to an embodiment of the present application.
Reference numerals:
[ description of the drawings ]
1. A chassis body; 2. an interaction node; 3. GPU nodes; 4. a fan module; 5. a hard disk; 6. a power panel; 7. a CPU board; 8. a GPU board; 9. a Switch board; 10. a CPU cold plate; 11. GPU cold plate; 12. swich cold plate; 1-1, a chassis upper cover; 1-2, a separator; 2-1, PCIE card; 2-2, PCIE brackets; 2-3, a handle; 2-4, a cross beam; 2-5, crocodile clips; 3-1, node handles; 3-2, sliding rails; 3-3, node upper cover; 10-1, a CPU cold plate front quick connector; 10-2, a CPU cold plate rear quick connector; 12-1, switch cold plate quick connector.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It should be noted that racks or cabinets have different sizes and specifications. Standard rack or cabinet dimensions generally follow international standards, typically in units of U height, and millimeters in units of width and depth. Common racks or cabinets are 42U, 47U, 37U, 32U, 20U, 12U, 6U, etc. Racks or cabinets of various models are typically configured with servers of different sizes and specifications, and the design of the servers is required to conform to the standard dimensions of the rack, typically the width of the servers is fixed at 19 inches (48.26 cm) to fit the standard rack and cabinet. The height of the server is U, 1U is equal to 4.445 cm, common servers are 1U, 2U, 4U, 8U and the like, and servers with different heights have different performances and expansibility.
Because the 2U server not only can provide more hard disk 5 slots, expansion card slots, memory slots, power modules and the like, and meet the increasing calculation demands, but also can support cooling fans and cooling fins, the working temperature of the server is reduced to a certain extent, meanwhile, the 2U height is relatively low, more servers can be installed in a limited space, and the density and the efficiency of a data center are improved. Therefore, the 2U server can balance the volume and the performance of the server, does not occupy excessive space and sacrifice excessive functions, and is a main stream requirement of server selection in the current market.
In order to meet the rapid development demands in the fields of big data, artificial intelligence, cloud computing and the like, server hardware equipment with more GPUs (Graphic Processing Unit, graphics processors) mounted on a server meets the main stream demands of the market. Nowadays, the power consumption of the server is higher and higher due to the improvement of the computing performance, and the higher the power consumption is, the higher the performance of the heat sink is, therefore, the server should be generally configured with a high-performance heat sink. Today, there is a 4U multi-GPU server chassis on the market, and 10 external GPUs in the form of full-height PCIE (Peripheral Component Interconnect Express Expansion Card, peripheral component interconnect express card) cards are assembled on the back side of the chassis. The server adopts air cooling to dissipate heat, the bandwidth and the signals are not as excellent as those of the GPU in the on board mode, and meanwhile, the structural limitation of the full-height card also enables the height of the chassis to reach 4U, and a large part of space is occupied. Moreover, another type of server structure supporting 8 GPUs on the market adopts a modularized design, and the server structure comprises a chassis, a power module, a storage module, an interaction module, a heat dissipation module, a GPU module and an expansion module. The server structure adopts a heat dissipation mode, which leads to the need of selecting a higher heat radiator on the GPU and being provided with more fans, occupies most of the space inside the server, and leads the overall height of the server to reach 4U.
It can be known that, in general, the height of the GPU server is often 4U or more due to the excessively high height of the air-cooled radiator. If only the specification standard of the 2U server is considered, the required heat dissipation requirement cannot be met under the condition of carrying multiple GPUs; however, simply satisfying the efficient heat dissipation requirement under the condition of multiple GPUs, a large number of air-cooled heat sinks with large height or water-cooled heat sinks with complex heat dissipation environments need to be mounted on the server, so that the whole machine cannot meet the standard 2U server height, therefore, the 2U high server capable of mounting multiple GPUs is still blank in the market, and for application scenes with large calculation power requirements such as high-performance calculation, artificial intelligence, cloud calculation and the like, huge heat dissipation challenges and space limitations exist for integrating multiple GPUs on the 2U high server.
In view of this, referring to fig. 1-7, fig. 1 is an exploded view between a chassis body and a node of the 2U chassis-based multi-GPU server shown in the present application; FIG. 2 is an exploded view of the overall structure of the 2U chassis-based multi-GPU server shown in the present application; fig. 3 is a schematic diagram of the overall structure of the chassis body shown in the present application; fig. 4 is a schematic diagram of the overall structure of the interaction node shown in the present application; FIG. 5 is a schematic diagram of the overall structure of the GPU node shown in the present application; FIG. 6 is a schematic view of the overall structure of the CPU cold plate shown in the present application; FIG. 7 is a schematic diagram of the overall structure of the switch cold plate of the present application.
The embodiment of the application aims at providing a multi-GPU server based on a 2U chassis, which comprises a chassis body 1, and an interaction node 2, a GPU node 3, a heat dissipation module, a hard disk 5, a power panel 6, a CPU (Central Processing Unit) board, a GPU board 8 and a Switch Chip (Switch Chip) board which are positioned in the chassis body 1; wherein, a plurality of GPU chips are arranged on the GPU board 8, at least one CPU chip is arranged on the CPU board 7, and at least one Switch chip is arranged on the Switch board 9; the heat dissipation module at least comprises a water cooling module, wherein the water cooling module is arranged above at least one of the CPU board 7, the GPU board 8 and the Switch board 9 and is used for reducing the temperature of a chip on a corresponding board; the height of the case body 1 is 2U, and the height of the water cooling module is 10mm.
Specifically, the electronic elements on the server are numerous, and several modules of the interaction node 2, the GPU node 3, the heat dissipation module, the hard disk 5, the power board 6, the CPU board 7, the GPU board 8 and the Switch board 9 can form a complete 2U height server. Wherein GPU node 3 refers to an independent computing unit within the server chassis. The interaction node 2 is responsible for communication with external networks and storage devices. The chassis body 1 may be used to house a main body module of a server and protect the main body module from the external environment. The interaction node 2 and the GPU node 3 may be respectively responsible for handling different computing tasks, and the interaction node 2 may be responsible for coordinating and managing data exchanges and communications between different components, e.g. running an operating system and various application programs, such as databases, web servers, file servers, etc. The GPU node 3 generally comprises a plurality of GPUs, which can provide high-performance parallel computing capability, and is suitable for application fields of artificial intelligence, graphics rendering, scientific simulation, deep learning and other application fields requiring high-performance graphics processing capability.
The interaction node 2 may also act as a controller for the GPU node 3, communicating with the GPU node 3 via a PCIe bus or other high speed interconnect. The interaction node 2 may be provided with a plurality of PCIe slots to insert PCIe cards 2-1, so as to extend various functions, such as network adapters, graphics cards, storage controllers, and the like. In some embodiments, the interaction node 2 may be connected with the GPU node 3 through a high density connector. As will be appreciated, a high density connector refers to a connector that compactly arranges a plurality of electrical terminals on a small interface, allowing for efficient rapid communication and data transfer between different components. The two nodes are connected through the high-density connector, so that the plugging force is reduced, and meanwhile, the space can be saved, so that the installation requirement in the case body 1 at the height of 2U is met. Illustratively, the high density connectors are of different types and specifications, such as LFH (Low Force Helix Connector, low force screw Connector), D-Sub (D-Sub female Connector), M8/M12 (M8/M12 Connector), and the like.
Therefore, the server provided by the embodiment of the application adopts a multi-node architecture, can be independently assembled and is more convenient. The GPU node 3 can be maintained on the cabinet, and the GPU is maintained more quickly.
Wherein the hard disk 5 may be used for storing data and files. Wherein the power supply board 6 may be used to provide power to the server. The CPU board 7 may be a motherboard or a board card capable of mounting a plurality of CPUs. The CPU board 7 may be provided with a memory bank, which may be used to store data and instructions during operation of the server. The GPU board 8 may refer to a motherboard or a board card on which a plurality of GPUs are mounted. The Switch board 9 may refer to a motherboard or a board card of a plurality of Switch chips. The on-board GPU can integrate a plurality of GPUs on the main board, and compared with an independent display card, the on-board GPU can save space, cost and power consumption.
As a specific explanation of the present embodiment, the heat radiation module is a structure for providing heat radiation to the heating element in the chassis body 1. The chip is used as a main heating element, a large amount of heat is generated in the operation process, and if effective heat dissipation is not carried out, the chip temperature is too high, so that the performance is reduced and even hardware is damaged. The heat radiation module provided by the embodiment of the invention adopts the water cooling module, can utilize water or other liquid media to circulate in a closed pipeline, and can transmit heat generated by the GPU chip, the CPU chip or the switch chip to the heat radiation module, and then the heat radiation module is radiated by a fan or natural convection. As a specific explanation of this embodiment, the water cooling module may be provided on the CPU board 7, and heat dissipation is provided to the CPU and the memory bank on the board. The water cooling module can be arranged on the GPU board 8 to provide heat dissipation for chips such as the GPU on the board. The water cooling module may be mounted on the Switch board 9 to provide heat dissipation for the Switch chips on the board.
Preferably, the water cooling module comprises a CPU cold plate 10, a GPU cold plate 11 and a Switch cold plate 12; wherein the CPU cold plate 10 is mounted on the CPU board 7, the GPU cold plate 11 is mounted on the GPU board 8, and the Switch cold plate 12 is mounted on the Switch board 9. A plurality of cold plates are simultaneously installed on the CPU board 7, the GPU board 8 and the Switch board 9 to radiate heat for the heat generating elements on the core board.
In this embodiment, at least one CPU chip slot, a power supply interface, and an interface connected with other components may be disposed on the CPU board 7, where each CPU chip slot is correspondingly plugged with a CPU chip for processing a computing task and a control instruction of the server. At least one GPU chip slot, a power supply interface and an interface connected with other components can be arranged on the GPU board 8, and each GPU chip slot can be inserted with one GPU chip for processing the graphic computation of the server and accelerating the computation task. At least one Switch chip may be provided on the Switch board 9 for controlling the communication and connection between the different components inside the server.
In a specific example, in the case body 1 with the height of 2U, through the combination of the modules, the application can be provided with 8 onboard GPUs, 2 paths of CPUs and 10PCIE expansion cards, and the configuration of the 8 GPUs and the 10PCIE cards 2-1 can enable the server to have strong computing power, so that the application scenarios with high computing power requirements such as deep learning and video processing are applicable, and the CPU, GPU, switch chips and the memory strips are both provided with a heat dissipation mode of a water cooling module, so that the application scenarios have better cooling effect compared with the traditional air cooling heat dissipation, and are applicable to chips with larger power consumption. The problem that the height of the traditional multi-GPU server is often 4U or more due to the fact that the height of the GPU air-cooled radiator is too high is solved.
It is emphasized that the water cooling module adopted by the invention can be designed to be 10mm high, so that the space is greatly saved, the whole machine is 2U standard height, the machine cabinet can be adapted to a 19 inch standard machine cabinet, the utilization rate of the space in the machine cabinet is improved, and the structure is more compact. The multi-GPU server can provide more parallel computing capacity and higher processing performance for deep learning, big data and other high-performance computing tasks, and can be mounted in the 2U server to realize more computing resources in a limited physical space, so that the cabinet space and the power consumption are saved, and the operation and maintenance cost and the environmental influence are reduced. Therefore, for meeting the increasing high-performance computing demands, especially in the field of artificial intelligence, the structural design of the 2U-height server capable of carrying 8GPU in the embodiment of the invention meets the main market demands, and the server of the structural type provides a more efficient and compact solution for research institutions, enterprises and scientific research institutions so as to meet the increasing computing demands.
In some embodiments, the heat dissipation module further comprises the fan module 4. The fan module 4 comprises a plurality of fans, the fan module 4 can provide strong wind power, the heat dissipation performance of the water cooling module is enhanced in an auxiliary mode, heat of other electronic components on the core board is taken away, the heat is discharged from the inside of the chassis, and the overall heat dissipation effect of the server is guaranteed. More specifically, the water cooling module utilizes the liquid cooling principle, mainly distributes the heat generated by the chip to the surrounding environment through heat conduction and heat convection, and when the water cooling module conducts a large amount of heat generated by the chip to the shell of the water cooling module through the circulation of liquid, the active air flow generated by the fan can perform active heat convection exchange with the water cooling module, so that the working temperature of the chip is reduced.
In this way, the server in the embodiment of the invention adopts the air cooling and liquid cooling superimposed heat dissipation mode, wherein the high-power chip on the board card comprises a CPU, a GPU and a Switch chip, and the CPU cold plate 10, the GPU cold plate 11 and the Switch cold plate 12 are used for heat dissipation and cooling respectively. The fan may provide heat dissipation for PCIE card 2-1 and other components on the board. For PCIE cards 2-1 and other components with low heat productivity, only a small number of fans with smaller specifications are required to be configured to meet the heat dissipation requirement. Therefore, the embodiment of the invention can realize the efficient heat dissipation effect in a limited space, not only can meet the heat dissipation requirement of a high-performance chip, but also can save the space of a case and the power consumption, and solve the heat dissipation problem of the multi-GPU server at the height of 2U.
In some possible technical solutions, the CPU cold plate 10, the GPU cold plate 11, and the Switch cold plate 12 each include a heat dissipation plate, and a plurality of water channels are formed in the heat dissipation plate, and a water inlet pipeline and a water outlet pipeline that are communicated with the water channels are arranged on the heat dissipation plate, and are used for communicating with a cooling medium input device, and the water outlet pipeline is used for communicating with a cooling medium output device; the cooling medium input device inputs cooling medium into each water channel through the water inlet pipeline, the cooling medium exchanges heat with the chips on the corresponding plates contacted with the heat dissipation plates, and the cooling medium after heat exchange is discharged out of the water channels through the water outlet pipeline so as to reduce the temperature of the chips; the water inlet pipeline is provided with at least one water outlet pipeline, the water outlet pipeline is provided with at least one water outlet pipeline, and the water inlet pipelines and the water outlet pipelines are correspondingly communicated with different water channels.
Specifically, the CPU cold plate 10, the GPU cold plate 11 and the Switch cold plate 12 all adopt the same principle, but the shapes and structures of the heat dissipation plates of the CPU cold plate 10, the GPU cold plate 11 and the Switch cold plate 12 may be different, they may be determined according to the size and shape of the chip, the thickness of the cold plate may also be determined according to the height of the chip and the heat dissipation requirement, and for clarity and conciseness, the embodiment of the present invention will be described by taking the CPU cold plate 10 as an example, and the cold plate may be referred to.
Specifically, the CPU cooling plate 10 includes a heat radiating plate, which may be rectangular, square, circular, or other shaped. The heat dissipation plate can be made of a metal material with high heat conduction performance, the bottom surface of the heat dissipation plate is tightly attached to the CPU below, a plurality of water channels are designed in the heat dissipation plate, and cooling media flow in the water channels. The cooling medium is guided to the vicinity of the chip needing heat dissipation by the plurality of water channels, heat is taken away after the heat is absorbed, then the heat is transferred to the pipe wall of the water channels, and finally the heat is dissipated through active air flow generated by the fan or surrounding air. The heat dissipation plate is provided with a water inlet and a water outlet, the water inlet is respectively communicated with the water channel and the water inlet pipeline, the water inlet pipeline is communicated with the cooling medium input device so as to lead cold water of the water inlet into the water channel to absorb heat, and the water outlet is respectively communicated with the water channel and the water outlet pipeline so as to lead hot water after heat exchange out of the water channel, and the water channel is recycled after heat dissipation.
It will be appreciated that the configuration of the waterway is dependent upon the shape of the heat sink. For example, the heating panel can include a plurality of fin that sets up on bottom plate and the bottom plate, the inside of bottom plate and fin is the cavity formation and corresponds the water course to the heating panel can be designed to thickness less and high lower, with its high control for 10mm, the face area that carries out the heat exchange with the chip and the cooling medium flow of heat absorption are all higher, and the cooling medium after the heat absorption carries out heat transfer to a plurality of fin on, the fin carries out the heat exchange with the air the area big, further realize high-efficient heat dissipation, satisfy the heat dissipation demand of the 2U server of many GPUs simultaneously.
In some embodiments, it may be preferable to use two water inlet pipes respectively connected to two water inlets of the heat dissipation plate, and two water outlet pipes respectively connected to two water outlets of the heat dissipation plate, so that the cooling medium is accelerated to flow into the plurality of water channels and be uniformly distributed in the water channels, and after heat exchange, the cooling medium flows out of the water channels rapidly to perform the next circulation.
It can be appreciated that the direction of the water inlet pipe and the water outlet pipe can be optimally designed according to the internal space layout of the server and the heat dissipation requirement, so as to ensure that the cooling medium can be fully and uniformly distributed on the heat dissipation plate. For example, the inlet and outlet lines may be flexible hoses that can be bent within the server, typically to minimize the length of the lines.
In some embodiments, the connection of the water inlet pipeline and the water outlet pipeline is provided with a connection interface, and the water pipe is connected to the water inlet and the water outlet on the cold plate through the connection interface. The water inlet, the water outlet, the water inlet pipeline, the water outlet pipeline and the connecting joint of the heat dissipation plate can be connected through the connecting piece and the sealing piece, so that smooth water flow and no leakage are ensured.
In a preferred embodiment, the heat dissipation plate covers the CPU board 7, so that the memory banks on both sides of the CPU can be more fully covered, the heat dissipation coverage is improved, and the heat dissipation efficiency of the whole CPU board 7 area is ensured.
In one possible solution, the plurality of water channels are mutually communicated, and at least one water guide pipe communicated with different water channels is arranged on the heat dissipation plate; and the outlet of the water guide pipe is close to the position corresponding to the chip on the heat dissipation plate. In this scheme, the entry of water pipe can communicate with one of them water course, set up the export of water pipe according to the position of chip again, set up the entry of water pipe in the chip position place after the heating panel is installed to be convenient for distribute cooling medium's flow, guide more cooling medium can flow to the heating panel department of chip top, absorb the heat that the chip produced fast, and take away it rapidly, make the heating panel can optimize the heat dissipation to the heat distribution of CPU board 7, the local high temperature of CPU board 7 has been avoided. The inlet of the water guide pipe can be connected to the water channel close to the chip after the heat dissipation plate is installed, so that the length of the water guide pipe is shortened.
As a further improvement of the present embodiment, the water outlet line of the CPU cold plate 10 communicates with the water inlet line of the Switch cold plate 12. In this embodiment, outlets of two water outlet pipelines of the CPU cold plate 10 are provided with a CPU cold plate rear-mounted quick connector 10-2, inlets of two water inlet pipelines are provided with a CPU cold plate front-mounted quick connector 10-1, and inlets of two water inlet pipelines of the Switch cold plate 12 are provided with a Switch cold plate quick connector 12-1, and the CPU cold plate rear-mounted quick connector 10-2 and the Switch cold plate quick connector 12-1 are in butt joint, and share one waterway. In the server in the embodiment of the application, the heat generation amount of the CPU chip and the Switch chip is smaller than that of the GPU chip, the CPU cold plate 10 for radiating the CPU chip and the Switch cold plate 12 for radiating the Switch chip can form a liquid cooling circulation system, the number and the length of waterways can be reduced, and the space layout of the server is optimized.
The CPU cold plate 10 and the Switch cold plate 12 are detachably connected by corresponding quick connectors in the chassis, so that the two cold plates can be separately installed while sharing the same waterway circulation is met, and the installation and detachment difficulty is reduced.
It is worth mentioning that, under the condition that the GPU board 8 is loaded with 8 GPUs, higher heat can be generated during operation, so that the heat generated by the GPU can be better controlled and processed by independently using the water channel, and the influence on the heat dissipation effect of the GPU due to the fact that the temperature of the cooling medium is increased after heat exchange is avoided.
In addition, in order to facilitate the connection between the water outlet pipeline of the CPU cold plate 10 and the water inlet pipeline of the Switch cold plate 12, the CPU cold plate 10 may be arranged right in front of the Switch cold plate 12, so as to facilitate the pipeline trend. Specifically, the technical scheme adopted by the invention can be as follows:
the chassis body 1 comprises a partition board 1-2, the partition board 1-2 divides the front end area of the chassis body 1 into an upper space and a lower space, the upper space is provided with the hard disk 5, the power panel 6, the CPU board 7 and the fan module 4, the lower space is provided with the GPU node 3, and the tail end area of the chassis body 1 is provided with the interaction node 2; wherein, the GPU node 3 is provided with the GPU board 8; the Switch board 9 is arranged on the interaction node 2; the height of the GPU node 3 is 1U, and the height of the interaction node 2 is close to the height of the chassis body 1.
In this embodiment, the interaction node 2 is installed at the rear part of the chassis body 1, is close to the chassis body 1 at the same height, integrally forms a server with a height of 2U, and the GPU node 3 occupies a space with a height of 1U at the front part of the chassis and is located at the lower layer of the chassis body 1. The middle of the height of the case body 1 is provided with a partition board 1-2, the front part of the server is divided into an upper layer and a lower layer, the upper layer is provided with a fan module 4, a hard disk 5, a power panel 6 and a CPU panel 7, which are all arranged on the partition board 1-2, and the lower layer is provided with a GPU node 3. The upper and lower two-layer structure of the case body 1 can effectively distinguish functions and layouts of different components, so that space utilization inside the server is more reasonable, and optimization of overall heat dissipation design and layout is facilitated. Meanwhile, the CPU cold plate 10 is positioned above the CPU plate 7, namely in the upper space of the front part of the case body 1, the Switch cold plate 12 is positioned above the Switch plate 9, namely in the rear part of the case body 1, and the heights of the two are consistent, so that the Switch cold plate quick connector 12-1 on the Switch cold plate 12 can be fixed on the cross beam 2-4, and the CPU cold plate rear quick connector 10-2 is inserted with the Switch cold plate quick connector 12-1 positioned at the rear.
Specifically, the fan module 4, the hard disk 5, the power board 6 and the CPU board 7 in the upper space of the chassis body 1 may be:
first, a fan module 4 is installed, the fan is fixed on the partition board 1-2, and the fan power line is connected. The fan module 4 is usually located at the front end portion, then the hard disk 5 is mounted, the hard disk 5 is inserted into the hard disk 5 bracket on the partition board 1-2 and connected with the SATA data line and the power line, the hard disk 5 is located at the forefront, then the power board 6 is mounted, the power board 6 is fixed on the partition board 1-2 and connected with the power line and the switch line, the power board 6 may be located at the rearmost of the partition board 1-2, finally the CPU board 7 is mounted, the CPU board 7 is fixed on the partition board 1-2 and connected with the CPU power line, and the CPU board 7 may be mounted at the middle position of the partition board 1-2 and located between the fan module 4 and the power board 6. After the installation is finished, the upper cover 1-1 of the case can be covered to form a server with the height of 2U.
In a further embodiment, as shown in fig. 5, the GPU node 3 may slide in the next space through the slide rail 3-2 and the handle 2-3. Specifically, the node handle 3-1 of the GPU node 3 can facilitate the operator to push and pull the whole node, and has the functions of saving labor and limiting. The sliding rail 3-2 is matched with the side wall of the case body 1 through rollers, so that the nodes can slide in the case conveniently. Wherein the GPU node 3 is provided with a node upper cover 3-3 which plays a role in providing protection for the internal devices of the node and enhancing the structural strength of the node. The interaction node 2 is installed behind the GPU node 3 and the partition board 1-2, and can also slide in the rear space of the case body 1 through the handle 2-3, so that the node can be installed and disassembled. The handle 2-3 makes maintenance and operation of the corresponding node more convenient, and is helpful for reducing maintenance cost and improving work efficiency.
In some embodiments, a plurality of PCIE cards 2-1 and a rack 2-2 are installed on the interaction node 2, a first portion of the PCIE cards 2-1 are disposed along a height direction of the interaction node 2, and the remaining portion of the PCIE cards 2-1 are placed on the rack 2-2 along a height direction perpendicular to the interaction node 2; wherein, the number of the PCIE cards 2-1 in the first part is larger than the number of the PCIE cards 2-1 in the rest. Further, an OCP network card module is arranged on the bracket 2-2.
In this embodiment, as shown in fig. 4, 10PCIE cards 2-1 may be installed on the interaction node 2, including 8 PCIE cards 2-1 inserted vertically, and two PCIE cards 2-1 provided on the PCIE bracket 2-2. The 10PCIE cards 2-1 may support different functions and devices. In addition, the PCIE bracket 2-2 can be provided with an OCP network card module, so that the integration level is high. The interactive node 2 is designed and provided with a plurality of PCIE cards 2-1, and comprises the PCIE cards 2-1 which are horizontally arranged on the vertical PCIE card 2-1 and the bracket 2-2, and an integrated OCP network card module, so that the server has higher expansibility and flexibility, and can support more external devices and expansion cards. The PCIE card 2-1 and the OCP network card module are integrated in the interaction node 2, the design form can fully utilize the space of the 2U chassis, improve the overall space utilization rate of the server, meet the high-density installation requirement, enable the server to provide more computing resources in a limited space, and achieve high performance and high efficiency of the multi-GPU server.
In an alternative embodiment, the OCP module on PCIE bracket 2-2 may be replaced with an all-high PCIE card 2-1 according to the usage scenario. The full-height PCIE card 2-1 can be tiled on the bracket 2-2, and meets the installation requirement of a 2U server.
In combination with the above embodiment, in a further preferred scheme, the rear end of the server may adopt a bus bar centralized power supply mode, and the interaction node 2 may implement complete machine power supply by inserting the crocodile clip 2-5 of itself and the bus bar on the cabinet. The BUSBAR centralized power supply mode can save space in the chassis and provide more installation positions and heat dissipation air flow for other components. The crocodile clips 2-5 on the interaction node 2 and the BUSBAR on the cabinet are inserted, so that power connection and management can be simplified, unnecessary power components and cables are reduced, the overall structure of the server is more compact, maintenance and management are easy, PSU (Power Supply Unit ) space is omitted, the space utilization rate in the case body 1 is further improved, and the structure is more compact.
It should be noted that, for clarity and conciseness, descriptions of well-known functions and structures of the interaction node 2, the GPU node 3, the fan module 4 and other related electronic devices according to the present invention are omitted, and if not fully described, reference may be made to the prior art.
So, the multi-GPU server based on 2U machine case that this application designed can be equipped with 8 on-board GPU,2 way CPU,10 PCIE expansion cards, GPU, CPU, switch all adopt cold plate to strengthen radiating mode on chip and the memory strip, PCIE card 2-1 adopts the fan to assist radiating mode, the server height is standard 2U height, the rear end adopts BUSBAR to concentrate power supply mode, save PSU space, further improve quick-witted incasement space utilization, thereby it is strong to play an calculation power, the heat dissipation is abundant, compact structure's server, be applicable to multiple calculation power demand higher application scenario such as deep learning, video processing.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It should also be noted that, in the present document, the terms "upper", "lower", "left", "right", "inner", "outer", etc. indicate an orientation or a positional relationship based on that shown in the drawings, and are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or element to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Moreover, relational terms such as "first" and "second" may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions, or order, and without necessarily being construed as indicating or implying any relative importance. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal.
The foregoing description of the 2U chassis-based multiple GPU server provided in the present application has been provided in detail, and specific examples have been applied herein to illustrate the principles and embodiments of the present application, and the description of the foregoing examples is only for aiding in understanding the present application, and the disclosure should not be construed as limiting the present application. Also, various modifications in the details and application scope may be made by those skilled in the art in light of this disclosure, and all such modifications and variations are not required to be exhaustive or are intended to be within the scope of the disclosure.

Claims (11)

1. The multi-GPU server based on the 2U chassis is characterized by comprising a chassis body, and interaction nodes, GPU nodes, a heat dissipation module, a hard disk, a power supply board, a CPU board, a GPU board and a switch board which are positioned in the chassis body; the CPU board is provided with at least one CPU chip, and the switch board is provided with at least one switch chip; wherein,
the heat dissipation module at least comprises a water cooling module, wherein the water cooling module is arranged above at least one of the CPU board, the GPU board and the switch board and is used for reducing the temperature of a chip on a corresponding board;
the height of the case body is 2U, and the height of the water cooling module is 10mm.
2. The 2U chassis-based multi-GPU server of claim 1, wherein the water cooling module comprises a CPU cold plate, a GPU cold plate, and a switch cold plate; the CPU cold plate is arranged on the CPU plate, the GPU cold plate is arranged on the GPU plate, and the switch cold plate is arranged on the switch plate.
3. The multi-GPU server based on the 2U machine case according to claim 2, wherein the CPU cold plate, the GPU cold plate and the switch cold plate all comprise a radiating plate, a plurality of water channels are arranged in the radiator plate, a water inlet pipeline and a water outlet pipeline which are communicated with the water channels are arranged on the radiating plate, the water inlet pipeline is used for being communicated with a cooling medium input device, and the water outlet pipeline is used for being communicated with a cooling medium output device; wherein,
the cooling medium input device inputs cooling medium into each water channel through the water inlet pipeline, the cooling medium exchanges heat with the chips on the corresponding plates contacted with the heat dissipation plates, and the cooling medium after heat exchange is discharged out of the water channels through the water outlet pipeline so as to reduce the temperature of the chips;
the water inlet pipeline is provided with at least one water outlet pipeline, the water outlet pipeline is provided with at least one water outlet pipeline, and the water inlet pipelines and the water outlet pipelines are correspondingly communicated with different water channels.
4. A 2U chassis based multiple GPU server according to claim 3, wherein the outlet pipeline of the CPU cold plate is in communication with the inlet pipeline of the switch cold plate.
5. The 2U chassis-based multi-GPU server of claim 1, wherein the chassis body comprises a partition plate, the partition plate divides a front end region of the chassis body into an upper space and a lower space, the upper space is provided with the hard disk, the power board and the CPU board, the lower space is provided with the GPU node, and an end region of the chassis body is provided with the interaction node;
wherein, the GPU node is provided with the GPU board; the switch board is arranged on the interaction node;
the height of the GPU node is 1U, and the height of the interaction node is close to the height of the case body.
6. The 2U chassis-based multi-GPU server of claim 5, wherein the heat dissipation module further comprises a fan module for dissipating heat from the interior of the chassis body; the fan module is located in the upper space.
7. The 2U chassis-based multi-GPU server of claim 1, wherein the GPU node is slidably connected to the chassis body and the interaction node is slidably connected to the chassis body.
8. The 2U chassis-based multi-GPU server of claim 5, wherein a plurality of PCIE cards and shelves are installed on the interaction node, a first portion of the PCIE cards are disposed along a height direction of the interaction node, and a remaining portion of the PCIE cards are disposed on the shelves along a height direction perpendicular to the interaction node;
the number of PCIE cards in the first part is larger than that of PCIE cards in the remaining part.
9. The 2U chassis-based multi-GPU server of claim 8, wherein the bracket is provided with an OCP network card module.
10. The 2U chassis-based multi-GPU server of claim 1, wherein the interaction node is in high density connection with the GPU node.
11. The 2U chassis-based multi-GPU server of claim 1, wherein the interaction node is electrically connected to the BUSBA.
CN202311402460.2A 2023-10-26 2023-10-26 Multi-GPU server based on 2U chassis Pending CN117631775A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311402460.2A CN117631775A (en) 2023-10-26 2023-10-26 Multi-GPU server based on 2U chassis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311402460.2A CN117631775A (en) 2023-10-26 2023-10-26 Multi-GPU server based on 2U chassis

Publications (1)

Publication Number Publication Date
CN117631775A true CN117631775A (en) 2024-03-01

Family

ID=90034649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311402460.2A Pending CN117631775A (en) 2023-10-26 2023-10-26 Multi-GPU server based on 2U chassis

Country Status (1)

Country Link
CN (1) CN117631775A (en)

Similar Documents

Publication Publication Date Title
US9538688B2 (en) Bimodal cooling in modular server system
US9907206B2 (en) Liquid cooling system for a server
US20130135811A1 (en) Architecture For A Robust Computing System
US11119543B2 (en) Closed loop hybrid cooling
CN105717991B (en) Electronic device
US11497145B2 (en) Server rack and data center including a hybrid-cooled server
CN112584668A (en) Method for deploying liquid cooling solutions in air-cooled data center rooms
JP7288998B2 (en) Cold plate with anti-clogging mechanism
CN114190063B (en) Integrated directional immersion cooling type server module and data center
CN206235977U (en) A kind of VHD server architecture
CN107515657A (en) A kind of liquid-cooled suit business device containing preposition liquid-cooled air-cooling apparatus
CN211375548U (en) Water-cooling heat dissipation server based on VPX framework
CN209946807U (en) Auxiliary heat dissipation device for computer case
CN212623957U (en) High-performance node server for enterprise
CN211745074U (en) Server cabinet and heat exchange equipment cabinet for server
CN210610176U (en) Portable VPX machine case
CN117631775A (en) Multi-GPU server based on 2U chassis
CN207301958U (en) A kind of liquid-cooled suit business device containing preposition liquid-cooled air-cooling apparatus
CN116860091A (en) High-density server and wind-liquid comprehensive heat dissipation framework thereof
CN217279502U (en) Intelligence cooling computer mainboard
CN116981214A (en) Side fluid cooling apparatus for server racks
CN216387952U (en) 2U cold drawing formula liquid cooling server
CN111198601B (en) Positive and negative opposite buckling stacking type high-density server assembly structure and high-density server
CN209514495U (en) A kind of full immersed type generic server frame
CN207704386U (en) A kind of radiator of network data backup memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination