CN210742934U - Multi-GPU (graphics processing Unit) interconnection device - Google Patents

Multi-GPU (graphics processing Unit) interconnection device Download PDF

Info

Publication number
CN210742934U
CN210742934U CN201921240118.6U CN201921240118U CN210742934U CN 210742934 U CN210742934 U CN 210742934U CN 201921240118 U CN201921240118 U CN 201921240118U CN 210742934 U CN210742934 U CN 210742934U
Authority
CN
China
Prior art keywords
board
gpu
single board
gpus
design
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201921240118.6U
Other languages
Chinese (zh)
Inventor
余华国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201921240118.6U priority Critical patent/CN210742934U/en
Application granted granted Critical
Publication of CN210742934U publication Critical patent/CN210742934U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Combinations Of Printed Boards (AREA)

Abstract

The application discloses device of many GPUs interconnect, the device includes: the system comprises a first single board and a second single board, wherein the first single board and the second single board are interconnected through a buckle connector, the number of GPUs is 4 or 6, the multiple GPUs are evenly distributed in the first single board and the second single board, and at least 4 sending ends of any GPU in the first single board are connected to a receiving end of a GPU in the second single board. By the method and the device, the flexibility of multi-GPU interconnection design can be improved, the design space is increased, and the size of the single board is reduced, so that the design of an AI server is facilitated.

Description

Multi-GPU (graphics processing Unit) interconnection device
Technical Field
The present application relates to the field of server design technologies, and in particular, to a multi-GPU interconnection apparatus.
Background
With the development of big data analysis and artificial intelligence, the performance requirements of internet enterprises on servers are higher and higher, the traditional servers with pure CPU calculation cannot meet the application requirements, and further an AI server with a GPU module mounted under a CPU system appears. In the AI server, the CPU performs simple and repetitive calculations by the GPU module in order to handle the processing of mass data. However, the processing capacity of a single GPU is limited, and in order to improve the data processing efficiency, a plurality of GPU modules need to be enabled to process data simultaneously, thereby generating a demand for interconnecting GPUs by using an NVLINK bus. Therefore, how to interconnect the GPUs is an important technical problem in the AI server.
At present, the structural design of interconnecting a plurality of GPUs in a server by using an NVLINK bus is generally as follows: and realizing interconnection of a plurality of GPUs in the same single board. Taking an example of an SXM2 packaged 4NVLINK bus reference board, the reference board uses the NVLINK bus to locate 4 GPUs in the same single board, and the size of the single board is about: length 400mm and width 150 mm.
However, in the structure of interconnecting a plurality of SXM2 in a server by using an NVLINK bus, since all GPUs are disposed in the same board, various interconnection forms among the GPUs need to be implemented in the same board according to different service requirements, so that interconnection channels among the GPUs are complex, the flexibility of the design of interconnection among the GPUs is low, and the design space is small. Moreover, the existing NVLINK bus topology makes the size of the single board larger, especially in the design of the AI server, the size of the single board designed by the existing structure is too large, and the interconnection of multiple GPUs cannot be performed in such a small space of the AI server, that is: it is not beneficial to the design of the AI server.
SUMMERY OF THE UTILITY MODEL
The application provides a device for interconnecting multiple GPUs, and aims to solve the problems that in the prior art, the NVLINK bus topology is low in design flexibility, small in design space and too large in size of a single board.
In order to solve the technical problem, the embodiment of the application discloses the following technical scheme:
an apparatus of a multi-GPU interconnect, the apparatus comprising: the system comprises a first single board and a second single board, wherein the first single board and the second single board are interconnected through a buckle connector, the number of GPUs is 4 or 6, the multiple GPUs are evenly distributed in the first single board and the second single board, and at least 4 sending ends of any GPU in the first single board are connected to a receiving end of a GPU in the second single board.
Optionally, the board grades of the first single board and the second single board are both greater than or equal to the M6 grade.
Optionally, the total length of NVLINK routing between the GPU of the first board and the GPU of the second board is less than or equal to 9.5 inch.
Optionally, the total number of via holes in a single link between any GPU in the first board and any GPU in the second board is less than or equal to 4, and the via hole stub in the single link is less than or equal to 10 mil.
Optionally, the buckle connector is an anfeinuo ExaMEZZ series connector.
Optionally, the first board includes: server mainboard, LC card or line card, the second veneer includes: a server motherboard, an LC card, or a line card.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
the application provides a device for interconnecting multiple GPUs, which mainly comprises a first single board, a second single board and a buckle plate connector used for connecting the first single board and the second single board. The number of the multiple GPUs is 4 or 6, and the multiple GPUs are equally distributed in the first single board and the second single board, and at least 4 transmitting terminals of any one GPU in the first single board are connected to receiving terminals of different GPUs in the second single board. Through the topological structure of the laminated single boards, the device can evenly distribute a plurality of GPUs in the two single boards, so that the design mode of the NVLINK bus topology is more flexible, and the design space is increased. In addition, the layout of the GPUs is shared by the two single boards, and the two single boards are stacked through the buckle plate connector, so that the effect of reducing the transverse space can be achieved, the plane size of a single board is greatly reduced, the single board can be applied to the AI server, and the design of the AI server is facilitated.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a multi-GPU interconnected device according to an embodiment of the present disclosure;
FIG. 2 is a side view of 4 GPUs interconnected across boards in an embodiment of the present application;
FIG. 3 is a side view of 6 GPUs interconnected across boards in an embodiment of the present application;
FIG. 4 is a schematic topology diagram of 4GPU cross-board interconnections in the embodiment of the present application;
FIG. 5 is a schematic topology diagram of 6 GPU cross-board interconnections in the embodiment of the present application;
FIG. 6 is a diagram of the test results of active testing of a prototype GPU of a 4GPU cross-board interconnect.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For a better understanding of the present application, embodiments of the present application are explained in detail below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a multi-GPU interconnected device according to an embodiment of the present disclosure. As shown in fig. 1, the multi-GPU interconnection apparatus in this embodiment mainly includes three parts: the first single plate, the second single plate and the buckle connector. The first single board and the second single board are interconnected through a buckle connector. The device of the embodiment is mainly used for cross-board interconnection between the GPUs of the SXM2 model, and the number of the GPUs is 4 or 6, and the 4 or 6 GPUs are evenly distributed in the first single board and the second single board. The number of the GPUs is even so as to realize the uniform distribution of the plurality of GPUs in the two single boards. There are 6 RX receiving terminals and 6 TX transmitting terminals in the GPU, and at least 4 transmitting terminals of any GPU in one board are connected to the receiving terminal of the GPU in another board.
In the embodiment, the multiple GPUs are equally distributed to the two single boards, and the two single boards are stacked by using the buckle plate connector to realize the interconnected topological structure, so that the plane size of a single board can be greatly reduced, the increase of the flexibility of GPU interconnection is facilitated, the design flexibility is improved, and the method is favorably applied to the design of servers with higher space limitation, such as AI servers and the like.
In this embodiment, the first board and the second board each include: a server motherboard, an LC card, or a line card. That is, the design structure using single board lamination may be applied to two server motherboards, or may be applied to two LC cards or two line cards.
In this embodiment, the buckle connector is an Anfeno ExaMEZZ series connector. Of course, a higher performance connector with less loss and better signal transmission performance than the Anfeno ExaMEZZ series connector may be used. The characteristic impedance of the connector is 90 ohms, power loss is small, the impedance of the connector is well matched with a single board, a resonance point is proper, signal transmission quality is guaranteed, and therefore signal transmission efficiency of multi-GPU cross-board interconnection is improved. The buckle plate connector is used for connecting the first single plate and the second single plate, so that communication between GPUs in different single plates is realized. Fig. 2 is a side view of 4GPU cross-board interconnects in the embodiment of the present application, and fig. 3 is a side view of 6 GPU cross-board interconnects in the embodiment of the present application. The BTB in fig. 2 and 3 is the buckle connector.
The stack design of the first single board and the second single board in this embodiment is shown in table 1 below:
Figure DEST_PATH_GDA0002445436890000041
TABLE 1 Stack design of Single boards
In this embodiment, a single board stack design method structure is adopted, and multiple GPUs are uniformly distributed across boards in a first single board and a second single board of two single boards, and the two single boards are interconnected through a buckle connector, so that the design mode of the multiple NVLINK bus is changed from setting N GPUs on the single board to setting N/2 GPUs on the single board, the design mode is more flexible, and the design space is increased. And two single boards are used for sharing the layout of a plurality of GPUs and are stacked through a buckle plate connector, so that the effect of reducing the transverse space can be achieved, and the size of a single board is reduced. Taking the 4NVLINK bus as an example, when a single board in the prior art is adopted, the size of the LC card is 400mm long by 150mm wide, and by adopting the design method in this embodiment, the size of the LC card is about: the length is 300mm and the width is 160mm, so that the transverse space of the single plate is greatly reduced, the single plate can be applied to an AI server, and the design of the AI server is facilitated.
In addition, at least 4 transmitting terminals in any GPU in the first board are connected to a receiving terminal of a GPU in the second board, that is: at least 4 sending terminals in any GPU in the first single board are connected to different GPUs in the second single board and connected with receiving terminals of different GPUs in the second single board.
In this embodiment, a topological diagram of 4GPU cross-board interconnections may be shown in fig. 4, and a topological diagram of 6 GPU cross-board interconnections may be shown in fig. 5. In fig. 4 and 5, the board a and the board B represent two single boards, the interconnection manner between GPUs in the same single board is omitted, and the connection between GPUs interconnected across boards is represented by different line types. The size of the single board can be flexibly adjusted according to the size of the chassis, but the topology of the multiple GPUs interconnected across boards is unchanged, namely: at least 4 sending terminals in any GPU in the first single board are connected to a receiving terminal of a GPU in the second single board. The topological structure can realize the cross-board interconnection of the GPUs by the short NVLINK bus, and the NVLINK bus of the GPUs among different single boards is more reasonable in layout, so that the signal loss is reduced, and the transmission efficiency of the cross-board interconnection signals of the GPUs among different single boards is improved.
In addition, in this embodiment, the board grades of the first single board and the second single board are both greater than or equal to the M6 grade, that is, the boards of the first single board and the second single board need to adopt the M6 grade or above, and the loss in the signal transmission process of the boards of this grade is small, which is beneficial to ensuring the transmission quality of the NVLINK signals, thereby improving the data transmission efficiency of the GPU cross-board interconnection.
In this embodiment, the total length of the NVLINK routing between the GPU of the first board and the GPU of the second board is less than or equal to 9.5inch, that is, the total length of the NVLINK routing between the GPUs on different boards is less than or equal to 9.5 inch. The wiring length can realize the cross-board interconnection of the GPU by using a short NVLINK bus, can reduce the loss in the signal transmission process, and is favorable for ensuring the quality of NVLINK signal transmission.
Further, in this embodiment, the total number of via holes in a single link between any GPU in the first board and any GPU in the second board is less than or equal to 4, and the via hole stub in the single link is less than or equal to 10 mil.
It should be noted that the total number of vias in a single link in the apparatus of this embodiment is less than or equal to 4, where the vias in the single link include vias below the buckle connector. The via hole stub in a single link is less than or equal to 10mil, and the via hole stub can be controlled by a back drilling process to be within 10 mil. According to the embodiment, the number of the via holes and the specification of the via hole stub of the single link between the two GPUs which are interconnected are set in the two single boards, the phenomenon that the number of the via holes is too large and the number of the via hole stub is too large can be avoided, the number of impedance mismatching points on a signal transmission channel is favorably reduced, reflection is reduced, the quality of NVLINK signals is further improved, and the quality of signal transmission when the multiple GPUs are interconnected is favorably improved.
In order to further verify the feasibility of the device, the topology of the multi-GPU interconnection device in the embodiment is passively simulated according to the NVLINK2.0 passive standard. That is, passive simulation is performed on the topology of the multi-GPU cross-board interconnection, which may be passively simulated by using ADS software in this embodiment.
Specifically, the method for passive simulation of the topology of the multi-GPU interconnection device comprises the following steps:
a1: and establishing a transmission line model and a via hole model and collecting a buckle plate connector model.
The method for establishing the transmission line model comprises the following steps: and establishing a transmission line model by using ADS software according to the lamination design file, the plate relative dielectric constant of the single plate and the plate loss tangent factor of the single plate. The method for establishing the via hole model comprises the following steps: and establishing a via hole model by using a three-dimensional electromagnetic field solving tool. The buckle connector model may be directly obtained from the product specification of the buckle connector.
A2: and building an NVLINK interface simulation link according to the transmission line model, the via hole model and the buckle connector model.
After the transmission line model, the via hole model and the buckle connector model are obtained, the transmission line model, the via hole model and the buckle connector model are cascaded in ADS software, and then the NVLINK interface simulation link can be built.
In the embodiment, the total number of the vias in the single link of the GPU prototype is less than or equal to 4, and the vias in the single link include the vias below the buckle connector. And the via hole stub is less than or equal to 10mil, the via hole stub can be controlled by adopting a back drilling process so as to be controlled within 10 mil. The mode of designing from the angle of control via hole quantity and stub can avoid the too much and too big via hole stub of via hole quantity, is favorable to reducing the quantity of impedance mismatch point on the signal transmission passageway to reduce the reflection, and then improve the quality of NVLINK signal, be favorable to improving signal transmission's when many GPUs interconnect quality.
A3: and extracting S parameters from the NVLINK interface simulation link.
A4: and substituting the S parameter into an nvisim tool, and carrying out passive link simulation on the whole link to obtain a passive simulation result.
A5: and when the passive simulation result is unqualified, carrying out passive optimization on the laminated design of the single plates and the parameters of the via holes.
Specifically, the following passive optimization method is adopted:
a51: shortening the line length on the layout wiring of the veneer laminated design, so that the line length is less than or equal to 9.5 inch;
a52: optimizing the impedance at the through hole by using a three-dimensional electromagnetic field solving tool to ensure that the impedance is 85 +/-10% Ohm;
a53: the length of the via hole is less than or equal to 50mil by adjusting the routing layer, and the stub of the via hole is less than or equal to 10mil by using the back drilling process.
Namely, the routing layer is adjusted to ensure that the length of the effective via hole is not more than 50mil and the stub of the via hole is not more than 10 mil. Until the result of the topology passive simulation is qualified.
The overall link in this embodiment only includes GPUs connected by a snap connector in two boards. S parameter measurement is one of the basic means in the radio frequency design and high speed signal design process, and is used to describe the measured component as a black box and is used to simulate the behavior of electronic components under different frequencies. In the passive simulation process, the attenuation degree and the reflection degree of the multiple GPU cross-board interconnection topologies to the input energy can be effectively judged by extracting the S parameter in the simulation link, so that the quality of the simulation link is more accurately analyzed, and then the single board lamination design is timely and effectively fed back, and the design efficiency is favorably improved.
In order to further verify the feasibility of the topological structure in the device, when the passive simulation result is qualified, PCB sampling, loading and assembling are carried out according to the topology, a GPU prototype is obtained, and active test verification is carried out on the GPU prototype according to the NVLINK bus active test standard.
In this embodiment, the method for performing active test verification on the GPU prototype is the same as that in the prior art, that is, the method for performing active test verification on the SXM2 packaged 4NVLINK bus reference board by england corporation is used, and details are not described here. Through active test verification, whether the topology of the multiple NVLINK buses meets the standard in the physical layer or not is judged, so that whether the product has the shipment condition or not is determined, and the efficiency of product design and the product quality are further improved.
In this embodiment, a test result diagram of an active test performed on a GPU prototype with 4 GPUs interconnected across boards can be seen in fig. 6. Wherein, Data Eye Results are Data Eye diagram test Results, Y EOM is Eye height margin, and X EOM is Eye width margin.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. An apparatus for multi-GPU interconnection, the apparatus comprising: the system comprises a first single board and a second single board, wherein the first single board and the second single board are interconnected through a buckle connector, the number of GPUs is 4 or 6, the multiple GPUs are evenly distributed in the first single board and the second single board, and at least 4 sending ends of any GPU in the first single board are connected to a receiving end of a GPU in the second single board.
2. The device of claim 1, wherein the board grades of the first single board and the second single board are both greater than or equal to M6 grade.
3. The device of claim 1, wherein a total length of NVLINK routing between the GPU of the first board and the GPU of the second board is no greater than 9.5 inches.
4. The device according to claim 1, wherein the total number of via holes in a single link between any GPU in the first board and any GPU in the second board is less than or equal to 4, and the via stub in the single link is less than or equal to 10 mil.
5. A multi-GPU interconnect device as claimed in claim 1, wherein said snap connector is an afeno ExaMEZZ series connector.
6. An apparatus according to any one of claims 1-5, wherein the first board comprises: server mainboard, LC card or line card, the second veneer includes: a server motherboard, an LC card, or a line card.
CN201921240118.6U 2019-08-02 2019-08-02 Multi-GPU (graphics processing Unit) interconnection device Active CN210742934U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201921240118.6U CN210742934U (en) 2019-08-02 2019-08-02 Multi-GPU (graphics processing Unit) interconnection device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201921240118.6U CN210742934U (en) 2019-08-02 2019-08-02 Multi-GPU (graphics processing Unit) interconnection device

Publications (1)

Publication Number Publication Date
CN210742934U true CN210742934U (en) 2020-06-12

Family

ID=71008119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201921240118.6U Active CN210742934U (en) 2019-08-02 2019-08-02 Multi-GPU (graphics processing Unit) interconnection device

Country Status (1)

Country Link
CN (1) CN210742934U (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492291A (en) * 2022-04-06 2022-05-13 飞腾信息技术有限公司 Design method and device of high-speed serial link, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492291A (en) * 2022-04-06 2022-05-13 飞腾信息技术有限公司 Design method and device of high-speed serial link, electronic equipment and storage medium
CN114492291B (en) * 2022-04-06 2022-07-15 飞腾信息技术有限公司 Method and device for designing high-speed serial link, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US7873933B2 (en) Computer program for balancing power plane pin currents in a printed wiring board
US10010007B2 (en) Multi-slot plug-in card
US9954295B2 (en) Midplane interconnect system with conductor twist mitigation
CN210742934U (en) Multi-GPU (graphics processing Unit) interconnection device
US7106600B2 (en) Interposer device
CN110765723A (en) Routing modeling optimization method and device based on BP neural network
CN108124390A (en) Distribution method, device, PCB and the via anti-pad manufacture device of via anti-pad
CN110728108A (en) Parameter configuration method for ultra-high-speed SerDes circuit system
US20160179733A1 (en) Two-part electrical connector
DE102021133949A1 (en) CHIP MODULE STRUCTURE AND METHOD AND SYSTEM FOR CHIP MODULE DESIGN USING CHIP PACKAGE CO-OPTIMIZATION
CN218006601U (en) High-speed signal circuit board, high-speed signal mainboard structure and electronic equipment
CN218213316U (en) Back plate, testing machine and testing system
CN115587057A (en) Isometric design method and system for high-speed signals in server system
CN105027518B (en) Interconnection system for communication equipment
CN113568847B (en) Network card and processor interconnection device and server
CN115098422A (en) NVLink bus-based multi-GPU (graphics processing Unit) interaction system and server
CN203243353U (en) Backboard
CN106681870A (en) 1553B bus physical layer cable network testing system and method
CN115695075B (en) Link loss compensation method, device, server, electronic equipment and storage medium
CN206312133U (en) A kind of Pci Express multiport paradigmatic systems
CN109542702A (en) A kind of test equipment and test method
CN210274674U (en) PCB (printed circuit board) adapter plate of power module
CN211789644U (en) Digital signal transmission device
Paladhi et al. SI model to hardware correlation on a 44 Gb/s HLGA socket connector
CN217983744U (en) Double-plate type intelligent network card

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant