CN116471190A - Method for evaluating server topology architecture and electronic device - Google Patents

Method for evaluating server topology architecture and electronic device Download PDF

Info

Publication number
CN116471190A
CN116471190A CN202310378242.3A CN202310378242A CN116471190A CN 116471190 A CN116471190 A CN 116471190A CN 202310378242 A CN202310378242 A CN 202310378242A CN 116471190 A CN116471190 A CN 116471190A
Authority
CN
China
Prior art keywords
topology
server
evaluating
connection
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310378242.3A
Other languages
Chinese (zh)
Inventor
龙善敏
蔡炎松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanhu Research Institute Of Electronic Technology Of China
Original Assignee
Nanhu Research Institute Of Electronic Technology Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanhu Research Institute Of Electronic Technology Of China filed Critical Nanhu Research Institute Of Electronic Technology Of China
Priority to CN202310378242.3A priority Critical patent/CN116471190A/en
Publication of CN116471190A publication Critical patent/CN116471190A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • H04L67/303Terminal profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method for evaluating a server topology architecture and electronic equipment. The method for evaluating the server topology comprises the following steps: acquiring a first topology framework, wherein the first topology framework comprises a first device and a second device; generating a topology configuration file of the first topology architecture so that the first device and the second device establish communication connection; and simulating the transfer processing process of the message from the first equipment to the second equipment, and acquiring the server topology architecture evaluation parameters of the first topology architecture.

Description

Method for evaluating server topology architecture and electronic device
Technical Field
The present invention relates to the field of server technologies, and in particular, to a method and an electronic device for evaluating a server topology architecture.
Background
With the development of artificial intelligence technology, the demands of people on computing power of computers are increasing. For example, for large unstructured oversized data sets such as voice and image, such as image data set ImageNet, model training using single-machine multi-card or multi-machine multi-card is needed, and how to promote the server computing power of single-machine multi-card or multi-machine multi-card is an urgent issue of concern.
Currently, the mainstream deep learning framework already encapsulates a Distributed data parallel training function (Distributed DataParrallel, DDP) based on multiple GPUs, and the requirements for communication between GPU cards are increasing. In recent years, various high-performance computing cards are layered, and heterogeneous fusion computing demands of a CPU and high-performance computing cards such as a GPU card, an FPGA and a brain-like computing card are increasing. To meet the inter-card communication requirements of high-performance heterogeneous fusion computing low-latency, high-bandwidth, multi-type cards, single-machine multi-card or multi-machine multi-card server topologies are becoming more and more complex.
With such complex topologies, it becomes increasingly difficult to guarantee high performance of the server topology and to meet the computational demands, requiring a considerable knowledge of the server technology. However, the purchase of high performance servers is typically dominated by college teachers, scientific researchers, and not server specialists, which results in the purchase of the customer typically taking several months from the time the need is raised to the determination of the specification configuration of the server. In the period of several months, the buyer needs to perform consultation and communication with personnel before sales of the server manufacturer for several times, and then selects a proper high-performance computing card, CPU and network card according to research and development requirements and budget, determines the topology architecture of the server, and finally determines the configuration specification of the server, which is a very specialized process.
Therefore, an easy-to-use tool is needed to evaluate the merits of the server topology architecture, determine whether the server topology meets the calculation power requirement, and further improve the model selection efficiency and the purchase compliance.
Disclosure of Invention
The invention aims to provide a method and electronic equipment for evaluating a server topology structure, which can be used as an easy-to-use tool for evaluating the quality of the server topology structure, determining whether the server topology meets the calculation requirement or not, and improving the model selection efficiency and purchase compliance.
According to an aspect of the present invention, at least one embodiment provides a method for evaluating a server topology architecture, comprising: acquiring a first topology framework, wherein the first topology framework comprises a first device and a second device; generating a topology configuration file of the first topology architecture so that the first device and the second device establish communication connection; and simulating the transfer processing process of the message from the first equipment to the second equipment, and acquiring the server topology architecture evaluation parameters of the first topology architecture.
According to another aspect of the present invention, at least one embodiment also provides an electronic device for evaluating a server topology architecture, comprising: a processor adapted to implement instructions; and a memory adapted to store a plurality of instructions adapted to be loaded by the processor and to perform the above-described method for evaluating a server topology.
According to another aspect of the present invention, at least one embodiment also provides a system for evaluating a server topology architecture, comprising: the electronic device for evaluating the topological architecture of the server is provided.
According to another aspect of the present invention, at least one embodiment also provides a computer-readable non-volatile storage medium storing computer program instructions that, when executed by the computer, perform the above-described method for evaluating a server topology according to the present invention.
According to the embodiment of the invention, when the service scene of the buyer changes, the card can be quickly inserted on the system for evaluating the topological structure of the server, a novel topological structure is constructed, the time delay of the novel topological structure is evaluated through a simulation test, and if the overall time delay is greatly smaller than the original topology, the adjustment accords with the time delay; if the overall time delay is greater than the original topology or basically equal, unnecessary case opening and card inserting are performed, so that unnecessary and time-consuming hardware adjustment operation is reduced, and the model selection efficiency and purchase compliance are improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a single server topology according to an embodiment of the invention;
FIG. 2 is a diagram of a NUMA server topology according to an embodiment of the invention;
FIG. 3 is a schematic diagram of an application environment for evaluating server topology according to an embodiment of the invention;
FIG. 4 is a schematic diagram of an electronic device hardware environment for evaluating a server topology according to an embodiment of the invention;
FIG. 5 is a schematic diagram of an electronic device software architecture for evaluating a server topology according to an embodiment of the invention;
FIG. 6 is a schematic diagram of a topology design page in accordance with an embodiment of the present invention;
FIG. 7 is a schematic illustration of a simulation demonstration in accordance with an embodiment of the present invention;
FIG. 8 is a flow chart of a method for evaluating a server topology according to an embodiment of the invention;
fig. 9 is a schematic diagram of a topology traffic simulation according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A high-performance server is a high-value, high-technology-complexity, highly-scenic device. In general, in a single server topology architecture, a computing card such as a GPU is regarded as an external device, and a PCIe protocol is defined by the PCI-SIG organization to normalize communication behaviors between all external devices and a CPU, and a typical PCIe system configuration diagram is shown in fig. 1. The external device is connected with the CPU and the PCIe Switch through the PCIe slot, wherein communication among devices under the same PCIe Switch is switched by the PCIe Switch, communication among devices directly connected with the CPU is switched by the CPU, and different connection modes have different communication time delay and bandwidth.
In fig. 1, root Complex (RC) is a set of various resources such as interrupt controller, power management controller, memory controller, error detection and reporting logic, and the RC is basically integrated in the CPU; the RC includes an internal bus (Host Bridge), which represents bus number 0 in the overall PCIe tree structure, which initiates external transaction requests on behalf of the processor, sends or receives packets from or at its ports, and then transfers the packets to memory.
On the basis of a single server topology architecture, the current mainstream server generally adopts a NUMA (Non-uniform memory access) architecture, as shown in FIG. 2. The NUMA architecture divides the CPU into different groups (called nodes), each node having its own independent memory access address, memory, and directly connected PCIe switches and PCIe slots. If node0 accesses the memory under node1, communication is required via the UPI connection between the nodes. Both bandwidth and latency of the cross-CPU communication are worse than the communication performance under the same CPU.
It can be seen that NUMA, PCIe Switch add-on card, etc. together constitute a server topology architecture. The server topology architecture can be provided with a plurality of GPU cards, and because of the difference of the server topology, when the physical positions of the cards are different, the communication links between the GPUs are different, and the performances are greatly different. As shown in table one:
TABLE 1 six communication modes of Nvidia card and GPU
From the foregoing, it is clear that determining a high performance server topology requires a considerable knowledge of server technology. In view of the fact that the different algorithm models of the artificial intelligence have different demands on indexes such as computing power, communication delay between a CPU and a card, communication delay between computing cards and the like, an easy-to-use tool is urgently needed to evaluate the advantages and disadvantages of a server topology structure and determine whether the server topology meets the computing power requirement or not, and further the model selection efficiency and purchase compliance are improved.
Based on the above, the invention provides a system for evaluating a server topology architecture, which presets large model training data such as Bert-large, GPT and the like, calls out preset data during simulation, takes first equipment as a starting point and second equipment as an ending point of data conforming to the scale of the model data, and simulates a transfer process and an equipment processing process of a message from one equipment to the other equipment by utilizing a path automatic generation algorithm based on the server topology architecture; and the animation is displayed on the software interface, so that a user can intuitively see the transmission path of the flow, and when the flow is larger than the path bandwidth or the processing capacity of equipment, the communication bottleneck mark can be displayed in the process of simulating the flow, thereby the system can be used as auxiliary tool software for server type selection.
The system for evaluating the server topology architecture can be used for selecting the server topology architecture, determining the model and the number of the high-performance computing cards to efficiently support the training and reasoning of the model, can fully see the visual display and simulation process of the server topology, is simple and easy to use, and better improves the selection efficiency and the purchasing compliance. As shown in fig. 3, the system for evaluating a server topology includes a hardware environment including an electronic device 100 for evaluating a server topology and a server 200, and a network environment.
Here, the electronic device 100 for evaluating a server topology, as shown in fig. 4, includes: a processor 402; and a memory 404 configured to store computer program instructions adapted to be loaded by the processor and to perform the method for evaluating a server topology developed by the present invention (described in more detail below). The processor 402 may be any suitable processor, for example, implemented as a central processing unit, a microprocessor, an embedded processor, etc., and may be implemented in an X86, ARM, etc. architecture. The memory 404 may be any of a variety of suitable memory devices, such as non-volatile memory devices, including but not limited to magnetic memory devices, semiconductor memory devices, optical memory devices, etc., and may be arranged as a single memory device, an array of memory devices, or a distributed memory device, as embodiments of the present invention are not limited in this regard.
The software environment of the electronic device 100 for evaluating a server topology architecture is shown in fig. 5, and the software environment comprises four modules, namely a topology design module, a component management module, a template management module and a simulation module. The topology design module supports a user to connect the combined equipment assembly in a graphical mode on a topology design page to build an assembly topology as shown in figure 6; the module automatically records the graphical connection relation and converts the connection relation into a configuration file; the topology design module may recover the server on-board component topology map from the configuration file. The template management module supports a user to store the self-edited topology as a template, and supports the user to add, delete and search the template; and support exporting import template, facilitate template sharing. The component management module supports the addition of new equipment for users, modifies the original equipment parameters and also supports the deletion of equipment which cannot be used by users. The simulation module, as shown in fig. 7, supports the user to simulate the flow direction, and the user designates the initial equipment (i.e. the first equipment), the target equipment (i.e. the second equipment) and the flow size, and initiates the flow simulation; the simulation can determine the flow path, clear the path bottleneck, and show the communication process in an animation mode; single data source flow simulation is supported and bottleneck evaluation is automatically performed.
It will be appreciated by those of ordinary skill in the art that the above-described hardware structure and software architecture of the electronic device 100 for evaluating a server topology architecture are merely illustrative, and are not limited to the structure or architecture of the device. For example, the electronic device 100 for evaluating a server topology may also include more or fewer components (e.g., transmission means) than shown in fig. 4. The transmission device is used for receiving or transmitting data via a network.
The electronic device 100 for evaluating a server topology may be connected to the server 200 through a network. The above-mentioned networks include wired networks and local area networks. Here, the electronic device 100 for evaluating a server topology may operate the server 200 through corresponding instructions so that data may be read, changed, added, etc. The server 200 may be one or more, or may include a plurality of processing nodes, where the plurality of processing nodes may be external to the server as a whole. Optionally, the electronic device 100 for evaluating a server topology may also send the acquired data to the server 200, so that the server 200 performs the method for evaluating a server topology according to the present invention.
Based on the above-mentioned operating environment, at least one embodiment of the present invention proposes a method for evaluating a server topology, where the method for evaluating a server topology may be loaded and executed by the processor 402 of the electronic device 100 for evaluating a server topology, and at least may be used as an auxiliary tool to evaluate the merits of the server topology, determine whether the server topology meets the calculation requirement, and further help the buyer to improve the model selection efficiency and purchase compliance. As shown in the flowchart of the method for evaluating a server topology, fig. 8, it should be noted that the steps shown in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown, the method may include the steps of:
step S802, a first topology framework is obtained, wherein the first topology framework comprises a first device and a second device;
step S804, generating a topology configuration file of the first topology architecture so that the first device and the second device establish communication connection;
step S806, simulating a transfer processing process of the message from the first device to the second device, and acquiring server topology architecture evaluation parameters of the first topology architecture.
By the mode, an auxiliary tool is provided to enable a user to construct a high-performance server topology required by the user, the model and the number of the high-performance computing cards are determined, and the slot positions of the high-performance computing cards are determined. Meanwhile, the auxiliary tool provides a simulation function, can dynamically simulate the communication relation between the calculation cards, determines the bandwidth and time delay between the calculation cards and the bottleneck on a communication link through simulation, and better improves the type selection efficiency and purchase compliance of the buyers.
In step S802, a first topology is acquired, where the first topology includes a plurality of components and connection relationships thereof, and the plurality of components include a first device that is a simulation start node and a second device that is an end node. For example, a user opens auxiliary tool software, enters a topology design page for evaluating a server topology architecture, starts editing a high-performance server topology required by the user, selects the model number and the number of components, and determines the slot positions of the components; the system acquires the first topological structure edited by the user from the working area of the topological design page.
The inventor finds out equipment which affects performance and can be allocated in a high-performance server topological structure through analysis and abstraction, and mainly comprises four devices, namely a CPU (Central Processing Unit, a central processing unit), a memory, an external computing card [ GPU card (graphics processing unit, a graphic processor), FPGA (Field Programmable Gate Array), a brain-like computing card ], and an external network card (Ethernet card, infiniband network card). The devices are connected to each other by a connection device such as PCIe Switch or NV Switch, and a plurality of PCIe switches may be connected in series or CPU may be connected in parallel. The tail end of the connecting device is provided with slots, after the specification of the connecting device is determined, the number of the slots can be determined, and an external computing card and an external network card are inserted into the slots. Therefore, the invention divides the components used in the evaluation server topological structure into five types of CPU, memory, external computing card, external network card and connecting equipment, each type of equipment has various specifications, and after the connecting equipment (PCIe Switch, NVswitch and CPU) is selected, the specification and the number of PCIe slots can be determined. PCIe is typically used in a variety of specifications such as PCIe 16 and PCIe 4, and high performance cards typically use PCIe 16.
After the present invention abstracts the various devices in the server topology into components, the user can use the auxiliary tool to combine the various components to build the server on-board topology on software in units of components. For example, constructing a required high-performance server topological relation on a topological design page (graphical interface), determining the types and the number of the high-performance computing cards, and determining the slot positions of the high-performance computing cards. As shown in fig. 6, the topology design page includes four parts of a component area, a work area, a parameter area, and a function navigation area. The component area displays five optional component types of CPU, memory, (external) computing card, (external) network card and connecting equipment for selection, clicks a certain component type and pops up an equipment selection frame; displaying all selectable equipment models in an equipment selection frame; clicking on a certain device model, the user may drag the device to the workspace. The working area displays the server topological structure which is being edited, and a user can drag the component to adjust the position of the device component on the canvas in the working area. Wherein the parameter area may configure parameters of the device. The function navigation area supports a user to switch the current interface to the interface of template management and component management; triggering simulation based on the current topology is supported.
Meanwhile, before the topology design page performs topology design, a user can also import a topology template. The server manufacturer can design templates based on the type of the on-sale server, and issue templates to users; the user can import the candidate server model topology templates, and the configuration is adjusted based on the imported templates, so that the working efficiency is improved. The invention provides a large number of topology templates of the on-sale servers, is convenient for users to intuitively know the on-board topologies of various typical servers, helps the purchasing party to quickly master the key knowledge of the high-performance servers in a graphical mode, and determines whether the server of the model meets the needs of the purchasing party through simulation.
In step S604, a topology configuration file of the first topology architecture is generated, so that the first device and the second device establish a communication connection. For example, generating a topology configuration file based on a plurality of components and connection relations thereof, wherein the topology configuration file comprises type, position, connection, parameter information of each component, connection comprises device IDs and port IDs at two ends of a connection line, and parameter comprises bandwidth and time delay; triggering a simulation based on the topology profile to cause the first device and the second device to establish a communication connection. Example:
s1, in the process that a user edits a first topological structure in a working area of a topological design page, a component (including first equipment serving as a simulation starting node and second equipment serving as an end node) is dragged into the working area, and a system automatically allocates a globally unique ID for the component by using a UUID algorithm; the user selection and dragging has determined the component type and the position coordinates of the component on the canvas, and the system automatically records the component information based on the information, and the component information is organized in json format.
Component ID:
s2, a user can use a mouse to connect two different devices in a working area of the topological design page, the connection relation between the devices is configured, the connection operation is corresponding, and the system automatically increases connection configuration; the connection configuration will record the device ID, the connection port ID at both ends of the connection, default that one device has (up, down, left, right) four ports.
Component ID:
s3, the user can click specific equipment on the topological design page, pop up a parameter area on the right side of the interface, and the parameters of the equipment can be configured in the parameter area. Each device component has default device parameters, and the system allows the user to make parameter adjustments within the legal parameters of the device, the most relevant of which are bandwidth and latency. After the user selects the parameters, the system automatically generates parameter configuration information.
Component ID:
s4, the system automatically generates four types of information, namely type, position, connection and parameter, for each component. The plurality of components form a complete topology configuration file.
S5, the system simulates the first topological structure edited by the user to formally establish communication connection between the first equipment and the second equipment, wherein the system also presets large model training data such as Bert-large, GPT and the like, and preset data can be called during simulation.
In step S606, a transfer process of the message from the first device to the second device is simulated, and server topology evaluation parameters for the first topology are obtained. For example, a connection array of the first device is obtained, wherein the connection array comprises a third device directly connected with the first device and delay thereof; expanding a reachable device list of the first device based on the connection array, wherein the reachable device list comprises a communication path, an access tag and a probe tag; the reachable devices list is traversed to find the communication path with the least latency to the second device. That is, the present invention can simulate the flow characteristics between the first device and the second device, and the user can designate two devices in communication with each other, wherein the first device (such as an external network card) is used as a starting point, and the second device (such as an external computing card) is used as an ending point.
Alternatively, as shown in fig. 9, acquiring the connection array of the first device may include: marking a third device directly connected with the first device, namely marking the type of the device for each device, wherein the marking is an Endpoint marking of the terminal device and a Network marking of the Network device; according to a first principle, the terminal equipment generates flow, the network equipment does not autonomously generate flow, two terminal equipment are arranged at two ends of communication, and a simulation process firstly automatically finds communication paths of the two pieces of equipment based on a connection relation to obtain delay of the first equipment and a third equipment directly connected with the first equipment, wherein the system pre-stores connection information of the first equipment and the third equipment directly connected with the first equipment, and parameters such as delay, bandwidth and the like of the third equipment can be obtained through the type of the third equipment; and determining the distance information of the direct connection of the first equipment and the third equipment by taking the delay of the connection as the distance to obtain a connection array of the first equipment. Taking fig. 7 as an example, the connection array connectide device array of the device GPU0 is [ (pcie switch0, 3) ], pcie switch0 represents a device directly connected to GPU0, and 3 represents a delay between GPU0 and pcie switch 0.
Meanwhile, in the five components of the topology selection type, from the communication perspective, four types of terminal equipment, namely a memory, a computing card, a network card and a CPU, can be used as an Endpoint mark; the CPU and the connection device are Network devices responsible for connecting different terminal devices, so that the flow mutual conversion of the terminal devices is realized, and Network marks can be made. At present, the integration level of the CPU is higher and higher, and the CPU is not only a terminal device but also a Network device, and can be simultaneously provided with two marks, namely an Endpoint mark and a Network mark.
Optionally, the third device includes a plurality of fourth devices, and expanding the reachable devices list of the first device based on the connection array may include: performing sub-tree access on a plurality of fourth devices directly connected with the first device, and adding access marks and ascertaining marks in a reachable device list of the first device, wherein the access marks are unvisit and visited, and the ascertaining marks are notfound and found; all communication paths of the first device are acquired and the communication paths are added in a list of reachable devices of the first device.
That is, the present invention is directed to a method for generating a new data stream from an initial node: starting from the first device, the first device continuously expands a reachable device list CanVisitList of the first device, wherein the ConnectDeviceArray, canVisitList record information of the first device in the initial CanVisitList records more communication paths, access marks and ascertaining marks than the ConnectDeviceArray. Continuing with fig. 7 as an example, GPU0 is selected as the starting node (the first device is an external network card), GPU1 is the target node (the second device is an external computing card), and CanVisitList is [ (pcie switch0,3, [ GPU0, pcie switch0], unvisit, notfound) ]. The communication path from the GPU0 to the PCIES switch0 is [ GPU0, PCIES switch0], the access mark is recorded as unvisit (not access mark), which indicates that the node does not make sub-tree access, and when the connection nodes of the equipment are all accessed, the connection nodes are set as accessed marks; the default record has been marked as visited when added to the CanVisitList because the terminal device does not have forwarding capability. The snoop flag is notfound (not ascertained), which indicates that the shortest path between GPU0 and pcie switch0 has not been ascertained, and the snoop flag is found to be found (ascertained).
Optionally, traversing the reachable devices list may include: continuously selecting nodes of notfound and unvisit from the reachable device list to traverse so as to find a communication path with minimum delay to the second device; server topology evaluation parameters for the first topology are generated. Continuing with the example of FIG. 7, the unexpired node with the smallest distance is selected from the current CanVisitList as the path-ascertained node, and the new CanVisitList becomes [ (PCIES switch0,3, [ GPU0, PCIES switch0], unersive, found) ]. If the access mark of the node is unvisit, traversing each node connected with the node one by one, and calculating the new distance of each node. The distance of the currently traversed node is the sum of the distance from the starting point to the just-ascertained node and the distance from the just-ascertained node to the current node: distance (start, current) =distance (start, just related) +distance (just related, current). If the newly traversed node is not originally in the CanVisitList, adding the newly traversed node to the CanVisitList; if the newly traversed node is already in the CanVisitList and the ascertaining flag is ascertained, continuing to process the next node; if the ascertaining mark is not ascertained and the new node distance is smaller than the original node distance, replacing the original path with the new path; otherwise, the new node distance is smaller than the original node distance, and the next node is processed continuously. And after traversing, modifying the access mark of the node just ascertained into a visible. The new CanVisitList becomes [ (PCIES switch0,3, [ GPU0, PCIES switch0], virtual, found), (IB 0,7, [ GPU0, PCIES switch0, IB0], virtual, notfound), (CPU 0,6, [ GPU0, PCIES switch0, CPU0], unvisit, notfound) ].
Then, entering a loop, continuously selecting an undetermined node with the smallest distance from the CanVisitList, modifying the undetermined node into a path ascertained node, traversing the node if the access mark of the node just ascertained is undesiet, finding a target node if the path from a starting point to an end point device is ascertained, and ending the traversing; all nodes become ascertained nodes, which means that there is no path between the two devices, and the traversal is ended. Finally, the shortest communication path between any two devices can be obtained through simulation, for example, the communication path between the GPU0 and the GPU1 is [ GPU0, pcie switch0, CPU1, pcie switch1, GPU1]. When the flow is simulated, a user needs to input the simulated flow, traverse each node passing through the communication path, and take out the bandwidth parameters of the node. If the traffic size is greater than the node bandwidth, the node is identified as a bottleneck and the entire communication path may have multiple bottlenecks. The communication bandwidth of the whole path is the minimum value of the node bandwidth in the communication path.
The invention can be said to be an algorithm for combining node attributes to find the shortest path between two specified terminal devices, which is an improved version of the Dijkstra algorithm combining scenario. The invention provides simulation and debugging tools for knowing the server technology for users through a graphical configuration method, and utilizes a communication path generation algorithm to quickly obtain key indexes of the server topology so as to help the users to quickly select types. The invention helps users intuitively configure the server topology, determines the bottleneck of the target topology through simulation, rapidly tries to get wrong, and improves the model selection efficiency and the model selection conformity of the server.
According to the method, when the service scene of the user changes, the card can be quickly inserted into the simulation system, a novel topology is constructed, the topology time delay is estimated through simulation test based on the novel topology, and if the overall time delay is greatly smaller than the original topology, the adjustment accords with the adjustment; if the overall latency is greater than the original topology or substantially flat, then unnecessary out-of-box card insertion and removal is performed, thereby reducing unnecessary and time consuming hardware adjustment operations. The system helps the server seller to efficiently explain and demonstrate clients, so that the specification configuration of the server can be quickly determined.
Optionally, at least one embodiment of the present invention further provides a computer-readable non-volatile storage medium storing computer program instructions that, when executed by a computer, perform the method for evaluating a server topology developed by the present invention.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method for evaluating a server topology, comprising:
acquiring a first topology framework, wherein the first topology framework comprises a first device and a second device;
generating a topology configuration file of the first topology architecture so that the first device and the second device establish communication connection;
and simulating the transfer processing process of the message from the first equipment to the second equipment, and acquiring the server topology architecture evaluation parameters of the first topology architecture.
2. The method of claim 1, wherein the topology design page comprises a component area and a working area, the component area is used for selecting components, the components comprise a CPU, a memory, a connection device, an external network card and an external computing card, and the working area is used for displaying a server topology framework being edited, and the obtaining the first topology framework comprises:
a first topology is obtained from a workspace, wherein the first topology includes a plurality of components and their connection relationships, the plurality of components including a first device and a second device.
3. The method of claim 2, wherein generating a topology profile of the first topology architecture comprises:
generating a topology configuration file based on the components and the connection relation thereof, wherein the topology configuration file comprises type, position, connection, parameter information of each component, the connection comprises device IDs and port IDs at two ends of a connecting line, and the parameter comprises bandwidth and time delay;
triggering a simulation based on the topology profile to cause the first device and the second device to establish a communication connection.
4. The method of claim 1, wherein simulating the transfer process of the message from the first device to the second device comprises:
acquiring a connection array of a first device, wherein the connection array comprises a third device directly connected with the first device and delay thereof;
augmenting a reachable device list of the first device based on the connection array, wherein the reachable device list comprises a communication path, an access tag, and a probe tag;
traversing the reachable devices list to find a communication path with minimal latency to the second device.
5. The method of claim 4, wherein the third device comprises a plurality of fourth devices, wherein expanding the reachable devices list of the first device based on the connection array comprises:
performing sub-tree access on a plurality of fourth devices directly connected with the first device, and adding access marks and ascertaining marks in a reachable device list of the first device, wherein the access marks are unvisit and visited, and the ascertaining marks are notfound and found;
all communication paths of the first device are acquired, and the communication paths are added in a reachable device list of the first device.
6. The method of claim 4, wherein traversing the reachable devices list comprises:
continuously selecting nodes of notfound and unvisit from the reachable equipment list to traverse so as to find a communication path with minimum delay to the second equipment;
server topology evaluation parameters for the first topology are generated.
7. The method of claim 1, wherein the first device is an external network card and the second device is an external computing card.
8. An electronic device for evaluating a server topology, comprising:
a processor adapted to implement instructions; and a memory adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor: a method for evaluating a server topology as recited in any of claims 1-7.
9. A system for evaluating a server topology, comprising: the electronic device for evaluating a server topology of claim 8.
10. A computer-readable non-transitory storage medium storing computer program instructions that, when executed by a computer, perform: a method for evaluating a server topology as recited in any of claims 1-7.
CN202310378242.3A 2023-04-06 2023-04-06 Method for evaluating server topology architecture and electronic device Pending CN116471190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310378242.3A CN116471190A (en) 2023-04-06 2023-04-06 Method for evaluating server topology architecture and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310378242.3A CN116471190A (en) 2023-04-06 2023-04-06 Method for evaluating server topology architecture and electronic device

Publications (1)

Publication Number Publication Date
CN116471190A true CN116471190A (en) 2023-07-21

Family

ID=87174603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310378242.3A Pending CN116471190A (en) 2023-04-06 2023-04-06 Method for evaluating server topology architecture and electronic device

Country Status (1)

Country Link
CN (1) CN116471190A (en)

Similar Documents

Publication Publication Date Title
CN110795148B (en) Method and device for generating layout file and electronic equipment
CN108153670A (en) A kind of interface test method, device and electronic equipment
JP2009157928A (en) Interconnect architectural state coverage measuring method
US11645438B2 (en) Generating a template-driven schematic from a netlist of electronic circuits
CN106155264B (en) Manage the computer approach and computer system of the power consumption of storage subsystem
CN113568860A (en) Deep learning-based topology mapping method, apparatus, medium, and program product
CN109683882B (en) NB-IOT terminal software development method based on mobile terminal in visual environment
CN107885661A (en) The terminal transparency method of testing and system of Mobile solution, equipment, medium
US20220222049A1 (en) Visual Programming for Deep Learning
CN112685026A (en) Multi-language-based visual modeling platform and method
CN109391508A (en) The computer implemented method of data center resource is automatically composed in data center
CN113610963B (en) Three-dimensional network topology drawing method, device, computing equipment and storage medium
CN108334421A (en) Restored using the system of WOL
US9652370B2 (en) Address range decomposition
CN112015382A (en) Processor architecture analysis method, device, equipment and storage medium
CN116471190A (en) Method for evaluating server topology architecture and electronic device
CN105446284A (en) Data analysis method and data analysis device of CAN (Controller Area Network) bus
CN116436791A (en) Industrial Internet scene construction method, system, equipment and storage medium
KR20170132660A (en) Method and apparatus for predicting storage distance
US20110246158A1 (en) Method for simulating a complex system with construction of at least one model including at least one modelled router, corresponding computer software package and storage means
CN113691403B (en) Topology node configuration method, related device and computer program product
CN108683547A (en) A kind of wireless sensor network configuration method of software definition
CN111599242A (en) Computer network teaching virtual simulation system
CN114282029A (en) Primitive management method and device, electronic equipment and storage medium
CN108334313A (en) Continuous integrating method, apparatus and code management system for large-scale SOC research and development

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination