CN117955897A - Data communication method, device, server and storage medium - Google Patents

Data communication method, device, server and storage medium Download PDF

Info

Publication number
CN117955897A
CN117955897A CN202410347498.2A CN202410347498A CN117955897A CN 117955897 A CN117955897 A CN 117955897A CN 202410347498 A CN202410347498 A CN 202410347498A CN 117955897 A CN117955897 A CN 117955897A
Authority
CN
China
Prior art keywords
server
connection
internal components
servers
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410347498.2A
Other languages
Chinese (zh)
Other versions
CN117955897B (en
Inventor
郑上闽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinhua San Industrial Internet Co ltd
Original Assignee
Xinhua San Industrial Internet Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinhua San Industrial Internet Co ltd filed Critical Xinhua San Industrial Internet Co ltd
Priority to CN202410347498.2A priority Critical patent/CN117955897B/en
Publication of CN117955897A publication Critical patent/CN117955897A/en
Application granted granted Critical
Publication of CN117955897B publication Critical patent/CN117955897B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/14Routing performance; Theoretical aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/102Packet switching elements characterised by the switching fabric construction using shared medium, e.g. bus or ring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/35Switches specially adapted for specific applications
    • H04L49/356Switches specially adapted for specific applications for storage area networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of server clusters, and discloses a data communication method, a device, a server and a storage medium, wherein the method comprises the following steps: acquiring a first connection relation and a second connection relation of each server in a server cluster; the first connection relation comprises an internal connection mode among all internal components of the server, and the second connection relation comprises a corresponding relation between a network adapter of the server and network equipment in an accessed server cluster; combining the first connection relation and the second connection relation of the plurality of servers into a global connection relation; and determining the distance between the internal components according to the connection mode between the internal components, converting the global connection relation into a global distance relation, and planning the shortest communication path traversing each acceleration component. The invention can construct the global shortest communication path, execute training tasks based on the shortest communication path, and improve the communication efficiency of the server cluster.

Description

Data communication method, device, server and storage medium
Technical Field
The present invention relates to the field of server clusters, and in particular, to a data communication method, a device, a server, and a storage medium.
Background
As artificial intelligence models scale and training data become larger, so too will the server clusters that train the models. In each server, there are also multiple computing acceleration components involved in training, each of which may act as a training node. The acceleration component may specifically be a GPU (Graphic Processing Unit, graphics processor), a TPU (Tensor Processing Unit, tensor processor), or the like.
In the large cluster training process, a large number of communication are synchronized between each node participating in the training for parameter synchronization. The training nodes in a plurality of servers are difficult to reasonably utilize in the current training process, and the communication efficiency is low.
Disclosure of Invention
In view of the above, the present invention provides a data communication method, apparatus, server and storage medium, so as to solve the problem of low communication efficiency in the existing training process.
In a first aspect, the present invention provides a data communication method, applied to a target server, including:
acquiring a first connection relation and a second connection relation of each server in a server cluster; the first connection relation comprises an internal connection mode among all internal components of the server, and the second connection relation comprises a corresponding relation between a network adapter of the server and network equipment in an accessed server cluster; wherein the internal components include a network adapter and an acceleration component;
Combining the first connection relation and the second connection relation of the servers into a global connection relation; the global connection relation comprises connection modes among all internal components of a plurality of servers;
Determining the distance between the internal components according to the connection mode between the internal components, and converting the global connection relation into a global distance relation comprising the distances between the internal components of a plurality of servers;
A certain path planning algorithm is adopted, and the shortest communication path traversing each acceleration component is planned according to the global distance relation;
and controlling each server in the server cluster to perform data communication according to the shortest communication path.
In some alternative embodiments, the first connection relationship of the server is generated by:
Defining global numbers of all internal components in the server; the global number comprises a server number and a component number;
Determining an internal connection mode of which the connection efficiency between any two internal components meets preset requirements according to the topology information of the internal components of the server;
and recording the internal connection modes among all internal components of the server according to the global number, and generating a first connection relation.
In some alternative embodiments, the method further comprises:
and under the condition that the state of the internal components of the server is changed, updating the topology information of the internal components of the server, and redetermining the internal connection mode of which the connection efficiency between any two internal components meets the preset requirement.
In some optional embodiments, the merging the first connection relationship and the second connection relationship of the plurality of servers into a global connection relationship includes:
Determining a cross-server connection mode between internal components of two servers according to a second connection relation of the two servers;
and combining an internal connection mode between each internal component of the server and a cross-server connection mode between the internal components of the two servers to form a global connection relation.
In some optional embodiments, the merging the internal connection manner between the internal components of the server and the cross-server connection manner between the internal components of the two servers to form a global connection relationship includes:
Binding an acceleration component of the server with a corresponding network adapter in the server to form a virtual node;
determining an internal connection mode between each virtual node of the server and a cross-server connection mode between the virtual nodes of the two servers;
And combining the connection relations among the virtual nodes of the servers to form a global connection relation.
In some optional embodiments, the determining the internal connection manner between the virtual nodes of the server and the cross-server connection manner between the virtual nodes of the two servers includes:
For a first virtual node and a second virtual node belonging to the same server, taking an internal connection mode between an acceleration component of the first virtual node and an acceleration component of the second virtual node as an internal connection mode between the first virtual node and the second virtual node;
And regarding a third virtual node and a fourth virtual node which belong to different servers, taking a cross-server connection mode between the network adapter of the third virtual node and the network adapter of the fourth virtual node as a cross-server connection mode between the third virtual node and the fourth virtual node.
In some optional embodiments, the determining, according to the second connection relationship of the two servers, a cross-server connection manner between internal components of the two servers includes:
According to the second connection relation of the two servers, determining the hop count between the internal components of the two servers, and determining a cross-server connection mode corresponding to the hop count based on a preset matching relation between the hop count and the cross-server connection relation.
In some alternative embodiments, the determining the distance between the internal components according to the connection manner between the internal components includes:
Determining a distance corresponding to the connection mode between the internal components according to a preset corresponding relation between the connection mode and the distance;
Or determining time delay and bandwidth between internal components according to the connection mode between the internal components, and determining the distance between the internal components corresponding to the time delay and the bandwidth; the distance between the internal components and the time delay are in positive correlation and the bandwidth is in negative correlation.
In some alternative embodiments, the method further comprises:
Determining communication priorities among the internal components, and adjusting the distances among the internal components according to the communication priorities; wherein, under the condition that the connection modes are the same, the higher the communication priority is, the smaller the distance between the internal components is.
In a second aspect, the present invention provides a data communication apparatus comprising:
The acquisition module is used for acquiring a first connection relation and a second connection relation of each server in the server cluster; the first connection relation comprises an internal connection mode among all internal components of the server, and the second connection relation comprises a corresponding relation between a network adapter of the server and network equipment in an accessed server cluster; wherein the internal components include a network adapter and an acceleration component;
The merging module is used for merging the first connection relations and the second connection relations of the servers into a global connection relation; the global connection relation comprises connection modes among all internal components of a plurality of servers;
The conversion module is used for determining the distance between the internal components according to the connection mode between the internal components and converting the global connection relation into a global distance relation comprising the distances between the internal components of a plurality of servers;
the planning module is used for adopting a certain path planning algorithm to plan out the shortest communication path traversing each acceleration component according to the global distance relation;
and the control module is used for controlling each server in the server cluster to carry out data communication according to the shortest communication path.
In a third aspect, the present invention provides a server comprising: the data communication system comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the data communication method of the first aspect or any corresponding implementation mode of the first aspect is executed.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the data communication method of the first aspect or any of its corresponding embodiments.
In a fifth aspect, the present invention provides a computer program product comprising computer instructions for causing a computer to perform the data communication method of the first aspect or any of its corresponding embodiments.
According to the invention, the first connection relation and the second connection relation of a plurality of servers in the server cluster are combined into the global connection relation capable of representing the connection mode of the internal components in the servers and among the servers, so that a global shortest communication path can be constructed, training tasks are executed based on the shortest communication path, the communication efficiency of the server cluster can be improved, and the training efficiency can be improved in an artificial intelligent training scene.
By collecting global topology information, the distance between communication nodes can be defined, and the global optimal path, namely the shortest communication path, can be calculated based on the distance, so that the communication efficiency in the artificial intelligence training scene can be improved. The acceleration component is bound with the network adapter, so that the number of nodes participating in calculation can be reduced, and the calculation efficiency can be improved. And the distance between the nodes can be flexibly set and adjusted according to certain requirements, so that the actual requirements can be met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the related art, the drawings that are required to be used in the description of the embodiments or the related art will be briefly described, and it is apparent that the drawings in the description below are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a schematic diagram of a configuration of a ring communication model;
FIG. 2 is a schematic diagram of a topology of a server cluster;
FIG. 3 is a schematic diagram of a communication path of the server cluster shown in FIG. 2;
FIG. 4 is a flow chart of a data communication method according to an embodiment of the invention;
FIG. 5 is a schematic diagram of a communication path of the server cluster of FIG. 2, in accordance with an embodiment of the invention;
FIG. 6 is a flow chart of another data communication method according to an embodiment of the invention;
FIG. 7 is a diagram of a coding scheme for global numbering according to an embodiment of the invention;
FIG. 8 is a schematic diagram of a connection inside a server according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a topology within a server according to an embodiment of the invention;
FIG. 10 is a schematic diagram of a topology of a server cluster according to an embodiment of the invention;
FIG. 11 is a schematic diagram of a cross-server connection in accordance with an embodiment of the invention;
FIG. 12 is a schematic diagram of a global connection relationship according to an embodiment of the present invention;
FIG. 13 is a schematic diagram of a shortest communication path in accordance with an embodiment of the present invention;
FIG. 14 is a schematic diagram of another shortest communication path in accordance with an embodiment of the present invention;
fig. 15 is a block diagram of a data communication apparatus according to an embodiment of the present invention;
fig. 16 is a schematic diagram of a hardware structure of a server according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
To enable parameter synchronization between acceleration components, a number of communication models have been studied for this purpose, one of which is a ring communication model that is relatively widely used. Wherein the training nodes participating in the computation form a logical ring, each training node receives (Receive) data from the training node on the "left" side and sends (Send) the computed data to the training node on the "right" side.
For example, taking an acceleration component as a GPU as an example, a schematic structural diagram of a ring communication model formed by five GPUs may be shown in fig. 1.
Based on the current communication model, although all training nodes can be traversed and parameter synchronization can be realized, when the training nodes in a plurality of servers are traversed, the adopted communication path is usually not optimal, so that additional communication delay is introduced, and the communication efficiency is lower.
Fig. 2 shows a schematic topology of a server cluster. As shown in fig. 2, the server cluster adopts a Spine-Leaf topology network structure, which includes four servers: server 0, server 1, server 2, server 3; each server is a computing device that can implement training; and, each server includes eight GPUs: GPU 0-GPU 7, each GPU can be used as a training node. The GPUs can be directly interconnected through a high-speed interconnection switch (LINKSWITCH) in the server, so that the data transmission rate between the GPUs can be improved; for example, the interconnect switch may be NVSwitch.
In the existing communication model algorithm, only the topological relation of acceleration components in the server is generally considered, and the network connection topology between the servers is not considered. In this scenario, if the network topology between servers cannot be uniformly considered with the internal topology of the servers, the determined communication path may be as shown in fig. 3, and the black solid line in fig. 3 represents the communication path.
As shown in fig. 3, the communication path is too long, which is not the shortest communication path, thereby introducing additional communication delay; and, unnecessary cross-Leaf (Leaf) traffic is introduced into the communication path, thereby increasing the occurrence of path Hash (Hash) non-uniformity between Leaf ridges, and causing unnecessary congestion and even packet loss.
According to the data communication method provided by the embodiment of the invention, the topological relation between the acceleration components in the server and the topological relation between the servers are comprehensively considered, and the global shortest communication path is constructed, so that the communication efficiency of the server cluster can be improved.
According to an embodiment of the present invention, there is provided a data communication method embodiment, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.
The embodiment provides a data communication method which can be applied to a target server; the target server may be a server in a server cluster, or may be a separate server, such as a planning server. Fig. 4 is a flowchart of a data communication method according to an embodiment of the present invention, as shown in fig. 4, including the following steps.
Step S401, obtaining a first connection relation and a second connection relation of each server in a server cluster; the first connection relation comprises an internal connection mode among all internal components of the server, and the second connection relation comprises a corresponding relation between a network adapter of the server and network equipment in an accessed server cluster; wherein the internal components include a network adapter and an acceleration component.
In this embodiment, the server cluster includes a plurality of servers, and each server includes a network adapter and an acceleration component; the network Adapter is used for realizing communication among servers, and can be a network card (Network Interface Controller, NIC), a Host Bus Adapter (HBA) and the like; the acceleration component may be specifically an acceleration training card for implementing model training, e.g., the acceleration training card may be a GPU, a TPU, etc. For ease of description, the network adapter and acceleration component are collectively referred to as internal components.
For a certain server in the server cluster, a connection mode among all internal components in the server can be determined, wherein the connection mode is a connection mode capable of realizing communication among the internal components; for convenience of description, a connection manner between internal components belonging to the same server will be referred to as an "internal connection manner". For a plurality of internal components of the same server, an internal connection manner between any two internal components can be determined, so that a connection relationship capable of representing the internal connection manner between each internal component in the server, namely, a first connection relationship, is generated.
And the server cluster also comprises other network devices, and the server can access the corresponding network devices through the network adapter, so that the corresponding relationship between the network adapter of the server and the network devices in the accessed server cluster can be determined, namely, which network device in the server cluster is accessed by the network adapter of the server can be determined, and the connection relationship which can represent the corresponding relationship between the network adapter of the server and the network devices, namely, the second connection relationship can be generated.
Taking the server cluster shown in fig. 2 as an example, the server cluster includes four servers: server 0, server 1, server 2, server 3; each server comprises eight acceleration components, and the acceleration components are GPUs, namely GPU0 to GPU7. An internal connection manner between any two GPUs can be determined, thereby forming a first connection relationship.
The network devices in the server cluster shown in fig. 2 include leaf 1, leaf 2, spine 1, spine 2, and the like; each server may communicate with a corresponding network device in the server cluster through a network adapter, e.g., server 0 and server 1 may communicate with leaf 1; the network adapter in the server is not shown in fig. 2. It may be determined with which network device the network adapter of the server is communicatively connected, thereby forming a corresponding second connection relationship.
For each server in the server cluster, after the first connection relation and the second connection relation are generated, the first connection relation and the second connection relation can be sent to the target server, so that the target server can acquire and collect the first connection relation and the second connection relation of each server in the server cluster.
It can be appreciated that if the target server is an independent server, the target server needs to obtain the first connection relationship and the second connection relationship of all servers in the server cluster. Or if the target server is a certain server in the server cluster, the target server may generate its own first connection relationship and second connection relationship, and collect the first connection relationship and second connection relationship of other servers.
Step S402, combining the first connection relation and the second connection relation of a plurality of servers into a global connection relation; the global connection relationship includes a connection manner between respective internal components of the plurality of servers.
For a first connection relationship and a second connection relationship of a certain server, only the internal connection mode of the internal components of the server and which network device or devices are accessed are represented; in this embodiment, the target server merges the acquired first connection relationships and the second connection relationships of the plurality of servers, so that a connection manner between internal components of the plurality of servers can be determined, that is, in addition to an internal connection manner between internal components in the servers included in the first connection relationships, a connection manner between internal components belonging to different servers can be determined based on the second connection relationships; for convenience of description, a connection manner between internal components belonging to different servers is referred to as a cross-server connection manner.
In this embodiment, the connection modes between the internal components of the plurality of servers are combined into one connection relationship, that is, a global connection relationship.
Step S403, determining the distance between the internal components according to the connection manner between the internal components, and converting the global connection relationship into a global distance relationship including the distances between the internal components of the plurality of servers.
In this embodiment, different connection manners may be adopted between different internal components, so that there is a difference in the distance between the internal components. For example, two internal components may be directly interconnected, or may be connected by other devices (e.g., switches, network devices, etc.), with different connection modes corresponding to different distances, and thus, the corresponding distance sizes may be determined based on the connection modes between the internal components.
For each connection mode in the global connection relationship, a corresponding distance can be determined, so that the global connection relationship can be converted into a global distance relationship, and the global distance relationship can represent the distance between the internal components of the plurality of servers.
Step S404, a certain path planning algorithm is adopted to plan out the shortest communication path traversing each acceleration component according to the global distance relation.
In this embodiment, after determining the distance between the internal components, the shortest communication path through each acceleration component in the server cluster can be determined. Wherein the shortest communication path may be determined based on a path planning algorithm. Specifically, determining that the shortest communication path is a typical traveler Problem (TRAVELING SALESMAN Problem, TSP), the shortest communication path through all acceleration components, which may be traversed, may be determined by path planning algorithms such as enumeration, nearest neighbor, branch-and-bound, dynamic planning, genetic, simulated annealing, etc. The path planning algorithm adopted in this embodiment is not limited.
In step S405, each server in the control server cluster performs data communication according to the shortest communication path.
After the shortest communication path is determined, controlling each server in the server cluster to perform data communication according to the shortest communication path; for example, the corresponding acceleration component in each server can be controlled to sequentially execute training tasks, and finally model training is achieved. In this embodiment, based on the global distance relationship, the global shortest communication path may be determined, and training may be performed by using the shortest communication path, so that communication efficiency may be effectively improved. Taking the server cluster shown in fig. 2 as an example, the shortest communication path determined based on the method provided in the present embodiment may be shown in fig. 5; as can be seen from comparing the two communication paths in fig. 3 and fig. 5, the shortest communication path determined in this embodiment can effectively reduce inter-leaf communication, so as to improve communication efficiency.
Because the servers are communicated through the network adapters, at least one network adapter is required to be introduced when traversing acceleration components of different servers so as to realize cross-server communication; therefore, the shortest communication path needs to pass through at least one network adapter of each server in addition to all acceleration components, and the network adapter may be a preset network adapter or a network adapter selected based on actual situations, which is not limited in this embodiment.
According to the data communication method provided by the embodiment, the first connection relation and the second connection relation of the plurality of servers in the server cluster are combined into the global connection relation capable of representing the connection mode of the internal components in the servers and among the servers, so that the global shortest communication path can be constructed, the training task is executed based on the shortest communication path, the communication efficiency of the server cluster can be improved, and the training efficiency can be improved in an artificial intelligence training scene.
The embodiment provides a data communication method which can be applied to a target server; the target server may be a server in a server cluster, or may be a separate server. Fig. 6 is a flowchart of a data communication method according to an embodiment of the present invention, and as shown in fig. 6, the method flow includes the following steps.
Step S601, obtaining a first connection relation and a second connection relation of each server in a server cluster; the first connection relation comprises an internal connection mode among all internal components of the server, and the second connection relation comprises a corresponding relation between a network adapter of the server and network equipment in an accessed server cluster; wherein the internal components include a network adapter and an acceleration component.
Please refer to step S401 in the embodiment shown in fig. 4 in detail, which is not described herein.
In some optional embodiments, each server in the server cluster needs to generate a first connection relationship and a second connection relationship, and send the generated first connection relationship and second connection relationship to the target server. The process of generating the first connection relationship by the server may include the following steps A1 to A2.
Step A1, defining global numbers of all internal components in a server; the global number includes a server number and a component number.
In this embodiment, in order to distinguish internal components of different servers in the server cluster, a globally unique number, that is, a global number, is defined for each internal component, where the internal components are in one-to-one correspondence with the global numbers. Wherein, in order to facilitate each server to set up the global number of the own internal assembly separately, the global number set up includes server number and assembly number.
Wherein, the server number refers to a globally unified server number, and may start from 0.
The component number is the internal number of the internal component in the server. Since the server contains multiple types of components, such as GPUs, network adapters, etc., the components of different types can be sequentially numbered as elements in a set to determine the internal numbers of the corresponding internal components.
Or the internal components of the same component type can be respectively numbered; in this case, it is necessary to set a field for representing the type of the component, which refers to whether the internal component is a GPU, a TPU, a network adapter, or the like; for example, a GPU may be defined with a component type of 0 and a network adapter with a component type of 1. Wherein different component types may be denoted by different numbers, which are specifically internal sequence numbers of internal components of the same component type, starting from 0.
For example, the global number may be represented by a 16-bit binary number, and the 16-bit binary number is divided into three parts of a server number, a component number and a component type, where each part may be specified according to the service ticket cluster size and the server topology, but is globally unified. For example, the server number is 10 bits (e.g., 15:6), the component type is 2 bits (e.g., 5:4), and the component number is 4 bits (e.g., 3:0); thus, a server cluster may have a maximum of 1024 servers, and if 8 GPUs are configured for each server, the entire server cluster may support 8×1024 GPUs.
Specifically, taking the above encoding method as an example, if the component type of the GPU is 0, for the 3 rd GPU of the server 1, the encoding method can be seen in fig. 7.
For convenience of expression, the global numbers of the internal components are described in the manner of [ component type ] [ server number ] [ component number ] in this embodiment. For example, the internal number of the 3 rd GPU of server 1 shown in fig. 7, which can be described as: GPU1.3.
And step A2, determining an internal connection mode that the connection efficiency between any two internal components meets the preset requirement according to the topology information of the internal components of the server.
And step A3, according to the global serial number, recording the internal connection mode among all internal components of the server, and generating a first connection relation.
In this embodiment, for a certain server, topology information of internal components may represent interconnection paths between the internal components, that is, may represent how the internal components are communicatively connected, so based on the topology information, one or more connection modes that may enable communications between the internal components may be determined, and connection efficiency of each connection mode may be determined, so that a connection mode with the highest connection efficiency may be used as an internal connection mode between the internal components.
Specifically, the server includes a processor (CPU), a Root Complex (RC), a Switch (Switch), and the like, and the Switch may specifically be a PCIe (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, a high-speed serial computer expansion bus standard) Switch. There are various connection modes for interconnection between two internal components in the server according to the connection mode and the path through which the data passes. FIG. 8 shows 6 connections within a server, namely 6 internal connections; the connection modes ① to ⑥ are ordered in order of connection efficiency from high to low.
Connection mode ①: LSW (Local Switch), high-speed inter-links. As shown in fig. 8, the two GPUs can be interconnected at high speed through the interconnection switch, so that the data transmission rate is high and the connection efficiency is highest.
Connection mode ②: PIS (PCIe In Switch), interconnect under the same PCIe switch.
Connection mode ③: PXS (PCIe eXtend Switch) are interconnected across PCIe switches and located under the same PCIe switch.
Connection mode ④: PRC (PCIe in RC), interconnected across PCIe switches and located under the same root complex, i.e., interconnected based on a certain root complex.
Connection mode ⑤: SNU (Single NUMA) cross-root complex interconnect under the same NUMA (Non Uniform Memory Access, non-uniform memory access).
Connection mode ⑥: MNU (Multi NUMA), interconnection between different NUMAs, possibly through inter-processor high speed channels such as QPI/UPI (Quick Path Interconnect/Ultra Path Interconnect, fast path interconnect/hyper path interconnect), etc.
In this embodiment, the server may collect topology information of the local internal components, and further determine an internal connection manner between any two internal components, so as to form a first connection manner of the server; wherein, the global numbers of the internal components are used for representing the corresponding internal components, thereby recording the internal connection modes among the internal components.
The internal connection modes among the internal components in the local machine can be stored in a matrix form. For example, a two-dimensional table may be used to preserve the internal connection between internal components, i.e., the first connection relationship is in the form of a two-dimensional table.
For example, the topology of a server is shown in FIG. 9, which includes four GPUs and four network adapters. Illustratively, in this embodiment, the network adapter may be a network card (NIC). It will be appreciated that the NIC in this embodiment may be replaced with other types of network adapters. If the number of the server is 1, the global numbers of the four GPUs and the four network cards are as follows: GPU1.0, GPU1.1, GPU1.2, GPU1.3, NIC1.0, NIC1.1, NIC1.2, NIC1.3, the first connection relationship generated by the server is specifically shown in table 1 below.
TABLE 1
Optionally, after the server is started, connection relation collection can be performed according to the physical topology; and the server can update the first connection relation in real time according to the state change of the internal component, and send the updated first connection relation to the target server. Specifically, after the step A2 "determining, according to the topology information of the internal components of the server, the internal connection mode in which the connection efficiency between any two internal components meets the preset requirement" the method may further include: under the condition that the state of the internal components of the server is changed, the topology information of the internal components of the server is updated, and the internal connection mode that the connection efficiency between any two internal components meets the preset requirement is redetermined.
In this embodiment, when a state of an internal component changes, for example, when the internal component fails, is pulled out or is newly added, the internal connection manner between the internal components may be automatically updated, so as to automatically update the first connection relationship.
For example, for the server shown in fig. 9, the first connection relationship thereof is as shown in table 1 above; if NIC1.0 of the server fails, the slave NIC1.0 may be deleted from the first connection relationship, and the first connection relationship of the server may be updated as in table 2 below.
TABLE 2
In addition, each server also generates a corresponding second connection relationship. Specifically, the network adapter of each server gathers connection relationships with the network devices, i.e., determines through which network device the network adapter accesses the server cluster, so that a second connection relationship may be generated.
For example, the server clusters adopt a Leaf-spine network structure, and each server can determine which Leaf node the network card joins, where the Leaf node accessed by the network card can be determined through LLDP (LINK LAYER Discovery Protocol ), so as to generate a second connection relationship. For example, the structure of the server cluster is shown in fig. 10, and the second connection relationship of the server 0 is shown in table 3 below.
TABLE 3 Table 3
Similarly, the second connection relationship of the server 1 is shown in table 4 below.
TABLE 4 Table 4
Step S602, merging the first connection relation and the second connection relation of the plurality of servers into a global connection relation; the global connection relationship includes a connection manner between respective internal components of the plurality of servers.
Specifically, the above-described step S602 "combining the first connection relationship and the second connection relationship of the plurality of servers into the global connection relationship" includes the following steps S6021 to S6022.
Step S6021, determining a cross-server connection mode between the internal components of the two servers according to the second connection relation of the two servers.
In this embodiment, the second connection relationship of each server may represent a network device to which the network adapter is connected, and based on whether the network devices to which the network adapters of the two servers are connected are the same, a connection manner between the network adapters of the two servers, that is, a cross-server connection manner, may be determined. Moreover, the acceleration components of different servers need to be connected across servers through a network adapter, so that the connection mode of the acceleration components of two servers and the connection mode of the acceleration component of one server and the acceleration component of the other server can be determined.
Optionally, the step S6021 "determining the cross-server connection manner between the internal components of the two servers according to the second connection relationship of the two servers" includes: according to the second connection relation of the two servers, determining the hop count between the internal components of the two servers, and determining a cross-server connection mode corresponding to the hop count based on a preset matching relation between the hop count and the cross-server connection relation.
In this embodiment, when the internal components are connected across servers, they need to pass through one or more network devices in the cluster, and at this time, different connection modes across servers may be represented based on the number of passing network devices, that is, the number of hops between the internal components.
Different inter-server connection relations are preset for different hop counts among internal components among servers, so that a preset matching relation between the hop counts and the inter-server connection relations is formed; the number of hops between internal components can be used to determine the corresponding cross-server connection simply and quickly.
Still taking the server cluster shown in fig. 2 as an example, different cross-server connection modes are defined according to the network hops between servers. As shown in fig. 11, two connection modes are included: connection ⑦ and connection ⑧.
Connection mode ⑦: NET2, which hops through network 2. For example, server 0-leaf 1-server 1.
Connection mode ⑧: NET4, which hops through network 4. For example, server 0-leaf 1-spine 1-leaf 2-server 2.
For a cross-server connection mode involving an acceleration component, a connection mode with more hops can be set, or a connection mode ⑨ which cannot be directly communicated can also be set.
Connection mode ⑨: UR (Un diRect), acceleration components can only communicate with each other through other components, but cannot directly communicate with each other. For example, if two GPUs of the cross-server can only communicate with each other through the network adapter, the cross-server connection between the two GPUs of the cross-server may be set as the connection ⑨.
In this embodiment, after the target server obtains the second connection relationship between the two servers, the hop count between the two internal components of the cross-server may be determined, so as to determine a corresponding cross-server connection manner.
Taking the server cluster shown in fig. 10 as an example, NIC0 of server 0 and NIC0 of server 1 are interconnected through leaf 1, and the hop count between the two is 2, so the connection mode between the two is NET2; or NIC0 of server 0 and NIC2 of server 1 can be interconnected only through leaf 1, ridge 1 and leaf 2, and the number of hops between the two is 4, so the connection mode between the two is NET4. The manner of cross-server connection between the internal components of server 0 and server 1 is shown in table 5 below.
TABLE 5
In step S6022, the internal connection modes between the internal components of the servers and the cross-server connection mode between the internal components of the two servers are combined to form a global connection relationship.
Specifically, an n×n two-dimensional matrix may be established based on the total number N of internal components in the server cluster (N is a positive integer), where each row and each column of the two-dimensional matrix corresponds to a corresponding internal matrix; correspondingly, element n ij (i and j are positive integers) in the two-dimensional matrix represents a connection mode between the internal component corresponding to the ith row and the internal component corresponding to the jth column, and the connection mode may be an internal connection mode or a cross-server connection mode. After determining the internal connection modes between the internal components of the servers and the inter-server connection modes between any two internal components of the servers, corresponding elements in the two-dimensional matrix can be assigned based on the internal connection modes and the inter-server connection modes, and finally the assigned two-dimensional matrix can represent the global connection relation.
For example, if the server cluster is configured as shown in fig. 10, and the server 0 and the server 1 are both configured as shown in fig. 9, the first connection relationship and the second connection relationship of the two servers are combined, that is, the global connection relationship of the server cluster is formed by combining the first connection relationship and the second connection relationship based on the above-mentioned tables 1 and 5, and the global connection relationship can be specifically seen in fig. 12.
In some alternative embodiments, the number of nodes involved in the computation is high due to the need to go through the corresponding network adapter when traversing the acceleration component across servers; in the scene of the server cluster, the acceleration component needs to realize data transmission across servers through the corresponding network adapter, so in the embodiment, the acceleration component and the corresponding network adapter are bound and are regarded as a node, thereby reducing the number of sent nodes participating in calculation and reducing the calculation workload.
Specifically, the step S6022 "merge the internal connection between the internal components of the server and the inter-server connection between the internal components of the two servers to form the global connection relationship" may include the following steps B1 to B3.
And step B1, binding an acceleration component of the server with a corresponding network adapter in the server to form a virtual node.
In this embodiment, the acceleration component in the server generally performs cross-server communication through a specific network adapter, and at this time, the acceleration component and the corresponding network adapter may be bound, and combined into one node, i.e., a virtual node.
The binding relationship can be considered as specified, or can be automatically bound according to the condition that a certain rule is met. For example, the acceleration component and the network adapter may be bound according to the PCIe internal topology of the server, or under the same Root Complex (RC), since the network adapter may send and receive data to and from the acceleration component, the same component number may be bound, that is, the acceleration component and the network adapter with the same component number may be combined to form a virtual node.
It is understood that the acceleration components are in one-to-one correspondence with the virtual nodes, and the number of the acceleration components is the same as the number of the virtual nodes. In addition, different acceleration components may bind different network adapters, or part of the acceleration components may bind the same network adapter, which is not limited in this embodiment.
And B2, determining an internal connection mode between each virtual node of the server and a cross-server connection mode between the virtual nodes of the two servers.
And B3, merging the connection relations among the virtual nodes of the servers to form a global connection relation.
In this embodiment, when the global connection relationship is formed by merging, the connection manner between the virtual nodes is determined by using the virtual nodes as a unit. Similar to the connection manner between internal components, the connection manner between virtual nodes is also divided into an internal connection manner and a cross-server connection manner. Based on the first connection relation and the second connection relation of each server, the internal connection mode between two virtual nodes belonging to the same server can be determined, and the cross-server connection mode between two virtual nodes belonging to different servers can also be determined; the connection relations between the virtual nodes are combined, so that a global connection relation can be formed.
Optionally, the step B2 "determining the internal connection manner between the virtual nodes of the server and the cross-server connection manner between the virtual nodes of the two servers" includes the following steps B21 to B22.
And step B21, regarding the first virtual node and the second virtual node belonging to the same server, taking an internal connection mode between the acceleration component of the first virtual node and the acceleration component of the second virtual node as an internal connection mode between the first virtual node and the second virtual node.
And step B22, regarding the third virtual node and the fourth virtual node belonging to different servers, using a cross-server connection mode between the network adapter of the third virtual node and the network adapter of the fourth virtual node as a cross-server connection mode between the third virtual node and the fourth virtual node.
In this embodiment, for convenience of description, two virtual nodes belonging to the same server are referred to as a first virtual node and a second virtual node, respectively; similarly, two virtual nodes belonging to different servers are referred to as a third virtual node and a fourth virtual node, respectively, i.e., the third virtual node is a virtual node of a certain server and the fourth virtual node is a virtual node of another server.
For a first virtual node and a second virtual node belonging to the same server, because a network adapter is not needed to be used in communication between the first virtual node and the second virtual node, the internal connection mode between the first virtual node and the second virtual node can be determined according to the internal connection mode between the acceleration component of the first virtual node and the acceleration component of the second virtual node; specifically, the internal connection manner between the acceleration component of the first virtual node and the acceleration component of the second virtual node may be used as the internal connection manner between the first virtual node and the second virtual node.
For the third virtual node and the fourth virtual node belonging to different servers, the third virtual node and the fourth virtual node perform cross-server communication, and the network adapter of the third virtual node and the network adapter of the fourth virtual node need to pass through the network adapter, so that the cross-server connection mode between the network adapter of the third virtual node and the network adapter of the fourth virtual node can be used as the cross-server connection mode between the third virtual node and the fourth virtual node.
Taking the server shown in fig. 9 as an example, for the server 1, if the traffic of the GPU1.0 is fixedly transmitted and received through the NIC1.0, the GPU1.0 and the NIC1.0 may be bound into one virtual node GPU1.0-NIC1.0. For convenience of description, the present embodiment binds according to the same component numbers. And the internal connection relation between the virtual nodes after combination is calculated according to the internal connection relation of the GPU.
For the first virtual node GPU1.0-NIC1.0 and the second virtual node GPU1.1-NIC1.1 belonging to the same server, the acceleration component of the first virtual node is GPU1.0, and the acceleration component of the second virtual node is GPU1.1, based on the above table 1, it can be known that the internal connection mode between the two is LSW, and therefore, the internal connection mode between the first virtual node GPU1.0-NIC1.0 and the second virtual node GPU1.1-NIC1.1 can also be determined as LSW.
For the server 1, the internal connection relationship between virtual nodes can be seen from the following table 6.
TABLE 6
GPU1.0-NIC1.0 GPU1.1-NIC1.1 GPU1.2-NIC1.2 GPU1.3-NIC1.3
GPU1.0-NIC1.0 LSW LSW LSW
GPU1.1-NIC1.1 LSW LSW LSW
GPU1.2-NIC1.2 LSW LSW LSW
GPU1.3-NIC1.3 LSW LSW LSW
Similarly, for virtual nodes of different servers, the manner of cross-server connection between the two may also be determined.
For example, taking the server 1 shown in fig. 10 as an example, based on the above table 4, it is known that the virtual nodes GPU1.0-NIC1.0 and GPU1.1-NIC1.1 each correspond to the leaf 1, and the virtual nodes GPU1.2-NIC1.2 and GPU1.3-NIC1.3 each correspond to the leaf 2. Based on this, the manner of cross-server connection between virtual nodes of server 0 and server 1 can be seen in table 7 below.
TABLE 7
GPU1.0-NIC1.0 GPU1.1-NIC1.1 GPU1.2-NIC1.2 GPU1.3-NIC1.3
GPU0.0-NIC0.0 NET2 NET2 NET4 NET4
GPU0.1-NIC0.1 NET2 NET2 NET4 NET4
GPU0.2-NIC0.2 NET4 NET4 NET2 NET2
GPU0.3-NIC0.3 NET4 NET4 NET2 NET2
In combination with the above tables 6 and 7, the global connection relationships formed by merging based on the connection relationships (including the internal connection relationship and the cross-server connection relationship) between virtual nodes can be seen in the following table 8.
TABLE 8
GPU0.0-NIC0.0 GPU0.1-NIC0.1 GPU0.2-NIC0.2 GPU0.3-NIC0.3 GPU1.0-NIC1.0 GPU1.1-NIC1.1 GPU1.2-NIC1.2 GPU1.3-NIC1.3
GPU0.0-NIC0.0 LSW LSW LSW NET2 NET2 NET4 NET4
GPU0.1-NIC0.1 LSW LSW LSW NET2 NET2 NET4 NET4
GPU0.2-NIC0.2 LSW LSW LSW NET4 NET4 NET2 NET2
GPU0.3-NIC0.3 LSW LSW LSW NET4 NET4 NET2 NET2
GPU1.0-NIC1.0 NET2 NET2 NET4 NET4 LSW LSW LSW
GPU1.1-NIC1.1 NET2 NET2 NET4 NET4 LSW LSW LSW
GPU1.2-NIC1.2 NET4 NET4 NET2 NET2 LSW LSW LSW
GPU1.3-NIC1.3 NET4 NET4 NET2 NET2 LSW LSW LSW
Compared with the global connection relation shown in fig. 12, by binding the internal components with the network adapter (network card), the number of nodes participating in calculation is reduced by half, and the global connection relation is reduced to 1/4 of the original one.
In step S603, the distance between the internal components is determined according to the connection manner between the internal components, and the global connection relationship is converted into a global distance relationship including the distances between the internal components of the plurality of servers.
Please refer to step S403 in the embodiment shown in fig. 4 in detail, which is not described herein.
In some alternative embodiments, the step S603 "determining the distance between the internal components according to the connection manner between the internal components" may include the following step C1 or step C2.
And C1, determining the distance corresponding to the connection mode between the internal components according to the preset corresponding relation between the connection mode and the distance.
In this embodiment, a distance corresponding to each connection manner, that is, a correspondence between the connection manner and the distance may be preset; after determining the connection between the internal components, the corresponding distances may be determined based on the correspondence.
For example, for connections ① - ⑨ described above, a distance value may be defined for each connection based on empirical values. For example, the correspondence between the preset connection mode and the distance can be seen from the following table 9.
TABLE 9
If the acceleration component and the network adapter are bound, the distance between the virtual nodes can be determined based on the corresponding relation between the preset connection mode and the distance, so that a global distance relation is formed.
For example, based on the correspondence shown in the above table 9, the global connection relationship shown in the above table 8 may be converted into a corresponding global distance relationship, which may be shown in the following table 10.
Table 10
GPU0.0-NIC0.0 GPU0.1-NIC0.1 GPU0.2-NIC0.2 GPU0.3-NIC0.3 GPU1.0-NIC1.0 GPU1.1-NIC1.1 GPU1.2-NIC1.2 GPU1.3-NIC1.3
GPU0.0-NIC0.0 1 1 1 1024 1024 2048 2048
GPU0.1-NIC0.1 1 1 1 1024 1024 2048 2048
GPU0.2-NIC0.2 1 1 1 2048 2048 1024 1024
GPU0.3-NIC0.3 1 1 1 2048 2048 1024 1024
GPU1.0-NIC1.0 1024 1024 2048 2048 1 1 1
GPU1.1-NIC1.1 1024 1024 2048 2048 1 1 1
GPU1.2-NIC1.2 2048 2048 1024 1024 1 1 1
GPU1.3-NIC1.3 2048 2048 1024 1024 1 1 1
Step C2, determining time delay and bandwidth between the internal components according to the connection mode between the internal components, and determining the distance between the internal components corresponding to the time delay and the bandwidth; the distance and time delay between the internal components are in positive correlation and the bandwidth is in negative correlation.
In this embodiment, the corresponding distance may also be calculated based on the bandwidth and the delay. Specifically, after determining the time delay and the bandwidth between two internal components, the distance between the internal components can be calculated based on the time delay and the bandwidth, and the distance between the internal components and the time delay are in positive correlation and the bandwidth are in negative correlation.
For example, an adjustment function of the delay and an adjustment function of the bandwidth may be set, the distance being proportional to the adjustment function of the delay and inversely proportional to the adjustment function of the bandwidth. For example, the relationship between Distance and delay t and bandwidth B may be:
Wherein, The time delay t is an adjusting function, and the time delay t can be adjusted according to actual conditions and then calculated; for example,/>, can be generally employed。/>The bandwidth B is an adjusting function, and the bandwidth B can be adjusted according to actual conditions and then is calculated; for example, when bandwidth B does not become a bottleneck between two components, it may be employed/>
In this embodiment, after determining the distance between every two internal components, the global connection relationship may also be converted into a global distance relationship, so as to generate a global distance relationship similar to the table 10, which is not described in detail in this embodiment.
In step S604, a certain path planning algorithm is adopted to plan out the shortest communication path traversing each acceleration component according to the global distance relation.
The global shortest communication path, which is obtained from the global topological distance relationship, is also the path that is planned to traverse each acceleration component and minimize the total traversed "distance". The time required for the information to pass along this shortest communication path is minimal.
If the acceleration component is bound to the network adapter, the shortest communication path planned according to the global distance relationship needs to traverse all the virtual nodes.
For example, for the server cluster shown in fig. 9 and fig. 10, if the global distance relationship is as shown in table 10 above, the shortest communication path obtained by the final planning is :[GPU1.0-NIC1.0, GPU0.1-NIC0.1, GPU0.2-NIC0.2, GPU0.3-NIC0.3, GPU0.0-NIC0.0, GPU1.1-NIC1.1, GPU1.2-NIC1.2, GPU1.3-NIC1.3, GPU1.0-NIC1.0]., and the total length of the shortest communication path is 2054, which can be specifically seen in fig. 13.
Optionally, there may be a specific requirement for part of the communication paths in the server cluster, based on which the global distance relation may be adaptively updated, ensuring that the final determined shortest communication path length is shortest. Specifically, after the step S603 "determining the distance between the internal components according to the connection manner between the internal components" described above, the method may further include: determining communication priorities among the internal components, and adjusting the distances among the internal components according to the communication priorities; wherein, under the condition of the same connection mode, the higher the communication priority is, the smaller the distance between the internal components is.
In this embodiment, two internal components may be regarded as one internal component set, and if multiple internal component sets correspond to the same connection manner, the communication priority of each internal component set may be determined, that is, the communication priority between internal components is determined, and the higher the communication priority is, the smaller the distance may be set for the internal component set, so as to flexibly set the adjustment distance.
For example, in a cross-server communication path, communication between the same component numbers may be used preferentially, so that a higher communication priority may be set between two internal components (or virtual nodes) having the same component numbers.
Still taking the structure shown in fig. 9 and 10 as an example, the communication priority between the virtual nodes GPU0.0-NIC0.0 and GPU1.0-NIC1.0 is higher than the communication priority between GPU0.0-NIC0.0 and GPU1.1-NIC 1.1. In this case, different communication priorities may be accommodated by adjusting the "distance" between them, i.e., defining a shorter distance between GPU0.0-NIC0.0 and GPU1.0-NIC 1.0.
For example, assuming that the distance between GPU0.0-NIC0.0 and GPU1.0-NIC1.0 is 512, the global distance relationship shown in table 10 is adjusted, and the new global distance relationship is shown in table 11 below.
TABLE 11
GPU0.0-NIC0.0 GPU0.1-NIC0.1 GPU0.2-NIC0.2 GPU0.3-NIC0.3 GPU1.0-NIC1.0 GPU1.1-NIC1.1 GPU1.2-NIC1.2 GPU1.3-NIC1.3
GPU0.0-NIC0.0 1 1 1 512 1024 2048 2048
GPU0.1-NIC0.1 1 1 1 1024 512 2048 2048
GPU0.2-NIC0.2 1 1 1 2048 2048 512 1024
GPU0.3-NIC0.3 1 1 1 2048 2048 1024 512
GPU1.0-NIC1.0 512 1024 2048 2048 1 1 1
GPU1.1-NIC1.1 1024 512 2048 2048 1 1 1
GPU1.2-NIC1.2 2048 2048 512 1024 1 1 1
GPU1.3-NIC1.3 2048 2048 1024 512 1 1 1
The shortest communication path planned based on the above table 11 is :[GPU0.3-NIC0.3, GPU1.3-NIC1.3, GPU1.0-NIC1.0, GPU1.1-NIC1.1, GPU1.2-NIC1.2, GPU0.2-NIC0.2, GPU0.1-NIC0.1, GPU0.0-NIC0.0, GPU0.3-NIC0.3],, and the total length of the shortest communication path is 1030, which can be seen in fig. 14.
In step S605, each server in the control server cluster performs data communication according to the shortest communication path.
Please refer to step S405 in the embodiment shown in fig. 4 in detail, which is not described herein.
According to the data communication method provided by the embodiment, the distance between the communication nodes can be defined by collecting global topology information, and based on the distance, the global optimal path, namely the shortest communication path, can be calculated by adopting the TSP problem solving method, so that the communication efficiency in an artificial intelligence training scene can be improved. The acceleration component is bound with the network adapter, so that the number of nodes participating in calculation can be reduced, and the calculation efficiency can be improved. And the distance between the nodes can be flexibly set and adjusted according to certain requirements, so that the actual requirements can be met.
In this embodiment, a data communication device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The present embodiment provides a data communication apparatus, which is applied to a target server, as shown in fig. 15, and includes:
An obtaining module 1501, configured to obtain a first connection relationship and a second connection relationship of each server in the server cluster; the first connection relation comprises an internal connection mode among all internal components of the server, and the second connection relation comprises a corresponding relation between a network adapter of the server and network equipment in an accessed server cluster; wherein the internal components include a network adapter and an acceleration component;
A merging module 1502, configured to merge the first connection relationships and the second connection relationships of the plurality of servers into a global connection relationship; the global connection relation comprises connection modes among all internal components of a plurality of servers;
A conversion module 1503, configured to determine a distance between internal components according to a connection manner between the internal components, and convert the global connection relationship into a global distance relationship including distances between the internal components of the plurality of servers;
a planning module 1504, configured to plan, using a certain path planning algorithm, a shortest communication path traversing each acceleration component according to the global distance relationship;
And a control module 1505, configured to control each server in the server cluster to perform data communication according to the shortest communication path.
In some alternative embodiments, the first connection relationship of the server is generated by:
Defining global numbers of all internal components in the server; the global number comprises a server number and a component number;
Determining an internal connection mode of which the connection efficiency between any two internal components meets preset requirements according to the topology information of the internal components of the server;
and recording the internal connection modes among all internal components of the server according to the global number, and generating a first connection relation.
In some alternative embodiments, the obtaining module 1501 is further configured to:
and under the condition that the state of the internal components of the server is changed, updating the topology information of the internal components of the server, and redetermining the internal connection mode of which the connection efficiency between any two internal components meets the preset requirement.
In some optional embodiments, the merging module 1502 merges the first connection relationship and the second connection relationship of the plurality of servers into a global connection relationship, including:
Determining a cross-server connection mode between internal components of two servers according to a second connection relation of the two servers;
and combining an internal connection mode between each internal component of the server and a cross-server connection mode between the internal components of the two servers to form a global connection relation.
In some optional embodiments, the merging module 1502 merges internal connection manners between internal components of the servers and cross-server connection manners between internal components of two servers to form a global connection relationship, including:
Binding an acceleration component of the server with a corresponding network adapter in the server to form a virtual node;
determining an internal connection mode between each virtual node of the server and a cross-server connection mode between the virtual nodes of the two servers;
And combining the connection relations among the virtual nodes of the servers to form a global connection relation.
In some alternative embodiments, the merging module 1502 determines an internal connection between virtual nodes of the servers and a cross-server connection between virtual nodes of two servers, including:
For a first virtual node and a second virtual node belonging to the same server, taking an internal connection mode between an acceleration component of the first virtual node and an acceleration component of the second virtual node as an internal connection mode between the first virtual node and the second virtual node;
And regarding a third virtual node and a fourth virtual node which belong to different servers, taking a cross-server connection mode between the network adapter of the third virtual node and the network adapter of the fourth virtual node as a cross-server connection mode between the third virtual node and the fourth virtual node.
In some optional embodiments, the merging module 1502 determines, according to a second connection relationship between two servers, a cross-server connection manner between internal components of the two servers, including:
According to the second connection relation of the two servers, determining the hop count between the internal components of the two servers, and determining a cross-server connection mode corresponding to the hop count based on a preset matching relation between the hop count and the cross-server connection relation.
In some alternative embodiments, the conversion module 1503 determines the distance between the internal components according to the connection between the internal components, including:
Determining a distance corresponding to the connection mode between the internal components according to a preset corresponding relation between the connection mode and the distance;
Or determining time delay and bandwidth between internal components according to the connection mode between the internal components, and determining the distance between the internal components corresponding to the time delay and the bandwidth; the distance between the internal components and the time delay are in positive correlation and the bandwidth is in negative correlation.
In some alternative embodiments, the conversion module 1503 is further configured to:
Determining communication priorities among the internal components, and adjusting the distances among the internal components according to the communication priorities; wherein, under the condition that the connection modes are the same, the higher the communication priority is, the smaller the distance between the internal components is.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The data communication means in this embodiment is presented in the form of functional units, where the units are ASIC (Application SPECIFIC INTEGRATED Circuit) circuits, including processors and memories executing one or more software or firmware programs, and/or other devices that can provide the above described functions.
The embodiment of the invention also provides a server which is provided with the data communication device shown in the figure 15.
Referring to fig. 16, fig. 16 is a schematic structural diagram of a server according to an alternative embodiment of the present invention, as shown in fig. 16, the server includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the server, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display apparatus coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories. Also, multiple servers may be connected, with each device providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 16.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the server, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The server also includes a communication interface 30 for the server to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present invention by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (13)

1. A data communication method, applied to a target server, the method comprising:
acquiring a first connection relation and a second connection relation of each server in a server cluster; the first connection relation comprises an internal connection mode among all internal components of the server, and the second connection relation comprises a corresponding relation between a network adapter of the server and network equipment in an accessed server cluster; wherein the internal components include a network adapter and an acceleration component;
Combining the first connection relation and the second connection relation of the servers into a global connection relation; the global connection relation comprises connection modes among all internal components of a plurality of servers;
Determining the distance between the internal components according to the connection mode between the internal components, and converting the global connection relation into a global distance relation comprising the distances between the internal components of a plurality of servers;
A certain path planning algorithm is adopted, and the shortest communication path traversing each acceleration component is planned according to the global distance relation;
and controlling each server in the server cluster to perform data communication according to the shortest communication path.
2. The method of claim 1, wherein the first connection relationship of the server is generated by:
Defining global numbers of all internal components in the server; the global number comprises a server number and a component number;
Determining an internal connection mode of which the connection efficiency between any two internal components meets preset requirements according to the topology information of the internal components of the server;
and recording the internal connection modes among all internal components of the server according to the global number, and generating a first connection relation.
3. The method according to claim 2, wherein the method further comprises:
and under the condition that the state of the internal components of the server is changed, updating the topology information of the internal components of the server, and redetermining the internal connection mode of which the connection efficiency between any two internal components meets the preset requirement.
4. The method of claim 1, wherein the merging the first connection and the second connection of the plurality of servers into a global connection comprises:
Determining a cross-server connection mode between internal components of two servers according to a second connection relation of the two servers;
and combining an internal connection mode between each internal component of the server and a cross-server connection mode between the internal components of the two servers to form a global connection relation.
5. The method of claim 4, wherein the merging the inter-connection between the internal components of the servers and the inter-server connection between the internal components of the two servers to form the global connection relationship comprises:
Binding an acceleration component of the server with a corresponding network adapter in the server to form a virtual node;
determining an internal connection mode between each virtual node of the server and a cross-server connection mode between the virtual nodes of the two servers;
And combining the connection relations among the virtual nodes of the servers to form a global connection relation.
6. The method of claim 5, wherein determining the internal connection between the virtual nodes of the server and the cross-server connection between the virtual nodes of the two servers comprises:
For a first virtual node and a second virtual node belonging to the same server, taking an internal connection mode between an acceleration component of the first virtual node and an acceleration component of the second virtual node as an internal connection mode between the first virtual node and the second virtual node;
And regarding a third virtual node and a fourth virtual node which belong to different servers, taking a cross-server connection mode between the network adapter of the third virtual node and the network adapter of the fourth virtual node as a cross-server connection mode between the third virtual node and the fourth virtual node.
7. The method according to claim 4, wherein determining a cross-server connection between internal components of the two servers according to the second connection relationship of the two servers comprises:
According to the second connection relation of the two servers, determining the hop count between the internal components of the two servers, and determining a cross-server connection mode corresponding to the hop count based on a preset matching relation between the hop count and the cross-server connection relation.
8. The method of claim 1, wherein determining the distance between the internal components based on the manner of connection between the internal components comprises:
Determining a distance corresponding to the connection mode between the internal components according to a preset corresponding relation between the connection mode and the distance;
Or determining time delay and bandwidth between internal components according to the connection mode between the internal components, and determining the distance between the internal components corresponding to the time delay and the bandwidth; the distance between the internal components and the time delay are in positive correlation and the bandwidth is in negative correlation.
9. The method according to claim 1, wherein the method further comprises:
Determining communication priorities among the internal components, and adjusting the distances among the internal components according to the communication priorities; wherein, under the condition that the connection modes are the same, the higher the communication priority is, the smaller the distance between the internal components is.
10. A data communication apparatus for application to a target server, the apparatus comprising:
The acquisition module is used for acquiring a first connection relation and a second connection relation of each server in the server cluster; the first connection relation comprises an internal connection mode among all internal components of the server, and the second connection relation comprises a corresponding relation between a network adapter of the server and network equipment in an accessed server cluster; wherein the internal components include a network adapter and an acceleration component;
The merging module is used for merging the first connection relations and the second connection relations of the servers into a global connection relation; the global connection relation comprises connection modes among all internal components of a plurality of servers;
The conversion module is used for determining the distance between the internal components according to the connection mode between the internal components and converting the global connection relation into a global distance relation comprising the distances between the internal components of a plurality of servers;
the planning module is used for adopting a certain path planning algorithm to plan out the shortest communication path traversing each acceleration component according to the global distance relation;
and the control module is used for controlling each server in the server cluster to carry out data communication according to the shortest communication path.
11. A server, comprising:
A memory and a processor in communication with each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the data communication method of any of claims 1 to 9.
12. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the data communication method of any of claims 1 to 9.
13. A computer program product comprising computer instructions for causing a computer to perform the data communication method of any one of claims 1 to 9.
CN202410347498.2A 2024-03-26 2024-03-26 Data communication method, device, server and storage medium Active CN117955897B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410347498.2A CN117955897B (en) 2024-03-26 2024-03-26 Data communication method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410347498.2A CN117955897B (en) 2024-03-26 2024-03-26 Data communication method, device, server and storage medium

Publications (2)

Publication Number Publication Date
CN117955897A true CN117955897A (en) 2024-04-30
CN117955897B CN117955897B (en) 2024-06-18

Family

ID=90792452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410347498.2A Active CN117955897B (en) 2024-03-26 2024-03-26 Data communication method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN117955897B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118487990A (en) * 2024-07-16 2024-08-13 苏州元脑智能科技有限公司 Server cluster flow control method and device and server cluster

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150003465A1 (en) * 2008-11-12 2015-01-01 Teloip Inc System, apparatus and method for providing improved performance of aggregated/bonded network connections with multiprotocol label switching
US20160080255A1 (en) * 2014-09-17 2016-03-17 Netapp, Inc. Method and system for setting up routing in a clustered storage system
CN106922211A (en) * 2014-09-17 2017-07-04 特洛伊普公司 System, apparatus and method for providing the performance for improving polymerization/binding network connection with multiprotocol label switching
CN117579550A (en) * 2023-11-20 2024-02-20 南开大学 Routing method based on programmable switch and container state information unloading

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150003465A1 (en) * 2008-11-12 2015-01-01 Teloip Inc System, apparatus and method for providing improved performance of aggregated/bonded network connections with multiprotocol label switching
US20160080255A1 (en) * 2014-09-17 2016-03-17 Netapp, Inc. Method and system for setting up routing in a clustered storage system
CN106922211A (en) * 2014-09-17 2017-07-04 特洛伊普公司 System, apparatus and method for providing the performance for improving polymerization/binding network connection with multiprotocol label switching
CN117579550A (en) * 2023-11-20 2024-02-20 南开大学 Routing method based on programmable switch and container state information unloading

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118487990A (en) * 2024-07-16 2024-08-13 苏州元脑智能科技有限公司 Server cluster flow control method and device and server cluster

Also Published As

Publication number Publication date
CN117955897B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN117955897B (en) Data communication method, device, server and storage medium
US12001681B2 (en) Distributed storage system and data processing method
EP2710470B1 (en) Extensible centralized dynamic resource distribution in a clustered data grid
CN113098773B (en) Data processing method, device and system
US8325761B2 (en) System and method for establishing sufficient virtual channel performance in a parallel computing network
US8898422B2 (en) Workload-aware distributed data processing apparatus and method for processing large data based on hardware acceleration
JP2007249810A (en) Reduction method for parallel computer, and parallel computer
US10084860B2 (en) Distributed file system using torus network and method for configuring and operating distributed file system using torus network
CN110119304B (en) Interrupt processing method and device and server
CN103649923B (en) A kind of NUMA Installed System Memory mirror configuration method, release method, system and host node
KR102028428B1 (en) Distributed file system using torus network and method for configuring and operating of the distributed file system using torus network
US20230136661A1 (en) Task scheduling for machine-learning workloads
WO2023040197A1 (en) Cross-node communication method and apparatus, device, and readable storage medium
CN117493237B (en) Computing device, server, data processing method, and storage medium
US9749219B2 (en) Method of optimizing routing in a cluster comprising static communication links and computer program implementing that method
CN115328579A (en) Scheduling method and system for neural network training and computer readable storage medium
US10255399B2 (en) Method, apparatus and system for automatically performing end-to-end channel mapping for an interconnect
CN105187487A (en) Copying state machine modular framework design method oriented to cloud storage
CN116663639B (en) Gradient data synchronization method, system, device and medium
WO2020124488A1 (en) Application process mapping method, electronic device, and computer-readable storage medium
CN115879543B (en) Model training method, device, equipment, medium and system
CN115426221A (en) Gateway device of Internet of things
CN104580328A (en) Virtual machine migration method, device and system
TW202315360A (en) Microservice allocation method, electronic equipment, and storage medium
JPWO2014102996A1 (en) Information processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant