WO2023040447A1

WO2023040447A1 - Bus system, communication method, and related device

Info

Publication number: WO2023040447A1
Application number: PCT/CN2022/105758
Authority: WO
Inventors: 刘君龙
Original assignee: 华为技术有限公司
Priority date: 2021-09-14
Filing date: 2022-07-14
Publication date: 2023-03-23
Also published as: CN115811446A

Abstract

Disclosed in embodiments of the present application are a bus system, a communication method, and a related device. The system is a graph structure composed of multiple master nodes, multiple switches, and multiple slave nodes; any multipath bus subsystem in the graph structure comprises a first master node, a first slave node, and N switches; the N switches comprise N1 first switches adjacent to the first master node, N2 second switches, and N3 third switches adjacent to the first slave node; any first switch is adjacent to any third switch or connected to any third switch by means of the second switch; the first master node is used for sending an enum message to the first slave node multiple times and determining multiple routing paths between the first master node and the first slave node; and each routing path at least passes through one or more of the N1 first switches, S second switches of N2 second switches, and one or more of the N3 third switches in sequence. By using the embodiments of the present application, multipath access to a device on a bus can be supported.

Description

A bus system, communication method and related equipment

This application claims the priority of a Chinese patent application filed with the China Patent Office on September 14, 2021, with application number 202111083972.8, and application title "A bus system, communication method, and related equipment", the entire contents of which are hereby incorporated by reference In this application.

technical field

The present application relates to the field of computer technology, in particular to a bus system, a communication method and related equipment.

Background technique

With the rapid development of information technology, artificial intelligence (AI), autonomous driving computing, cloud computing and other data and computing-intensive application scenarios are becoming more and more popular, and the entire computing system will become more and more complex. Some computing devices (for example, graphics processing unit (graphic processing unit, GPU) and tensor processing unit (tensor processing unit, TPU), etc.) will be widely integrated and applied.

In this way, the requirements for interconnection buses are bound to become higher and higher, and interconnection buses with high bandwidth, low latency, low energy consumption, and easy implementation are becoming more and more important. The bus interconnection in the future will be disordered, point-to-point graph structure, a node on the bus may be multipathing, and a node on the bus can be used as a central processor unit (CPU). ) Node to manage a certain device, there may be multiple hosts (multi-host) on the bus. The current interconnection in the industry is also following this direction. For example, the industry's Compute Express Link (CXL) 2.0 supports CXL's multi -host function, and the future evolution of CXL will put more emphasis on point-to-point computing and multi-path node interconnection. Therefore, from the perspective of industry trends, it is an inevitable trend for bus device management to support multi-host and multi-path.

However, peripheral component interconnect express (PCIe), as the most popular interconnection bus in the industry today, is a strict top-down tree topology bus, which does not natively support multi-host and multipathing. For CXL2.0, a special CXL switch and a specific topology interconnection (for example, a type (type) 3 CXL device connected to a CXL switch) are required to realize multi-host. In addition, for future CXL evolution versions (such as CXL3.0), it may support multipathing, but it must also require this special CXL switch and specific topology interconnection to support it.

Therefore, most of the existing technologies "deceive" multiple hosts through special intermediate connection devices and specific interconnection topologies, so that different hosts think that they "exclusively" these node devices, and cannot truly support multi-hosts and multi-hosts. Path, and this will undoubtedly greatly increase the delay, cost and topology constraints of device interconnection.

Contents of the invention

Embodiments of the present application provide a bus system, a communication method, and related equipment, which can support multi-path access to devices on the bus.

In the first aspect, the embodiment of the present application provides a bus system, the system is a graph structure composed of a plurality of master nodes, a plurality of switches and a plurality of slave nodes through the bus; any multipath in the graph structure The bus subsystem includes a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and the N2 second switches connected to the first master node N3 third switches adjacent to the first slave node; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all less than or equal to N A positive integer; the first master node is used to send enumeration reports to the first slave node multiple times, and based on at least one switch, determine the relationship between the first master node and the first slave node A plurality of routing paths; the at least one switch is the switch through which the report message is passed from the first master node to the first slave node each time; wherein, each routing path in the plurality of routing paths is at least sequentially via one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches; S is less than or equal to N3 Natural number.

The conventional PCIe bus interconnection is a tree topology with a strict top-down hierarchical relationship. Nodes on each layer can only be related to one node in the upper layer, but may be related to multiple nodes in the lower layer. Related, resulting in only one top-down path from the host on the bus to any node device (such as GPU, TPU, and network card, etc.). This greatly limits the scalability of the topology, which cannot meet the increasingly complex computing system in the case of increasingly large computing data. In the embodiment of this application, the top-down tree structure of the existing PCIe bus is broken, and the host (ie, master node), node device (ie, slave node) and switch are connected to form a flat graph structure, wherein a certain node in the graph structure (may be a master node, or a slave node such as a GPU or a TPU, or a switch, etc.) may be related to any other node. Based on this, in this graph structure, a plurality of switches are used as intermediate devices connecting the master node and the slave node, and can construct a master node (for example, the first master node) to a slave node (for example, the first master node) A slave node) multiple physical links. In the process of the master node discovering all devices in the entire topology by sending an enumeration message, each time the master node sends an enumeration message, the number of report messages may be enumerated to a slave node through a certain physical link. The master node can record the routing path from the master node to the slave node based on which port of the master node the enumeration report departs from in the physical link, which port of the switch it passes through, etc. Further, as mentioned above, under the premise that multiple physical links between a master node and a slave node are constructed by the present application through multiple switches (such as the first switch, the second switch and the third switch), the host sends Multiple enumeration reports may reach the same slave node through multiple different paths. In this way, the master node can send enumeration reports multiple times to record multiple different routes between the master node and the slave node path. Therefore, compared with the tree structure in the prior art, the bus interconnection topology implemented in the embodiment of the present application is a graph structure, which can support multipathing access to node devices (multipathing), that is, there can be multiple Accessible paths can make the bus interconnection easier, and the design space is larger, making it more scalable and larger in capacity, so as to meet the increasingly complex and huge computing needs of users.

It should be noted that the master node in the embodiment of the present application may be a master processing chip, and the slave node may be a slave processing chip, a memory, or a dedicated hardware processing unit, and the like. Specifically, the master node may be a host computer including multiple central processing units, and the slave node may be a computing unit such as a GPU or a TPU, or a storage device such as a solid-state disk, and so on. The master node can access the corresponding slave nodes through the multiple routing paths determined above, and then call the computing resources or data resources in the slave nodes to perform a series of calculation processing and so on.

In addition, the master node described in the embodiment of this application can also be called a host, and the slave node can also be called a node device. Correspondingly, the management master node can also be called a management host, and the management master node information register can also be called a It is the management host information register, and the management master node signature can also be called the management host signature, etc., and will not be explained repeatedly in the future.

In some possible implementation manners, the first master node is further configured to query the visible bit in the routing status register of the first slave node based on the sent enumeration message, if the visible bit When the bit is 0, the corresponding device number is allocated to the first slave node; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated and found; the first slave node, It is used to save the device number assigned by the first master node to the routing status register, and set the visible bit to 1; wherein, the visible bit is 1 to indicate that the first slave The node is currently discovered by enumeration.

In the embodiment of this application, as mentioned above, since there are multiple paths reachable to the node device, when the host performs the enumeration process, a node device can be enumerated by the host through multiple paths, and the node device can be enumerated via the first When a routing path is discovered by enumeration for the first time by the host, the visible bit in the routing status register of the node device can be set to 1 to indicate that the node device has been discovered by enumeration currently. In this way, after the host sends an enumeration message to the node device via other routing paths, it can be determined that the node device has been enumerated and discovered according to its visible bit being 1, so that there is no need to assign a device number to it, and only need to record this The new routing path through which the enumeration message is sent for the first time is sufficient. In this way, the enumeration under the multi-path can be realized in a form of recording the historical discovery state, avoiding repeated enumeration and infinite loop under the multi-path, and completing the enumeration efficiently and accurately.

In some possible implementation manners, the first master node is further configured to send a first configuration message to the first slave node, so as to obtain management authority for the first slave node; the first configuration The message carries the master node password and the master node number of the first master node; the first slave node is also used to receive the first configuration message, and based on the first configuration message, set the The signature bit in the management master node information register of the first slave node is set to 1; wherein, the signature bit is 1 to indicate that the first slave node currently has a management master node, and other master nodes in the system The management authority to the first slave node cannot be obtained; the first slave node is further configured to save the master node password and master node number of the first master node in the management master node information register.

In this embodiment of the application, the host can set the signature bit in the management host information register of the node device to 1 by sending the first configuration message to the node device, and set its own host password (that is, the master node password) and The host number (that is, the master node number) is written into the node device, so as to obtain the management authority of the node device. It should be noted that after the signature bit is set to 1, that is, after a certain host obtains the management authority, other hosts will not be able to obtain the management authority of the node device, thus ensuring that the node device It has a unique management host (that is, the management master node) to ensure the clarity of the device management plane. In some possible embodiments, a host can obtain management rights to multiple different node devices, and when the host obtains management rights to different node devices, the host password for each node device can be different, the host password It can be used as an important credential for subsequent verification of the identity of the management host, thereby ensuring the security of the device management plane. Among them, the management authority includes but is not limited to the host arbitrating the resource competition between node devices, handling abnormal error reports of node devices, managing and configuring the basic characteristics of node devices (such as the maximum supported packet length), etc. wait. In some possible embodiments, a mechanism for separating the management and use of node devices is also defined. As mentioned above, a node device can only be managed by one host, but can be accessed and used by multiple hosts (for example Including the use of data resources and computing resources, etc.), etc., which are not specifically limited in this embodiment of the present application.

In some possible implementation manners, the first master node is further configured to send a second configuration message to the first slave node, so as to cancel the management authority of the first slave node; the second configuration The message carries the master node password and the master node number of the first master node; the first slave node is also used to receive the second configuration message, if the second configuration message carries the first The master node password and the master node number of a master node are consistent with the master node password and the master node number stored in the management master node information register, then the signature bit in the management master node information register is set to 0; Wherein, the signature bit being 0 is used to indicate that the first slave node currently does not own the management master node.

In this embodiment of the application, after the host obtains the management authority for a certain node device, it can also send the second configuration Message, reset the signature bit in the management host information register of the node device (that is, the management master node information register) to 0, thereby canceling the management authority of the node device, improving the flexibility of the device management mechanism, and satisfying the user Actual demand. Moreover, when the node device receives the second configuration message, it needs to verify the host password and host number carried in the message, and only allow The host performs the operation of canceling the management authority, that is, the embodiment of the present application also defines the verification mechanism of the node device management host to ensure that the management host of the node device cannot be counterfeited, and further ensures clear and reliable device management.

In some possible implementations, the first master node is further configured to, after obtaining the management authority for the first slave node, respond to a query message sent by the first slave node, A slave node sends an in-position message; or, sends the in-position message to the first slave node according to a first time interval.

In the embodiment of the present application, an on-site confirmation process of the management host is also defined to ensure the robustness of the bus system. After the host obtains the management authority of the node device, it can respond to the query message sent by the node device and send an in-position message to the node device, or actively send an in-position message to the node device at a certain time interval, thereby notifying the node device. The management host of the node device is currently normal, ensuring the real-time presence of the management host. Moreover, generally, the frequency of presence confirmation interaction between the host and the node device can be very low, and the impact on the bus bandwidth is almost negligible.

In some possible implementation manners, the first slave node is further configured to set the signature bit in the management master node information register to 0 when a preset condition is met, so as to cancel the The management authority of a master node to the first slave node; wherein, the preset condition includes: after the first master node obtains the management authority of the first slave node, the first slave node The in-position message sent by the first master node was not received within the preset time, or, after the first slave node sent K times of query messages to the first master node, none of them received The presence message sent by the first master node; K is an integer greater than or equal to 1.

In the embodiment of this application, when the node device has not received the in-position message sent by its management host (for example, the first master node) for a long time, or sent query messages to its management host for many times without getting a response , it is considered that the management host is in an abnormal state, and then its management authority is cancelled, so as to obtain a new management host later, so as to ensure the reliable operation of the entire bus system.

In some possible implementations, the first slave node is further configured to send a broadcast message to at least one master node in the system; the broadcast message is used to indicate that the first slave node does not currently have a management master Node; the at least one master node, configured to receive the broadcast message, and send the first configuration message to the first slave node based on the broadcast message, so as to obtain management of the first slave node authority.

In the embodiment of this application, when the node device judges that the current management host (such as the first master node) is abnormal and cancels its management authority, it can also send a broadcast message to multiple hosts in the bus system to notify The plurality of hosts obtain the management authority of the node device, so that the node device can have a new management host, thereby ensuring the reliable operation of the entire bus system.

In some possible implementations, the multipath bus subsystem further includes a second master node; the N switches further include N4 fourth switches adjacent to the second master node; wherein, any fourth The switch is adjacent to any third switch, or connected through one or more second switches; N4 is a positive integer less than or equal to N; the second master node is used to send multiple times to the first slave node report messages, and based on at least one switch, determine multiple routing paths between the second master node and the first slave node; A switch through which the node goes to the first slave node; wherein, each routing path in the plurality of routing paths passes through at least one or more of the N4 fourth switches and the N2 second switches in sequence S of them, and one or more of said N3 third switches.

In the embodiment of this application, a node device (such as the first slave node) can also be discovered by multiple different hosts, that is, other hosts (such as the second master node) can also be connected to the node device via multiple switches, The enumeration message is sent to the node device multiple times, thereby recording multiple routing paths with the node device, and so on. It greatly expands the scope of use of node devices to meet the needs of each host for a large number of node devices, so that any host can access node devices under other hosts, and then can call more computing resources to perform more complex tasks. Calculation processing. In addition, as mentioned above, although a node device can be used by multiple hosts, it can only be managed by one host. The single host management mechanism can make the management clearer.

In some possible implementation manners, the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and multiple first slave nodes, so The second node domain includes the second master node and multiple second slave nodes; the second master node is specifically configured to: all first slave nodes in the first node domain are controlled by the first The master node assigns a device number, and after all the second slave nodes in the domain of the second node are assigned device numbers by the second master node, it sends enumeration messages to the first slave node multiple times, and based on At least one switch determines multiple routing paths between the second master node and the first slave node.

In some possible implementation manners, the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; the first cross-node domain switch The first port of the switch is connected to the second port of the second cross-node domain switch; if some or all of the first slave nodes in the first node domain have not been assigned device numbers by the first master node, and/ Or, some or all of the second slave nodes in the second node domain have not been assigned device numbers by the second master node, then the data link between the first port and the second port is in a closed state; if All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node, Then the data link between the first port and the second port is in an open state, so that the second master node sends more data to the first slave node through the second port and the first port An enumeration report is sent for the second time, and multiple routing paths between the second master node and the first slave node are determined based on at least one switch.

In the embodiment of this application, the bus system can be divided into multiple node domains. In this way, the host can further enumerate and discover node devices in other node domains after completing the enumeration and discovery of node devices in the current node domain. , and then record multiple routing paths with node devices in other node domains. In short, the enumeration and discovery process of the host starts from the node domain to which it currently belongs. Before the enumeration and discovery of the node devices in the local node domain is completed, the current node domain may not be connected with any other node domains. Node devices make data stream connections. In this way, it can support multi-host device enumeration discovery under cross-node domains (that is, cross-network or different system management software), and ensure that different enumeration software will not assign multiple device numbers to the same node device. Or a conflict scenario where the same device number is assigned to different node devices.

In some possible implementation manners, the second master node is a remote master node connected to the first master node through a network through a switch; the second master node is specifically configured to: access the first master node through a network connection. The first slave node in a node domain, to invoke computing resources in the first slave node or read stored data in the first slave node.

In some possible implementations, the first master node is the central processing unit CPU in the first terminal; the first master node is also used to invoke computing resources or read resources of at least one second slave node through a network connection The stored data in the at least one second slave node is fetched; the second slave node is an image processor GPU, a solid state disk, an accelerator, a network card or a tensor processing unit TPU in the second terminal. Optionally, the first terminal and the second terminal may be smart phones, tablet computers, desktop computers, computers, servers, etc., which are not specifically limited in this embodiment of the present application.

In the embodiment of this application, based on the graph structure, a node device on the bus can be accessed and used by multiple hosts. Further, a node device can not only be accessed and used by hosts connected by wires at the near end, but also can be accessed by Remotely accessed and used by other hosts connected wirelessly, so that the host can call remote computing resources and store data, to a greater extent meet the host's demand for huge computing resources when performing complex computing processing, and so on. Undoubtedly, the network connection greatly enhances the scalability of the entire bus system, and further breaks through the limitation of the existing PCIe bus connection.

In some possible implementations, the slave node is an image processor, a solid state disk, an accelerator, a network card, a tensor processing unit, an embedded neural network processor NPU, a digital signal processor DSP, an image signal processor ISP or a switch any of the.

In this embodiment of the application, the node device may be an image processor, a solid state disk, an accelerator, a network card, a tensor processing unit, an embedded neural network processor (neural-network processor, NPU), a digital signal processor (digital signal processor) , DSP), an image signal processor (image signal processor, ISP) and a switch, etc., which are not specifically limited in this embodiment of the present application. Based on the graph structure with more scalability and larger design space in the embodiment of the present application, the above various node devices can be added to the structure, so that the host on the bus can access the corresponding node devices according to actual needs, to use Its computing resources or data resources, etc., so as to meet the increasingly complex and huge computing needs.

In some possible implementation manners, the master node includes one or more central processing units (CPUs).

In the embodiment of the present application, the host may be a computing system with one or more central processing units. In some possible embodiments, the host may also include a main memory, a cache memory (cache), an internal interconnection bus, an input and output The (input/output, IO) interface and the like are not specifically limited in this embodiment of the present application. Based on the graph structure that is more scalable and has a larger design space in the embodiment of the present application, the host can use computing resources (such as computing units in the GPU, etc.) Computing processing to meet increasingly complex and huge computing needs.

In the second aspect, the embodiment of the present application provides a communication method, which is applied to a bus system, and the bus system is a graph structure composed of a plurality of master nodes, a plurality of switches, and a plurality of slave nodes through the bus; the graph structure Any one of the multipath bus subsystems in the system includes a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, and N2 second switches switch, and N3 third switches adjacent to the first slave node; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all is a positive integer less than or equal to N; the method includes: through the first master node, sending an enumeration message to the first slave node multiple times, and based on at least one switch, determining the first master node A plurality of routing paths with the first slave node; the at least one switch is the switch through which each enumeration report message passes from the first master node to the first slave node; wherein each route The path passes through at least one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches in sequence; S is less than or equal to A natural number of N3.

In some possible implementations, the method further includes: through the first master node, based on the sent enumeration message, querying the visible bits in the routing status register of the first slave node, if If the visible bit is 0, assign the corresponding device number to the first slave node; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated; through the The first slave node saves the device number allocated by the first master node into the routing status register, and sets the visible bit to 1; wherein, the visible bit is 1 to indicate the The first slave node is currently discovered by enumeration.

In some possible implementation manners, the method further includes: using the first master node, sending a first configuration message to the first slave node, so as to obtain management authority for the first slave node; The first configuration message carries the master node password and the master node number of the first master node; through the first slave node, the first configuration message is received, and based on the first configuration message, the The signature bit in the management master node information register of the first slave node is set to 1; wherein, the signature bit is 1 to indicate that the first slave node currently owns a management master node, and other The master node cannot obtain the management authority of the first slave node; through the first slave node, the master node password and the master node number of the first master node are saved in the management master node information register.

In some possible implementation manners, the method further includes: using the first master node, sending a second configuration message to the first slave node, so as to cancel the management authority of the first slave node; The second configuration message carries the master node password and the master node number of the first master node; the second configuration message is received through the first slave node, if the second configuration message carries the The master node password and the master node number of the first master node are consistent with the master node password and the master node number stored in the management master node information register, then the signature bit in the management master node information register is set to 0; wherein, the signature bit is 0 to indicate that the first slave node currently does not own a management master node.

In some possible implementation manners, the method further includes: through the first master node, after obtaining the management authority to the first slave node, in response to a query message sent by the first slave node, sending an in-position message to the first slave node; or sending the in-position message to the first slave node at a first time interval.

In some possible implementation manners, the method further includes: by the first slave node, setting the signature bit in the management master node information register to 0 when a preset condition is met, so as to Canceling the management authority of the first master node to the first slave node; wherein, the preset condition includes: after the first master node obtains the management authority of the first slave node, the The first slave node does not receive the presence message sent by the first master node within a preset time, or, after the first slave node sends K times of query messages to the first master node, The in-position message sent by the first master node has not been received; K is an integer greater than or equal to 1.

In some possible implementations, the method further includes: sending a broadcast message to at least one master node in the system through the first slave node; the broadcast message is used to indicate that the first slave node is currently does not have a management master node; through the at least one master node, the broadcast message is received, and the first configuration message is sent to the first slave node based on the broadcast message, so as to obtain the configuration information for the first slave node Administrative permissions for the node.

In some possible implementations, the multipath bus subsystem further includes a second master node; the N switches further include N4 fourth switches adjacent to the second master node; wherein, any fourth The switch is adjacent to any third switch, or connected through one or more second switches; N4 is a positive integer less than or equal to N; the method also includes: through the second master node, to the first slave The node sends enumeration report messages multiple times, and based on at least one switch, determines multiple routing paths between the second master node and the first slave node; A switch through which the second master node goes to the first slave node; wherein, each routing path in the plurality of routing paths passes through at least one or more of the N4 fourth switches in sequence, the S of the N2 second switches and one or more of the N3 third switches.

In some possible implementation manners, the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and multiple first slave nodes, so The second node domain includes the second master node and multiple second slave nodes; the second master node sends an enumeration message to the first slave node multiple times, and based on at least one switch , determining multiple routing paths between the second master node and the first slave node, including: all first slave nodes in the domain of the first node are assigned device numbers by the first master node, In addition, after all the second slave nodes in the second node domain are assigned device numbers by the second master node, the enumeration report message is sent to the first slave node multiple times through the second master node, And based on at least one switch, determine multiple routing paths between the second master node and the first slave node.

In some possible implementations, the second master node is a remote master node that is network-connected to the first master node through a switch; the method further includes: using the second master node to access The first slave node in the first node domain is used to invoke computing resources in the first slave node or read stored data in the first slave node.

In a third aspect, an embodiment of the present application provides a master node, where the host includes a processor configured to support the master node to perform a corresponding function in any one of the communication methods provided in the second aspect. The master node may also include a memory, which is used to be coupled with the processor, and stores necessary program instructions and data of the master node. The master node may also include a communication interface for the master node to communicate with other devices or a communication network.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the communication described in any one of the above-mentioned second aspects is realized. method flow.

In a fifth aspect, an embodiment of the present application provides a computer program, the computer program includes instructions, and when the computer program is executed by a computer, the computer can execute the process of the communication method described in any one of the above-mentioned second aspects.

In a sixth aspect, the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the processor is used to call and run instructions from the communication interface, and when the processor executes the instructions, the chip Execute the flow of the communication method described in the second aspect above.

In the seventh aspect, the embodiment of the present application provides a chip system, the chip system includes the bus system described in any one of the above-mentioned first aspects, and is used to implement the communication method process described in any one of the above-mentioned second aspects the functions involved. In a possible design, the chip system further includes a memory, and the memory is used for storing necessary program instructions and data of the communication method. The system-on-a-chip may consist of chips, or may include chips and other discrete devices.

Description of drawings

Figure 1 is a schematic diagram of an interconnection structure based on CXL2.0.

Fig. 2 is a schematic structural diagram of a bus system provided by an embodiment of the present application.

FIG. 3 is a schematic structural diagram of a multipath bus subsystem provided by an embodiment of the present application.

Fig. 4a is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.

Fig. 4b is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.

Fig. 4c is a schematic structural diagram of another multi-path bus subsystem provided in the example of the present application.

Fig. 4d is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.

FIG. 5 is a schematic structural diagram of another bus system provided by an embodiment of the present application.

Fig. 6 is a schematic diagram of a management host signature flow provided by an embodiment of the present application.

FIG. 7 is a schematic flow diagram of a host canceling management authority provided by an embodiment of the present application.

Fig. 8a is a schematic diagram of a management host presence confirmation process provided by an embodiment of the present application.

Fig. 8b is a schematic diagram of another management host presence confirmation process provided by the embodiment of the present application.

FIG. 9 is a schematic structural diagram of another multipath bus subsystem provided by an embodiment of the present application.

10a-10c are schematic structural diagrams of a set of multi-path multi-master bus systems provided by an embodiment of the present application.

11a-11b are schematic diagrams of an enumeration process under a group of multi-path multi-hosts provided by the embodiment of the present application.

FIG. 12 is a schematic flowchart of a communication method provided by an embodiment of the present application.

Detailed ways

The embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

The terms "first" and "second" in the description and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses. It should be noted that when an element is referred to as being "coupled" or "connected" to another element or elements, it may be that one element is directly connected to another element or elements or may be indirectly connected to the other element. one or more elements.

It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein may be combined with other embodiments.

The terms "component", "module", "system" and the like are used in this specification to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a processor and a processor may be components. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.

First of all, some terms used in this application are explained to facilitate the understanding of those skilled in the art.

(1) Topological structure. The topology of a computer network refers to the method of studying the relationship between points and lines that have nothing to do with size and shape in topology. It abstracts the computer and communication equipment in the network into a point, and abstracts the transmission medium into a line, which is composed of points and lines. The geometry of is the topology of the computer network. Among them, the most important topological structures of computer networks include bus topology, ring topology, tree topology, star topology, hybrid topology and mesh topology. For example, PCI adopts a bus topology, PCIe adopts a tree topology, and so on. Furthermore, hosts in the topology can use algorithms to search the entire network topology, enumerate and discover all devices connected in the network, the so-called enumeration discovery simply means to traverse the devices physically connected under each port, and read the configuration of the devices Space, assign corresponding device numbers to it and record the routing path enumerated to the device. Finally, the host can know the number of devices in the entire topology and the connection relationship between each device, and so on. Apparently, as mentioned above, if a device number has been allocated to a device, it may indicate that the device has been discovered by enumeration. In some possible implementations of the present application, the topology structure may be a flattened (flat) graph structure, and there may be multiple paths from the host on the bus (ie, the master node) to a node device (ie, the slave node) , and at the same time, a node device can also be accessed by multiple hosts on the bus, etc., which will not be described in detail here.

(2) A switch (switch), used to implement bus interconnection and routing, includes multiple ports, and each port may correspond to a physical link to connect a master node, a slave node or other switches in the bus system. In the embodiment of the present application, multiple switches are specifically used to construct multiple physical links between the master node and the slave nodes by connecting with each other, so as to realize multi-path access from the master node to the slave nodes. In some possible embodiments, some ports of the switch can also have network card functions for remote network connection, and some special ports can be used to control the enumeration and discovery process of the master node in the node domain where it is located. In this way, interference in the enumeration process of multiple master nodes is avoided, etc., and will not be elaborated here.

(3) Graphic structure, that is, graph structure, is a nonlinear structure more complex than tree structure. In the tree structure, there is a branched hierarchical relationship between nodes, and a node on each layer can only be related to one node in the previous layer, but may be related to multiple nodes in the next layer. In the graph structure, the predecessor and successor of any node can be one or more, breaking the strict hierarchical relationship from top to bottom in the tree structure, or in other words, any two nodes in the graph structure All nodes may be related, that is, the adjacency relationship between nodes can be arbitrary.

First, in order to facilitate the understanding of the embodiments of the present application, the technical problems specifically to be solved in the present application are further analyzed and proposed. In the prior art, various schemes are included regarding the bus interconnection technology, and the more common CXL2.0 schemes are exemplarily listed below.

Please refer to Figure 1, which is a schematic diagram of a CXL2.0-based interconnection structure. As shown in Figure 1, the structure may include a CXL switch, a bus manager (fabric manager, FM), multiple hosts and multiple devices. Specifically, it can include host 0, host 1, and type3 device 0, type3 device 1, and type3 device 2. Obviously, as shown in Figure 1, the multiple hosts and devices must participate in the interconnection through the CXL switch, that is, the host cannot be directly connected to the device, but indirectly connected to the device through the CXL switch. Top-down PCIe tree structure.

Further, as shown in Figure 1, the CXL switch includes a bus manager endpoint device (fabric manager Endpoint, FM EP) connected to the FM, and two virtual CXL switches (virtual CXL switch, VCS), such as VCS-0 and VCS-1, wherein, can also comprise a plurality of virtual pci-to-pci bridges (virtual pci-to-pci bridge, vPPB) in each VCS, as shown in Figure 1, can comprise vPPB-01, vPPB-02, vPPB-03, and VCS-1 may include vPPB-11, vPPB-12, and vPPB-13. In addition, the CXL switch also includes multiple physical pci-to-pci bridges (pci-to-pci bridge, PPB), such as PPB-0, PPB-1, and PPB-2.

It should be noted that in the CXL2.0 solution, the CXL switch first needs to be initialized by the FM, and the downstream port (downstream port, DP) of the CXL switch is not bound to the virtual CXL switch, but only belongs to the FM. Among them, FM can initialize the CXL switch through some manufacturer-defined mechanisms, and bind the vPPB of the CXL switch to a physical PPB in advance. Among them, multiple vPPBs can be bound to the same PPB, as shown in Figure 1. Both vPPB-03 and vPPB-12 can be bound to PPB-1.

After the CXL switch is initialized, host 0 and host 1 can enumerate to the CXL switch and the connected devices behind the CXL switch (such as type3 device 0, type3 device 1, and type3 device 2) according to the standard PCIe enumeration process. Switch's address window, bus number window, etc. Therefore, for host 0 and host 1, what they see is a complete exclusive PCIe tree. The CXL switch must store the routing configurations during these enumeration processes, and the subsequent downstream data flows can be accurately routed to the actual physical downstream ports. For the upstream data flow, similarly, it should also be able to be normally routed to the real destination host. Therefore, the design of the CXL switch must comply with these regulations of the CXL protocol. Compared with the ordinary switch, the CXL switch is more complicated and more expensive.

In summary, the CXL2.0 solution shown in Figure 1 has the following disadvantages:

(1) A specially designed CXL switch must be used as an intermediate interconnection device to interconnect hosts and devices. This specially designed CXL switch will undoubtedly increase the delay of interconnection, and to some extent limit the topology of the entire interconnection, increasing The complexity and additional cost of the interconnection design.

(2) Node devices still need to be designed with special functions, such as type 3 devices that support multi logical device (multi logical device) functions.

(3) There are relatively strict requirements on the initialization process of the entire interconnection topology, requiring the bus manager to participate in the initialization interconnection in advance (such as initializing the CXL switch, initializing type 3 devices, etc.), requiring the device to recognize these special management messages, and the process is complex and cumbersome .

Therefore, in order to solve the problem that the current bus interconnection technology does not meet the actual needs, the actual technical problems to be solved in this application include the following aspects: breaking the limitation of the existing PCIe tree structure from top to bottom, based on the existing conventional switch (switch), to realize the interconnection topology of the graph structure. For a node in the structure, there can be multiple paths and can be accessed by multiple hosts, so that the interconnection bus protocol can natively support multiple paths and multiple hosts at low cost.

Please refer to FIG. 2 . FIG. 2 is a schematic structural diagram of a bus system provided by an embodiment of the present application. The technical solutions of the embodiments of the present application may be specifically implemented in the system structure shown in FIG. 2 or a similar system structure. As shown in Figure 2, the bus system 10 may include a plurality of master nodes, a plurality of switches and a plurality of slave nodes, specifically may include a master node 100a, a master node 100b and a master node 100c, etc., a switch 300a, a switch 300b and a switch 300c etc., and slave node 200a, slave node 200b, and slave node 200c. Wherein, the plurality of master nodes, the plurality of switches and the plurality of slave nodes can be connected by a bus (for example, a network on chip, or any other possible bus, such as an amba bus, etc.) to form a graph structure, or It is said that the topology structure of the computer network composed of multiple master nodes, multiple switches and multiple slave nodes is a graph structure rather than a conventional tree structure. It should be noted that any master node, switch or slave node can be used as a node in the graph structure, and in the graph structure, any two nodes may be related, for example, the master node 100a can be connected to the switch 300a respectively , the switch 300b is adjacent (or directly connected, that is, the physical link between the master node 100a and the switch 300a has no other equipment), the switch 300a, the switch 300b and the switch 300c can be adjacent to each other, and the switch 300a can also be connected to the slave node 200a , adjacent to the slave node 200b, and even, when the slave node 200a has multiple ports, the slave node 200a can also be adjacent to the master node 100a, master node 100c, etc., etc., and the embodiment of the present application does not specifically limit this. It should be understood that the purpose of this embodiment of the application is to take into account the popularity of AI, autonomous driving and other application scenarios with increasingly higher computing requirements, boldly break through the limitations of the traditional tree structure in the existing PCIe bus interconnection topology, and adopt a graph structure, The connection between the master node, switch and slave nodes is more arbitrary, and the entire topology is more scalable, so that a large number of proprietary computing devices (such as GPUs and TPUs, etc.) can be continuously added to the bus as the graph structure of slave nodes. And any master node in the graph structure can access and use any slave node in the graph structure through multiple paths formed by multiple switch connections, maximizing the unlimited use of various computing resources and data resources by the master node .

To sum up, the master node 100a, the master node 100b, and the master node 100c may include one or more CPUs, and optionally, may also include a main memory, a cache memory (such as a cache), an internal interconnection bus, an IO interface, etc. etc., this embodiment of the present application does not specifically limit it. Optionally, the master node can also be considered as a computing system with the above-mentioned components; the switch 300a, the switch 300b, and the switch 300c can include multiple ports, and in other possible In the embodiment, switch can also have corresponding congestion control and service quality (quality of service, QOS) function; From node 200a, from node 200b and from node 200c etc. can be general-purpose GPU, TPU, certain processor unit XPU It can also be a storage device such as a solid state drive (solid state drives, SSD), or an accelerator with a specific computing function, a smart network card, or even a switch (such as a network switch), etc., and the implementation of this application The example does not specifically limit this.

Further, please refer to FIG. 3 . FIG. 3 is a schematic structural diagram of a multi-path bus subsystem provided by an embodiment of the present application. The bus system 10 of the above graph structure may include one or more multi-path bus subsystems, and any one of the multi-path bus subsystems may include a master node, a slave node and a switch. For example, as shown in Figure 3, the multipath bus subsystem 10a may include a master node 100a (i.e. the first master node), a slave node 200a (i.e. the first slave node) and N switches, where N may be greater than or equal to 1 integer. Wherein, the N switches may include N1 first switches adjacent to the first master node 100a (such as the first switch 11 and the first switch 12 in FIG. 3 ), and the N3 first switches adjacent to the first slave node 200a. Three switches (such as the third switch 31, the third switch 32, etc. in FIG. 3), and N2 second switches (such as the second switch 21, the second switch 22, the second switch 23, etc. in FIG. 3).

Wherein, any one of the first switches may be adjacent to any one of the third switches, or may be connected through one or more second switches among the N2 second switches. For example, the first switch 11 can be adjacent to the third switch 31; for another example, the first switch 11 can be connected to the third switch 31 through the second switch 21, obviously, at this time, the second switch 21 is connected to the first switch 11 and the third switch 31 respectively. The third switch 31 is adjacent; also for example, the first switch 11 can pass through the second switch 22 and the second switch 23 in turn, thereby being connected with the third switch 31, obviously, at this moment, the second switch 22 is respectively connected with the first switch 11 and the second switch 23. The second switch 23 is adjacent, and the second switch 23 is also adjacent to the third switch 31, and so on, which is not specifically limited in this embodiment of the present application. Wherein, N1, N2, and N3 may all be positive integers less than or equal to N.

Specifically, the master node 100a may send multiple reports to the slave node 200a, and based on each report message from the master node 100a to the switch passed by the slave node 200a, determine the distance between the master node 100a and the slave node 200a. multiple routing paths. Wherein, each routing path may pass through at least one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches in sequence. In this way, multi-path access to a slave node is realized, or a master node can access the same slave node through multiple routing paths. Wherein, S is a natural number less than or equal to N3, that is, S may be equal to 0, and at this time, the routing path between the master node 100a and the slave node 200a may only pass through the first switch and the third switch. In some possible embodiments, the routing path between the master node 100a and the slave node 200a may only pass through one or more of the N1 first switches, or only through one or more of the N3 third switches , etc., which are not specifically limited in this embodiment of the present application.

It should be noted that the embodiment of the present application aims to implement multipath access to the slave nodes based on the graph structure, and does not specifically limit the specific connections between the master nodes, slave nodes, and switches. The technical solutions provided by the embodiments of the present application will be described in detail below through examples of several possible connection situations. The connection situations in this application may include but not limited to the following examples.

Optionally, please refer to FIG. 4a , which is a schematic structural diagram of another multipath bus subsystem provided in the present application example. As shown in FIG. 4 a , the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 , a first switch 12 , a second switch 21 , a third switch 31 and a third switch 32 . Wherein, the first switch 11 and the first switch 12 are adjacent to the master node 100a, the third switch 31 and the third switch 32 are adjacent to the slave node 200a, the first switch 11 and the third switch 32 are connected through the second switch 21, the first The switch 11 is also adjacent to the third switch 31, and the third switch 31 is also adjacent to the master node 100a.

Based on this, the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending enumeration reports from its own port through the enumeration software, and the subsequent master node 100a can Any one of the routing paths accesses the slave node 200a and uses computing resources therein or manages its functional configuration, and so on. As shown in Figure 4a, the multiple routing paths may include: (1) master node 100a→first switch 11→second switch 21→third switch 32→slave node 200a; (2) master node 100a→first switch 11→the third switch 31→slave node 200a; (3) master node 100a→the third switch 31→slave node 200a, which can be the above at this moment; (4) master node 100a→the first switch 12→the third switch 32 → Slave node 200a.

Optionally, please refer to FIG. 4b. FIG. 4b is a schematic structural diagram of another multipath bus subsystem provided in this application example. As shown in FIG. 4 a , the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 , a first switch 12 , a third switch 31 and a third switch 32 . Wherein, the first switch 11 and the first switch 12 are adjacent to the master node 100a, the third switch 31 and the third switch 32 are adjacent to the slave node 200a, the first switch 11 is also adjacent to the first switch 12 and the third switch respectively, and the third switch 31 and the third switch 32 are adjacent to the slave node 200a. The three switches 32 are also adjacent to the first switch 12 and the third switch 31 respectively.

Based on this, the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a → first exchange 11 → third exchange 31 → slave node 200a; (2) master node 100a → first exchange 11 → third exchange 31 → third exchange 32 → slave node 200a; (3) master node 100a → first Switch 11→the first switch 12→the third switch 32→slave node 200a; (4) master node 100a→the first switch 11→the first switch 12→the third switch 32→the third switch 31→slave node 200a; (5 ) master node 100a→first switch 12→third switch 32→slave node 200a; (6) master node 100a→first switch 12→third switch 32→third switch 31→slave node 200a; (7) master node 100a→first switch 12→first switch 11→third switch 31→slave node 200a; (8) master node 100a→first switch 12→first switch 11→third switch 31→third switch 32→slave node 200a.

Optionally, please refer to FIG. 4c, which is a schematic structural diagram of another multi-path bus subsystem provided in this application example. As shown in FIG. 4 a , the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 and a first switch 12 . Wherein, the first switch 11 and the first switch 12 are adjacent to the master node 100 a and the slave node 200 a respectively, and the first switch 11 is also adjacent to the first switch 12 .

Based on this, the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a → first switch 11 → slave node 200a; (2) master node 100a → first switch 11 → first switch 12 → slave node 200a; (3) master node 100a → first switch 12 → slave node 200a; (4) Master node 100a→first switch 12→first switch 11→slave node 200a.

Optionally, please refer to FIG. 4d. FIG. 4d is a schematic structural diagram of another multipath bus subsystem provided in the present application example. As shown in FIG. 4 a , the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, and a first switch 11 . Wherein, the first switch 11 is respectively adjacent to the master node 100a and the slave node 200a, and the slave node 200a is also directly adjacent to the master node 100a.

Based on this, the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a → first switch 11 → slave node 200a; (2) master node 100a → slave node 200a.

In summary, it needs to be explained that the "first", "second" and "third" in the description of the switch in the embodiment of the present application do not refer to a certain switch, but are used to describe the difference between the switch and the master node and the slave node. Connection status. For example, as shown in Figure 4a, the third switch 31 is respectively adjacent to the master node 100a and the slave node 200a, based on the foregoing discussion, it can also be referred to as the first switch; for another example, as shown in Figure 4c, the first switch 11. The first switch 12 is adjacent to the master node 100a and the slave node 200a respectively, so based on the foregoing discussion, it can also be called the third switch, and so on. In general, because the ports of the switch are many, based on the graph structure, its adjacency relationship can be relatively arbitrary and complicated. For example, other ports of the second switch 21 in FIG. The slave node 200a and the like are adjacent, etc., which are not specifically limited in this embodiment of the present application.

Further, based on the above descriptions of various possible multipath bus subsystems, the enumeration discovery process under multipath will be described in detail below.

Optionally, the master node 100a can query the visible bit (visited bit) in the routing status register of the slave node 200a based on the enumeration message sent to the slave node 200a, if the visible bit is 0, the master node 100a A corresponding device number can be assigned to the slave node 200a; wherein, the visible bit is 0, which can be used to indicate that the slave node 200a is not currently enumerated and found.

Correspondingly, the slave node 200a can save the device number assigned by the master node 100a into the routing status register, and set the visible bit to 1; wherein, the visible bit being 1 can be used to indicate that the slave node 200a has been Enumeration found. It is equivalent to that when the master node 100a enumerates to the slave node 200a via a certain routing path for the first time, it can set its visible bit that was originally 0 to 1, and assign a device number to it, so that the master node 100a is in After sending an enumeration report message to the slave node 200a via other routing paths, it can be determined that the slave node 200a has been enumerated and discovered according to its visible bit being 1, so that there is no need to assign a device number to it again, only need to record This time, the new routing path through which the report message is sent is sufficient. In this way, the enumeration under the multi-path can be realized in a form of recording the historical discovery state, avoiding repeated enumeration and infinite loop under the multi-path, so as to complete the enumeration efficiently and accurately.

Optionally, corresponding enumeration software may be installed and run on the master node 100a, and the enumeration software may be used to enumerate and discover devices in the entire interconnection bus topology. In addition, each slave node in the embodiment of the present application has its configuration space like a conventional PCIe device, allowing software to access its configuration space to obtain device-related declaration information and manage the device, and so on. For example, please refer to FIG. 5 , which is a schematic structural diagram of another bus system provided by an embodiment of the present application. As shown in Figure 5, the bus system includes a master node 100a, a switch A, a switch B, a device C and a device D, wherein the device C and the device D can be the above-mentioned slave nodes 200a, 200b or 200c, etc., specifically, it can be GPU, TPU, SSD, accelerator, etc. As shown in Figure 5, the master node 100a includes two ports, port A1 is connected to switch A, port A2 is connected to switch B, switch A is connected to switch B and device C respectively, and switch B is also connected to device C and device D respectively . Apparently, the bus system shown in FIG. 5 may include two multipath bus subsystems, which are respectively the multipath bus subsystem including master node 100a, switch A, switch B and device C, and the multipath bus subsystem including master node 100a, switch A , the multipath bus subsystem of switch B and device D, which will not be described in detail here.

Taking Figure 5 as an example below, the enumeration discovery process under multipath is further elaborated in detail. The enumeration discovery process may specifically include the following steps:

Step1, the enumeration software running on the master node 100a can start from the port A1 of the master node 100a, and search according to the breadth-first search algorithm of the graph. Optionally, the enumeration software can obtain whether the physical link corresponding to the current port A1 has been established and whether data transmission can be performed. If the enumeration software determines that the physical link corresponding to port A1 has been established and data transmission can be performed, an enumeration report (or called enumeration access) can be sent from port A1 to the physical link directly connected to port A1. The device is switch A in Figure 5.

Optionally, before sending the enumeration report message, the basic input/output system (basic input/output system, BIOS) can report the local bus interface situation to the enumeration software in the master node 100a through a specific interface based on the agreed description, For example, how many local bus ports are there, corresponding access addresses and other information, such as how many ports are there in switch A, switch B ports, device C, and device D in FIG. 5 .

Step 2. Switch A receives the enumeration report message sent by the master node 100a, and returns a response to the enumeration report message. Based on the response, the enumeration software can determine that switch A is a legally existing device and is a switch device.

Step3, the enumeration software performs the management host signature (ie, the management master node signature) on switch A according to the management host signature process (ie, the management master node signature process). Optionally, as shown in FIG. 5 , the signature process may include setting the signature bit in the management master node information register (ie, the management host information register) in the configuration space A of the switch A to 1, and setting the master node 100a The master node number (that is, the management host number (component identity document, CID) in Figure 5) and the master node password at the time of signing, that is, the host password (host key) are written into the management master node information register, thereby completing the master node The node 100a obtains the management authority of the switch A, that is, determines that the master node 100a becomes the management master node of the switch A. Optionally, for step3, reference may be made to the description in the following embodiment corresponding to FIG. 6 , and details are not repeated here.

Step4, the enumeration software continues to send an enumeration report message through port A1 to read the routing status register in the configuration space A of switch A, and check whether the visible bit of the routing status register of switch A is 0.

Step5, the switch A returns that the visible bit of its routing status register is 0, and then the enumeration software determines that the switch A has not been found by enumeration. As shown in Figure 5, at this time, the enumeration software can assign the device number to switch A as cid1, and switch A can directly write the device number (i.e. cid1) into the corresponding CID value bit field of the routing status register, and at the same time, Visible bits that were originally 0 may also be set to 1.

Step6, as mentioned above, since the enumeration software can know that switch A is a switch device, the enumeration software can also read the relevant port status register of switch A, so as to know which ports in switch A have been physically linked.

Step7, the enumeration software further sends an enumeration report message to device C from a physical link-building port of switch A, and confirms that device C is a legally existing device through the response returned by device C.

Step8, enumerate the path of the software through port A1→switch A→device C, refer to the process of step 3~step 5, sign the management master node of device C, and assign a device number that is not assigned to other devices to device C ( In Figure 5, cid2 is taken as an example).

Step9, the enumeration software passes through the path of port A1→switch A→switch B, refer to the process of step 7~step 8, complete the enumeration and discovery of switch B, sign the management master node of switch B, and assign corresponding device number (cid3 is taken as an example in Figure 5).

Step10, for switch B, which is also a switch device, the enumeration software refers to the process of step 6~step 8, and accesses device C through the path of port A1→switch A→switch B→device C. At this time, the enumeration software discovers device C The visible bit of the routing status register has been set (that is, the visible bit has been set to 1), and it is found that the signature register of the management master node of device C has completed the signature (that is, the signature bit has been set to 1), so the enumeration The software no longer assigns device numbers to device C, but only records the new routing path between the master node 100a and device C this time.

Step11, referring to the process of step 7~step 8, the enumeration software completes the enumeration and discovery of device D through the path of port A1→switch A→switch B→device D, signs the management master node of device D, and registers for the device D assigns the corresponding device number (cid4 is taken as an example in Figure 5).

step12, the enumeration software starts from port A2 of the master node 100a, through port A2→switch B, port A2→switch B→device C, port A2→switch B→device D, port A2→switch B→switch A, port A2 →Switch B→Switch A→Device C enumerates and discovers the paths to Switch B, Device C, Device D, Switch A, and Device C respectively. Obviously, based on the above steps, the enumeration software has already completed the enumeration and discovery of these devices through port A1. Therefore, no further enumeration and discovery of these devices will be performed during this enumeration discovery process, that is, no further enumeration and discovery actions will be performed on these devices. The equipment number is assigned once, and only the new routing path between the master node 100a and the switch A, switch B, device C, and device D is recorded.

step13, so far, the master node 100a has completed the discovery and enumeration of all devices and topologies in the bus system shown in FIG. 5 through corresponding enumeration software.

As mentioned above, when the master node enumerates and discovers a certain device (which may include a switch and a slave node), it can sign the management host to obtain the management authority of the device. Among them, the management authority includes but is not limited to arbitrating resource competition among devices, handling device exceptions, managing basic characteristics of devices (such as the maximum supported packet length), and managing device functions , for example, whether it is allowed to use a certain function (such as a function related to a physical link), and so on, which is not specifically limited in this embodiment of the present application.

Optionally, please refer to FIG. 6 . FIG. 6 is a schematic diagram of a management host signature flow provided by an embodiment of the present application. As shown in FIG. 6, the management host signing process may include the following steps.

S11, the master node sends a configuration message to the device, trying to obtain the management authority of the device. Wherein, the master node is, for example, the above-mentioned master node 100a, and the device is, for example, the above-mentioned slave node 200a, or the switch A, device C, etc. in FIG. 5 . Wherein, the configuration message (for example, the first configuration message) is intended to be used to set the signature bit in the management master node information register of the configuration space to 1, and the configuration message can also carry the master node of the master node number and masternode password. Among them, the signature bit is 1 to indicate that the device has completed the signature at present, that is, it already has a management master node, and other master nodes can no longer obtain the management authority of the device. In this way, it can ensure that the device has a unique Manage master nodes to ensure clear manageability of the entire topology. It should be noted that one master node can obtain management rights to multiple devices. In addition, when the master node obtains management rights to different devices, the master node passwords carried in the configuration messages sent can be different.

S12. The device receives the configuration message sent by the master node, and determines that the configuration message is used to write the signature bit in its management master node information register to be 1.

S13, the device inquires whether the signature bit in the management master node information register is 0, if yes, execute S14, otherwise execute S15. Among them, the signature bit is 0 to indicate that the device has not completed the signature currently, that is, it does not have a management master node, and the master nodes in the system have the opportunity to obtain the management authority of the device.

S14. The device sets the signature bit in the management master node information register to 1, and saves the master node password and master node number carried in the configuration message to the management master node information register. So far, the master node has completed the authentication of the device. Manage the signature of the master node and obtain the management authority of the device. Optionally, the device can guarantee that the masternode password saved in this signature will not be read and modified by other non-verified masternodes. Optionally, all subsequent operations of the device by the management master node of the device need to first verify whether the master node password and master node number carried by the operation are consistent with those saved by the device. The management master node of the device may not respond, thus ensuring the safety of device management.

S15, the device verifies the master node sending the configuration message, and judges whether it is the management master node of the device, if yes, executes S16, otherwise executes S17. Among them, if the ID information of the requester (or source) corresponding to the current access (that is, the master node number carried by the configuration message) is consistent with the master node number saved in the management master node information register of the device, and the configuration report The master node password carried in the file is consistent with the master node password saved in the management master node information register. Then it can be determined that the master node sending the configuration message this time is the management master node of the device. Correspondingly, if the number of the master node is inconsistent, the password of the master node is inconsistent, or any one or more of the passwords of the master node are not carried, it can be determined that the master node sending the configuration message this time is not the management master node of the device.

S16. The device does not modify the signature bit in the information register of the management master node, that is, keeps the signature bit as 1, and returns a configuration success response.

S17, the device does not modify the signature bit in the management master node information register, that is, keeps the signature bit as 1, and returns a configuration failure response. Optionally, the device may also directly discard the configuration message without responding.

To sum up, the embodiment of this application defines the mechanism that whoever signs successfully first will become the management master node, and the system may include multiple master nodes (such as master node 100a, master node 100b, and master node 100c shown in Figure 2) In the case of the device, the management of the device is clearer. At the same time, the device can ensure that the master node password configured when the management master node signs is not read and modified by other non-management master nodes, and the device that completes the management master node signature will manage every subsequent management operation of the management master node The master node verification ensures the security of device management to a great extent.

Optionally, after the master node obtains the management right to the device, it can actively cancel the management right to the device. For example, the master node may receive requests from other master nodes to obtain management rights to the device, or when the master node detects that it has an abnormal fault, in order to ensure that the device can have a new normal work in the future To ensure the management efficiency of the device, etc., the management master node can actively cancel the management authority of the device. Please refer to FIG. 7 . FIG. 7 is a schematic flowchart of a host canceling management rights provided by an embodiment of the present application. As shown in FIG. 7 , the process of canceling the management authority may include the following steps.

S21, the master node sends a configuration message to the device, trying to cancel the management authority of the device. Wherein, the master node is, for example, the above-mentioned master node 100a, and the device is, for example, the above-mentioned slave node 200a, or the switch A, device C, etc. in FIG. 5 . Wherein, the configuration message (for example, the second configuration message) is intended to be used to set the signature bit in the management master node information register of the configuration space to 0, and the configuration message can also carry the master node of the master node number and masternode password.

S22. The device receives the configuration message sent by the master node, and determines that the configuration message is used to write the signature bit in its management master node information register to 0.

S23, the device verifies the master node sending the configuration message, and judges whether it is the management master node of the device, if yes, executes S24, otherwise executes S25.

S24. The device sets the signature bit in the information register of the management master node to 0, and returns a response that the configuration is successful. It should be understood that if the master node is the management master node of the device, it obviously indicates that the device already has a management master node, and the signature bit in the management master node information register of the device is generally 1.

S25. The device does not modify the signature bits in the information register of the management master node, and returns a configuration failure response. Optionally, the device may also directly discard the configuration message without responding. It should be understood that if the master node is not the master management node of the device, at this time, the signature bit in the master management node information register of the device may be 1 or 0.

It should be noted that, the embodiment of the present application does not specifically limit the execution sequence of the steps shown in FIG. 6 and FIG. 7 .

Furthermore, after the master node obtains the management authority for the device, in order to ensure the real-time presence of the master management node and the robustness of the entire system (or bus topology), the embodiment of this application also defines the presence of the master management node. bit confirmation mechanism.

Optionally, please refer to FIG. 8a. FIG. 8a is a schematic diagram of a management host presence confirmation process provided by an embodiment of the present application. The presence confirmation process of the management host can be applied to the above-mentioned system structure shown in Fig. 2-Fig. 5, the master node involved in Fig. It is the above-mentioned slave node 200a (that is, the first slave node), or the switch A, the switch B, the device C, or the device D in FIG. 5 .

As shown in Figure 8a, a timer 1 can be configured inside the device. When a master node obtains the management authority of the device (for example, when the signature bit of the device is set to 1), the timer 1 can start counting . As shown in Figure 8a, when the value of timer 1 is equal to X (such as 200ms, 3s, 5s or 8s, etc.), the device can send a query message to its management master node to try to obtain the presence status of its management master node, namely The device may actively send a query message to its management master node according to a preset time interval (or a preset frequency), and correspondingly, its management master node receives the query message. Optionally, when the device sends the query message, the timer 2 inside the device can start counting. If the value of the timer 2 is equal to Y (for example, 300ms, 1s, 5s or 7s, etc.), the device still does not receive the query message. It manages the in-position message sent by the master node, and a timeout can be counted through the counter in the device. In this way, as shown in Figure 8a, when the number of timeouts reaches K times (K is an integer greater than or equal to 1, such as 3 times, 5 times or 7 times, etc.), that is, the device has sent K times to its management master node. Inquiry message, and does not receive the presence message sent by its management master node every time, the device can consider its management master node to be abnormal and not in place, and the device can reset the signature bit in its management master node information register Set it to 0. Further, it can also clear the master node number and master node password saved in its management master node register, thereby canceling the management authority of its original management master node. The state of the master node. Correspondingly, as shown in Figure 8a, if the device receives the presence message sent by its management master node during this period, it can be determined that the management master node is normally in place, and at the same time, the device can clear the timer 2 and the current value of the counter , that is, the timer 2 and the counter are cleared (or reset).

Optionally, please refer to FIG. 8b. FIG. 8b is a schematic diagram of another management host presence confirmation process provided by an embodiment of the present application. The presence confirmation process of the management host can be applied to the above-mentioned system structure shown in Fig. 2-Fig. 5, the master node involved in Fig. It is the above-mentioned slave node 200a (that is, the first slave node), or the switch A, the switch B, the device C, or the device D in FIG. 5 .

As shown in FIG. 8b, a timer 3 may be configured inside the master node, and when the master node obtains the management authority of the device, the timer 3 may start timing. As shown in Figure 8b, when the value of the timer 3 is equal to W (such as 200ms, 3s or 10s, etc.), the master node can send an in-position message to the device it manages, that is, the management master node can follow the preset time interval ( For example, the first time interval) actively sends presence messages to the devices it manages. Correspondingly, the device can receive the in-position message sent by the master node, and judge whether the message comes from its master management node. If so, the device can determine that its master management node is in place normally. If not, it can discard the message. No response is required.

Optionally, a corresponding timer can also be configured in the device, and the timer can also start counting when the master node obtains the management authority of the device. When the timer in the device is equal to a preset time (for example, W), and the device still has not received the in-position message actively sent by its management master node, the device can consider that its management master node is abnormal and not in position, and the device can reset the signature bit in its management master node information register If it is set to 0, furthermore, the master node number and master node password stored in its management master node register can be cleared, thereby canceling the management authority of its original management master node. Correspondingly, if the device receives the presence message sent by its master management node during this period, it can be determined that the master management node is normally in position, and at the same time, the device can clear the timer maintained locally.

Optionally, the device can also be configured with corresponding timers and counters. Similarly, the timer can start counting when the master node obtains the management authority of the device. When the timer in the device is equal to the preset time (for example, W), and the device still has not received the in-position message actively sent by its management master node, it can count a timeout through the counter in the device. When the number of timeouts reaches the preset value (for example, 5 times, 7 times, or 10 times, etc.), that is, when the management master node of the device has not actively sent an in-position message for a long time, the device can consider that its management master node is abnormal and has not In position, and the device can reset the signature bit in its management master node information register to 0, further, it can also clear the master node number and master node password saved in its management master node register, thereby canceling its original management master node Administrative permissions for the node. Correspondingly, if the device successfully receives the presence message actively sent by its management master node during this period, it can be determined that the management master node is normally in place, and at the same time, the device can also reset the locally maintained timer and counter.

Optionally, the values of X, Y, K, and W mentioned above can all be configured by the management master node, and more appropriate values can be selected according to actual needs.

Further, as mentioned above, in order to maintain the robustness of the entire bus topology, in order to maintain the robustness of the entire bus topology, ensure The device has a normally on-site management master node, and the device can also send a broadcast message to at least one master node in the system (such as the master node 100a, the master node 100b, and the master node 100c in the bus system 10 shown in FIG. 2 ). Wherein, the broadcast message may be used to indicate that the device currently does not have a master management node. Optionally, the at least one master node may also include its original management master node. Correspondingly, the at least one master node receives the broadcast message, and can sign the management master node of the device based on the broadcast message, and try to obtain the management authority of the device, so as to become a new management master node of the device. It should be understood that, according to the above discussion, the at least one master node that completes the signature first can obtain the management authority of the device. The specific signature process can refer to the description of the corresponding embodiment in FIG. 6 above, and will not be repeated here. .

It should be noted that, in general, the frequency of interaction between the management master node and its device presence confirmation shown in Figure 8a and Figure 8b above can be very low, so the impact on the bus bandwidth is almost negligible. Under the premise of not affecting the data transmission and computing efficiency of the entire bus, ensure that the management master node is in place in real time, so as to ensure the robustness of the bus topology.

Optionally, please refer to FIG. 9 . FIG. 9 is a schematic structural diagram of another multipath bus subsystem provided by an embodiment of the present application. As shown in FIG. 9, the multipath bus subsystem 10a may also include a master node 100b (ie, a second master node), and N4 fourth switches adjacent to the master node 100b (for example, the fourth switch 41 in FIG. 9 , the fourth switch 42, etc.). Wherein, any fourth switch may be adjacent to any third switch, or connected through one or more second switches; wherein, N4 is a positive integer less than or equal to N. For example, the fourth switch 41 may be adjacent to the third switch 31; for another example, the fourth switch 41 may be connected to the third switch 31 through the second switch 22, and so on, which is not specifically limited in this embodiment of the present application.

Correspondingly, the master node 100b may also send multiple report messages to the slave node 200a, and determine the relationship between the master node 100b and the slave node 200a based on each switch passed by the master node 100b to the slave node 200a. multiple routing paths between them. Wherein, each routing path may pass through at least one or more of the N4 fourth switches, S of the N2 second switches, and one or more of the N3 third switches in sequence. In this way, multi-path access from multiple master nodes to a slave node is further realized, that is, a slave node can be enumerated and discovered by multiple master nodes, and subsequently can be accessed by the multiple master nodes based on their respective multiple routing paths And use the various resources in it, and so on. Correspondingly, the routing path between the master node 100b and the slave node 200a may only pass through the fourth switch and the third switch, and in some possible embodiments, may also only pass through one or more of the N4 fourth switches, Or only through one or more of the N3 third switches, etc., which is not specifically limited in this embodiment of the present application.

Optionally, for the process of enumerating and discovering the slave node 200a by the master node 100b through multiple routing paths, reference may be made to the corresponding description in FIG. 5 above, and details are not repeated here.

Optionally, the master node 100a and the slave node 200a may belong to a first node domain, and the master node 100b may belong to a second node domain. Wherein, the second node domain further includes one or more second slave nodes (such as the slave node 200b in FIG. 2 , etc.). Optionally, the first node domain may also include multiple switches connected to the master node 100a (such as the N1 first switches, etc.), and the second node domain may also include multiple switches connected to the master node 100b. Switches (for example, the N4 fourth switches, etc.), are not specifically limited in this embodiment of the present application. It should be noted that different node domains may belong to different sub-networks (sub-network), or the system management software of different node domains may be different (for example, the enumeration software mentioned above is different, the operating system (operating system, OS) is different, etc.).

Optionally, after the master node 100b assigns device numbers to the devices in the second node domain to which it belongs (for example, it may include slave nodes and switches), that is, the master node 100b completes the enumeration in the second node domain where it is located After discovery, an enumeration report message is sent to devices in other node domains (for example, the slave node 200a), thereby completing the enumeration discovery of multi-master nodes across node domains.

As mentioned above, in the case where there are multiple node domains and corresponding multiple master nodes in the bus system, the enumeration and discovery process of the master node starts from the node domain to which it currently belongs. Before the enumeration of devices in the domain is discovered, the current node domain may not perform data flow connection with any device in other node domains. In this way, device enumeration and discovery of multiple master nodes under cross-node domains (that is, cross-network or different system management software) can be supported, and it is guaranteed that different enumeration software will not assign multiple device numbers to the same device. Or a conflict scenario where the same device number is assigned to different devices.

Further, based on the above-mentioned concept that different master nodes may belong to different node domains, the aforementioned device (such as the slave node 200a) cancels the management authority of the original management master node (such as the master node 100a), so as to at least one master node in the system When sending a broadcast message, you can first send a first-level broadcast message to the master node in the node domain to which it belongs (for example, the master node 100a). Node 100b and master node 100c) send secondary broadcast messages (ie, broadcast messages across node domains), and so on.

Optionally, please refer to FIG. 10a-FIG. 10c. FIG. 10a-FIG. 10c are schematic structural diagrams of a group of multi-path multi-master bus systems provided by an embodiment of the present application.

As shown in Figure 10a, the bus system may include multiple hosts (such as master node 100a, master node 100b), multiple switches (such as switch 1, switch 2, switch 3, etc. in Figure 10a) and multiple slave nodes ( For example, accelerators, SSDs, smart network cards, GPUs and TPUs in Figure 10a, etc.). Wherein, the master node 100a may include a port A1 and a port A2, and the master node 100b may include a port B1 and a port B2. Obviously, as shown in Figure 10a, compared with the traditional tree structure, the embodiment of the present application implements a flattened graph structure, and multiple switches can be arranged in a matrix and connected vertically and horizontally through the bus. Obviously, this structure can support multi-path and multi-master access to any slave node. The content involved in the access can include, for example, initial enumeration discovery, subsequent resource usage, and function management.

Optionally, the master node 100a and the switch 1, switch 2, switch 3, and switch 4 connected to the master node 100a, and the XPU, accelerator, SSD, and smart network card connected to the switch 1, switch 3, and switch 4 respectively may belong to the first A node domain; master node 100b and the switch 5, switch 6, switch 7, and switch 8 connected under the master node 100b, and the TPU, SSD, GPU, and accelerator connected to the switch 6, switch 7, and switch 8 respectively can belong to the second node domain.

Optionally, in this embodiment of the present application, the enumeration process of multiple master nodes (that is, multi-host) does not interfere with each other, and ensures that the master node does not communicate with any device in other node domains before completing the enumeration discovery in the domain of the node. Data stream connection, that is, not to enumerate and discover devices in other node domains, and at the same time ensure that devices in this node domain will not be discovered by master nodes in other node domains, as shown in Figure 10a, the embodiment of this application also provides a device with A switch with a special port is used for adjacency with switches that also have special ports in other node domains. For example, for switch 2, switch 4, switch 5, and switch 7 in FIG. 10a, the special ports are, for example, ports marked in black in switch 2, switch 4, switch 5, and switch 7. As shown in Figure 10a, under the default setting (or when the master node has not completed the enumeration discovery in the domain of the node), when the system enumeration software scans this type of port, it will no longer continue to discover enumeration based on this port . For example, when the master node 100a enumerates from port A2 to the special port marked in black in switch 2, it no longer enumerates and discovers the device (ie switch 5) connected under the port, but continues to search through other ports. Devices in the first node domain perform enumeration discovery; for another example, when the master node 100b enumerates from the port B1 to the special port marked in black in the switch 5, the device connected under the port (that is, the switch 2) is no longer connected. enumeration discovery, but continue to perform enumeration discovery on devices in the second node domain through other ports. Correspondingly, after the master node 100a and the master node 100b complete the enumeration and discovery of all devices in their respective node domains (for example, all devices in their respective node domains have been assigned corresponding device numbers), the special ports of switch 2 and switch 5 If it can be "opened", the master node 100a and the master node 100b can perform cross-node domain enumeration and discovery through this special port.

For example, please refer to FIG. 11a-FIG. 11b. FIG. 11a-FIG. 11b are schematic diagrams of an enumeration process under a group of multi-path multi-hosts provided by the embodiment of the present application. As shown in FIG. 11a, the master node 100a and the master node 100b have different operating systems, namely OS 1 and OS 2, respectively. Among them, the master node 100a includes CPU-A0 and CPU-A1, and the corresponding port A1, port A2, port A3 and port A4; the master node 100b includes CPU-B0 and CPU-B1, and the corresponding port B1, port B2, port B3 and port B4.

As shown in Figure 11a, at this time both the master node 100a and the master node 100b have performed part of the enumeration and found that there are some devices (such as GPUs and SSDs in the first node domain, and accelerators in the second node domain) that have not been allocated. device ID. Among them, the first node domain where the master node 100a is located includes a switch E with a device number of cid4, and its port A is a special port across node domains. The enumeration software (such as OS1) of the master node 100a finds the switch E through enumeration After the port A of the port A, the topology behind this port A is no longer discovered and enumerated because the enumeration of other ports that have been physically linked and that are not cross-node domains has not been enumerated. Similarly, the second node domain where the master node 100b is located includes a switch F with a device number of cid2, and its port B is a special port across the node domain. The enumeration software (such as OS2) of the master node 100b finds the switch through enumeration After port B of F, because the enumeration of other ports with good physical links and non-cross-node domains has not been enumerated, the topology behind this port B is no longer discovered and enumerated.

As shown in FIG. 11 b , at this time, the master node 100 a and the master node 100 b have respectively completed enumeration and discovery of the bus topology and all devices in their respective node domains, and all devices have been assigned device numbers. Therefore, the system software of the master node 100a and the master node 100b can respectively turn on the data flow switch of the special port A of the switch E and the special port B of the switch F, so as to enumerate and discover devices in other node domains outside the domain of this node. It should be noted that at this time, the devices in each node domain have completed the signature of the management master node, and it can be known that the master node in the node domain has obtained the management master node authority of the devices in the node domain, for example, the master Node 100a is a master node for managing TPUs, GPUs and SSDs in the first node domain; master node 100b is a master node for managing GPUs and accelerators in the second node domain. Therefore, the master node 100a only has the right to use the data plane for the devices in the domain of the second node, and has no right to manage master nodes on the management plane. Correspondingly, the master node 100b only has the right to use the data plane for the devices in the domain of the first node , does not have the permission to manage the master node on the management plane.

Optionally, after the system software in each node domain has completed the bus topology discovery and enumeration of all devices in the node domain, the master node 100a can also directly discover the enumeration of the devices in the second node domain through The system software (such as OS2) in the domain of the second node reports to the system software (such as OS1) on the master node 100a based on the agreed description structure and interface, so that the system software on the master node 100a can directly obtain the information in the node domain where the master node 100b is located. topology information of all devices. Correspondingly, the master node 100b discovers and enumerates the devices under the first node domain where the master node 100a is located in the same way, and details are not repeated here.

As mentioned above, through the special port provided by the embodiment of the present application, each master node can independently perform enumeration at this port to stop discovering and enumerating the device/topology behind this port, and no longer continue to enumerate from this port. Do a breadth/depth first search to find the enumeration. Only after each master node completes the enumeration discovery in its respective node domain in its system software, does it open the data path switch of this special port in its respective domain, and enumerate across master nodes and across node domains.

As shown in Figure 10b, the bus system may also include a master node 100a, a master node 100b, a plurality of switches, and a plurality of slave nodes. to repeat.

Optionally, as shown in Figure 10b, the switch 2 under the master node 100a and the switch 5 under the master node 100b both include a port capable of accessing the Internet, or a port with a network card function (such as the gray mark in Figure 10b ports), so that the master node 100a and the master node 100b can be connected to the network through the switch 2 and the switch 5, that is, the master node 100b can be a remote device of the master node 100a. In this way, a device can not only be accessed and used by the master node connected by wire at the near end, but also can be accessed and used by other master nodes remotely connected wirelessly, so that the host can call remote computing resources and store data , to a greater extent meet the host's demand for huge computing resources when performing complex computing processing, and so on. This undoubtedly greatly enhances the scalability of the entire bus system and further breaks through the limitations of existing PCIe bus connections.

As shown in Figure 10c, the bus system may include a master node 100a, a master node 100b, a master node 100c, a plurality of switches and a plurality of slave nodes, etc., wherein the bus system may include switches with special ports or Includes switches for network connections. Optionally, for the introduction of FIG. 10c, reference may be made to the descriptions of the embodiments corresponding to FIG. 10a and FIG. 10b above, and details are not repeated here. As shown in FIG. 10c , the bus system can be a typical graph-based bus system provided by the embodiment of the present application, which can achieve greater scalability, larger capacity, and larger design space for bus interconnection to a great extent.

Optionally, the special ports shown in FIGS. 10a-10c and ports with network card functions can be directly indicated by the relevant function register group in the configuration space of the switch device (that is, the switch), so as to inform the system enumeration software.

It should be noted that FIG. 10a-FIG. 10c are only exemplary illustrations, and do not specifically limit the bus system of the multi-path multi-master node in the embodiment of the present application.

In summary, the embodiment of the present application provides a bus system based on the concept of a graph structure, which breaks the limitation of the top-down tree structure of the existing PCIe bus interconnection topology, and realizes the interconnection topology of a flattened graph structure . In this way, there is no need for additional special bus interconnection devices/devices (such as the CXL switch and type 3 device in Figure 1 above), and the complicated bus topology discovery and enumeration process, so that the interconnection bus protocol can be natively and at low cost. Supports multi-host and multipathing. Based on this, the embodiment of the present application further provides a series of management host signature process, management host presence confirmation process, multi-host enumeration process under cross-node domain, etc., so that in the case of realizing multi-host and multi-path, The clear manageability, safety, reliability, etc. of the entire multi-host and multi-path bus system (or the entire bus topology with a graph structure) are further guaranteed.

In addition, it is emphasized again that the embodiment of the present application aims to break the tree structure with strict top-down hierarchical relationship in the prior art based on the concept of graph structure, so as to realize the above-mentioned multi-host and multipathing, and the mutual relationship between each node The specific connection between them is not specifically limited. In the graph structure of the embodiment of the present application, not all nodes need to be adjacent. switch, and directly adjacent to multiple hosts, so that it is more convenient to realize multi-host and multipathing, and so on.

It should be noted that, in addition to being used for network on chip, the technical solution provided by the embodiment of the present application can also be applied to ultra-large-scale network interconnection, such as the Internet, and the interconnection between large-scale servers/computing nodes in data centers device management, etc., which are not specifically limited in this embodiment of the present application.

Please refer to FIG. 12 . FIG. 12 is a schematic flowchart of a communication method provided by an embodiment of the present application. The communication method can be applied to a bus system (such as the bus system shown in Fig. 2, Fig. 5 or Fig. 10a-Fig. 10c for example), and the bus system can include multiple master nodes, multiple switches and multiple slave nodes; the multiple A plurality of master nodes, a plurality of switches and a plurality of slave nodes can form a graph structure through the bus; wherein, any multipath bus subsystem in the graph structure (such as the multipath shown in Fig. 3, Fig. 4a-Fig. 4d etc. for example The bus subsystem 10a) may include a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and the first slave node N3 adjacent third switches; any one of the first switches is adjacent to any one of the third switches, or connected through one or more second switches; N1, N2, and N3 are all positive integers less than or equal to N. The communication method may include the following step S401.

Step S401, through the first host, send enumeration report messages to the first node device multiple times,

Step S402, based on at least one switch, determine a plurality of routing paths between the first host and the first node device; at least one switch is used for each enumeration message from the first host to the first node device switches; wherein each routing path passes through at least one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches in sequence S is a natural number less than or equal to N3.

Optionally, for the communication method, reference may be made to the descriptions of the above-mentioned embodiments corresponding to FIG. 2-FIG. 11b , and details are not repeated here.

Optionally, each method procedure in the communication method described in the embodiments of the present application may specifically be implemented in a software-based, hardware-based, or a combination thereof. Wherein, the way of implementing by hardware may include logic circuit, arithmetic circuit or analog circuit and so on. A software implementation may include program instructions, which may be regarded as a software product, which is stored in a memory and can be executed by a processor to implement related functions.

An embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium can store a program, and when the program is executed by a processor, the processor can execute any of the methods described in the above-mentioned method embodiments. Some or all of the steps of one.

The embodiment of the present application also provides a computer program, the computer program includes instructions, when the computer program is executed by a multi-core processor, the processor can perform some or all of the steps described in any one of the above method embodiments .

In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments. It should be noted that for the foregoing method embodiments, for the sake of simple description, they are expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Depending on the application, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions and modules involved are not necessarily required by this application.

In the several embodiments provided in this application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.

The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute all or part of the steps of the above-mentioned methods in various embodiments of the present application. Wherein, the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disc, read-only memory (read-only memory, ROM), double data rate synchronous dynamic random access memory (double data rate, DDR), flash memory ( flash) or random access memory (random access memory, RAM) and other media that can store program code.

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application.

Claims

A kind of bus system, it is characterized in that, described system is the graph structure that constitutes by bus by a plurality of master nodes, a plurality of switches and a plurality of slave nodes; Any one multi-path bus subsystem in the described graph structure includes the first A master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and N2 second switches adjacent to the first slave node N3 third switches; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all positive integers less than or equal to N;

The first master node is configured to send an enumeration message to the first slave node multiple times, and based on at least one switch, determine multiple routes between the first master node and the first slave node path; the at least one switch is the switch through which each enumeration report message passes from the first master node to the first slave node; wherein, each routing path in the multiple routing paths passes at least sequentially through the One or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches; S is a natural number less than or equal to N3.
The system according to claim 1, characterized in that,

The first master node is also used to query the visible bit in the routing status register of the first slave node based on the enumerated report message sent, and if the visible bit is 0, send to the The first slave node assigns a corresponding device number; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated and discovered;

The first slave node is configured to save the device number assigned by the first master node to the routing status register, and set the visible bit to 1; wherein, the visible bit is 1 for to indicate that the first slave node is currently discovered by enumeration.
The system according to any one of claims 1-2, characterized in that,

The first master node is further configured to send a first configuration message to the first slave node to obtain management authority for the first slave node; the first configuration message carries the first master The master node password and master node number of the node;

The first slave node is further configured to receive the first configuration message, and set the signature bit in the management master node information register of the first slave node to 1 based on the first configuration message; Wherein, the signature bit is 1 to indicate that the first slave node currently has a management master node, and other master nodes in the system cannot obtain management authority to the first slave node;

The first slave node is further configured to save the master node password and master node number of the first master node in the management master node information register.
The system according to claim 3, characterized in that,

The first master node is further configured to send a second configuration message to the first slave node to cancel the management authority of the first slave node; the second configuration message carries the first master The master node password and master node number of the node;

The first slave node is further configured to receive the second configuration message, if the master node password and the master node number of the first master node carried in the second configuration message are consistent with the management master node information The master node password saved in the register is consistent with the master node number, then the signature bit in the management master node information register is set to 0; wherein, the signature bit is 0 for indicating that the first slave node is currently Does not own an administrative master node.
The system according to claim 3, characterized in that,

The first master node is further configured to send an in-position message to the first slave node in response to a query message sent by the first slave node after obtaining the management authority to the first slave node; Or, send the presence message to the first slave node according to a first time interval.
The system according to claim 5, characterized in that,

The first slave node is further configured to set the signature bit in the management master node information register to 0 when a preset condition is met, so as to cancel the first master node’s registration of the first Administrative privileges for slave nodes; where,

The preset condition includes: after the first master node obtains the management authority for the first slave node, the first slave node does not receive the message sent by the first master node within a preset time. The in-position message, or the first slave node has not received the in-position message sent by the first master node after sending K times of query messages to the first master node; K is An integer greater than or equal to 1.
The system according to claim 6, characterized in that,

The first slave node is further configured to send a broadcast message to at least one master node in the system; the broadcast message is used to indicate that the first slave node currently does not own a management master node;

The at least one master node is configured to receive the broadcast message, and send the first configuration message to the first slave node based on the broadcast message, so as to obtain management authority for the first slave node.
The system according to any one of claims 1-7, wherein the multipath bus subsystem further comprises a second master node; the N switches further comprise N4 adjacent to the second master node A fourth switch; wherein, any fourth switch is adjacent to any third switch, or is connected through one or more second switches; N4 is a positive integer less than or equal to N;

The second master node is configured to send an enumeration message to the first slave node multiple times, and based on at least one switch, determine multiple routes between the second master node and the first slave node path; the at least one switch is the switch through which each report message is passed from the second master node to the first slave node; wherein, each routing path in the plurality of routing paths passes at least sequentially through the One or more of the N4 fourth switches, S of the N2 second switches, and one or more of the N3 third switches.
The system according to claim 8, wherein the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and a plurality of the second node domains. A slave node, the second node domain includes the second master node and multiple second slave nodes; the second master node is specifically used for:

All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node Afterwards, sending an enumeration message to the first slave node multiple times, and determining multiple routing paths between the second master node and the first slave node based on at least one switch.
The system according to claim 9, wherein the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; The first port of the first cross-node domain switch is connected to the second port of the second cross-node domain switch;

If there is a first slave node in the first node domain that has not been assigned a device number by the first master node, and/or, there is a slave node in the second node domain that has not been assigned a device number by the second master node the second slave node, the data link between the first port and the second port is in a closed state;

If all first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node , then the data link between the first port and the second port is in an open state, so that the second master node communicates with the first slave node through the second port and the first port The enumeration message is sent multiple times, and multiple routing paths between the second master node and the first slave node are determined based on at least one switch.
The system according to any one of claims 9-10, wherein the second master node is a remote master node connected to the first master node via a switch; the second master node, specifically The method is used for: accessing the first slave node in the first node domain through a network connection, so as to invoke computing resources in the first slave node or read stored data in the first slave node.
The system according to any one of claims 1 to 11, wherein the first master node is a central processing unit (CPU) in the first terminal;

The first master node is also used to call the computing resources of at least one second slave node or read the stored data in the at least one second slave node through a network connection; the second slave node is the Image processor GPU, solid state drive, accelerator, network card or tensor processing unit TPU.
The system according to any one of claims 1-12, wherein the slave node is a GPU, a solid state drive, an accelerator, a network card, a TPU, an embedded neural network processor NPU, a digital signal processor DSP, an image signal Either processor ISP or switch.
The system according to any one of claims 1-13, wherein the master node includes one or more central processing units (CPUs).
A communication method is characterized in that it is applied to a bus system, and the bus system is a graph structure formed by a plurality of master nodes, a plurality of switches and a plurality of slave nodes through the bus; any multipath in the graph structure The bus subsystem includes a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and the N2 second switches connected to the first master node N3 third switches adjacent to the first slave node; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all less than or equal to N positive integer; the method includes:

Send enumeration messages to the first slave node multiple times through the first master node, and determine multiple routing paths between the first master node and the first slave node based on at least one switch ; The at least one switch is the switch through which each report message passes from the first master node to the first slave node; wherein, each routing path passes through at least one of the N1 first switches in turn or more, S of the N2 second switches, and one or more of the N3 third switches; S is a natural number less than or equal to N3.
The method according to claim 15, further comprising:

Through the first master node, based on the number of reports sent, query the visible bit in the routing status register of the first slave node, if the visible bit is 0, then send to the first The corresponding device number is allocated from the node; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated and found;

Through the first slave node, the device number assigned by the first master node is saved in the routing status register, and the visible bit is set to 1; wherein, the visible bit is 1 for Indicates that the first slave node is currently discovered by enumeration.
The method according to any one of claims 15-16, wherein the method further comprises:

Send a first configuration message to the first slave node through the first master node, so as to obtain management authority for the first slave node; the first configuration message carries the first master node’s Masternode password and masternode number;

The first configuration message is received by the first slave node, and based on the first configuration message, the signature bit in the management master node information register of the first slave node is set to 1; wherein, The signature bit being 1 is used to indicate that the first slave node currently has a management master node, and other master nodes in the system cannot obtain management authority to the first slave node;

Save the master node password and master node number of the first master node into the management master node information register through the first slave node.
The method according to claim 17, further comprising:

Send a second configuration message to the first slave node through the first master node to cancel the management authority of the first slave node; the second configuration message carries the first master node’s Masternode password and masternode number;

The first slave node receives the second configuration message, if the master node password and the master node number of the first master node carried in the second configuration message are the same as those in the management master node information register The saved master node password is consistent with the master node number, then the signature bit in the management master node information register is set to 0; wherein, the signature bit is 0 to indicate that the first slave node currently does not have Manage master nodes.
The method according to claim 17, further comprising:

Through the first master node, after obtaining the management authority to the first slave node, sending an in-position message to the first slave node in response to the query message sent by the first slave node; or, Sending the presence message to the first slave node according to a first time interval.
The method according to claim 19, further comprising:

Through the first slave node, if the preset condition is met, the signature bit in the management master node information register is set to 0, so as to cancel the first master node from the first slave node administrative privileges for ; where,

The preset condition includes: after the first master node obtains the management authority for the first slave node, the first slave node does not receive the message sent by the first master node within a preset time. The in-position message, or the first slave node has not received the in-position message sent by the first master node after sending K times of query messages to the first master node; K is An integer greater than or equal to 1.
The method according to claim 20, further comprising:

Sending a broadcast message to at least one master node in the system through the first slave node; the broadcast message is used to indicate that the first slave node currently does not own a management master node;

The broadcast message is received by the at least one master node, and the first configuration message is sent to the first slave node based on the broadcast message, so as to obtain management authority for the first slave node.
The method according to any one of claims 15-21, wherein the multipath bus subsystem further comprises a second master node; the N switches further comprise N4 adjacent to the second master node A fourth switch; wherein, any fourth switch is adjacent to any third switch, or is connected through one or more second switches; N4 is a positive integer less than or equal to N; the method also includes:

Send enumeration messages to the first slave node multiple times through the second master node, and determine multiple routing paths between the second master node and the first slave node based on at least one switch ; The at least one switch is the switch through which each report message passes from the second master node to the first slave node; wherein, each routing path in the plurality of routing paths passes at least sequentially through the One or more of the N4 fourth switches, S of the N2 second switches, and one or more of the N3 third switches.
The method according to claim 22, wherein the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and a plurality of the second node domains A slave node, the second node domain includes the second master node and multiple second slave nodes; the second master node sends an enumeration message to the first slave node multiple times, And based on at least one switch, determining multiple routing paths between the second master node and the first slave node, including:

All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node Afterwards, through the second master node, send enumeration report messages to the first slave node multiple times, and based on at least one switch, determine a plurality of information between the second master node and the first slave node routing path.
The method according to claim 23, wherein the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; The first port of the first cross-node domain switch is connected to the second port of the second cross-node domain switch;

If some or all of the first slave nodes in the domain of the first node have not been assigned device numbers by the first master node, and/or, some or all of the second slave nodes in the domain of the second node have not been assigned device numbers If the second master node assigns a device number, the data link between the first port and the second port is in a closed state;

If all first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node , then the data link between the first port and the second port is in an open state, so that the second master node communicates with the first slave node through the second port and the first port The enumeration message is sent multiple times, and multiple routing paths between the second master node and the first slave node are determined based on at least one switch.
The method according to any one of claims 23-24, wherein the second master node is a remote master node connected to the first master node through a switch; the method further includes: through the The second master node accesses the first slave node in the domain of the first node through a network connection, so as to invoke computing resources in the first slave node or read stored data in the first slave node.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a computer or a processor, the method described in claims 15-25 above is realized.
A computer program, characterized in that the computer program includes instructions, and when the computer program is executed by a computer or a processor, the computer or the processor executes the method according to claims 15-25.