WO2023040447A1 - Bus system, communication method, and related device - Google Patents

Bus system, communication method, and related device Download PDF

Info

Publication number
WO2023040447A1
WO2023040447A1 PCT/CN2022/105758 CN2022105758W WO2023040447A1 WO 2023040447 A1 WO2023040447 A1 WO 2023040447A1 CN 2022105758 W CN2022105758 W CN 2022105758W WO 2023040447 A1 WO2023040447 A1 WO 2023040447A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
master node
slave
master
switch
Prior art date
Application number
PCT/CN2022/105758
Other languages
French (fr)
Chinese (zh)
Inventor
刘君龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023040447A1 publication Critical patent/WO2023040447A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/16Multipoint routing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of computer technology, in particular to a bus system, a communication method and related equipment.
  • AI artificial intelligence
  • cloud computing cloud computing
  • tensor processing unit tensor processing unit
  • interconnection buses with high bandwidth, low latency, low energy consumption, and easy implementation are becoming more and more important.
  • the bus interconnection in the future will be disordered, point-to-point graph structure, a node on the bus may be multipathing, and a node on the bus can be used as a central processor unit (CPU).
  • CPU central processor unit
  • Node to manage a certain device there may be multiple hosts (multi-host) on the bus.
  • multi-host hosts
  • CXL Compute Express Link
  • PCIe peripheral component interconnect express
  • CXL2.0 a special CXL switch and a specific topology interconnection (for example, a type (type) 3 CXL device connected to a CXL switch) are required to realize multi-host.
  • CXL3.0 future CXL evolution versions (such as CXL3.0), it may support multipathing, but it must also require this special CXL switch and specific topology interconnection to support it.
  • Embodiments of the present application provide a bus system, a communication method, and related equipment, which can support multi-path access to devices on the bus.
  • the embodiment of the present application provides a bus system, the system is a graph structure composed of a plurality of master nodes, a plurality of switches and a plurality of slave nodes through the bus; any multipath in the graph structure
  • the bus subsystem includes a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and the N2 second switches connected to the first master node N3 third switches adjacent to the first slave node; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all less than or equal to N A positive integer; the first master node is used to send enumeration reports to the first slave node multiple times, and based on at least one switch, determine the relationship between the first master node and the first slave node A plurality of routing paths; the at least one switch is the switch through which the report message is passed from the first master node to
  • the conventional PCIe bus interconnection is a tree topology with a strict top-down hierarchical relationship. Nodes on each layer can only be related to one node in the upper layer, but may be related to multiple nodes in the lower layer. Related, resulting in only one top-down path from the host on the bus to any node device (such as GPU, TPU, and network card, etc.). This greatly limits the scalability of the topology, which cannot meet the increasingly complex computing system in the case of increasingly large computing data.
  • the top-down tree structure of the existing PCIe bus is broken, and the host (ie, master node), node device (ie, slave node) and switch are connected to form a flat graph structure, wherein a certain node in the graph structure (may be a master node, or a slave node such as a GPU or a TPU, or a switch, etc.) may be related to any other node.
  • a plurality of switches are used as intermediate devices connecting the master node and the slave node, and can construct a master node (for example, the first master node) to a slave node (for example, the first master node) A slave node) multiple physical links.
  • the number of report messages may be enumerated to a slave node through a certain physical link.
  • the master node can record the routing path from the master node to the slave node based on which port of the master node the enumeration report departs from in the physical link, which port of the switch it passes through, etc.
  • the host sends Multiple enumeration reports may reach the same slave node through multiple different paths.
  • the master node can send enumeration reports multiple times to record multiple different routes between the master node and the slave node path.
  • the bus interconnection topology implemented in the embodiment of the present application is a graph structure, which can support multipathing access to node devices (multipathing), that is, there can be multiple Accessible paths can make the bus interconnection easier, and the design space is larger, making it more scalable and larger in capacity, so as to meet the increasingly complex and huge computing needs of users.
  • the master node in the embodiment of the present application may be a master processing chip, and the slave node may be a slave processing chip, a memory, or a dedicated hardware processing unit, and the like.
  • the master node may be a host computer including multiple central processing units, and the slave node may be a computing unit such as a GPU or a TPU, or a storage device such as a solid-state disk, and so on.
  • the master node can access the corresponding slave nodes through the multiple routing paths determined above, and then call the computing resources or data resources in the slave nodes to perform a series of calculation processing and so on.
  • the master node described in the embodiment of this application can also be called a host, and the slave node can also be called a node device.
  • the management master node can also be called a management host, and the management master node information register can also be called a It is the management host information register, and the management master node signature can also be called the management host signature, etc., and will not be explained repeatedly in the future.
  • the first master node is further configured to query the visible bit in the routing status register of the first slave node based on the sent enumeration message, if the visible bit When the bit is 0, the corresponding device number is allocated to the first slave node; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated and found; the first slave node, It is used to save the device number assigned by the first master node to the routing status register, and set the visible bit to 1; wherein, the visible bit is 1 to indicate that the first slave The node is currently discovered by enumeration.
  • a node device can be enumerated by the host through multiple paths, and the node device can be enumerated via the first
  • the visible bit in the routing status register of the node device can be set to 1 to indicate that the node device has been discovered by enumeration currently.
  • the host sends an enumeration message to the node device via other routing paths, it can be determined that the node device has been enumerated and discovered according to its visible bit being 1, so that there is no need to assign a device number to it, and only need to record this
  • the new routing path through which the enumeration message is sent for the first time is sufficient.
  • the enumeration under the multi-path can be realized in a form of recording the historical discovery state, avoiding repeated enumeration and infinite loop under the multi-path, and completing the enumeration efficiently and accurately.
  • the first master node is further configured to send a first configuration message to the first slave node, so as to obtain management authority for the first slave node; the first configuration The message carries the master node password and the master node number of the first master node; the first slave node is also used to receive the first configuration message, and based on the first configuration message, set the The signature bit in the management master node information register of the first slave node is set to 1; wherein, the signature bit is 1 to indicate that the first slave node currently has a management master node, and other master nodes in the system The management authority to the first slave node cannot be obtained; the first slave node is further configured to save the master node password and master node number of the first master node in the management master node information register.
  • the host can set the signature bit in the management host information register of the node device to 1 by sending the first configuration message to the node device, and set its own host password (that is, the master node password) and The host number (that is, the master node number) is written into the node device, so as to obtain the management authority of the node device.
  • the signature bit is set to 1, that is, after a certain host obtains the management authority, other hosts will not be able to obtain the management authority of the node device, thus ensuring that the node device It has a unique management host (that is, the management master node) to ensure the clarity of the device management plane.
  • a host can obtain management rights to multiple different node devices, and when the host obtains management rights to different node devices, the host password for each node device can be different, the host password It can be used as an important credential for subsequent verification of the identity of the management host, thereby ensuring the security of the device management plane.
  • the management authority includes but is not limited to the host arbitrating the resource competition between node devices, handling abnormal error reports of node devices, managing and configuring the basic characteristics of node devices (such as the maximum supported packet length), etc. wait.
  • a mechanism for separating the management and use of node devices is also defined. As mentioned above, a node device can only be managed by one host, but can be accessed and used by multiple hosts (for example Including the use of data resources and computing resources, etc.), etc., which are not specifically limited in this embodiment of the present application.
  • the first master node is further configured to send a second configuration message to the first slave node, so as to cancel the management authority of the first slave node; the second configuration The message carries the master node password and the master node number of the first master node; the first slave node is also used to receive the second configuration message, if the second configuration message carries the first The master node password and the master node number of a master node are consistent with the master node password and the master node number stored in the management master node information register, then the signature bit in the management master node information register is set to 0; Wherein, the signature bit being 0 is used to indicate that the first slave node currently does not own the management master node.
  • the host after the host obtains the management authority for a certain node device, it can also send the second configuration Message, reset the signature bit in the management host information register of the node device (that is, the management master node information register) to 0, thereby canceling the management authority of the node device, improving the flexibility of the device management mechanism, and satisfying the user Actual demand.
  • the node device receives the second configuration message, it needs to verify the host password and host number carried in the message, and only allow The host performs the operation of canceling the management authority, that is, the embodiment of the present application also defines the verification mechanism of the node device management host to ensure that the management host of the node device cannot be counterfeited, and further ensures clear and reliable device management.
  • the first master node is further configured to, after obtaining the management authority for the first slave node, respond to a query message sent by the first slave node, A slave node sends an in-position message; or, sends the in-position message to the first slave node according to a first time interval.
  • an on-site confirmation process of the management host is also defined to ensure the robustness of the bus system.
  • the host obtains the management authority of the node device, it can respond to the query message sent by the node device and send an in-position message to the node device, or actively send an in-position message to the node device at a certain time interval, thereby notifying the node device.
  • the management host of the node device is currently normal, ensuring the real-time presence of the management host.
  • the frequency of presence confirmation interaction between the host and the node device can be very low, and the impact on the bus bandwidth is almost negligible.
  • the first slave node is further configured to set the signature bit in the management master node information register to 0 when a preset condition is met, so as to cancel the The management authority of a master node to the first slave node; wherein, the preset condition includes: after the first master node obtains the management authority of the first slave node, the first slave node The in-position message sent by the first master node was not received within the preset time, or, after the first slave node sent K times of query messages to the first master node, none of them received The presence message sent by the first master node; K is an integer greater than or equal to 1.
  • the node device when the node device has not received the in-position message sent by its management host (for example, the first master node) for a long time, or sent query messages to its management host for many times without getting a response , it is considered that the management host is in an abnormal state, and then its management authority is cancelled, so as to obtain a new management host later, so as to ensure the reliable operation of the entire bus system.
  • its management host for example, the first master node
  • the first slave node is further configured to send a broadcast message to at least one master node in the system; the broadcast message is used to indicate that the first slave node does not currently have a management master Node; the at least one master node, configured to receive the broadcast message, and send the first configuration message to the first slave node based on the broadcast message, so as to obtain management of the first slave node authority.
  • the node device when the node device judges that the current management host (such as the first master node) is abnormal and cancels its management authority, it can also send a broadcast message to multiple hosts in the bus system to notify The plurality of hosts obtain the management authority of the node device, so that the node device can have a new management host, thereby ensuring the reliable operation of the entire bus system.
  • the current management host such as the first master node
  • the multipath bus subsystem further includes a second master node; the N switches further include N4 fourth switches adjacent to the second master node; wherein, any fourth The switch is adjacent to any third switch, or connected through one or more second switches; N4 is a positive integer less than or equal to N; the second master node is used to send multiple times to the first slave node report messages, and based on at least one switch, determine multiple routing paths between the second master node and the first slave node; A switch through which the node goes to the first slave node; wherein, each routing path in the plurality of routing paths passes through at least one or more of the N4 fourth switches and the N2 second switches in sequence S of them, and one or more of said N3 third switches.
  • a node device (such as the first slave node) can also be discovered by multiple different hosts, that is, other hosts (such as the second master node) can also be connected to the node device via multiple switches, The enumeration message is sent to the node device multiple times, thereby recording multiple routing paths with the node device, and so on. It greatly expands the scope of use of node devices to meet the needs of each host for a large number of node devices, so that any host can access node devices under other hosts, and then can call more computing resources to perform more complex tasks. Calculation processing.
  • a node device can be used by multiple hosts, it can only be managed by one host. The single host management mechanism can make the management clearer.
  • the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and multiple first slave nodes, so The second node domain includes the second master node and multiple second slave nodes; the second master node is specifically configured to: all first slave nodes in the first node domain are controlled by the first The master node assigns a device number, and after all the second slave nodes in the domain of the second node are assigned device numbers by the second master node, it sends enumeration messages to the first slave node multiple times, and based on At least one switch determines multiple routing paths between the second master node and the first slave node.
  • the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; the first cross-node domain switch The first port of the switch is connected to the second port of the second cross-node domain switch; if some or all of the first slave nodes in the first node domain have not been assigned device numbers by the first master node, and/ Or, some or all of the second slave nodes in the second node domain have not been assigned device numbers by the second master node, then the data link between the first port and the second port is in a closed state; if All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node, Then the data link between the first port and the second port is in an open state, so that the second master node sends more data to the first slave node through the second port and the first port An
  • the bus system can be divided into multiple node domains.
  • the host can further enumerate and discover node devices in other node domains after completing the enumeration and discovery of node devices in the current node domain. , and then record multiple routing paths with node devices in other node domains.
  • the enumeration and discovery process of the host starts from the node domain to which it currently belongs. Before the enumeration and discovery of the node devices in the local node domain is completed, the current node domain may not be connected with any other node domains. Node devices make data stream connections.
  • the second master node is a remote master node connected to the first master node through a network through a switch; the second master node is specifically configured to: access the first master node through a network connection.
  • the first slave node in a node domain, to invoke computing resources in the first slave node or read stored data in the first slave node.
  • the first master node is the central processing unit CPU in the first terminal; the first master node is also used to invoke computing resources or read resources of at least one second slave node through a network connection
  • the stored data in the at least one second slave node is fetched;
  • the second slave node is an image processor GPU, a solid state disk, an accelerator, a network card or a tensor processing unit TPU in the second terminal.
  • the first terminal and the second terminal may be smart phones, tablet computers, desktop computers, computers, servers, etc., which are not specifically limited in this embodiment of the present application.
  • a node device on the bus can be accessed and used by multiple hosts. Further, a node device can not only be accessed and used by hosts connected by wires at the near end, but also can be accessed by Remotely accessed and used by other hosts connected wirelessly, so that the host can call remote computing resources and store data, to a greater extent meet the host's demand for huge computing resources when performing complex computing processing, and so on. Undoubtedly, the network connection greatly enhances the scalability of the entire bus system, and further breaks through the limitation of the existing PCIe bus connection.
  • the slave node is an image processor, a solid state disk, an accelerator, a network card, a tensor processing unit, an embedded neural network processor NPU, a digital signal processor DSP, an image signal processor ISP or a switch any of the.
  • the node device may be an image processor, a solid state disk, an accelerator, a network card, a tensor processing unit, an embedded neural network processor (neural-network processor, NPU), a digital signal processor (digital signal processor) , DSP), an image signal processor (image signal processor, ISP) and a switch, etc., which are not specifically limited in this embodiment of the present application.
  • NPU neural-network processor
  • DSP digital signal processor
  • ISP image signal processor
  • switch etc.
  • the above various node devices can be added to the structure, so that the host on the bus can access the corresponding node devices according to actual needs, to use Its computing resources or data resources, etc., so as to meet the increasingly complex and huge computing needs.
  • the master node includes one or more central processing units (CPUs).
  • CPUs central processing units
  • the host may be a computing system with one or more central processing units.
  • the host may also include a main memory, a cache memory (cache), an internal interconnection bus, an input and output
  • the (input/output, IO) interface and the like are not specifically limited in this embodiment of the present application.
  • the host can use computing resources (such as computing units in the GPU, etc.) Computing processing to meet increasingly complex and huge computing needs.
  • the embodiment of the present application provides a communication method, which is applied to a bus system, and the bus system is a graph structure composed of a plurality of master nodes, a plurality of switches, and a plurality of slave nodes through the bus; the graph structure Any one of the multipath bus subsystems in the system includes a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, and N2 second switches switch, and N3 third switches adjacent to the first slave node; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all is a positive integer less than or equal to N; the method includes: through the first master node, sending an enumeration message to the first slave node multiple times, and based on at least one switch, determining the first master node A plurality of routing paths with the first slave node; the at least one switch is the switch through which each
  • the method further includes: through the first master node, based on the sent enumeration message, querying the visible bits in the routing status register of the first slave node, if If the visible bit is 0, assign the corresponding device number to the first slave node; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated; through the The first slave node saves the device number allocated by the first master node into the routing status register, and sets the visible bit to 1; wherein, the visible bit is 1 to indicate the The first slave node is currently discovered by enumeration.
  • the method further includes: using the first master node, sending a first configuration message to the first slave node, so as to obtain management authority for the first slave node;
  • the first configuration message carries the master node password and the master node number of the first master node; through the first slave node, the first configuration message is received, and based on the first configuration message, the The signature bit in the management master node information register of the first slave node is set to 1; wherein, the signature bit is 1 to indicate that the first slave node currently owns a management master node, and other
  • the master node cannot obtain the management authority of the first slave node; through the first slave node, the master node password and the master node number of the first master node are saved in the management master node information register.
  • the method further includes: using the first master node, sending a second configuration message to the first slave node, so as to cancel the management authority of the first slave node;
  • the second configuration message carries the master node password and the master node number of the first master node;
  • the second configuration message is received through the first slave node, if the second configuration message carries the The master node password and the master node number of the first master node are consistent with the master node password and the master node number stored in the management master node information register, then the signature bit in the management master node information register is set to 0; wherein, the signature bit is 0 to indicate that the first slave node currently does not own a management master node.
  • the method further includes: through the first master node, after obtaining the management authority to the first slave node, in response to a query message sent by the first slave node, sending an in-position message to the first slave node; or sending the in-position message to the first slave node at a first time interval.
  • the method further includes: by the first slave node, setting the signature bit in the management master node information register to 0 when a preset condition is met, so as to Canceling the management authority of the first master node to the first slave node; wherein, the preset condition includes: after the first master node obtains the management authority of the first slave node, the The first slave node does not receive the presence message sent by the first master node within a preset time, or, after the first slave node sends K times of query messages to the first master node, The in-position message sent by the first master node has not been received; K is an integer greater than or equal to 1.
  • the method further includes: sending a broadcast message to at least one master node in the system through the first slave node; the broadcast message is used to indicate that the first slave node is currently does not have a management master node; through the at least one master node, the broadcast message is received, and the first configuration message is sent to the first slave node based on the broadcast message, so as to obtain the configuration information for the first slave node Administrative permissions for the node.
  • the multipath bus subsystem further includes a second master node; the N switches further include N4 fourth switches adjacent to the second master node; wherein, any fourth The switch is adjacent to any third switch, or connected through one or more second switches; N4 is a positive integer less than or equal to N; the method also includes: through the second master node, to the first slave The node sends enumeration report messages multiple times, and based on at least one switch, determines multiple routing paths between the second master node and the first slave node; A switch through which the second master node goes to the first slave node; wherein, each routing path in the plurality of routing paths passes through at least one or more of the N4 fourth switches in sequence, the S of the N2 second switches and one or more of the N3 third switches.
  • the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and multiple first slave nodes, so The second node domain includes the second master node and multiple second slave nodes; the second master node sends an enumeration message to the first slave node multiple times, and based on at least one switch , determining multiple routing paths between the second master node and the first slave node, including: all first slave nodes in the domain of the first node are assigned device numbers by the first master node, In addition, after all the second slave nodes in the second node domain are assigned device numbers by the second master node, the enumeration report message is sent to the first slave node multiple times through the second master node, And based on at least one switch, determine multiple routing paths between the second master node and the first slave node.
  • the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; the first cross-node domain switch The first port of the switch is connected to the second port of the second cross-node domain switch; if some or all of the first slave nodes in the first node domain have not been assigned device numbers by the first master node, and/ Or, some or all of the second slave nodes in the second node domain have not been assigned device numbers by the second master node, then the data link between the first port and the second port is in a closed state; if All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node, Then the data link between the first port and the second port is in an open state, so that the second master node sends more data to the first slave node through the second port and the first port An
  • the second master node is a remote master node that is network-connected to the first master node through a switch; the method further includes: using the second master node to access The first slave node in the first node domain is used to invoke computing resources in the first slave node or read stored data in the first slave node.
  • an embodiment of the present application provides a master node, where the host includes a processor configured to support the master node to perform a corresponding function in any one of the communication methods provided in the second aspect.
  • the master node may also include a memory, which is used to be coupled with the processor, and stores necessary program instructions and data of the master node.
  • the master node may also include a communication interface for the master node to communicate with other devices or a communication network.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the communication described in any one of the above-mentioned second aspects is realized. method flow.
  • an embodiment of the present application provides a computer program, the computer program includes instructions, and when the computer program is executed by a computer, the computer can execute the process of the communication method described in any one of the above-mentioned second aspects.
  • the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the processor is used to call and run instructions from the communication interface, and when the processor executes the instructions, the chip Execute the flow of the communication method described in the second aspect above.
  • the embodiment of the present application provides a chip system
  • the chip system includes the bus system described in any one of the above-mentioned first aspects, and is used to implement the communication method process described in any one of the above-mentioned second aspects the functions involved.
  • the chip system further includes a memory, and the memory is used for storing necessary program instructions and data of the communication method.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • Figure 1 is a schematic diagram of an interconnection structure based on CXL2.0.
  • Fig. 2 is a schematic structural diagram of a bus system provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a multipath bus subsystem provided by an embodiment of the present application.
  • Fig. 4a is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.
  • Fig. 4b is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.
  • Fig. 4c is a schematic structural diagram of another multi-path bus subsystem provided in the example of the present application.
  • Fig. 4d is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.
  • FIG. 5 is a schematic structural diagram of another bus system provided by an embodiment of the present application.
  • Fig. 6 is a schematic diagram of a management host signature flow provided by an embodiment of the present application.
  • FIG. 7 is a schematic flow diagram of a host canceling management authority provided by an embodiment of the present application.
  • Fig. 8a is a schematic diagram of a management host presence confirmation process provided by an embodiment of the present application.
  • Fig. 8b is a schematic diagram of another management host presence confirmation process provided by the embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of another multipath bus subsystem provided by an embodiment of the present application.
  • 10a-10c are schematic structural diagrams of a set of multi-path multi-master bus systems provided by an embodiment of the present application.
  • 11a-11b are schematic diagrams of an enumeration process under a group of multi-path multi-hosts provided by the embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a communication method provided by an embodiment of the present application.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
  • a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a processor and a processor may be components.
  • One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.
  • packets of data e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems.
  • Topological structure The topology of a computer network refers to the method of studying the relationship between points and lines that have nothing to do with size and shape in topology. It abstracts the computer and communication equipment in the network into a point, and abstracts the transmission medium into a line, which is composed of points and lines.
  • the geometry of is the topology of the computer network.
  • the most important topological structures of computer networks include bus topology, ring topology, tree topology, star topology, hybrid topology and mesh topology.
  • PCI adopts a bus topology
  • PCIe adopts a tree topology, and so on.
  • hosts in the topology can use algorithms to search the entire network topology, enumerate and discover all devices connected in the network, the so-called enumeration discovery simply means to traverse the devices physically connected under each port, and read the configuration of the devices Space, assign corresponding device numbers to it and record the routing path enumerated to the device. Finally, the host can know the number of devices in the entire topology and the connection relationship between each device, and so on. Hence, as mentioned above, if a device number has been allocated to a device, it may indicate that the device has been discovered by enumeration.
  • the topology structure may be a flattened (flat) graph structure, and there may be multiple paths from the host on the bus (ie, the master node) to a node device (ie, the slave node) , and at the same time, a node device can also be accessed by multiple hosts on the bus, etc., which will not be described in detail here.
  • a switch used to implement bus interconnection and routing, includes multiple ports, and each port may correspond to a physical link to connect a master node, a slave node or other switches in the bus system.
  • multiple switches are specifically used to construct multiple physical links between the master node and the slave nodes by connecting with each other, so as to realize multi-path access from the master node to the slave nodes.
  • some ports of the switch can also have network card functions for remote network connection, and some special ports can be used to control the enumeration and discovery process of the master node in the node domain where it is located. In this way, interference in the enumeration process of multiple master nodes is avoided, etc., and will not be elaborated here.
  • Graphic structure that is, graph structure, is a nonlinear structure more complex than tree structure.
  • tree structure there is a branched hierarchical relationship between nodes, and a node on each layer can only be related to one node in the previous layer, but may be related to multiple nodes in the next layer.
  • the predecessor and successor of any node can be one or more, breaking the strict hierarchical relationship from top to bottom in the tree structure, or in other words, any two nodes in the graph structure All nodes may be related, that is, the adjacency relationship between nodes can be arbitrary.
  • Figure 1 is a schematic diagram of a CXL2.0-based interconnection structure.
  • the structure may include a CXL switch, a bus manager (fabric manager, FM), multiple hosts and multiple devices. Specifically, it can include host 0, host 1, and type3 device 0, type3 device 1, and type3 device 2.
  • the multiple hosts and devices must participate in the interconnection through the CXL switch, that is, the host cannot be directly connected to the device, but indirectly connected to the device through the CXL switch.
  • Top-down PCIe tree structure Top-down PCIe tree structure.
  • the CXL switch includes a bus manager endpoint device (fabric manager Endpoint, FM EP) connected to the FM, and two virtual CXL switches (virtual CXL switch, VCS), such as VCS-0 and VCS-1, wherein, can also comprise a plurality of virtual pci-to-pci bridges (virtual pci-to-pci bridge, vPPB) in each VCS, as shown in Figure 1, can comprise vPPB-01, vPPB-02, vPPB-03, and VCS-1 may include vPPB-11, vPPB-12, and vPPB-13.
  • the CXL switch also includes multiple physical pci-to-pci bridges (pci-to-pci bridge, PPB), such as PPB-0, PPB-1, and PPB-2.
  • the CXL switch first needs to be initialized by the FM, and the downstream port (downstream port, DP) of the CXL switch is not bound to the virtual CXL switch, but only belongs to the FM.
  • FM can initialize the CXL switch through some manufacturer-defined mechanisms, and bind the vPPB of the CXL switch to a physical PPB in advance.
  • multiple vPPBs can be bound to the same PPB, as shown in Figure 1. Both vPPB-03 and vPPB-12 can be bound to PPB-1.
  • host 0 and host 1 can enumerate to the CXL switch and the connected devices behind the CXL switch (such as type3 device 0, type3 device 1, and type3 device 2) according to the standard PCIe enumeration process. Switch's address window, bus number window, etc. Therefore, for host 0 and host 1, what they see is a complete exclusive PCIe tree.
  • the CXL switch must store the routing configurations during these enumeration processes, and the subsequent downstream data flows can be accurately routed to the actual physical downstream ports. For the upstream data flow, similarly, it should also be able to be normally routed to the real destination host. Therefore, the design of the CXL switch must comply with these regulations of the CXL protocol. Compared with the ordinary switch, the CXL switch is more complicated and more expensive.
  • a specially designed CXL switch must be used as an intermediate interconnection device to interconnect hosts and devices.
  • This specially designed CXL switch will undoubtedly increase the delay of interconnection, and to some extent limit the topology of the entire interconnection, increasing The complexity and additional cost of the interconnection design.
  • the actual technical problems to be solved in this application include the following aspects: breaking the limitation of the existing PCIe tree structure from top to bottom, based on the existing conventional switch (switch), to realize the interconnection topology of the graph structure.
  • switch switches
  • FIG. 2 is a schematic structural diagram of a bus system provided by an embodiment of the present application.
  • the bus system 10 may include a plurality of master nodes, a plurality of switches and a plurality of slave nodes, specifically may include a master node 100a, a master node 100b and a master node 100c, etc., a switch 300a, a switch 300b and a switch 300c etc., and slave node 200a, slave node 200b, and slave node 200c.
  • the plurality of master nodes, the plurality of switches and the plurality of slave nodes can be connected by a bus (for example, a network on chip, or any other possible bus, such as an amba bus, etc.) to form a graph structure, or It is said that the topology structure of the computer network composed of multiple master nodes, multiple switches and multiple slave nodes is a graph structure rather than a conventional tree structure.
  • a bus for example, a network on chip, or any other possible bus, such as an amba bus, etc.
  • any master node, switch or slave node can be used as a node in the graph structure, and in the graph structure, any two nodes may be related, for example, the master node 100a can be connected to the switch 300a respectively , the switch 300b is adjacent (or directly connected, that is, the physical link between the master node 100a and the switch 300a has no other equipment), the switch 300a, the switch 300b and the switch 300c can be adjacent to each other, and the switch 300a can also be connected to the slave node 200a , adjacent to the slave node 200b, and even, when the slave node 200a has multiple ports, the slave node 200a can also be adjacent to the master node 100a, master node 100c, etc., etc., and the embodiment of the present application does not specifically limit this.
  • this embodiment of the application is to take into account the popularity of AI, autonomous driving and other application scenarios with increasingly higher computing requirements, boldly break through the limitations of the traditional tree structure in the existing PCIe bus interconnection topology, and adopt a graph structure
  • the connection between the master node, switch and slave nodes is more arbitrary, and the entire topology is more scalable, so that a large number of proprietary computing devices (such as GPUs and TPUs, etc.) can be continuously added to the bus as the graph structure of slave nodes.
  • any master node in the graph structure can access and use any slave node in the graph structure through multiple paths formed by multiple switch connections, maximizing the unlimited use of various computing resources and data resources by the master node .
  • the master node 100a, the master node 100b, and the master node 100c may include one or more CPUs, and optionally, may also include a main memory, a cache memory (such as a cache), an internal interconnection bus, an IO interface, etc. etc., this embodiment of the present application does not specifically limit it.
  • the master node can also be considered as a computing system with the above-mentioned components; the switch 300a, the switch 300b, and the switch 300c can include multiple ports, and in other possible In the embodiment, switch can also have corresponding congestion control and service quality (quality of service, QOS) function; From node 200a, from node 200b and from node 200c etc.
  • QOS congestion control and service quality
  • can be general-purpose GPU, TPU, certain processor unit XPU It can also be a storage device such as a solid state drive (solid state drives, SSD), or an accelerator with a specific computing function, a smart network card, or even a switch (such as a network switch), etc., and the implementation of this application The example does not specifically limit this.
  • a storage device such as a solid state drive (solid state drives, SSD), or an accelerator with a specific computing function, a smart network card, or even a switch (such as a network switch), etc.
  • FIG. 3 is a schematic structural diagram of a multi-path bus subsystem provided by an embodiment of the present application.
  • the bus system 10 of the above graph structure may include one or more multi-path bus subsystems, and any one of the multi-path bus subsystems may include a master node, a slave node and a switch.
  • the multipath bus subsystem 10a may include a master node 100a (i.e. the first master node), a slave node 200a (i.e. the first slave node) and N switches, where N may be greater than or equal to 1 integer.
  • the N switches may include N1 first switches adjacent to the first master node 100a (such as the first switch 11 and the first switch 12 in FIG. 3 ), and the N3 first switches adjacent to the first slave node 200a. Three switches (such as the third switch 31, the third switch 32, etc. in FIG. 3), and N2 second switches (such as the second switch 21, the second switch 22, the second switch 23, etc. in FIG. 3).
  • any one of the first switches may be adjacent to any one of the third switches, or may be connected through one or more second switches among the N2 second switches.
  • the first switch 11 can be adjacent to the third switch 31; for another example, the first switch 11 can be connected to the third switch 31 through the second switch 21, obviously, at this time, the second switch 21 is connected to the first switch 11 and the third switch 31 respectively.
  • the third switch 31 is adjacent; also for example, the first switch 11 can pass through the second switch 22 and the second switch 23 in turn, thereby being connected with the third switch 31, obviously, at this moment, the second switch 22 is respectively connected with the first switch 11 and the second switch 23.
  • the second switch 23 is adjacent, and the second switch 23 is also adjacent to the third switch 31, and so on, which is not specifically limited in this embodiment of the present application.
  • N1, N2, and N3 may all be positive integers less than or equal to N.
  • the master node 100a may send multiple reports to the slave node 200a, and based on each report message from the master node 100a to the switch passed by the slave node 200a, determine the distance between the master node 100a and the slave node 200a. multiple routing paths. Wherein, each routing path may pass through at least one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches in sequence. In this way, multi-path access to a slave node is realized, or a master node can access the same slave node through multiple routing paths.
  • S is a natural number less than or equal to N3, that is, S may be equal to 0, and at this time, the routing path between the master node 100a and the slave node 200a may only pass through the first switch and the third switch.
  • the routing path between the master node 100a and the slave node 200a may only pass through one or more of the N1 first switches, or only through one or more of the N3 third switches , etc., which are not specifically limited in this embodiment of the present application.
  • connection situations in this application may include but not limited to the following examples.
  • FIG. 4a is a schematic structural diagram of another multipath bus subsystem provided in the present application example.
  • the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 , a first switch 12 , a second switch 21 , a third switch 31 and a third switch 32 .
  • first switch 11 and the first switch 12 are adjacent to the master node 100a
  • third switch 31 and the third switch 32 are adjacent to the slave node 200a
  • the first switch 11 and the third switch 32 are connected through the second switch 21
  • the first The switch 11 is also adjacent to the third switch 31
  • the third switch 31 is also adjacent to the master node 100a.
  • the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending enumeration reports from its own port through the enumeration software, and the subsequent master node 100a can Any one of the routing paths accesses the slave node 200a and uses computing resources therein or manages its functional configuration, and so on.
  • the multiple routing paths may include: (1) master node 100a ⁇ first switch 11 ⁇ second switch 21 ⁇ third switch 32 ⁇ slave node 200a; (2) master node 100a ⁇ first switch 11 ⁇ the third switch 31 ⁇ slave node 200a; (3) master node 100a ⁇ the third switch 31 ⁇ slave node 200a, which can be the above at this moment; (4) master node 100a ⁇ the first switch 12 ⁇ the third switch 32 ⁇ Slave node 200a.
  • FIG. 4b is a schematic structural diagram of another multipath bus subsystem provided in this application example.
  • the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 , a first switch 12 , a third switch 31 and a third switch 32 .
  • the first switch 11 and the first switch 12 are adjacent to the master node 100a
  • the third switch 31 and the third switch 32 are adjacent to the slave node 200a
  • the first switch 11 is also adjacent to the first switch 12 and the third switch respectively
  • the third switch 31 and the third switch 32 are adjacent to the slave node 200a.
  • the three switches 32 are also adjacent to the first switch 12 and the third switch 31 respectively.
  • the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a ⁇ first exchange 11 ⁇ third exchange 31 ⁇ slave node 200a; (2) master node 100a ⁇ first exchange 11 ⁇ third exchange 31 ⁇ third exchange 32 ⁇ slave node 200a; (3) master node 100a ⁇ first Switch 11 ⁇ the first switch 12 ⁇ the third switch 32 ⁇ slave node 200a; (4) master node 100a ⁇ the first switch 11 ⁇ the first switch 12 ⁇ the third switch 32 ⁇ the third switch 31 ⁇ slave node 200a; (5 ) master node 100a ⁇ first switch 12 ⁇ third switch 32 ⁇ slave node 200a; (6) master node 100a ⁇ first switch 12 ⁇ third switch 32 ⁇ third switch 31 ⁇ slave node 200a; (7) master node 100a ⁇ first switch 12 ⁇ first switch 11 ⁇ third switch 31 ⁇ slave node 200a; (8) master node 100a ⁇
  • FIG. 4c is a schematic structural diagram of another multi-path bus subsystem provided in this application example.
  • the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 and a first switch 12 .
  • the first switch 11 and the first switch 12 are adjacent to the master node 100 a and the slave node 200 a respectively, and the first switch 11 is also adjacent to the first switch 12 .
  • the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a ⁇ first switch 11 ⁇ slave node 200a; (2) master node 100a ⁇ first switch 11 ⁇ first switch 12 ⁇ slave node 200a; (3) master node 100a ⁇ first switch 12 ⁇ slave node 200a; (4) Master node 100a ⁇ first switch 12 ⁇ first switch 11 ⁇ slave node 200a.
  • FIG. 4d is a schematic structural diagram of another multipath bus subsystem provided in the present application example.
  • the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, and a first switch 11 .
  • the first switch 11 is respectively adjacent to the master node 100a and the slave node 200a, and the slave node 200a is also directly adjacent to the master node 100a.
  • the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a ⁇ first switch 11 ⁇ slave node 200a; (2) master node 100a ⁇ slave node 200a.
  • the "first”, “second” and “third” in the description of the switch in the embodiment of the present application do not refer to a certain switch, but are used to describe the difference between the switch and the master node and the slave node. Connection status.
  • the third switch 31 is respectively adjacent to the master node 100a and the slave node 200a, based on the foregoing discussion, it can also be referred to as the first switch; for another example, as shown in Figure 4c, the first switch 11.
  • the first switch 12 is adjacent to the master node 100a and the slave node 200a respectively, so based on the foregoing discussion, it can also be called the third switch, and so on.
  • the ports of the switch are many, based on the graph structure, its adjacency relationship can be relatively arbitrary and complicated.
  • other ports of the second switch 21 in FIG. The slave node 200a and the like are adjacent, etc., which are not specifically limited in this embodiment of the present application.
  • the master node 100a can query the visible bit (visited bit) in the routing status register of the slave node 200a based on the enumeration message sent to the slave node 200a, if the visible bit is 0, the master node 100a A corresponding device number can be assigned to the slave node 200a; wherein, the visible bit is 0, which can be used to indicate that the slave node 200a is not currently enumerated and found.
  • the slave node 200a can save the device number assigned by the master node 100a into the routing status register, and set the visible bit to 1; wherein, the visible bit being 1 can be used to indicate that the slave node 200a has been Enumeration found.
  • the master node 100a when it enumerates to the slave node 200a via a certain routing path for the first time, it can set its visible bit that was originally 0 to 1, and assign a device number to it, so that the master node 100a is in After sending an enumeration report message to the slave node 200a via other routing paths, it can be determined that the slave node 200a has been enumerated and discovered according to its visible bit being 1, so that there is no need to assign a device number to it again, only need to record This time, the new routing path through which the report message is sent is sufficient. In this way, the enumeration under the multi-path can be realized in a form of recording the historical discovery state, avoiding repeated enumeration and infinite loop under the multi-path, so as to complete the enumeration efficiently and accurately.
  • each slave node in the embodiment of the present application has its configuration space like a conventional PCIe device, allowing software to access its configuration space to obtain device-related declaration information and manage the device, and so on.
  • FIG. 5 is a schematic structural diagram of another bus system provided by an embodiment of the present application.
  • the bus system includes a master node 100a, a switch A, a switch B, a device C and a device D, wherein the device C and the device D can be the above-mentioned slave nodes 200a, 200b or 200c, etc., specifically, it can be GPU, TPU, SSD, accelerator, etc.
  • the master node 100a includes two ports, port A1 is connected to switch A, port A2 is connected to switch B, switch A is connected to switch B and device C respectively, and switch B is also connected to device C and device D respectively .
  • 5 may include two multipath bus subsystems, which are respectively the multipath bus subsystem including master node 100a, switch A, switch B and device C, and the multipath bus subsystem including master node 100a, switch A , the multipath bus subsystem of switch B and device D, which will not be described in detail here.
  • the enumeration discovery process may specifically include the following steps:
  • Step1 the enumeration software running on the master node 100a can start from the port A1 of the master node 100a, and search according to the breadth-first search algorithm of the graph.
  • the enumeration software can obtain whether the physical link corresponding to the current port A1 has been established and whether data transmission can be performed. If the enumeration software determines that the physical link corresponding to port A1 has been established and data transmission can be performed, an enumeration report (or called enumeration access) can be sent from port A1 to the physical link directly connected to port A1.
  • the device is switch A in Figure 5.
  • the basic input/output system (basic input/output system, BIOS) can report the local bus interface situation to the enumeration software in the master node 100a through a specific interface based on the agreed description, For example, how many local bus ports are there, corresponding access addresses and other information, such as how many ports are there in switch A, switch B ports, device C, and device D in FIG. 5 .
  • Switch A receives the enumeration report message sent by the master node 100a, and returns a response to the enumeration report message. Based on the response, the enumeration software can determine that switch A is a legally existing device and is a switch device.
  • Step3 the enumeration software performs the management host signature (ie, the management master node signature) on switch A according to the management host signature process (ie, the management master node signature process).
  • the signature process may include setting the signature bit in the management master node information register (ie, the management host information register) in the configuration space A of the switch A to 1, and setting the master node 100a
  • the master node number that is, the management host number (component identity document, CID) in Figure 5
  • the master node password at the time of signing that is, the host password (host key) are written into the management master node information register, thereby completing the master node
  • the node 100a obtains the management authority of the switch A, that is, determines that the master node 100a becomes the management master node of the switch A.
  • step3 reference may be made to the description in the following embodiment corresponding to FIG. 6 , and details are not repeated here.
  • Step4 the enumeration software continues to send an enumeration report message through port A1 to read the routing status register in the configuration space A of switch A, and check whether the visible bit of the routing status register of switch A is 0.
  • Step5 the switch A returns that the visible bit of its routing status register is 0, and then the enumeration software determines that the switch A has not been found by enumeration.
  • the enumeration software can assign the device number to switch A as cid1, and switch A can directly write the device number (i.e. cid1) into the corresponding CID value bit field of the routing status register, and at the same time, Visible bits that were originally 0 may also be set to 1.
  • Step6 since the enumeration software can know that switch A is a switch device, the enumeration software can also read the relevant port status register of switch A, so as to know which ports in switch A have been physically linked.
  • Step7 the enumeration software further sends an enumeration report message to device C from a physical link-building port of switch A, and confirms that device C is a legally existing device through the response returned by device C.
  • Step8 enumerate the path of the software through port A1 ⁇ switch A ⁇ device C, refer to the process of step 3 ⁇ step 5, sign the management master node of device C, and assign a device number that is not assigned to other devices to device C ( In Figure 5, cid2 is taken as an example).
  • Step9 the enumeration software passes through the path of port A1 ⁇ switch A ⁇ switch B, refer to the process of step 7 ⁇ step 8, complete the enumeration and discovery of switch B, sign the management master node of switch B, and assign corresponding device number (cid3 is taken as an example in Figure 5).
  • Step10 for switch B, which is also a switch device, the enumeration software refers to the process of step 6 ⁇ step 8, and accesses device C through the path of port A1 ⁇ switch A ⁇ switch B ⁇ device C.
  • the enumeration software discovers device C The visible bit of the routing status register has been set (that is, the visible bit has been set to 1), and it is found that the signature register of the management master node of device C has completed the signature (that is, the signature bit has been set to 1), so the enumeration The software no longer assigns device numbers to device C, but only records the new routing path between the master node 100a and device C this time.
  • Step11 referring to the process of step 7 ⁇ step 8, the enumeration software completes the enumeration and discovery of device D through the path of port A1 ⁇ switch A ⁇ switch B ⁇ device D, signs the management master node of device D, and registers for the device D assigns the corresponding device number (cid4 is taken as an example in Figure 5).
  • step12 the enumeration software starts from port A2 of the master node 100a, through port A2 ⁇ switch B, port A2 ⁇ switch B ⁇ device C, port A2 ⁇ switch B ⁇ device D, port A2 ⁇ switch B ⁇ switch A, port A2 ⁇ Switch B ⁇ Switch A ⁇ Device C enumerates and discovers the paths to Switch B, Device C, Device D, Switch A, and Device C respectively.
  • the enumeration software has already completed the enumeration and discovery of these devices through port A1.
  • the equipment number is assigned once, and only the new routing path between the master node 100a and the switch A, switch B, device C, and device D is recorded.
  • step13 the master node 100a has completed the discovery and enumeration of all devices and topologies in the bus system shown in FIG. 5 through corresponding enumeration software.
  • the master node when it enumerates and discovers a certain device (which may include a switch and a slave node), it can sign the management host to obtain the management authority of the device.
  • the management authority includes but is not limited to arbitrating resource competition among devices, handling device exceptions, managing basic characteristics of devices (such as the maximum supported packet length), and managing device functions , for example, whether it is allowed to use a certain function (such as a function related to a physical link), and so on, which is not specifically limited in this embodiment of the present application.
  • FIG. 6 is a schematic diagram of a management host signature flow provided by an embodiment of the present application.
  • the management host signing process may include the following steps.
  • the master node sends a configuration message to the device, trying to obtain the management authority of the device.
  • the master node is, for example, the above-mentioned master node 100a
  • the device is, for example, the above-mentioned slave node 200a, or the switch A, device C, etc. in FIG. 5 .
  • the configuration message (for example, the first configuration message) is intended to be used to set the signature bit in the management master node information register of the configuration space to 1, and the configuration message can also carry the master node of the master node number and masternode password.
  • the signature bit is 1 to indicate that the device has completed the signature at present, that is, it already has a management master node, and other master nodes can no longer obtain the management authority of the device. In this way, it can ensure that the device has a unique Manage master nodes to ensure clear manageability of the entire topology. It should be noted that one master node can obtain management rights to multiple devices. In addition, when the master node obtains management rights to different devices, the master node passwords carried in the configuration messages sent can be different.
  • the device receives the configuration message sent by the master node, and determines that the configuration message is used to write the signature bit in its management master node information register to be 1.
  • the device inquires whether the signature bit in the management master node information register is 0, if yes, execute S14, otherwise execute S15.
  • the signature bit is 0 to indicate that the device has not completed the signature currently, that is, it does not have a management master node, and the master nodes in the system have the opportunity to obtain the management authority of the device.
  • the device sets the signature bit in the management master node information register to 1, and saves the master node password and master node number carried in the configuration message to the management master node information register. So far, the master node has completed the authentication of the device. Manage the signature of the master node and obtain the management authority of the device. Optionally, the device can guarantee that the masternode password saved in this signature will not be read and modified by other non-verified masternodes. Optionally, all subsequent operations of the device by the management master node of the device need to first verify whether the master node password and master node number carried by the operation are consistent with those saved by the device. The management master node of the device may not respond, thus ensuring the safety of device management.
  • the device verifies the master node sending the configuration message, and judges whether it is the management master node of the device, if yes, executes S16, otherwise executes S17. Among them, if the ID information of the requester (or source) corresponding to the current access (that is, the master node number carried by the configuration message) is consistent with the master node number saved in the management master node information register of the device, and the configuration report The master node password carried in the file is consistent with the master node password saved in the management master node information register. Then it can be determined that the master node sending the configuration message this time is the management master node of the device.
  • the master node sending the configuration message this time is not the management master node of the device.
  • the device does not modify the signature bit in the information register of the management master node, that is, keeps the signature bit as 1, and returns a configuration success response.
  • the device does not modify the signature bit in the management master node information register, that is, keeps the signature bit as 1, and returns a configuration failure response.
  • the device may also directly discard the configuration message without responding.
  • the embodiment of this application defines the mechanism that whoever signs successfully first will become the management master node, and the system may include multiple master nodes (such as master node 100a, master node 100b, and master node 100c shown in Figure 2)
  • master nodes such as master node 100a, master node 100b, and master node 100c shown in Figure 2
  • the management of the device is clearer.
  • the device can ensure that the master node password configured when the management master node signs is not read and modified by other non-management master nodes, and the device that completes the management master node signature will manage every subsequent management operation of the management master node
  • the master node verification ensures the security of device management to a great extent.
  • the master node after the master node obtains the management right to the device, it can actively cancel the management right to the device.
  • the master node may receive requests from other master nodes to obtain management rights to the device, or when the master node detects that it has an abnormal fault, in order to ensure that the device can have a new normal work in the future
  • the management master node can actively cancel the management authority of the device.
  • FIG. 7 is a schematic flowchart of a host canceling management rights provided by an embodiment of the present application. As shown in FIG. 7 , the process of canceling the management authority may include the following steps.
  • the master node sends a configuration message to the device, trying to cancel the management authority of the device.
  • the master node is, for example, the above-mentioned master node 100a
  • the device is, for example, the above-mentioned slave node 200a, or the switch A, device C, etc. in FIG. 5 .
  • the configuration message (for example, the second configuration message) is intended to be used to set the signature bit in the management master node information register of the configuration space to 0, and the configuration message can also carry the master node of the master node number and masternode password.
  • the device receives the configuration message sent by the master node, and determines that the configuration message is used to write the signature bit in its management master node information register to 0.
  • the device verifies the master node sending the configuration message, and judges whether it is the management master node of the device, if yes, executes S24, otherwise executes S25.
  • the device sets the signature bit in the information register of the management master node to 0, and returns a response that the configuration is successful. It should be understood that if the master node is the management master node of the device, it obviously indicates that the device already has a management master node, and the signature bit in the management master node information register of the device is generally 1.
  • the device does not modify the signature bits in the information register of the management master node, and returns a configuration failure response.
  • the device may also directly discard the configuration message without responding. It should be understood that if the master node is not the master management node of the device, at this time, the signature bit in the master management node information register of the device may be 1 or 0.
  • the embodiment of this application also defines the presence of the master management node. bit confirmation mechanism.
  • FIG. 8a is a schematic diagram of a management host presence confirmation process provided by an embodiment of the present application.
  • the presence confirmation process of the management host can be applied to the above-mentioned system structure shown in Fig. 2-Fig. 5, the master node involved in Fig. It is the above-mentioned slave node 200a (that is, the first slave node), or the switch A, the switch B, the device C, or the device D in FIG. 5 .
  • a timer 1 can be configured inside the device.
  • the timer 1 can start counting .
  • the device can send a query message to its management master node to try to obtain the presence status of its management master node, namely The device may actively send a query message to its management master node according to a preset time interval (or a preset frequency), and correspondingly, its management master node receives the query message.
  • the timer 2 inside the device can start counting. If the value of the timer 2 is equal to Y (for example, 300ms, 1s, 5s or 7s, etc.), the device still does not receive the query message. It manages the in-position message sent by the master node, and a timeout can be counted through the counter in the device. In this way, as shown in Figure 8a, when the number of timeouts reaches K times (K is an integer greater than or equal to 1, such as 3 times, 5 times or 7 times, etc.), that is, the device has sent K times to its management master node.
  • K is an integer greater than or equal to 1, such as 3 times, 5 times or 7 times, etc.
  • the device can consider its management master node to be abnormal and not in place, and the device can reset the signature bit in its management master node information register Set it to 0. Further, it can also clear the master node number and master node password saved in its management master node register, thereby canceling the management authority of its original management master node. The state of the master node.
  • the device can clear the timer 2 and the current value of the counter , that is, the timer 2 and the counter are cleared (or reset).
  • FIG. 8b is a schematic diagram of another management host presence confirmation process provided by an embodiment of the present application.
  • the presence confirmation process of the management host can be applied to the above-mentioned system structure shown in Fig. 2-Fig. 5, the master node involved in Fig. It is the above-mentioned slave node 200a (that is, the first slave node), or the switch A, the switch B, the device C, or the device D in FIG. 5 .
  • a timer 3 may be configured inside the master node, and when the master node obtains the management authority of the device, the timer 3 may start timing.
  • W such as 200ms, 3s or 10s, etc.
  • the master node can send an in-position message to the device it manages, that is, the management master node can follow the preset time interval ( For example, the first time interval) actively sends presence messages to the devices it manages.
  • the device can receive the in-position message sent by the master node, and judge whether the message comes from its master management node. If so, the device can determine that its master management node is in place normally. If not, it can discard the message. No response is required.
  • a corresponding timer can also be configured in the device, and the timer can also start counting when the master node obtains the management authority of the device.
  • the timer in the device is equal to a preset time (for example, W)
  • the device can consider that its management master node is abnormal and not in position, and the device can reset the signature bit in its management master node information register If it is set to 0, furthermore, the master node number and master node password stored in its management master node register can be cleared, thereby canceling the management authority of its original management master node.
  • the device receives the presence message sent by its master management node during this period, it can be determined that the master management node is normally in position, and at the same time, the device can clear the timer maintained locally.
  • the device can also be configured with corresponding timers and counters.
  • the timer can start counting when the master node obtains the management authority of the device.
  • the timer in the device is equal to the preset time (for example, W), and the device still has not received the in-position message actively sent by its management master node, it can count a timeout through the counter in the device.
  • the device can consider that its management master node is abnormal and has not In position, and the device can reset the signature bit in its management master node information register to 0, further, it can also clear the master node number and master node password saved in its management master node register, thereby canceling its original management master node Administrative permissions for the node.
  • the device successfully receives the presence message actively sent by its management master node during this period, it can be determined that the management master node is normally in place, and at the same time, the device can also reset the locally maintained timer and counter.
  • the values of X, Y, K, and W mentioned above can all be configured by the management master node, and more appropriate values can be selected according to actual needs.
  • the device has a normally on-site management master node, and the device can also send a broadcast message to at least one master node in the system (such as the master node 100a, the master node 100b, and the master node 100c in the bus system 10 shown in FIG. 2 ).
  • the broadcast message may be used to indicate that the device currently does not have a master management node.
  • the at least one master node may also include its original management master node.
  • the at least one master node receives the broadcast message, and can sign the management master node of the device based on the broadcast message, and try to obtain the management authority of the device, so as to become a new management master node of the device.
  • the at least one master node that completes the signature first can obtain the management authority of the device.
  • the specific signature process can refer to the description of the corresponding embodiment in FIG. 6 above, and will not be repeated here. .
  • FIG. 9 is a schematic structural diagram of another multipath bus subsystem provided by an embodiment of the present application.
  • the multipath bus subsystem 10a may also include a master node 100b (ie, a second master node), and N4 fourth switches adjacent to the master node 100b (for example, the fourth switch 41 in FIG. 9 , the fourth switch 42, etc.).
  • any fourth switch may be adjacent to any third switch, or connected through one or more second switches; wherein, N4 is a positive integer less than or equal to N.
  • the fourth switch 41 may be adjacent to the third switch 31; for another example, the fourth switch 41 may be connected to the third switch 31 through the second switch 22, and so on, which is not specifically limited in this embodiment of the present application.
  • the master node 100b may also send multiple report messages to the slave node 200a, and determine the relationship between the master node 100b and the slave node 200a based on each switch passed by the master node 100b to the slave node 200a. multiple routing paths between them.
  • each routing path may pass through at least one or more of the N4 fourth switches, S of the N2 second switches, and one or more of the N3 third switches in sequence.
  • the routing path between the master node 100b and the slave node 200a may only pass through the fourth switch and the third switch, and in some possible embodiments, may also only pass through one or more of the N4 fourth switches, Or only through one or more of the N3 third switches, etc., which is not specifically limited in this embodiment of the present application.
  • the master node 100a and the slave node 200a may belong to a first node domain, and the master node 100b may belong to a second node domain.
  • the second node domain further includes one or more second slave nodes (such as the slave node 200b in FIG. 2 , etc.).
  • the first node domain may also include multiple switches connected to the master node 100a (such as the N1 first switches, etc.), and the second node domain may also include multiple switches connected to the master node 100b. Switches (for example, the N4 fourth switches, etc.), are not specifically limited in this embodiment of the present application.
  • node domains may belong to different sub-networks (sub-network), or the system management software of different node domains may be different (for example, the enumeration software mentioned above is different, the operating system (operating system, OS) is different, etc.).
  • the master node 100b assigns device numbers to the devices in the second node domain to which it belongs (for example, it may include slave nodes and switches), that is, the master node 100b completes the enumeration in the second node domain where it is located
  • an enumeration report message is sent to devices in other node domains (for example, the slave node 200a), thereby completing the enumeration discovery of multi-master nodes across node domains.
  • the enumeration and discovery process of the master node starts from the node domain to which it currently belongs. Before the enumeration of devices in the domain is discovered, the current node domain may not perform data flow connection with any device in other node domains. In this way, device enumeration and discovery of multiple master nodes under cross-node domains (that is, cross-network or different system management software) can be supported, and it is guaranteed that different enumeration software will not assign multiple device numbers to the same device. Or a conflict scenario where the same device number is assigned to different devices.
  • the aforementioned device cancels the management authority of the original management master node (such as the master node 100a), so as to at least one master node in the system
  • FIG. 10a-FIG. 10c are schematic structural diagrams of a group of multi-path multi-master bus systems provided by an embodiment of the present application.
  • the bus system may include multiple hosts (such as master node 100a, master node 100b), multiple switches (such as switch 1, switch 2, switch 3, etc. in Figure 10a) and multiple slave nodes ( For example, accelerators, SSDs, smart network cards, GPUs and TPUs in Figure 10a, etc.).
  • the master node 100a may include a port A1 and a port A2
  • the master node 100b may include a port B1 and a port B2.
  • the embodiment of the present application implements a flattened graph structure, and multiple switches can be arranged in a matrix and connected vertically and horizontally through the bus.
  • this structure can support multi-path and multi-master access to any slave node.
  • the content involved in the access can include, for example, initial enumeration discovery, subsequent resource usage, and function management.
  • the master node 100a and the switch 1, switch 2, switch 3, and switch 4 connected to the master node 100a, and the XPU, accelerator, SSD, and smart network card connected to the switch 1, switch 3, and switch 4 respectively may belong to the first A node domain; master node 100b and the switch 5, switch 6, switch 7, and switch 8 connected under the master node 100b, and the TPU, SSD, GPU, and accelerator connected to the switch 6, switch 7, and switch 8 respectively can belong to the second node domain.
  • the enumeration process of multiple master nodes does not interfere with each other, and ensures that the master node does not communicate with any device in other node domains before completing the enumeration discovery in the domain of the node.
  • Data stream connection that is, not to enumerate and discover devices in other node domains, and at the same time ensure that devices in this node domain will not be discovered by master nodes in other node domains, as shown in Figure 10a
  • the embodiment of this application also provides a device with A switch with a special port is used for adjacency with switches that also have special ports in other node domains. For example, for switch 2, switch 4, switch 5, and switch 7 in FIG.
  • the special ports are, for example, ports marked in black in switch 2, switch 4, switch 5, and switch 7.
  • the system enumeration software scans this type of port, it will no longer continue to discover enumeration based on this port .
  • the master node 100a enumerates from port A2 to the special port marked in black in switch 2, it no longer enumerates and discovers the device (ie switch 5) connected under the port, but continues to search through other ports.
  • Devices in the first node domain perform enumeration discovery; for another example, when the master node 100b enumerates from the port B1 to the special port marked in black in the switch 5, the device connected under the port (that is, the switch 2) is no longer connected. enumeration discovery, but continue to perform enumeration discovery on devices in the second node domain through other ports.
  • the special ports of switch 2 and switch 5 If it can be "opened", the master node 100a and the master node 100b can perform cross-node domain enumeration and discovery through this special port.
  • FIG. 11a-FIG. 11b are schematic diagrams of an enumeration process under a group of multi-path multi-hosts provided by the embodiment of the present application.
  • the master node 100a and the master node 100b have different operating systems, namely OS 1 and OS 2, respectively.
  • the master node 100a includes CPU-A0 and CPU-A1, and the corresponding port A1, port A2, port A3 and port A4;
  • the master node 100b includes CPU-B0 and CPU-B1, and the corresponding port B1, port B2, port B3 and port B4.
  • both the master node 100a and the master node 100b have performed part of the enumeration and found that there are some devices (such as GPUs and SSDs in the first node domain, and accelerators in the second node domain) that have not been allocated.
  • device ID the first node domain where the master node 100a is located includes a switch E with a device number of cid4, and its port A is a special port across node domains.
  • the enumeration software (such as OS1) of the master node 100a finds the switch E through enumeration After the port A of the port A, the topology behind this port A is no longer discovered and enumerated because the enumeration of other ports that have been physically linked and that are not cross-node domains has not been enumerated.
  • the second node domain where the master node 100b is located includes a switch F with a device number of cid2, and its port B is a special port across the node domain.
  • the enumeration software (such as OS2) of the master node 100b finds the switch through enumeration After port B of F, because the enumeration of other ports with good physical links and non-cross-node domains has not been enumerated, the topology behind this port B is no longer discovered and enumerated.
  • the master node 100 a and the master node 100 b have respectively completed enumeration and discovery of the bus topology and all devices in their respective node domains, and all devices have been assigned device numbers. Therefore, the system software of the master node 100a and the master node 100b can respectively turn on the data flow switch of the special port A of the switch E and the special port B of the switch F, so as to enumerate and discover devices in other node domains outside the domain of this node.
  • the devices in each node domain have completed the signature of the management master node, and it can be known that the master node in the node domain has obtained the management master node authority of the devices in the node domain, for example, the master Node 100a is a master node for managing TPUs, GPUs and SSDs in the first node domain; master node 100b is a master node for managing GPUs and accelerators in the second node domain. Therefore, the master node 100a only has the right to use the data plane for the devices in the domain of the second node, and has no right to manage master nodes on the management plane. Correspondingly, the master node 100b only has the right to use the data plane for the devices in the domain of the first node , does not have the permission to manage the master node on the management plane.
  • the master node 100a can also directly discover the enumeration of the devices in the second node domain through The system software (such as OS2) in the domain of the second node reports to the system software (such as OS1) on the master node 100a based on the agreed description structure and interface, so that the system software on the master node 100a can directly obtain the information in the node domain where the master node 100b is located. topology information of all devices.
  • the master node 100b discovers and enumerates the devices under the first node domain where the master node 100a is located in the same way, and details are not repeated here.
  • each master node can independently perform enumeration at this port to stop discovering and enumerating the device/topology behind this port, and no longer continue to enumerate from this port. Do a breadth/depth first search to find the enumeration. Only after each master node completes the enumeration discovery in its respective node domain in its system software, does it open the data path switch of this special port in its respective domain, and enumerate across master nodes and across node domains.
  • the bus system may also include a master node 100a, a master node 100b, a plurality of switches, and a plurality of slave nodes. to repeat.
  • the switch 2 under the master node 100a and the switch 5 under the master node 100b both include a port capable of accessing the Internet, or a port with a network card function (such as the gray mark in Figure 10b ports), so that the master node 100a and the master node 100b can be connected to the network through the switch 2 and the switch 5, that is, the master node 100b can be a remote device of the master node 100a.
  • a port capable of accessing the Internet or a port with a network card function (such as the gray mark in Figure 10b ports)
  • a device can not only be accessed and used by the master node connected by wire at the near end, but also can be accessed and used by other master nodes remotely connected wirelessly, so that the host can call remote computing resources and store data , to a greater extent meet the host's demand for huge computing resources when performing complex computing processing, and so on.
  • the bus system may include a master node 100a, a master node 100b, a master node 100c, a plurality of switches and a plurality of slave nodes, etc., wherein the bus system may include switches with special ports or Includes switches for network connections.
  • the bus system can be a typical graph-based bus system provided by the embodiment of the present application, which can achieve greater scalability, larger capacity, and larger design space for bus interconnection to a great extent.
  • the special ports shown in FIGS. 10a-10c and ports with network card functions can be directly indicated by the relevant function register group in the configuration space of the switch device (that is, the switch), so as to inform the system enumeration software.
  • FIG. 10a-FIG. 10c are only exemplary illustrations, and do not specifically limit the bus system of the multi-path multi-master node in the embodiment of the present application.
  • the embodiment of the present application provides a bus system based on the concept of a graph structure, which breaks the limitation of the top-down tree structure of the existing PCIe bus interconnection topology, and realizes the interconnection topology of a flattened graph structure .
  • a bus system based on the concept of a graph structure, which breaks the limitation of the top-down tree structure of the existing PCIe bus interconnection topology, and realizes the interconnection topology of a flattened graph structure .
  • additional special bus interconnection devices/devices such as the CXL switch and type 3 device in Figure 1 above
  • the complicated bus topology discovery and enumeration process so that the interconnection bus protocol can be natively and at low cost.
  • the embodiment of the present application further provides a series of management host signature process, management host presence confirmation process, multi-host enumeration process under cross-node domain, etc., so that in the case of realizing multi-host and multi-path, The clear manageability, safety, reliability, etc. of the entire multi-host and multi-path bus system (or the entire bus topology with a graph structure) are further guaranteed.
  • the embodiment of the present application aims to break the tree structure with strict top-down hierarchical relationship in the prior art based on the concept of graph structure, so as to realize the above-mentioned multi-host and multipathing, and the mutual relationship between each node
  • the specific connection between them is not specifically limited.
  • not all nodes need to be adjacent. switch, and directly adjacent to multiple hosts, so that it is more convenient to realize multi-host and multipathing, and so on.
  • the technical solution provided by the embodiment of the present application can also be applied to ultra-large-scale network interconnection, such as the Internet, and the interconnection between large-scale servers/computing nodes in data centers device management, etc., which are not specifically limited in this embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a communication method provided by an embodiment of the present application.
  • the communication method can be applied to a bus system (such as the bus system shown in Fig. 2, Fig. 5 or Fig. 10a-Fig. 10c for example), and the bus system can include multiple master nodes, multiple switches and multiple slave nodes; the multiple A plurality of master nodes, a plurality of switches and a plurality of slave nodes can form a graph structure through the bus; wherein, any multipath bus subsystem in the graph structure (such as the multipath shown in Fig. 3, Fig. 4a-Fig. 4d etc.
  • the bus subsystem 10a may include a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and the first slave node N3 adjacent third switches; any one of the first switches is adjacent to any one of the third switches, or connected through one or more second switches; N1, N2, and N3 are all positive integers less than or equal to N.
  • the communication method may include the following step S401.
  • Step S401 through the first host, send enumeration report messages to the first node device multiple times
  • Step S402 based on at least one switch, determine a plurality of routing paths between the first host and the first node device; at least one switch is used for each enumeration message from the first host to the first node device switches; wherein each routing path passes through at least one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches in sequence S is a natural number less than or equal to N3.
  • the multipath bus subsystem further includes a second master node; the N switches further include N4 fourth switches adjacent to the second master node; wherein, any fourth The switch is adjacent to any third switch, or connected through one or more second switches; N4 is a positive integer less than or equal to N; the method also includes: through the second master node, to the first slave The node sends enumeration report messages multiple times, and based on at least one switch, determines multiple routing paths between the second master node and the first slave node; A switch through which the second master node goes to the first slave node; wherein, each routing path in the plurality of routing paths passes through at least one or more of the N4 fourth switches in sequence, the S of the N2 second switches and one or more of the N3 third switches.
  • the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and multiple first slave nodes, so The second node domain includes the second master node and multiple second slave nodes; the second master node sends an enumeration message to the first slave node multiple times, and based on at least one switch , determining multiple routing paths between the second master node and the first slave node, including: all first slave nodes in the domain of the first node are assigned device numbers by the first master node, In addition, after all the second slave nodes in the second node domain are assigned device numbers by the second master node, the enumeration report message is sent to the first slave node multiple times through the second master node, And based on at least one switch, determine multiple routing paths between the second master node and the first slave node.
  • the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; the first cross-node domain switch The first port of the switch is connected to the second port of the second cross-node domain switch; if some or all of the first slave nodes in the first node domain have not been assigned device numbers by the first master node, and/ Or, some or all of the second slave nodes in the second node domain have not been assigned device numbers by the second master node, then the data link between the first port and the second port is in a closed state; if All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node, Then the data link between the first port and the second port is in an open state, so that the second master node sends more data to the first slave node through the second port and the first port An
  • the second master node is a remote master node that is network-connected to the first master node through a switch; the method further includes: using the second master node to access The first slave node in the first node domain is used to invoke computing resources in the first slave node or read stored data in the first slave node.
  • each method procedure in the communication method described in the embodiments of the present application may specifically be implemented in a software-based, hardware-based, or a combination thereof.
  • the way of implementing by hardware may include logic circuit, arithmetic circuit or analog circuit and so on.
  • a software implementation may include program instructions, which may be regarded as a software product, which is stored in a memory and can be executed by a processor to implement related functions.
  • An embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium can store a program, and when the program is executed by a processor, the processor can execute any of the methods described in the above-mentioned method embodiments. Some or all of the steps of one.
  • the embodiment of the present application also provides a computer program, the computer program includes instructions, when the computer program is executed by a multi-core processor, the processor can perform some or all of the steps described in any one of the above method embodiments .
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division.
  • there may be other division methods for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute all or part of the steps of the above-mentioned methods in various embodiments of the present application.
  • the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disc, read-only memory (read-only memory, ROM), double data rate synchronous dynamic random access memory (double data rate, DDR), flash memory ( flash) or random access memory (random access memory, RAM) and other media that can store program code.

Abstract

Disclosed in embodiments of the present application are a bus system, a communication method, and a related device. The system is a graph structure composed of multiple master nodes, multiple switches, and multiple slave nodes; any multipath bus subsystem in the graph structure comprises a first master node, a first slave node, and N switches; the N switches comprise N1 first switches adjacent to the first master node, N2 second switches, and N3 third switches adjacent to the first slave node; any first switch is adjacent to any third switch or connected to any third switch by means of the second switch; the first master node is used for sending an enum message to the first slave node multiple times and determining multiple routing paths between the first master node and the first slave node; and each routing path at least passes through one or more of the N1 first switches, S second switches of N2 second switches, and one or more of the N3 third switches in sequence. By using the embodiments of the present application, multipath access to a device on a bus can be supported.

Description

一种总线系统、通信方法及相关设备A bus system, communication method and related equipment
本申请要求于2021年9月14日提交中国专利局、申请号为202111083972.8、申请名称为“一种总线系统、通信方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the China Patent Office on September 14, 2021, with application number 202111083972.8, and application title "A bus system, communication method, and related equipment", the entire contents of which are hereby incorporated by reference In this application.
技术领域technical field
本申请涉及计算机技术领域,尤其涉及一种总线系统、通信方法及相关设备。The present application relates to the field of computer technology, in particular to a bus system, a communication method and related equipment.
背景技术Background technique
随着信息技术的快速发展,人工智能(artificial intelligence,AI)、自动驾驶计算、云计算等数据以及计算密集型的应用场景越来越普及,整个计算体系将会越来越复杂,各种专有的计算设备(例如,图形处理器(graphic processing unit,GPU)和张量处理单元(tensor processing unit,TPU)等)将会被广泛集成应用。With the rapid development of information technology, artificial intelligence (AI), autonomous driving computing, cloud computing and other data and computing-intensive application scenarios are becoming more and more popular, and the entire computing system will become more and more complex. Some computing devices (for example, graphics processing unit (graphic processing unit, GPU) and tensor processing unit (tensor processing unit, TPU), etc.) will be widely integrated and applied.
如此,势必会对互联总线的要求越来越高,高带宽、低延时、低能耗、易实现的互联总线显得愈发重要。未来的总线互联将会是无序的,点对点的图结构,总线上的某个节点可能是多条路径可达(multipathing),总线上的某个节点可以作为中央处理器(center processer unit,CPU)节点的某个设备来管理,总线上可能拥有多个主机(multi-host)。当前业界的互联也紧跟着这个方向,例如业界的计算扩展链接(compute express link,CXL)2.0就通过定义特殊的互联器件(例如CXL交换机(switch))和互联管理机制,支持了CXL的multi-host功能,并且未来CXL的演进将会更加强调点对点的计算,以及多路径的节点互联。因此,从业界趋势来看,总线的设备管理支持多主机多路径是一个必然趋势。In this way, the requirements for interconnection buses are bound to become higher and higher, and interconnection buses with high bandwidth, low latency, low energy consumption, and easy implementation are becoming more and more important. The bus interconnection in the future will be disordered, point-to-point graph structure, a node on the bus may be multipathing, and a node on the bus can be used as a central processor unit (CPU). ) Node to manage a certain device, there may be multiple hosts (multi-host) on the bus. The current interconnection in the industry is also following this direction. For example, the industry's Compute Express Link (CXL) 2.0 supports CXL's multi -host function, and the future evolution of CXL will put more emphasis on point-to-point computing and multi-path node interconnection. Therefore, from the perspective of industry trends, it is an inevitable trend for bus device management to support multi-host and multi-path.
然而,外围设备互联扩展总线(peripheral component interconnect express,PCIe)作为当今业界最流行的互联总线,是一个具有严格自上而下的树形拓扑结构总线,其原生就不支持multi-host和multipathing。对于CXL2.0而言,需要特殊的CXL switch和特定的拓扑互联(例如类型(type)3的CXL设备连接到CXL switch下)才能实现multi-host。此外,对于以后的CXL演进版本(例如CXL3.0),其可能会支持multipathing,但肯定也是需要这种特殊的CXL switch和特定的拓扑互联才能支持。However, peripheral component interconnect express (PCIe), as the most popular interconnection bus in the industry today, is a strict top-down tree topology bus, which does not natively support multi-host and multipathing. For CXL2.0, a special CXL switch and a specific topology interconnection (for example, a type (type) 3 CXL device connected to a CXL switch) are required to realize multi-host. In addition, for future CXL evolution versions (such as CXL3.0), it may support multipathing, but it must also require this special CXL switch and specific topology interconnection to support it.
因此,现有技术大都是通过特殊的中间连接器件和特定的互联拓扑,达到“欺骗”多个主机,从而让不同的主机都认为其是“独占”这些节点设备,无法真正支持多主机和多路径,并且,这无疑会大大增加设备互联的延时、成本和拓扑限制。Therefore, most of the existing technologies "deceive" multiple hosts through special intermediate connection devices and specific interconnection topologies, so that different hosts think that they "exclusively" these node devices, and cannot truly support multi-hosts and multi-hosts. Path, and this will undoubtedly greatly increase the delay, cost and topology constraints of device interconnection.
发明内容Contents of the invention
本申请实施例提供一种总线系统、通信方法及相关设备,可以支持对总线上设备的多路径访问。Embodiments of the present application provide a bus system, a communication method, and related equipment, which can support multi-path access to devices on the bus.
第一方面,本申请实施例提供了一种总线系统,所述系统是由多个主节点、多个交换机和多个从节点通过总线构成的图形结构;所述图形结构中的任意一个多路径总线子系统,包括第一主节点、第一从节点和N个交换机;所述N个交换机包括与所述第一主节点邻接的 N1个第一交换机,N2个第二交换机,以及与所述第一从节点邻接的N3个第三交换机;其中,任意一个第一交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N1、N2、N3均为小于或者等于N的正整数;所述第一主节点,用于向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第一主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第一主节点到所述第一从节点经由的交换机;其中,所述多条路由路径中的每条路由路径至少依次经由所述N1个第一交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个;S为小于或者等于N3的自然数。In the first aspect, the embodiment of the present application provides a bus system, the system is a graph structure composed of a plurality of master nodes, a plurality of switches and a plurality of slave nodes through the bus; any multipath in the graph structure The bus subsystem includes a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and the N2 second switches connected to the first master node N3 third switches adjacent to the first slave node; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all less than or equal to N A positive integer; the first master node is used to send enumeration reports to the first slave node multiple times, and based on at least one switch, determine the relationship between the first master node and the first slave node A plurality of routing paths; the at least one switch is the switch through which the report message is passed from the first master node to the first slave node each time; wherein, each routing path in the plurality of routing paths is at least sequentially via one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches; S is less than or equal to N3 Natural number.
常规的PCIe总线互联是树形拓扑结构,具有严格的自上而下的层次关系,每一层上的节点只能和上一层中的一个节点相关,但可能和下一层的多个节点相关,导致总线上的主机到任意一个节点设备(例如GPU、TPU和网卡等)都只有自上而下的唯一一条路径可达。这极大程度上限制了拓扑结构的扩展性,无法满足如今在计算数据越来越庞大的情况下,越来越复杂的计算体系。在本申请实施例中,打破了现有PCIe总线自上而下的树形结构,将主机(即主节点)、节点设备(即从节点)和交换机等连接形成一个展平(flat)的图形结构,其中,图形结构中的某一节点(可以是主节点,也可以是GPU、TPU等从节点,还可以是交换机,等等)都可能与其他任意节点相关。基于此,在该图形结构中,多个交换机作为连接主节点和从节点的中间设备,可以通过其彼此之间的连接,构建一个主节点(例如第一主节点)到一个从节点(例如第一从节点)的多条物理链路。在主节点通过发送枚举报文以发现整个拓扑中所有设备的过程中,主节点每发一个枚举报文,该枚举报文都可能经由某一条物理链路枚举到一个从节点。主节点可以基于枚举报文在该物理链路中从主节点的哪个端口出发,经过哪个交换机的哪个端口等,记录主节点到该从节点的路由路径。进一步地,如上所述,在本申请通过多个交换机(例如第一交换机、第二交换机和第三交换机)构建了一个主节点到一个从节点之间多条物理链路的前提下,主机发送的多个枚举报文,可能经由多个不同的路径到达同一个从节点,如此,主节点可以通过多次发送枚举报文,从而记录主节点与该从节点之间多条不同的路由路径。因此,相较于现有技术中的树形结构,本申请实施例实现的总线互联拓扑是一个图结构,可以支持对节点设备的多路径访问(multipathing),即对于一个节点设备可以有多条路径可达,可以使得总线互联更加简单,设计空间更大,使其更具扩展性,容量更大,从而能够满足用户越来越复杂、庞大的计算需求。The conventional PCIe bus interconnection is a tree topology with a strict top-down hierarchical relationship. Nodes on each layer can only be related to one node in the upper layer, but may be related to multiple nodes in the lower layer. Related, resulting in only one top-down path from the host on the bus to any node device (such as GPU, TPU, and network card, etc.). This greatly limits the scalability of the topology, which cannot meet the increasingly complex computing system in the case of increasingly large computing data. In the embodiment of this application, the top-down tree structure of the existing PCIe bus is broken, and the host (ie, master node), node device (ie, slave node) and switch are connected to form a flat graph structure, wherein a certain node in the graph structure (may be a master node, or a slave node such as a GPU or a TPU, or a switch, etc.) may be related to any other node. Based on this, in this graph structure, a plurality of switches are used as intermediate devices connecting the master node and the slave node, and can construct a master node (for example, the first master node) to a slave node (for example, the first master node) A slave node) multiple physical links. In the process of the master node discovering all devices in the entire topology by sending an enumeration message, each time the master node sends an enumeration message, the number of report messages may be enumerated to a slave node through a certain physical link. The master node can record the routing path from the master node to the slave node based on which port of the master node the enumeration report departs from in the physical link, which port of the switch it passes through, etc. Further, as mentioned above, under the premise that multiple physical links between a master node and a slave node are constructed by the present application through multiple switches (such as the first switch, the second switch and the third switch), the host sends Multiple enumeration reports may reach the same slave node through multiple different paths. In this way, the master node can send enumeration reports multiple times to record multiple different routes between the master node and the slave node path. Therefore, compared with the tree structure in the prior art, the bus interconnection topology implemented in the embodiment of the present application is a graph structure, which can support multipathing access to node devices (multipathing), that is, there can be multiple Accessible paths can make the bus interconnection easier, and the design space is larger, making it more scalable and larger in capacity, so as to meet the increasingly complex and huge computing needs of users.
需要说明的是,本申请实施例中的主节点可以为主处理芯片,从节点可以为从处理芯片、存储器或者专用的硬件处理单元等等。具体地,主节点可以为包括多个中央处理器的主机,从节点可以为GPU、TPU等计算单元,还可以为固态硬盘等存储设备,等等。主节点可以通过前述确定的多条路由路径访问相应的从节点,进而调用从节点中的计算资源或者数据资源等进行一些列的计算处理等等。It should be noted that the master node in the embodiment of the present application may be a master processing chip, and the slave node may be a slave processing chip, a memory, or a dedicated hardware processing unit, and the like. Specifically, the master node may be a host computer including multiple central processing units, and the slave node may be a computing unit such as a GPU or a TPU, or a storage device such as a solid-state disk, and so on. The master node can access the corresponding slave nodes through the multiple routing paths determined above, and then call the computing resources or data resources in the slave nodes to perform a series of calculation processing and so on.
此外,本申请实施例中描述的主节点也可称之为主机,从节点也可称之为节点设备,相应的,管理主节点也可称之为管理主机,管理主节点信息寄存器也可称之为管理主机信息寄存器,管理主节点签名也可称之为管理主机签名,等等,后续不再进行反复解释。In addition, the master node described in the embodiment of this application can also be called a host, and the slave node can also be called a node device. Correspondingly, the management master node can also be called a management host, and the management master node information register can also be called a It is the management host information register, and the management master node signature can also be called the management host signature, etc., and will not be explained repeatedly in the future.
在一些可能的实现方式中,所述第一主节点,还用于基于发送的所述枚举报文,查询所述第一从节点的路由状态寄存器中的可视比特,若所述可视比特为0,则向所述第一从节点分配相应的设备编号;其中,所述可视比特为0用于指示所述第一从节点当前未被枚举发现;所述第一从节点,用于将所述第一主节点分配的设备编号保存至所述路由状态寄存器中,并将所述可视比特设置为1;其中,所述可视比特为1用于指示所述第一从节点当前已被枚举 发现。In some possible implementation manners, the first master node is further configured to query the visible bit in the routing status register of the first slave node based on the sent enumeration message, if the visible bit When the bit is 0, the corresponding device number is allocated to the first slave node; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated and found; the first slave node, It is used to save the device number assigned by the first master node to the routing status register, and set the visible bit to 1; wherein, the visible bit is 1 to indicate that the first slave The node is currently discovered by enumeration.
在本申请实施例中,如上所述,由于节点设备存在多条路径可达,因此在主机进行枚举流程时,一个节点设备可以被主机经由多条路径枚举到,节点设备在经由第一条路由路径被主机第一次枚举发现时,便可以将节点设备的路由状态寄存器中的可视比特设置为1,以指示该节点设备当前已被枚举发现。从而使得主机在经由其他路由路径发送枚举报文到该节点设备后,根据其可视比特已为1确定该节点设备已被枚举发现,从而无需再向其分配设备编号,只需要记录此次发送枚举报文经由的新的路由路径即可。如此,可以以一种记录历史发现状态的形式来实现多路径下的枚举,避免多路径下的重复枚举和无限循环,高效、准确地完成枚举。In the embodiment of this application, as mentioned above, since there are multiple paths reachable to the node device, when the host performs the enumeration process, a node device can be enumerated by the host through multiple paths, and the node device can be enumerated via the first When a routing path is discovered by enumeration for the first time by the host, the visible bit in the routing status register of the node device can be set to 1 to indicate that the node device has been discovered by enumeration currently. In this way, after the host sends an enumeration message to the node device via other routing paths, it can be determined that the node device has been enumerated and discovered according to its visible bit being 1, so that there is no need to assign a device number to it, and only need to record this The new routing path through which the enumeration message is sent for the first time is sufficient. In this way, the enumeration under the multi-path can be realized in a form of recording the historical discovery state, avoiding repeated enumeration and infinite loop under the multi-path, and completing the enumeration efficiently and accurately.
在一些可能的实现方式中,所述第一主节点,还用于向所述第一从节点发送第一配置报文,以获取对所述第一从节点的管理权限;所述第一配置报文携带所述第一主节点的主节点密码和主节点编号;所述第一从节点,还用于接收所述第一配置报文,并基于所述第一配置报文,将所述第一从节点的管理主节点信息寄存器中的签名比特设置为1;其中,所述签名比特为1用于指示所述第一从节点当前已拥有管理主节点,所述系统中的其他主节点不能获取对所述第一从节点的管理权限;所述第一从节点,还用于将所述第一主节点的主节点密码和主节点编号保存至所述管理主节点信息寄存器中。In some possible implementation manners, the first master node is further configured to send a first configuration message to the first slave node, so as to obtain management authority for the first slave node; the first configuration The message carries the master node password and the master node number of the first master node; the first slave node is also used to receive the first configuration message, and based on the first configuration message, set the The signature bit in the management master node information register of the first slave node is set to 1; wherein, the signature bit is 1 to indicate that the first slave node currently has a management master node, and other master nodes in the system The management authority to the first slave node cannot be obtained; the first slave node is further configured to save the master node password and master node number of the first master node in the management master node information register.
在本申请实施例中,主机可以通过向节点设备发送第一配置报文,将节点设备的管理主机信息寄存器中的签名比特设置为1,并将自己的主机密码(也即主节点密码)和主机编号(也即主节点编号)写入该节点设备,从而获取对该节点设备的管理权限。需要说明的是,在其签名比特设置为1后,也即在被某一主机获取了管理权限后,其他主机将无法获取对该节点设备的管理权限,从而保证在较长一段时间内节点设备拥有唯一的管理主机(也即管理主节点),确保设备管理面上的清晰性。在一些可能的实施例中,一个主机可以获取对多个不同节点设备的管理权限,并且,主机在获取对不同的节点设备的管理权限时,给每个节点设备的主机密码可以不同,主机密码可以作为后续校验管理主机身份的重要凭证,从而保证设备管理面上的安全性。其中,该管理权限包括但不限于主机对节点设备之间的资源竞争进行仲裁,对节点设备的异常报错进行处理,对节点设备的基本特性(比如最大支持的报文长度)进行管理配置,等等。在一些可能的实施例中,还定义了将节点设备的管理和使用两个层面分开的机制,如上所述,一个节点设备只可以被一个主机管理,但可以被多个主机访问并使用(例如包括数据资源和计算资源的使用等等),等等,本申请实施例对此不作具体限定。In this embodiment of the application, the host can set the signature bit in the management host information register of the node device to 1 by sending the first configuration message to the node device, and set its own host password (that is, the master node password) and The host number (that is, the master node number) is written into the node device, so as to obtain the management authority of the node device. It should be noted that after the signature bit is set to 1, that is, after a certain host obtains the management authority, other hosts will not be able to obtain the management authority of the node device, thus ensuring that the node device It has a unique management host (that is, the management master node) to ensure the clarity of the device management plane. In some possible embodiments, a host can obtain management rights to multiple different node devices, and when the host obtains management rights to different node devices, the host password for each node device can be different, the host password It can be used as an important credential for subsequent verification of the identity of the management host, thereby ensuring the security of the device management plane. Among them, the management authority includes but is not limited to the host arbitrating the resource competition between node devices, handling abnormal error reports of node devices, managing and configuring the basic characteristics of node devices (such as the maximum supported packet length), etc. wait. In some possible embodiments, a mechanism for separating the management and use of node devices is also defined. As mentioned above, a node device can only be managed by one host, but can be accessed and used by multiple hosts (for example Including the use of data resources and computing resources, etc.), etc., which are not specifically limited in this embodiment of the present application.
在一些可能的实现方式中,所述第一主节点,还用于向所述第一从节点发送第二配置报文,以取消对所述第一从节点的管理权限;所述第二配置报文携带所述第一主节点的主节点密码和主节点编号;所述第一从节点,还用于接收所述第二配置报文,若所述第二配置报文携带的所述第一主节点的主节点密码和主节点编号与所述管理主节点信息寄存器中保存的主节点密码和主节点编号一致,则将所述管理主节点信息寄存器中的所述签名比特设置为0;其中,所述签名比特为0用于指示所述第一从节点当前未拥有管理主节点。In some possible implementation manners, the first master node is further configured to send a second configuration message to the first slave node, so as to cancel the management authority of the first slave node; the second configuration The message carries the master node password and the master node number of the first master node; the first slave node is also used to receive the second configuration message, if the second configuration message carries the first The master node password and the master node number of a master node are consistent with the master node password and the master node number stored in the management master node information register, then the signature bit in the management master node information register is set to 0; Wherein, the signature bit being 0 is used to indicate that the first slave node currently does not own the management master node.
在本申请实施例中,主机在获取到对某一节点设备的管理权限后,还可以根据实际需求(例如基于其他主机的请求或者当前主机的故障报告),通过向该节点设备发送第二配置报文,将节点设备的管理主机信息寄存器(也即管理主节点信息寄存器)中的签名比特重新设置为0,从而取消对该节点设备的管理权限,提升了设备管理机制的灵活性,满足用户实际需求。并且,在节点设备接收到该第二配置报文时,需要对该报文携带的主机密码和主机编号进行校验,若与当初进行管理主机签名时保存的主机密码和主机编号一致,才允许该主机 进行取消管理权限的操作,也即本申请实施例还定义了节点设备管理主机的校验机制,确保节点设备的管理主机不可仿冒,进一步确保了设备管理的清晰可靠。In this embodiment of the application, after the host obtains the management authority for a certain node device, it can also send the second configuration Message, reset the signature bit in the management host information register of the node device (that is, the management master node information register) to 0, thereby canceling the management authority of the node device, improving the flexibility of the device management mechanism, and satisfying the user Actual demand. Moreover, when the node device receives the second configuration message, it needs to verify the host password and host number carried in the message, and only allow The host performs the operation of canceling the management authority, that is, the embodiment of the present application also defines the verification mechanism of the node device management host to ensure that the management host of the node device cannot be counterfeited, and further ensures clear and reliable device management.
在一些可能的实现方式中,所述第一主节点,还用于在获取到对所述第一从节点的管理权限后,响应于所述第一从节点发送的查询消息,向所述第一从节点发送在位消息;或者,按照第一时间间隔向所述第一从节点发送所述在位消息。In some possible implementations, the first master node is further configured to, after obtaining the management authority for the first slave node, respond to a query message sent by the first slave node, A slave node sends an in-position message; or, sends the in-position message to the first slave node according to a first time interval.
在本申请实施例中,还定义了管理主机的在位确认流程,以保证该总线系统的健壮性。当主机获取到节点设备的管理权限后,可以响应于节点设备发送的查询消息,向该节点设备发送在位消息,或者,按照一定的时间间隔主动向该节点设备发送在位消息,从而通知该节点设备其管理主机当前无异常,保障管理主机的实时在位。并且,一般情况下,主机与节点设备之间的在位确认交互的频率可以很低,对总线带宽影响几乎可以忽略。In the embodiment of the present application, an on-site confirmation process of the management host is also defined to ensure the robustness of the bus system. After the host obtains the management authority of the node device, it can respond to the query message sent by the node device and send an in-position message to the node device, or actively send an in-position message to the node device at a certain time interval, thereby notifying the node device. The management host of the node device is currently normal, ensuring the real-time presence of the management host. Moreover, generally, the frequency of presence confirmation interaction between the host and the node device can be very low, and the impact on the bus bandwidth is almost negligible.
在一些可能的实现方式中,所述第一从节点,还用于在满足预设条件的情况下,将所述管理主节点信息寄存器中的所述签名比特设置为0,以取消所述第一主节点对所述第一从节点的管理权限;其中,所述预设条件包括:在所述第一主节点获取到对所述第一从节点的管理权限后,所述第一从节点在预设时间内未接收到所述第一主节点发送的所述在位消息,或者,所述第一从节点在向所述第一主节点发送了K次查询消息后,均未接收到所述第一主节点发送的所述在位消息;K为大于或者等于1的整数。In some possible implementation manners, the first slave node is further configured to set the signature bit in the management master node information register to 0 when a preset condition is met, so as to cancel the The management authority of a master node to the first slave node; wherein, the preset condition includes: after the first master node obtains the management authority of the first slave node, the first slave node The in-position message sent by the first master node was not received within the preset time, or, after the first slave node sent K times of query messages to the first master node, none of them received The presence message sent by the first master node; K is an integer greater than or equal to 1.
在本申请实施例中,节点设备还可以在长时间未接收到其管理主机(例如第一主节点)发送的在位消息时,或者在向其管理主机多次发送查询消息且均未得到响应时,认为该管理主机处于异常状态,进而取消其管理权限,以便后续重新获取新的管理主机,从而保证整个总线系统的可靠运行。In the embodiment of this application, when the node device has not received the in-position message sent by its management host (for example, the first master node) for a long time, or sent query messages to its management host for many times without getting a response , it is considered that the management host is in an abnormal state, and then its management authority is cancelled, so as to obtain a new management host later, so as to ensure the reliable operation of the entire bus system.
在一些可能的实现方式中,所述第一从节点,还用于向所述系统中的至少一个主节点发送广播消息;所述广播消息用于指示所述第一从节点当前未拥有管理主节点;所述至少一个主节点,用于接收所述广播消息,并基于所述广播消息向所述第一从节点发送所述第一配置报文,以获取对所述第一从节点的管理权限。In some possible implementations, the first slave node is further configured to send a broadcast message to at least one master node in the system; the broadcast message is used to indicate that the first slave node does not currently have a management master Node; the at least one master node, configured to receive the broadcast message, and send the first configuration message to the first slave node based on the broadcast message, so as to obtain management of the first slave node authority.
在本申请实施例中,当节点设备判断出当前的管理主机(例如第一主节点)异常,并且取消了其管理权限后,还可以向该总线系统内的多个主机发送广播消息,以通知该多个主机来获取对该节点设备的管理权限,使得该节点设备可以拥有新的管理主机,从而保证整个总线系统的可靠运行。In the embodiment of this application, when the node device judges that the current management host (such as the first master node) is abnormal and cancels its management authority, it can also send a broadcast message to multiple hosts in the bus system to notify The plurality of hosts obtain the management authority of the node device, so that the node device can have a new management host, thereby ensuring the reliable operation of the entire bus system.
在一些可能的实现方式中,所述多路径总线子系统还包括第二主节点;所述N个交换机还包括与所述第二主节点邻接的N4个第四交换机;其中,任意一个第四交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N4为小于或者等于N的正整数;所述第二主节点,用于向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第二主节点到所述第一从节点经由的交换机;其中,所述多条路由路径中的每条路由路径至少依次经由所述N4个第四交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个。In some possible implementations, the multipath bus subsystem further includes a second master node; the N switches further include N4 fourth switches adjacent to the second master node; wherein, any fourth The switch is adjacent to any third switch, or connected through one or more second switches; N4 is a positive integer less than or equal to N; the second master node is used to send multiple times to the first slave node report messages, and based on at least one switch, determine multiple routing paths between the second master node and the first slave node; A switch through which the node goes to the first slave node; wherein, each routing path in the plurality of routing paths passes through at least one or more of the N4 fourth switches and the N2 second switches in sequence S of them, and one or more of said N3 third switches.
在本申请实施例中,一个节点设备(例如第一从节点)还可以被多个不同的主机发现,即其他主机(例如第二主节点)也可以经由与该节点设备连接的多个交换机,向该节点设备多次发送枚举报文,从而记录与该节点设备之间的多条路由路径,等等。极大程度上扩展了节点设备的使用范围,满足各个主机对大量节点设备的使用需求,使得任意主机都可以访问其他主机下的节点设备,进而可以调用更多的计算资源等从而执行更加复杂的计算处理。另 外,如上所述,一个节点设备虽然可以被多个主机使用,但是只能被一个主机管理,单一主机管理机制可以使得管理更加清晰。In the embodiment of this application, a node device (such as the first slave node) can also be discovered by multiple different hosts, that is, other hosts (such as the second master node) can also be connected to the node device via multiple switches, The enumeration message is sent to the node device multiple times, thereby recording multiple routing paths with the node device, and so on. It greatly expands the scope of use of node devices to meet the needs of each host for a large number of node devices, so that any host can access node devices under other hosts, and then can call more computing resources to perform more complex tasks. Calculation processing. In addition, as mentioned above, although a node device can be used by multiple hosts, it can only be managed by one host. The single host management mechanism can make the management clearer.
在一些可能的实现方式中,所述多路径总线子系统包括第一节点域和第二节点域;所述第一节点域内包括所述第一主节点和多个所述第一从节点,所述第二节点域内包括所述第二主节点和多个第二从节点;所述第二主节点,具体用于:在所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号后,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。In some possible implementation manners, the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and multiple first slave nodes, so The second node domain includes the second master node and multiple second slave nodes; the second master node is specifically configured to: all first slave nodes in the first node domain are controlled by the first The master node assigns a device number, and after all the second slave nodes in the domain of the second node are assigned device numbers by the second master node, it sends enumeration messages to the first slave node multiple times, and based on At least one switch determines multiple routing paths between the second master node and the first slave node.
在一些可能的实现方式中,所述N个交换机中还包括属于第一节点域的第一跨节点域交换机,以及属于第二节点域的第二跨节点域交换机;所述第一跨节点域交换机的第一端口与所述第二跨节点域交换机的第二端口连接;若所述第一节点域内的部分或者全部第一从节点还未被所述第一主节点分配设备编号,和/或,所述第二节点域内的部分或者全部第二从节点还未被所述第二主节点分配设备编号,则所述第一端口和第二端口之间的数据链路处于关闭状态;若所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号,则所述第一端口与所述第二端口之间的数据链路处于打开状态,以使得所述第二主节点通过所述第二端口和所述第一端口向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。In some possible implementation manners, the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; the first cross-node domain switch The first port of the switch is connected to the second port of the second cross-node domain switch; if some or all of the first slave nodes in the first node domain have not been assigned device numbers by the first master node, and/ Or, some or all of the second slave nodes in the second node domain have not been assigned device numbers by the second master node, then the data link between the first port and the second port is in a closed state; if All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node, Then the data link between the first port and the second port is in an open state, so that the second master node sends more data to the first slave node through the second port and the first port An enumeration report is sent for the second time, and multiple routing paths between the second master node and the first slave node are determined based on at least one switch.
在本申请实施例中,总线系统可以分为多个节点域,如此,主机可以在完成对当前所属节点域内的节点设备的枚举发现后,才进一步对其他节点域内的节点设备进行枚举发现,进而记录与其他节点域内的节点设备之间的多条路由路径。简言之,主机的枚举发现流程是从其当前所属的节点域(node domain)内开始,在完成对本节点域内的节点设备的枚举发现之前,本节点域可以不与其他节点域的任何节点设备进行数据流连接。如此,能够支持跨节点域(即跨网络或者不同系统管理软件)下多主机的设备枚举发现,并且,保证不会出现不同的枚举软件对同一个节点设备的多次设备编号的分配,或者同一个设备编号被分配到不同节点设备的冲突场景。In the embodiment of this application, the bus system can be divided into multiple node domains. In this way, the host can further enumerate and discover node devices in other node domains after completing the enumeration and discovery of node devices in the current node domain. , and then record multiple routing paths with node devices in other node domains. In short, the enumeration and discovery process of the host starts from the node domain to which it currently belongs. Before the enumeration and discovery of the node devices in the local node domain is completed, the current node domain may not be connected with any other node domains. Node devices make data stream connections. In this way, it can support multi-host device enumeration discovery under cross-node domains (that is, cross-network or different system management software), and ensure that different enumeration software will not assign multiple device numbers to the same node device. Or a conflict scenario where the same device number is assigned to different node devices.
在一些可能的实现方式中,所述第二主节点为与所述第一主节点通过交换机进行网络连接的远程主节点;所述第二主节点,具体用于:通过网络连接访问所述第一节点域内的所述第一从节点,以调用所述第一从节点中的计算资或者读取所述第一从节点中的存储数据。In some possible implementation manners, the second master node is a remote master node connected to the first master node through a network through a switch; the second master node is specifically configured to: access the first master node through a network connection. The first slave node in a node domain, to invoke computing resources in the first slave node or read stored data in the first slave node.
在一些可能的实现方式中,所述第一主节点为第一终端中的中央处理器CPU;所述第一主节点,还用于通过网络连接调用至少一个第二从节点的计算资源或读取所述至少一个第二从节点中的存储数据;所述第二从节点为第二终端中的图像处理器GPU、固态硬盘、加速器、网卡或张量处理单元TPU。可选地,该第一终端和第二终端可以为智能手机、平板电脑、台式电脑、计算机和服务器等等,本申请实施例对此不作具体限定。In some possible implementations, the first master node is the central processing unit CPU in the first terminal; the first master node is also used to invoke computing resources or read resources of at least one second slave node through a network connection The stored data in the at least one second slave node is fetched; the second slave node is an image processor GPU, a solid state disk, an accelerator, a network card or a tensor processing unit TPU in the second terminal. Optionally, the first terminal and the second terminal may be smart phones, tablet computers, desktop computers, computers, servers, etc., which are not specifically limited in this embodiment of the present application.
在本申请实施例中,基于图形结构,总线上的一个节点设备可以被多个主机访问并使用,进一步地,一个节点设备除了可以被近端通过有线方式连接的主机访问并使用,还可以被远程通过无线方式连接的其他主机访问并使用,从而使得主机可以调用远程的计算资源和存储数据,更大程度上满足了主机在进行复杂计算处理时对庞大的计算资源等的需求,等等。通过网络连接无疑大大增强了整个总线系统的扩展性,进一步突破了现有PCIe总线连接的限制。In the embodiment of this application, based on the graph structure, a node device on the bus can be accessed and used by multiple hosts. Further, a node device can not only be accessed and used by hosts connected by wires at the near end, but also can be accessed by Remotely accessed and used by other hosts connected wirelessly, so that the host can call remote computing resources and store data, to a greater extent meet the host's demand for huge computing resources when performing complex computing processing, and so on. Undoubtedly, the network connection greatly enhances the scalability of the entire bus system, and further breaks through the limitation of the existing PCIe bus connection.
在一些可能的实现方式中,所述从节点为图像处理器、固态硬盘、加速器、网卡、张量 处理单元、嵌入式神经网络处理器NPU、数字信号处理器DSP、图像信号处理器ISP或交换机中的任意一种。In some possible implementations, the slave node is an image processor, a solid state disk, an accelerator, a network card, a tensor processing unit, an embedded neural network processor NPU, a digital signal processor DSP, an image signal processor ISP or a switch any of the.
在本申请实施例中,节点设备可以为图像处理器、固态硬盘、加速器、网卡、张量处理单元、嵌入式神经网络处理器(neural-network processor,NPU)、数字信号处理器(digital signal processor,DSP)、图像信号处理器(image signal processor,ISP)和交换机,等等,本申请实施例对此不作具体限定。基于本申请实施例中更具扩展性、设计空间更大的图形结构,可以实现将以上各种节点设备添加到该结构中,使得总线上的主机可以根据实际需求访问相应的节点设备,以使用其计算资源或者数据资源等等,从而满足越来越复杂、庞大的计算需求。In this embodiment of the application, the node device may be an image processor, a solid state disk, an accelerator, a network card, a tensor processing unit, an embedded neural network processor (neural-network processor, NPU), a digital signal processor (digital signal processor) , DSP), an image signal processor (image signal processor, ISP) and a switch, etc., which are not specifically limited in this embodiment of the present application. Based on the graph structure with more scalability and larger design space in the embodiment of the present application, the above various node devices can be added to the structure, so that the host on the bus can access the corresponding node devices according to actual needs, to use Its computing resources or data resources, etc., so as to meet the increasingly complex and huge computing needs.
在一些可能的实现方式中,所述主节点包括一个或多个中央处理器CPU。In some possible implementation manners, the master node includes one or more central processing units (CPUs).
在本申请实施例中,主机可以是拥有一个或多个中央处理器的计算系统,在一些可能的实施例中,主机还可以包括主存、高速缓冲存储器(cache)、内部互联总线、输入输出(input/output,IO)接口等,本申请实施例对此不作具体限定。基于本申请实施例中更具扩展性、设计空间更大的图形结构,主机可以经由多个不同的路由路径使用各个节点设备中的计算资源(例如GPU中的计算单元等)和数据资源等进行计算处理,从而满足越来越复杂、庞大的计算需求。In the embodiment of the present application, the host may be a computing system with one or more central processing units. In some possible embodiments, the host may also include a main memory, a cache memory (cache), an internal interconnection bus, an input and output The (input/output, IO) interface and the like are not specifically limited in this embodiment of the present application. Based on the graph structure that is more scalable and has a larger design space in the embodiment of the present application, the host can use computing resources (such as computing units in the GPU, etc.) Computing processing to meet increasingly complex and huge computing needs.
第二方面,本申请实施例提供了一种通信方法,应用于总线系统,所述总线系统是由多个主节点、多个交换机和多个从节点通过总线构成的图形结构;所述图形结构中的任意一个多路径总线子系统,包括第一主节点、第一从节点和N个交换机;所述N个交换机包括与所述第一主节点邻接的N1个第一交换机,N2个第二交换机,以及与所述第一从节点邻接的N3个第三交换机;其中,任意一个第一交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N1、N2、N3均为小于或者等于N的正整数;所述方法包括:通过所述第一主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第一主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第一主节点到所述第一从节点经由的交换机;其中,每条路由路径至少依次经由所述N1个第一交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个;S为小于或者等于N3的自然数。In the second aspect, the embodiment of the present application provides a communication method, which is applied to a bus system, and the bus system is a graph structure composed of a plurality of master nodes, a plurality of switches, and a plurality of slave nodes through the bus; the graph structure Any one of the multipath bus subsystems in the system includes a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, and N2 second switches switch, and N3 third switches adjacent to the first slave node; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all is a positive integer less than or equal to N; the method includes: through the first master node, sending an enumeration message to the first slave node multiple times, and based on at least one switch, determining the first master node A plurality of routing paths with the first slave node; the at least one switch is the switch through which each enumeration report message passes from the first master node to the first slave node; wherein each route The path passes through at least one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches in sequence; S is less than or equal to A natural number of N3.
在一些可能的实现方式中,所述方法还包括:通过所述第一主节点,基于发送的所述枚举报文,查询所述第一从节点的路由状态寄存器中的可视比特,若所述可视比特为0,则向所述第一从节点分配相应的设备编号;其中,所述可视比特为0用于指示所述第一从节点当前未被枚举发现;通过所述第一从节点,将所述第一主节点分配的设备编号保存至所述路由状态寄存器中,并将所述可视比特设置为1;其中,所述可视比特为1用于指示所述第一从节点当前已被枚举发现。In some possible implementations, the method further includes: through the first master node, based on the sent enumeration message, querying the visible bits in the routing status register of the first slave node, if If the visible bit is 0, assign the corresponding device number to the first slave node; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated; through the The first slave node saves the device number allocated by the first master node into the routing status register, and sets the visible bit to 1; wherein, the visible bit is 1 to indicate the The first slave node is currently discovered by enumeration.
在一些可能的实现方式中,所述方法还包括:通过所述第一主节点,向所述第一从节点发送第一配置报文,以获取对所述第一从节点的管理权限;所述第一配置报文携带所述第一主节点的主节点密码和主节点编号;通过所述第一从节点,接收所述第一配置报文,并基于所述第一配置报文,将所述第一从节点的管理主节点信息寄存器中的签名比特设置为1;其中,所述签名比特为1用于指示所述第一从节点当前已拥有管理主节点,所述系统中的其他主节点不能获取对所述第一从节点的管理权限;通过所述第一从节点,将所述第一主节点的主节点密码和主节点编号保存至所述管理主节点信息寄存器中。In some possible implementation manners, the method further includes: using the first master node, sending a first configuration message to the first slave node, so as to obtain management authority for the first slave node; The first configuration message carries the master node password and the master node number of the first master node; through the first slave node, the first configuration message is received, and based on the first configuration message, the The signature bit in the management master node information register of the first slave node is set to 1; wherein, the signature bit is 1 to indicate that the first slave node currently owns a management master node, and other The master node cannot obtain the management authority of the first slave node; through the first slave node, the master node password and the master node number of the first master node are saved in the management master node information register.
在一些可能的实现方式中,所述方法还包括:通过所述第一主节点,向所述第一从节点 发送第二配置报文,以取消对所述第一从节点的管理权限;所述第二配置报文携带所述第一主节点的主节点密码和主节点编号;通过所述第一从节点,接收所述第二配置报文,若所述第二配置报文携带的所述第一主节点的主节点密码和主节点编号与所述管理主节点信息寄存器中保存的主节点密码和主节点编号一致,则将所述管理主节点信息寄存器中的所述签名比特设置为0;其中,所述签名比特为0用于指示所述第一从节点当前未拥有管理主节点。In some possible implementation manners, the method further includes: using the first master node, sending a second configuration message to the first slave node, so as to cancel the management authority of the first slave node; The second configuration message carries the master node password and the master node number of the first master node; the second configuration message is received through the first slave node, if the second configuration message carries the The master node password and the master node number of the first master node are consistent with the master node password and the master node number stored in the management master node information register, then the signature bit in the management master node information register is set to 0; wherein, the signature bit is 0 to indicate that the first slave node currently does not own a management master node.
在一些可能的实现方式中,所述方法还包括:通过所述第一主节点,在获取到对所述第一从节点的管理权限后,响应于所述第一从节点发送的查询消息,向所述第一从节点发送在位消息;或者,按照第一时间间隔向所述第一从节点发送所述在位消息。In some possible implementation manners, the method further includes: through the first master node, after obtaining the management authority to the first slave node, in response to a query message sent by the first slave node, sending an in-position message to the first slave node; or sending the in-position message to the first slave node at a first time interval.
在一些可能的实现方式中,所述方法还包括:通过所述第一从节点,在满足预设条件的情况下,将所述管理主节点信息寄存器中的所述签名比特设置为0,以取消所述第一主节点对所述第一从节点的管理权限;其中,所述预设条件包括:在所述第一主节点获取到对所述第一从节点的管理权限后,所述第一从节点在预设时间内未接收到所述第一主节点发送的所述在位消息,或者,所述第一从节点在向所述第一主节点发送了K次查询消息后,均未接收到所述第一主节点发送的所述在位消息;K为大于或者等于1的整数。In some possible implementation manners, the method further includes: by the first slave node, setting the signature bit in the management master node information register to 0 when a preset condition is met, so as to Canceling the management authority of the first master node to the first slave node; wherein, the preset condition includes: after the first master node obtains the management authority of the first slave node, the The first slave node does not receive the presence message sent by the first master node within a preset time, or, after the first slave node sends K times of query messages to the first master node, The in-position message sent by the first master node has not been received; K is an integer greater than or equal to 1.
在一些可能的实现方式中,所述方法还包括:通过所述第一从节点,向所述系统中的至少一个主节点发送广播消息;所述广播消息用于指示所述第一从节点当前未拥有管理主节点;通过所述至少一个主节点,接收所述广播消息,并基于所述广播消息向所述第一从节点发送所述第一配置报文,以获取对所述第一从节点的管理权限。In some possible implementations, the method further includes: sending a broadcast message to at least one master node in the system through the first slave node; the broadcast message is used to indicate that the first slave node is currently does not have a management master node; through the at least one master node, the broadcast message is received, and the first configuration message is sent to the first slave node based on the broadcast message, so as to obtain the configuration information for the first slave node Administrative permissions for the node.
在一些可能的实现方式中,所述多路径总线子系统还包括第二主节点;所述N个交换机还包括与所述第二主节点邻接的N4个第四交换机;其中,任意一个第四交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N4为小于或者等于N的正整数;所述方法还包括:通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第二主节点到所述第一从节点经由的交换机;其中,所述多条路由路径中的每条路由路径至少依次经由所述N4个第四交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个。In some possible implementations, the multipath bus subsystem further includes a second master node; the N switches further include N4 fourth switches adjacent to the second master node; wherein, any fourth The switch is adjacent to any third switch, or connected through one or more second switches; N4 is a positive integer less than or equal to N; the method also includes: through the second master node, to the first slave The node sends enumeration report messages multiple times, and based on at least one switch, determines multiple routing paths between the second master node and the first slave node; A switch through which the second master node goes to the first slave node; wherein, each routing path in the plurality of routing paths passes through at least one or more of the N4 fourth switches in sequence, the S of the N2 second switches and one or more of the N3 third switches.
在一些可能的实现方式中,所述多路径总线子系统包括第一节点域和第二节点域;所述第一节点域内包括所述第一主节点和多个所述第一从节点,所述第二节点域内包括所述第二主节点和多个第二从节点;所述通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径,包括:在所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号后,通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。In some possible implementation manners, the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and multiple first slave nodes, so The second node domain includes the second master node and multiple second slave nodes; the second master node sends an enumeration message to the first slave node multiple times, and based on at least one switch , determining multiple routing paths between the second master node and the first slave node, including: all first slave nodes in the domain of the first node are assigned device numbers by the first master node, In addition, after all the second slave nodes in the second node domain are assigned device numbers by the second master node, the enumeration report message is sent to the first slave node multiple times through the second master node, And based on at least one switch, determine multiple routing paths between the second master node and the first slave node.
在一些可能的实现方式中,所述N个交换机中还包括属于第一节点域的第一跨节点域交换机,以及属于第二节点域的第二跨节点域交换机;所述第一跨节点域交换机的第一端口与所述第二跨节点域交换机的第二端口连接;若所述第一节点域内的部分或者全部第一从节点还未被所述第一主节点分配设备编号,和/或,所述第二节点域内的部分或者全部第二从节点还未被所述第二主节点分配设备编号,则所述第一端口和第二端口之间的数据链路处于关闭状态;若所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号,则所述第一端口与所 述第二端口之间的数据链路处于打开状态,以使得所述第二主节点通过所述第二端口和所述第一端口向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。In some possible implementation manners, the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; the first cross-node domain switch The first port of the switch is connected to the second port of the second cross-node domain switch; if some or all of the first slave nodes in the first node domain have not been assigned device numbers by the first master node, and/ Or, some or all of the second slave nodes in the second node domain have not been assigned device numbers by the second master node, then the data link between the first port and the second port is in a closed state; if All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node, Then the data link between the first port and the second port is in an open state, so that the second master node sends more data to the first slave node through the second port and the first port An enumeration report is sent for the second time, and multiple routing paths between the second master node and the first slave node are determined based on at least one switch.
在一些可能的实现方式中,所述第二主节点为与所述第一主节点通过交换机进行网络连接的远程主节点;所述方法还包括:通过所述第二主节点,通过网络连接访问所述第一节点域内的所述第一从节点,以调用所述第一从节点中的计算资或者读取所述第一从节点中的存储数据。In some possible implementations, the second master node is a remote master node that is network-connected to the first master node through a switch; the method further includes: using the second master node to access The first slave node in the first node domain is used to invoke computing resources in the first slave node or read stored data in the first slave node.
第三方面,本本申请实施例提供一种主节点,该主机中包括处理器,处理器被配置为支持该主节点执行第二方面提供的任意一种通信方法中相应的功能。该主节点还可以包括存储器,存储器用于与处理器耦合,其保存该主节点必要的程序指令和数据。该主节点还可以包括通信接口,用于该主节点与其他设备或通信网络通信。In a third aspect, an embodiment of the present application provides a master node, where the host includes a processor configured to support the master node to perform a corresponding function in any one of the communication methods provided in the second aspect. The master node may also include a memory, which is used to be coupled with the processor, and stores necessary program instructions and data of the master node. The master node may also include a communication interface for the master node to communicate with other devices or a communication network.
第四方面,本本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述第二方面中任意一项所述的通信方法流程。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the communication described in any one of the above-mentioned second aspects is realized. method flow.
第五方面,本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第二方面中任意一项所述的通信方法流程。In a fifth aspect, an embodiment of the present application provides a computer program, the computer program includes instructions, and when the computer program is executed by a computer, the computer can execute the process of the communication method described in any one of the above-mentioned second aspects.
第六方面,本申请实施例提供了一种芯片,该芯片包括处理器和通信接口,所述处理器用于从该通信接口调用并运行指令,当该处理器执行所述指令时,使得该芯片执行上述第二方面中所述的通信方法流程。In a sixth aspect, the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the processor is used to call and run instructions from the communication interface, and when the processor executes the instructions, the chip Execute the flow of the communication method described in the second aspect above.
第七方面,本本申请实施例提供了一种芯片系统,该芯片系统包括上述第一方面中任意一项所述的总线系统,用于实现上述第二方面中任意一项所述的通信方法流程所涉及的功能。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存通信方法必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。In the seventh aspect, the embodiment of the present application provides a chip system, the chip system includes the bus system described in any one of the above-mentioned first aspects, and is used to implement the communication method process described in any one of the above-mentioned second aspects the functions involved. In a possible design, the chip system further includes a memory, and the memory is used for storing necessary program instructions and data of the communication method. The system-on-a-chip may consist of chips, or may include chips and other discrete devices.
附图说明Description of drawings
图1是一种基于CXL2.0的互联结构示意图。Figure 1 is a schematic diagram of an interconnection structure based on CXL2.0.
图2是本申请实施例提供的一种总线系统的结构示意图。Fig. 2 is a schematic structural diagram of a bus system provided by an embodiment of the present application.
图3是本申请实施例提供的一种多路径总线子系统的结构示意图。FIG. 3 is a schematic structural diagram of a multipath bus subsystem provided by an embodiment of the present application.
图4a是本申请例提供的另一种多路径总线子系统的结构示意图。Fig. 4a is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.
图4b是本申请例提供的又一种多路径总线子系统的结构示意图。Fig. 4b is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.
图4c是本申请例提供的又一种多路径总线子系统的结构示意图。Fig. 4c is a schematic structural diagram of another multi-path bus subsystem provided in the example of the present application.
图4d是本申请例提供的又一种多路径总线子系统的结构示意图。Fig. 4d is a schematic structural diagram of another multi-path bus subsystem provided by the present application example.
图5是本申请实施例提供的另一种总线系统的结构示意图。FIG. 5 is a schematic structural diagram of another bus system provided by an embodiment of the present application.
图6是本申请实施例提供的一种管理主机签名流程示意图。Fig. 6 is a schematic diagram of a management host signature flow provided by an embodiment of the present application.
图7是本申请实施例提供的一种主机取消管理权限的流程示意图。FIG. 7 is a schematic flow diagram of a host canceling management authority provided by an embodiment of the present application.
图8a是本申请实施例提供的一种管理主机在位确认流程示意图。Fig. 8a is a schematic diagram of a management host presence confirmation process provided by an embodiment of the present application.
图8b是本申请实施例提供的另一种管理主机在位确认流程示意图。Fig. 8b is a schematic diagram of another management host presence confirmation process provided by the embodiment of the present application.
图9是本申请实施例提供的又一种多路径总线子系统的结构示意图。FIG. 9 is a schematic structural diagram of another multipath bus subsystem provided by an embodiment of the present application.
图10a-图10c是本申请实施例提供的一组多路径多主机的总线系统的结构示意图。10a-10c are schematic structural diagrams of a set of multi-path multi-master bus systems provided by an embodiment of the present application.
图11a-图11b是本申请实施例提供的一组多路径多主机下的枚举过程示意图。11a-11b are schematic diagrams of an enumeration process under a group of multi-path multi-hosts provided by the embodiment of the present application.
图12是本申请实施例提供的一种通信方法的流程示意图。FIG. 12 is a schematic flowchart of a communication method provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例进行描述。The embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
本申请的说明书和权利要求书及所述附图中的术语“第一”和“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。需要说明的是,当一个元件被称作与另一个或多个元件“耦合”、“连接”时,它可以是一个元件直接连接到另一个或多个元件,也可以是间接连接至该另一个或多个元件。The terms "first" and "second" in the description and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses. It should be noted that when an element is referred to as being "coupled" or "connected" to another element or elements, it may be that one element is directly connected to another element or elements or may be indirectly connected to the other element. one or more elements.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本邻域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein may be combined with other embodiments.
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在处理器上运行的应用和处理器都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms "component", "module", "system" and the like are used in this specification to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a processor and a processor may be components. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.
首先,对本申请中的部分用语进行解释说明,以便于本邻域技术人员理解。First of all, some terms used in this application are explained to facilitate the understanding of those skilled in the art.
(1)拓扑结构。计算机网络的拓扑结构是引用拓扑学中研究与大小、形状无关的点、线关系的方法,把网络中的计算机和通信设备抽象为一个点,把传输介质抽象为一条线,由点和线组成的几何图形就是计算机网络的拓扑结构。其中,计算机网络的最主要的拓扑结构有总线型拓扑、环形拓扑、树形拓扑、星形拓扑、混合型拓扑以及网状拓扑。例如,PCI采用的是总线型拓扑结构,PCIe则采用树形拓扑结构,等等。进一步地,拓扑中的主机可以使用算法搜索整个网络拓扑结构,枚举发现网络中连接的所有设备,所谓枚举发现简单说来就是去遍历每个端口下物理连接的设备,读取设备的配置空间,为其分配相应的设备编号并记录 枚举到该设备时经由的路由路径,最终主机可以获知整个拓扑中设备的个数以及各设备间的连接关系,等等。显然,如上所述,若一个设备已经被分配了设备编号,则可以表示该设备已被枚举发现。在本申请的一些可能的实现方式中,拓扑结构可以是一个展平(flat)的图结构,总线上的主机(即主节点)到一个节点设备(即从节点)可以有多条路径可达,同时,一个节点设备还可以被总线上的多个主机访问,等等,此处不再进行详述。(1) Topological structure. The topology of a computer network refers to the method of studying the relationship between points and lines that have nothing to do with size and shape in topology. It abstracts the computer and communication equipment in the network into a point, and abstracts the transmission medium into a line, which is composed of points and lines. The geometry of is the topology of the computer network. Among them, the most important topological structures of computer networks include bus topology, ring topology, tree topology, star topology, hybrid topology and mesh topology. For example, PCI adopts a bus topology, PCIe adopts a tree topology, and so on. Furthermore, hosts in the topology can use algorithms to search the entire network topology, enumerate and discover all devices connected in the network, the so-called enumeration discovery simply means to traverse the devices physically connected under each port, and read the configuration of the devices Space, assign corresponding device numbers to it and record the routing path enumerated to the device. Finally, the host can know the number of devices in the entire topology and the connection relationship between each device, and so on. Apparently, as mentioned above, if a device number has been allocated to a device, it may indicate that the device has been discovered by enumeration. In some possible implementations of the present application, the topology structure may be a flattened (flat) graph structure, and there may be multiple paths from the host on the bus (ie, the master node) to a node device (ie, the slave node) , and at the same time, a node device can also be accessed by multiple hosts on the bus, etc., which will not be described in detail here.
(2)交换机(switch),用于实现总线互联和路由,包括多个端口,每个端口均可以对应有物理链路,以连接总线系统内的主节点、从节点或者其他的交换机。在本申请实施例中,多个交换机具体用于通过彼此之间的连接,构建主节点到从节点间的多条物理链路,从而实现主节点对从节点的多路径访问。在一些可能的实施例中,交换机的一些端口还可以具备网卡功能,用于进行远程网络连接,还有一些特殊端口可以用于将主节点的枚举发现流程先控制在其所在的节点域内,从而避免多个主节点各自枚举过程中的干扰,等等,此处不再展开详细阐述。(2) A switch (switch), used to implement bus interconnection and routing, includes multiple ports, and each port may correspond to a physical link to connect a master node, a slave node or other switches in the bus system. In the embodiment of the present application, multiple switches are specifically used to construct multiple physical links between the master node and the slave nodes by connecting with each other, so as to realize multi-path access from the master node to the slave nodes. In some possible embodiments, some ports of the switch can also have network card functions for remote network connection, and some special ports can be used to control the enumeration and discovery process of the master node in the node domain where it is located. In this way, interference in the enumeration process of multiple master nodes is avoided, etc., and will not be elaborated here.
(3)图形结构,即图结构,是一种比树形结构更复杂的非线性结构。在树形结构中,节点间具有分支层次关系,每一层上的节点只能和上一层中的一个节点相关,但可能和下一层的多个节点相关。而在图形结构中,任意一个节点的前趋和后继的邻接节点均可以为一个或多个,打破了树形结构中严格的从上到下的层次关系,或者说,图形结构中任意两个节点之间都可能相关,即节点之间的邻接关系可以是任意的。(3) Graphic structure, that is, graph structure, is a nonlinear structure more complex than tree structure. In the tree structure, there is a branched hierarchical relationship between nodes, and a node on each layer can only be related to one node in the previous layer, but may be related to multiple nodes in the next layer. In the graph structure, the predecessor and successor of any node can be one or more, breaking the strict hierarchical relationship from top to bottom in the tree structure, or in other words, any two nodes in the graph structure All nodes may be related, that is, the adjacency relationship between nodes can be arbitrary.
首先,为了便于理解本申请实施例,进一步分析并提出本申请所具体要解决的技术问题。在现有技术中,关于总线互联技术,包括多种方案,以下示例性的列举如下较为常见的CXL2.0方案。First, in order to facilitate the understanding of the embodiments of the present application, the technical problems specifically to be solved in the present application are further analyzed and proposed. In the prior art, various schemes are included regarding the bus interconnection technology, and the more common CXL2.0 schemes are exemplarily listed below.
请参阅图1,图1是一种基于CXL2.0的互联结构示意图。如图1所示,该结构中可以包括CXL switch、总线管理者(fabric manager,FM)、多个主机(host)和多个设备(device)。具体可以包括host 0、host 1,以及type3 device 0、type3 device 1和type3 device 2。显然,如图1所示,该多个主机和设备中间必须通过CXL switch参与互联,即主机无法直接和设备互联,而是通过CXL switch和设备间接互联,对于单个主机和设备而言,还是从上而下的PCIe树形结构。Please refer to Figure 1, which is a schematic diagram of a CXL2.0-based interconnection structure. As shown in Figure 1, the structure may include a CXL switch, a bus manager (fabric manager, FM), multiple hosts and multiple devices. Specifically, it can include host 0, host 1, and type3 device 0, type3 device 1, and type3 device 2. Obviously, as shown in Figure 1, the multiple hosts and devices must participate in the interconnection through the CXL switch, that is, the host cannot be directly connected to the device, but indirectly connected to the device through the CXL switch. Top-down PCIe tree structure.
进一步地,如图1所示,CXL switch中包括与FM连接的总线管理者端点设备(fabric manager Endpoint,FM EP),以及两个虚拟CXL switch(virtual CXL switch,VCS),例如VCS-0和VCS-1,其中,每个VCS中还可以包括多个虚拟pci-to-pci桥(virtual pci-to-pci bridge,vPPB),如图1所示,VCS-0中可以包括vPPB-01、vPPB-02、vPPB-03,VCS-1中可以包括vPPB-11、vPPB-12、vPPB-13。此外,CXL switch中还包括多个物理的pci-to-pci桥(pci-to-pci bridge,PPB),例如PPB-0、PPB-1、PPB-2。Further, as shown in Figure 1, the CXL switch includes a bus manager endpoint device (fabric manager Endpoint, FM EP) connected to the FM, and two virtual CXL switches (virtual CXL switch, VCS), such as VCS-0 and VCS-1, wherein, can also comprise a plurality of virtual pci-to-pci bridges (virtual pci-to-pci bridge, vPPB) in each VCS, as shown in Figure 1, can comprise vPPB-01, vPPB-02, vPPB-03, and VCS-1 may include vPPB-11, vPPB-12, and vPPB-13. In addition, the CXL switch also includes multiple physical pci-to-pci bridges (pci-to-pci bridge, PPB), such as PPB-0, PPB-1, and PPB-2.
需要说明的是,在CXL2.0方案中,CXL switch首先需要FM对其进行初始化,并且CXL switch的下流端口(downstream port,DP)不与虚拟的CXL switch绑定,只属于FM。其中,FM可以通过一些厂商自定义的机制去初始化CXL switch,提前将CXL switch的vPPB和某个物理的PPB绑定,其中,多个vPPB可以绑定到同一个PPB,例如图1所示,vPPB-03和vPPB-12均可以绑定到PPB-1。It should be noted that in the CXL2.0 solution, the CXL switch first needs to be initialized by the FM, and the downstream port (downstream port, DP) of the CXL switch is not bound to the virtual CXL switch, but only belongs to the FM. Among them, FM can initialize the CXL switch through some manufacturer-defined mechanisms, and bind the vPPB of the CXL switch to a physical PPB in advance. Among them, multiple vPPBs can be bound to the same PPB, as shown in Figure 1. Both vPPB-03 and vPPB-12 can be bound to PPB-1.
随着CXL switch完成初始化,host 0和host 1便可以按照标准的PCIe枚举流程枚举到CXL switch,以及CXL switch背后互联的设备(例如type3 device 0、type3 device 1和type3 device 2),配置Switch的地址窗口、总线号窗口等。从而对于host 0和host 1而言,其看到 的都是一颗完整的独占的PCIe树。CXL switch必须要将这些枚举过程中的路由配置存储起来,并且在后续的下行数据流都能准确路由到实际的物理下行端口。对于上行数据流,同样地,也要能够正常路由到真正的目的主机。因此,CXL switch的设计必然是要符合CXL协议的这些规定,相对于普通的switch来说,CXL switch更加复杂、成本更高。After the CXL switch is initialized, host 0 and host 1 can enumerate to the CXL switch and the connected devices behind the CXL switch (such as type3 device 0, type3 device 1, and type3 device 2) according to the standard PCIe enumeration process. Switch's address window, bus number window, etc. Therefore, for host 0 and host 1, what they see is a complete exclusive PCIe tree. The CXL switch must store the routing configurations during these enumeration processes, and the subsequent downstream data flows can be accurately routed to the actual physical downstream ports. For the upstream data flow, similarly, it should also be able to be normally routed to the real destination host. Therefore, the design of the CXL switch must comply with these regulations of the CXL protocol. Compared with the ordinary switch, the CXL switch is more complicated and more expensive.
综上,图1所示的CXL2.0方案存在以下缺点:In summary, the CXL2.0 solution shown in Figure 1 has the following disadvantages:
(1)必须要通过特殊设计的CXL switch作为中间互联设备来互联主机和设备,这个特殊设计的CXL switch无疑会增加互联的延时,并从某种程度上限制了整个互联的拓扑结构,增加了互联设计的复杂度和额外的成本。(1) A specially designed CXL switch must be used as an intermediate interconnection device to interconnect hosts and devices. This specially designed CXL switch will undoubtedly increase the delay of interconnection, and to some extent limit the topology of the entire interconnection, increasing The complexity and additional cost of the interconnection design.
(2)节点设备依然要做特殊的功能设计,例如要支持多逻辑设备(multi logical device)功能的类型3设备。(2) Node devices still need to be designed with special functions, such as type 3 devices that support multi logical device (multi logical device) functions.
(3)对整个互联拓扑互联初始化流程有较为严格的要求,需要总线管理者提前参与初始化互联(例如初始化CXL switch、初始化类型3设备等),需要设备识别这些特殊的管理报文,过程复杂繁琐。(3) There are relatively strict requirements on the initialization process of the entire interconnection topology, requiring the bus manager to participate in the initialization interconnection in advance (such as initializing the CXL switch, initializing type 3 devices, etc.), requiring the device to recognize these special management messages, and the process is complex and cumbersome .
因此,为了解决当前总线互联相关技术中不满足实际需求的问题,本申请实际要解决的技术问题包括如下方面:打破现有PCIe自上而下的树形结构的限制,基于现有常规的交换机(switch),实现图形结构的互联拓扑。对于结构中的一个节点,可以有多条路径可达,并且可以被多个主机访问,从而使得互联总线协议能够原生地、低成本地支持多路径和多主机。Therefore, in order to solve the problem that the current bus interconnection technology does not meet the actual needs, the actual technical problems to be solved in this application include the following aspects: breaking the limitation of the existing PCIe tree structure from top to bottom, based on the existing conventional switch (switch), to realize the interconnection topology of the graph structure. For a node in the structure, there can be multiple paths and can be accessed by multiple hosts, so that the interconnection bus protocol can natively support multiple paths and multiple hosts at low cost.
请参阅图2,图2是本申请实施例提供的一种总线系统的结构示意图。本申请实施例的技术方案可以在图2举例所示的系统结构或类似的系统结构中具体实施。如图2所示,该总线系统10可以包括多个主节点、多个交换机和多个从节点,具体可以包括主节点100a、主节点100b和主节点100c等,交换机300a、交换机300b和交换机300c等,以及从节点200a、从节点200b和从节点200c。其中,该多个主节点、多个交换机和多个从节点之间可以通过总线(例如为片上总线(network on chip),或者其他任何可能的总线,比如amba总线等)连接构成图形结构,或者说该多个主节点、多个交换机和多个从节点构成的计算机网络的拓扑结构为图形结构,而非常规的树形结构。需要说明的是,任意主节点、交换机或者从节点均可以作为该图形结构中的一个节点,而在图形结构中,任意两个节点之间都可能相关,例如,主节点100a可以分别与交换机300a、交换机300b邻接(或者说直接连接,即主节点100a与交换机300a之间的物理链路没有其他设备),交换机300a、交换机300b和交换机300c彼此之间可以邻接,交换机300a也可以和从节点200a、从节点200b邻接,甚至,在从节点200a拥有多个端口的情况下,从节点200a还可以分别与主节点100a、主节点100c等邻接,等等,本申请实施例对此不作具体限定。应理解,本申请实施例旨在考虑到如今AI、自动驾驶等对计算需求越来越高的应用场景的普及,大胆突破现有PCIe总线互联拓扑中传统树形结构的限制,采用图形结构,令主节点、交换机和从节点之间的连接更加任意,整个拓扑的扩展性更强,如此,大量专有的计算设备(例如GPU和TPU等)便可以不断添加到总线上作为该图形结构中的从节点。并且使得该图形结构中的任意主节点可以通过多个交换机连接形成的多条路径访问并使用该图形结构内的任意从节点,最大限度地实现主节点使用各种计算资源和数据资源的无限制。Please refer to FIG. 2 . FIG. 2 is a schematic structural diagram of a bus system provided by an embodiment of the present application. The technical solutions of the embodiments of the present application may be specifically implemented in the system structure shown in FIG. 2 or a similar system structure. As shown in Figure 2, the bus system 10 may include a plurality of master nodes, a plurality of switches and a plurality of slave nodes, specifically may include a master node 100a, a master node 100b and a master node 100c, etc., a switch 300a, a switch 300b and a switch 300c etc., and slave node 200a, slave node 200b, and slave node 200c. Wherein, the plurality of master nodes, the plurality of switches and the plurality of slave nodes can be connected by a bus (for example, a network on chip, or any other possible bus, such as an amba bus, etc.) to form a graph structure, or It is said that the topology structure of the computer network composed of multiple master nodes, multiple switches and multiple slave nodes is a graph structure rather than a conventional tree structure. It should be noted that any master node, switch or slave node can be used as a node in the graph structure, and in the graph structure, any two nodes may be related, for example, the master node 100a can be connected to the switch 300a respectively , the switch 300b is adjacent (or directly connected, that is, the physical link between the master node 100a and the switch 300a has no other equipment), the switch 300a, the switch 300b and the switch 300c can be adjacent to each other, and the switch 300a can also be connected to the slave node 200a , adjacent to the slave node 200b, and even, when the slave node 200a has multiple ports, the slave node 200a can also be adjacent to the master node 100a, master node 100c, etc., etc., and the embodiment of the present application does not specifically limit this. It should be understood that the purpose of this embodiment of the application is to take into account the popularity of AI, autonomous driving and other application scenarios with increasingly higher computing requirements, boldly break through the limitations of the traditional tree structure in the existing PCIe bus interconnection topology, and adopt a graph structure, The connection between the master node, switch and slave nodes is more arbitrary, and the entire topology is more scalable, so that a large number of proprietary computing devices (such as GPUs and TPUs, etc.) can be continuously added to the bus as the graph structure of slave nodes. And any master node in the graph structure can access and use any slave node in the graph structure through multiple paths formed by multiple switch connections, maximizing the unlimited use of various computing resources and data resources by the master node .
综上,主节点100a、主节点100b和主节点100c等中均可以包括一个或多个CPU,可选地,还可以包括主存、高速缓冲存储器(例如cache)、内部互联总线、IO接口等等,本申请 实施例对此不作具体限定,可选地,主节点也可以认为是一个拥有上述各部件的计算系统;交换机300a、交换机300b和交换机300c等可以包括多个端口,在另外一些可能的实施例中,交换机还可以拥有相应的拥塞控制和服务质量(quality of service,QOS)功能;从节点200a、从节点200b和从节点200c等可以是通用的GPU,TPU,某处理器单元XPU等计算设备,还可以是固态硬盘(solid state drives,SSD)等存储设备,还可以是拥有特定计算功能的加速器,智能网卡,甚至也可以是交换机(例如网络交换机),等等,本申请实施例对此不作具体限定。To sum up, the master node 100a, the master node 100b, and the master node 100c may include one or more CPUs, and optionally, may also include a main memory, a cache memory (such as a cache), an internal interconnection bus, an IO interface, etc. etc., this embodiment of the present application does not specifically limit it. Optionally, the master node can also be considered as a computing system with the above-mentioned components; the switch 300a, the switch 300b, and the switch 300c can include multiple ports, and in other possible In the embodiment, switch can also have corresponding congestion control and service quality (quality of service, QOS) function; From node 200a, from node 200b and from node 200c etc. can be general-purpose GPU, TPU, certain processor unit XPU It can also be a storage device such as a solid state drive (solid state drives, SSD), or an accelerator with a specific computing function, a smart network card, or even a switch (such as a network switch), etc., and the implementation of this application The example does not specifically limit this.
进一步地,请参阅图3,图3是本申请实施例提供的一种多路径总线子系统的结构示意图。在上述图形结构的总线系统10中,可以包括一个或多个多路径总线子系统,其任意一个多路径总线子系统均可以包括主节点、从节点和交换机。例如,如图3所示,多路径总线子系统10a可以包括主节点100a(即第一主节点)、从节点200a(即第一从节点)和N个交换机,N可以为大于或者等于1的整数。其中,N个交换机中可以包括与第一主节点100a邻接的N1个第一交换机(例如图3中的第一交换机11、第一交换机12等),与第一从节点200a邻接的N3个第三交换机(例如图3中的第三交换机31、第三交换机32等),以及N2个第二交换机(例如图3中的第二交换机21、第二交换机22、第二交换机23等)。Further, please refer to FIG. 3 . FIG. 3 is a schematic structural diagram of a multi-path bus subsystem provided by an embodiment of the present application. The bus system 10 of the above graph structure may include one or more multi-path bus subsystems, and any one of the multi-path bus subsystems may include a master node, a slave node and a switch. For example, as shown in Figure 3, the multipath bus subsystem 10a may include a master node 100a (i.e. the first master node), a slave node 200a (i.e. the first slave node) and N switches, where N may be greater than or equal to 1 integer. Wherein, the N switches may include N1 first switches adjacent to the first master node 100a (such as the first switch 11 and the first switch 12 in FIG. 3 ), and the N3 first switches adjacent to the first slave node 200a. Three switches (such as the third switch 31, the third switch 32, etc. in FIG. 3), and N2 second switches (such as the second switch 21, the second switch 22, the second switch 23, etc. in FIG. 3).
其中,任意一个第一交换机可以与任意一个第三交换机邻接,或者可以通过N2个第二交换机中的一个或多个第二交换机连接。例如,第一交换机11可以与第三交换机31邻接;又例如,第一交换机11可以通过第二交换机21与第三交换机31连接,显然,此时该第二交换机21分别与第一交换机11和第三交换机31邻接;还例如,第一交换机11可以依次通过第二交换机22以及第二交换机23,从而与第三交换机31连接,显然,此时该第二交换机22分别与第一交换机11和第二交换机23邻接,且该第二交换机23还与第三交换机31邻接,等等,本申请实施例对此不作具体限定。其中,N1、N2、N3可以均为小于或者等于N的正整数。Wherein, any one of the first switches may be adjacent to any one of the third switches, or may be connected through one or more second switches among the N2 second switches. For example, the first switch 11 can be adjacent to the third switch 31; for another example, the first switch 11 can be connected to the third switch 31 through the second switch 21, obviously, at this time, the second switch 21 is connected to the first switch 11 and the third switch 31 respectively. The third switch 31 is adjacent; also for example, the first switch 11 can pass through the second switch 22 and the second switch 23 in turn, thereby being connected with the third switch 31, obviously, at this moment, the second switch 22 is respectively connected with the first switch 11 and the second switch 23. The second switch 23 is adjacent, and the second switch 23 is also adjacent to the third switch 31, and so on, which is not specifically limited in this embodiment of the present application. Wherein, N1, N2, and N3 may all be positive integers less than or equal to N.
具体地,主节点100a可以向该从节点200a多次发送枚举报文,并基于每次枚举报文从主节点100a到从节点200a经由的交换机,确定主节点100a与从节点200a之间的多条路由路径。其中,每条路由路径至少可以依次经由N1个第一交换机中的一个或多个、N2个第二交换机中的S个、以及N3个第三交换机中的一个或多个。如此,实现了对一个从节点的多路径访问,或者说实现了一个主节点可以通过多条路由路径访问同一从节点。其中,S为小于或者等于N3的自然数,即S可以等于0,此时主节点100a与从节点200a之间的路由路径可以仅经由第一交换机和第三交换机。在一些可能的实施例中,主节点100a与从节点200a之间的路由路径也可以仅经由N1个第一交换机中的一个或多个,或者仅经由N3个第三交换机中的一个或多个,等等,本申请实施例对此不作具体限定。Specifically, the master node 100a may send multiple reports to the slave node 200a, and based on each report message from the master node 100a to the switch passed by the slave node 200a, determine the distance between the master node 100a and the slave node 200a. multiple routing paths. Wherein, each routing path may pass through at least one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches in sequence. In this way, multi-path access to a slave node is realized, or a master node can access the same slave node through multiple routing paths. Wherein, S is a natural number less than or equal to N3, that is, S may be equal to 0, and at this time, the routing path between the master node 100a and the slave node 200a may only pass through the first switch and the third switch. In some possible embodiments, the routing path between the master node 100a and the slave node 200a may only pass through one or more of the N1 first switches, or only through one or more of the N3 third switches , etc., which are not specifically limited in this embodiment of the present application.
需要说明的是,本申请实施例旨在基于图形结构实现对从节点的多路径访问,而对于主节点、从节点以及交换机之间的具体连接情况不作具体限定。下面将通过几种可能的连接情况的举例,对本申请实施例提供的技术方案进行详细阐述。本申请中的连接情况可以包括但不限于以下举例的几种。It should be noted that the embodiment of the present application aims to implement multipath access to the slave nodes based on the graph structure, and does not specifically limit the specific connections between the master nodes, slave nodes, and switches. The technical solutions provided by the embodiments of the present application will be described in detail below through examples of several possible connection situations. The connection situations in this application may include but not limited to the following examples.
可选地,请参阅图4a,图4a是本申请例提供的另一种多路径总线子系统的结构示意图。如图4a所示,该多路径总线子系统10a具体可以包括主节点100a、从节点200a、第一交换机11、第一交换机12、第二交换机21、第三交换机31和第三交换机32。其中,第一交换机11、第一交换机12与主节点100a邻接,第三交换机31、第三交换机32与从节点200a邻接, 第一交换机11与第三交换机32通过第二交换机21连接,第一交换机11还与第三交换机31邻接,第三交换机31还与主节点100a邻接。Optionally, please refer to FIG. 4a , which is a schematic structural diagram of another multipath bus subsystem provided in the present application example. As shown in FIG. 4 a , the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 , a first switch 12 , a second switch 21 , a third switch 31 and a third switch 32 . Wherein, the first switch 11 and the first switch 12 are adjacent to the master node 100a, the third switch 31 and the third switch 32 are adjacent to the slave node 200a, the first switch 11 and the third switch 32 are connected through the second switch 21, the first The switch 11 is also adjacent to the third switch 31, and the third switch 31 is also adjacent to the master node 100a.
基于此,主节点100a通过枚举软件从自身端口向外多次发出枚举报文后,能够发现并记录其与从节点200a之间的多条路由路径,后续主节点100a可以基于该多条路由路径中的任意一条访问从节点200a并使用其中的计算资源或者对其功能配置进行管理,等等。如图4a所示,该多条路由路径可以包括:(1)主节点100a→第一交换机11→第二交换机21→第三交换机32→从节点200a;(2)主节点100a→第一交换机11→第三交换机31→从节点200a;(3)主节点100a→第三交换机31→从节点200a,此时即可以为上述;(4)主节点100a→第一交换机12→第三交换机32→从节点200a。Based on this, the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending enumeration reports from its own port through the enumeration software, and the subsequent master node 100a can Any one of the routing paths accesses the slave node 200a and uses computing resources therein or manages its functional configuration, and so on. As shown in Figure 4a, the multiple routing paths may include: (1) master node 100a→first switch 11→second switch 21→third switch 32→slave node 200a; (2) master node 100a→first switch 11→the third switch 31→slave node 200a; (3) master node 100a→the third switch 31→slave node 200a, which can be the above at this moment; (4) master node 100a→the first switch 12→the third switch 32 → Slave node 200a.
可选地,请参阅图4b,图4b是本申请例提供的又一种多路径总线子系统的结构示意图。如图4a所示,该多路径总线子系统10a具体可以包括主节点100a、从节点200a、第一交换机11、第一交换机12、第三交换机31和第三交换机32。其中,第一交换机11、第一交换机12与主节点100a邻接,第三交换机31、第三交换机32与从节点200a邻接,第一交换机11还分别与第一交换机12、第三交换机邻接,第三交换机32还分别与第一交换机12、第三交换机31邻接。Optionally, please refer to FIG. 4b. FIG. 4b is a schematic structural diagram of another multipath bus subsystem provided in this application example. As shown in FIG. 4 a , the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 , a first switch 12 , a third switch 31 and a third switch 32 . Wherein, the first switch 11 and the first switch 12 are adjacent to the master node 100a, the third switch 31 and the third switch 32 are adjacent to the slave node 200a, the first switch 11 is also adjacent to the first switch 12 and the third switch respectively, and the third switch 31 and the third switch 32 are adjacent to the slave node 200a. The three switches 32 are also adjacent to the first switch 12 and the third switch 31 respectively.
基于此,主节点100a通过枚举软件从自身端口向外多次发出枚举报文后,能够发现并记录其与从节点200a之间的多条路由路径,可以包括:(1)主节点100a→第一交换机11→第三交换机31→从节点200a;(2)主节点100a→第一交换机11→第三交换机31→第三交换机32→从节点200a;(3)主节点100a→第一交换机11→第一交换机12→第三交换机32→从节点200a;(4)主节点100a→第一交换机11→第一交换机12→第三交换机32→第三交换机31→从节点200a;(5)主节点100a→第一交换机12→第三交换机32→从节点200a;(6)主节点100a→第一交换机12→第三交换机32→第三交换机31→从节点200a;(7)主节点100a→第一交换机12→第一交换机11→第三交换机31→从节点200a;(8)主节点100a→第一交换机12→第一交换机11→第三交换机31→第三交换机32→从节点200a。Based on this, the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a → first exchange 11 → third exchange 31 → slave node 200a; (2) master node 100a → first exchange 11 → third exchange 31 → third exchange 32 → slave node 200a; (3) master node 100a → first Switch 11→the first switch 12→the third switch 32→slave node 200a; (4) master node 100a→the first switch 11→the first switch 12→the third switch 32→the third switch 31→slave node 200a; (5 ) master node 100a→first switch 12→third switch 32→slave node 200a; (6) master node 100a→first switch 12→third switch 32→third switch 31→slave node 200a; (7) master node 100a→first switch 12→first switch 11→third switch 31→slave node 200a; (8) master node 100a→first switch 12→first switch 11→third switch 31→third switch 32→slave node 200a.
可选地,请参阅图4c,图4c是本申请例提供的又一种多路径总线子系统的结构示意图。如图4a所示,该多路径总线子系统10a具体可以包括主节点100a、从节点200a、第一交换机11和第一交换机12。其中,第一交换机11、第一交换机12均分别与主节点100a和从节点200a邻接,第一交换机11还与第一交换机12邻接。Optionally, please refer to FIG. 4c, which is a schematic structural diagram of another multi-path bus subsystem provided in this application example. As shown in FIG. 4 a , the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, a first switch 11 and a first switch 12 . Wherein, the first switch 11 and the first switch 12 are adjacent to the master node 100 a and the slave node 200 a respectively, and the first switch 11 is also adjacent to the first switch 12 .
基于此,主节点100a通过枚举软件从自身端口向外多次发出枚举报文后,能够发现并记录其与从节点200a之间的多条路由路径,可以包括:(1)主节点100a→第一交换机11→从节点200a;(2)主节点100a→第一交换机11→第一交换机12→从节点200a;(3)主节点100a→第一交换机12→从节点200a;(4)主节点100a→第一交换机12→第一交换机11→从节点200a。Based on this, the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a → first switch 11 → slave node 200a; (2) master node 100a → first switch 11 → first switch 12 → slave node 200a; (3) master node 100a → first switch 12 → slave node 200a; (4) Master node 100a→first switch 12→first switch 11→slave node 200a.
可选地,请参阅图4d,图4d是本申请例提供的又一种多路径总线子系统的结构示意图。如图4a所示,该多路径总线子系统10a具体可以包括主节点100a、从节点200a和第一交换机11。其中,第一交换机11分别与主节点100a和从节点200a邻接,并且,从节点200a还直接与主节点100a邻接。Optionally, please refer to FIG. 4d. FIG. 4d is a schematic structural diagram of another multipath bus subsystem provided in the present application example. As shown in FIG. 4 a , the multipath bus subsystem 10 a may specifically include a master node 100 a, a slave node 200 a, and a first switch 11 . Wherein, the first switch 11 is respectively adjacent to the master node 100a and the slave node 200a, and the slave node 200a is also directly adjacent to the master node 100a.
基于此,主节点100a通过枚举软件从自身端口向外多次发出枚举报文后,能够发现并记录其与从节点200a之间的多条路由路径,可以包括:(1)主节点100a→第一交换机11→从节点200a;(2)主节点100a→从节点200a。Based on this, the master node 100a can discover and record multiple routing paths between itself and the slave node 200a after sending the enumeration report multiple times from its own port through the enumeration software, which may include: (1) the master node 100a → first switch 11 → slave node 200a; (2) master node 100a → slave node 200a.
综上,需要说明是,本申请实施例中描述交换机的“第一”、“第二”和“第三”并非特指某一个交换机,而是用于表述交换机与主节点、从节点的不同连接情况。例如,如图4a所示,第三交换机31分别与主节点100a和从节点200a邻接,则基于前述论述,其也可以称之为第一交换机;又例如,如图4c所示,第一交换机11、第一交换机12均分别与主节点100a和从节点200a邻接,则基于前述论述,其也可以称之为第三交换机,等等。一般情况下,由于交换机的端口较多,则基于图形结构,其邻接关系可以较为任意、复杂,例如图4a中的第二交换机21的其他端口还可以分别与主节点100a和第三交换机31或者从节点200a等邻接,等等,本申请实施例对此不作具体限定。In summary, it needs to be explained that the "first", "second" and "third" in the description of the switch in the embodiment of the present application do not refer to a certain switch, but are used to describe the difference between the switch and the master node and the slave node. Connection status. For example, as shown in Figure 4a, the third switch 31 is respectively adjacent to the master node 100a and the slave node 200a, based on the foregoing discussion, it can also be referred to as the first switch; for another example, as shown in Figure 4c, the first switch 11. The first switch 12 is adjacent to the master node 100a and the slave node 200a respectively, so based on the foregoing discussion, it can also be called the third switch, and so on. In general, because the ports of the switch are many, based on the graph structure, its adjacency relationship can be relatively arbitrary and complicated. For example, other ports of the second switch 21 in FIG. The slave node 200a and the like are adjacent, etc., which are not specifically limited in this embodiment of the present application.
进一步地,基于上述各种可能的多路径总线子系统的描述,下面将对多路径下的枚举发现流程进行详细阐述。Further, based on the above descriptions of various possible multipath bus subsystems, the enumeration discovery process under multipath will be described in detail below.
可选地,主节点100a可以基于向从节点200a发送的枚举报文,查询从节点200a的路由状态寄存器中的可视比特(visited bit),若该可视比特为0,则主节点100a可以向从节点200a分配相应的设备编号;其中,可视比特为0可以用于指示从节点200a当前未被枚举发现。Optionally, the master node 100a can query the visible bit (visited bit) in the routing status register of the slave node 200a based on the enumeration message sent to the slave node 200a, if the visible bit is 0, the master node 100a A corresponding device number can be assigned to the slave node 200a; wherein, the visible bit is 0, which can be used to indicate that the slave node 200a is not currently enumerated and found.
相应的,从节点200a可以将主节点100a分配的设备编号保存至路由状态寄存器中,并将可视比特设置为1;其中,所述可视比特为1可以用于指示从节点200a当前已被枚举发现。相当于主节点100a在第一次经由某一条路由路径枚举到该从节点200a时,可以将其原来为0的可视比特设置为1,并向其分配设备编号,从而使得主节点100a在经由其他路由路径发送枚举报文到该从节点200a后,可以根据其可视比特已为1确定该从节点200a已被枚举发现过,从而无需再一次向其分配设备编号,只需要记录此次发送枚举报文经由的新的路由路径即可。如此,可以以一种记录历史发现状态的形式来实现多路径下的枚举,避免多路径下的重复枚举和无限循环,从而高效、准确地完成枚举。Correspondingly, the slave node 200a can save the device number assigned by the master node 100a into the routing status register, and set the visible bit to 1; wherein, the visible bit being 1 can be used to indicate that the slave node 200a has been Enumeration found. It is equivalent to that when the master node 100a enumerates to the slave node 200a via a certain routing path for the first time, it can set its visible bit that was originally 0 to 1, and assign a device number to it, so that the master node 100a is in After sending an enumeration report message to the slave node 200a via other routing paths, it can be determined that the slave node 200a has been enumerated and discovered according to its visible bit being 1, so that there is no need to assign a device number to it again, only need to record This time, the new routing path through which the report message is sent is sufficient. In this way, the enumeration under the multi-path can be realized in a form of recording the historical discovery state, avoiding repeated enumeration and infinite loop under the multi-path, so as to complete the enumeration efficiently and accurately.
可选地,主节点100a中可以安装并运行有相应的枚举软件,该枚举软件可以用于枚举发现整个互联总线拓扑中的设备。另外,本申请实施例中的各个从节点和常规的PCIe设备一样,有其配置空间,能够让软件访问其配置空间,以获取设备相关声明信息和对此设备进行管理,等等。例如,请参阅图5,图5是本申请实施例提供的另一种总线系统的结构示意图。如图5所示,该总线系统内包括主节点100a、交换机A、交换机B、设备C和设备D,其中,设备C和设备D可以为上述从节点200a、200b或者200c等,具体比如可以是GPU、TPU、SSD、加速器等。如图5所示,主节点100a包括两个端口,端口A1与交换机A连接,端口A2与交换机B连接,交换机A分别与交换机B以及设备C连接,交换机B还分别与设备C和设备D连接。显然,图5所示的总线系统中可以包括两个多路径总线子系统,分别为包括主节点100a、交换机A、交换机B和设备C的多路径总线子系统,以及包括主节点100a、交换机A、交换机B和设备D的多路径总线子系统,此处不再详述。Optionally, corresponding enumeration software may be installed and run on the master node 100a, and the enumeration software may be used to enumerate and discover devices in the entire interconnection bus topology. In addition, each slave node in the embodiment of the present application has its configuration space like a conventional PCIe device, allowing software to access its configuration space to obtain device-related declaration information and manage the device, and so on. For example, please refer to FIG. 5 , which is a schematic structural diagram of another bus system provided by an embodiment of the present application. As shown in Figure 5, the bus system includes a master node 100a, a switch A, a switch B, a device C and a device D, wherein the device C and the device D can be the above-mentioned slave nodes 200a, 200b or 200c, etc., specifically, it can be GPU, TPU, SSD, accelerator, etc. As shown in Figure 5, the master node 100a includes two ports, port A1 is connected to switch A, port A2 is connected to switch B, switch A is connected to switch B and device C respectively, and switch B is also connected to device C and device D respectively . Apparently, the bus system shown in FIG. 5 may include two multipath bus subsystems, which are respectively the multipath bus subsystem including master node 100a, switch A, switch B and device C, and the multipath bus subsystem including master node 100a, switch A , the multipath bus subsystem of switch B and device D, which will not be described in detail here.
下面以图5为例,对多路径下的枚举发现流程进行进一步详细阐述,该枚举发现流程具体可以包括如下步骤:Taking Figure 5 as an example below, the enumeration discovery process under multipath is further elaborated in detail. The enumeration discovery process may specifically include the following steps:
step1,主节点100a中运行的枚举软件可以从主节点100a的端口A1出发,按照图的广度优先搜索算法进行搜索。可选地,枚举软件可以获取当前端口A1对应的物理链路是否已经建链,是否可以进行数据的传输。若枚举软件确定端口A1对应的物理链路已经建链,并且可以进行数据的传输,则可以从端口A1发出枚举报文(或者称之为枚举访问)至与端口A1直接物理相连的设备,即图5中的交换机A。Step1, the enumeration software running on the master node 100a can start from the port A1 of the master node 100a, and search according to the breadth-first search algorithm of the graph. Optionally, the enumeration software can obtain whether the physical link corresponding to the current port A1 has been established and whether data transmission can be performed. If the enumeration software determines that the physical link corresponding to port A1 has been established and data transmission can be performed, an enumeration report (or called enumeration access) can be sent from port A1 to the physical link directly connected to port A1. The device is switch A in Figure 5.
可选地,在发送枚举报文前,基本输入输出系统(basic input/output system,BIOS)可 以基于约定的描述,通过特定的接口向主节点100a中的枚举软件上报本地总线接口情况,例如本地的总线端口有多少个,对应访问地址等信息,比如图5中的交换机A、交换机B端口、设备C和设备D共有多少个端口等。Optionally, before sending the enumeration report message, the basic input/output system (basic input/output system, BIOS) can report the local bus interface situation to the enumeration software in the master node 100a through a specific interface based on the agreed description, For example, how many local bus ports are there, corresponding access addresses and other information, such as how many ports are there in switch A, switch B ports, device C, and device D in FIG. 5 .
step 2,交换机A接收主节点100a发送的枚举报文,并返回该枚举报文的响应,枚举软件可以基于该响应确定该交换机A是合法存在的设备,并且是一个switch设备。Step 2. Switch A receives the enumeration report message sent by the master node 100a, and returns a response to the enumeration report message. Based on the response, the enumeration software can determine that switch A is a legally existing device and is a switch device.
step3,枚举软件按照管理主机签名流程(即管理主节点签名流程)对交换机A进行管理主机签名(即管理主节点签名)。可选地,如图5所示,该签名流程可以包括,将交换机A的配置空间A中的管理主节点信息寄存器(即管理主机信息寄存器)内的签名比特设置为1,并将主节点100a的主节点编号(即图5中的管理主机编号(component identity document,CID))和此次签名时的主节点密码,即主机密码(host key)写入管理主节点信息寄存器中,从而完成主节点100a对交换机A的管理权限的获取,即确定主节点100a成为交换机A的管理主节点。可选地,step3具体可参考下述图6对应实施例中的描述,此处不再展开赘述。Step3, the enumeration software performs the management host signature (ie, the management master node signature) on switch A according to the management host signature process (ie, the management master node signature process). Optionally, as shown in FIG. 5 , the signature process may include setting the signature bit in the management master node information register (ie, the management host information register) in the configuration space A of the switch A to 1, and setting the master node 100a The master node number (that is, the management host number (component identity document, CID) in Figure 5) and the master node password at the time of signing, that is, the host password (host key) are written into the management master node information register, thereby completing the master node The node 100a obtains the management authority of the switch A, that is, determines that the master node 100a becomes the management master node of the switch A. Optionally, for step3, reference may be made to the description in the following embodiment corresponding to FIG. 6 , and details are not repeated here.
step4,枚举软件继续通过端口A1发出枚举报文去读交换机A的配置空间A中的路由状态寄存器,查看交换机A的路由状态寄存器的可视比特是否为0。Step4, the enumeration software continues to send an enumeration report message through port A1 to read the routing status register in the configuration space A of switch A, and check whether the visible bit of the routing status register of switch A is 0.
step5,交换机A返回其路由状态寄存器的可视比特为0,进而枚举软件确定该交换机A还没有被枚举发现过。如图5所示,此时,枚举软件可以为交换机A分配设备编号为cid1,并且交换机A可以把该设备编号(即cid1)直接写入路由状态寄存器对应的CID值比特域中,同时,还可以将原来为0的可视比特设置为1。Step5, the switch A returns that the visible bit of its routing status register is 0, and then the enumeration software determines that the switch A has not been found by enumeration. As shown in Figure 5, at this time, the enumeration software can assign the device number to switch A as cid1, and switch A can directly write the device number (i.e. cid1) into the corresponding CID value bit field of the routing status register, and at the same time, Visible bits that were originally 0 may also be set to 1.
step6,如上所述,由于枚举软件可以获知交换机A是一个switch设备,因此枚举软件还可以读交换机A的相关端口状态寄存器,从而可以获知交换机A中有哪些端口是已经物理建链的。Step6, as mentioned above, since the enumeration software can know that switch A is a switch device, the enumeration software can also read the relevant port status register of switch A, so as to know which ports in switch A have been physically linked.
step7,枚举软件进一步从交换机A的一个物理建链的端口出发,发送枚举报文至设备C,通过设备C返回的响应确定该设备C是合法存在的设备。Step7, the enumeration software further sends an enumeration report message to device C from a physical link-building port of switch A, and confirms that device C is a legally existing device through the response returned by device C.
step8,枚举软件通过端口A1→交换机A→设备C的路径,参考step 3~step 5的流程,对设备C进行管理主节点签名,并为设备C分配没有被分配给其他设备的设备编号(图5中以cid2为例)。Step8, enumerate the path of the software through port A1→switch A→device C, refer to the process of step 3~step 5, sign the management master node of device C, and assign a device number that is not assigned to other devices to device C ( In Figure 5, cid2 is taken as an example).
step9,枚举软件通过端口A1→交换机A→交换机B的路径,参考step 7~step 8的流程,完成对交换机B的枚举发现,对交换机B进行管理主节点签名,并为交换机B分配相应的设备编号(图5中以cid3为例)。Step9, the enumeration software passes through the path of port A1→switch A→switch B, refer to the process of step 7~step 8, complete the enumeration and discovery of switch B, sign the management master node of switch B, and assign corresponding device number (cid3 is taken as an example in Figure 5).
step10,对于交换机B,同样是switch设备,枚举软件参考step 6~step 8的流程,通过端口A1→交换机A→交换机B→设备C的路径访问到了设备C,此时枚举软件发现设备C的路由状态寄存器的可视比特已经置位(也即可视比特已经设置为1),并且发现设备C的管理主节点签名寄存器已经完成签名(即签名比特已经设置为1),故此时枚举软件对设备C不再进行分配设备编号等动作,仅记录此次主节点100a到设备C之间新的路由路径。Step10, for switch B, which is also a switch device, the enumeration software refers to the process of step 6~step 8, and accesses device C through the path of port A1→switch A→switch B→device C. At this time, the enumeration software discovers device C The visible bit of the routing status register has been set (that is, the visible bit has been set to 1), and it is found that the signature register of the management master node of device C has completed the signature (that is, the signature bit has been set to 1), so the enumeration The software no longer assigns device numbers to device C, but only records the new routing path between the master node 100a and device C this time.
step11,参考step 7~step 8的流程,枚举软件通过端口A1→交换机A→交换机B→设备D的路径,完成对设备D的枚举发现,对设备D进行管理主节点签名,并为设备D分配相应的设备编号(图5中以cid4为例)。Step11, referring to the process of step 7~step 8, the enumeration software completes the enumeration and discovery of device D through the path of port A1→switch A→switch B→device D, signs the management master node of device D, and registers for the device D assigns the corresponding device number (cid4 is taken as an example in Figure 5).
step12,枚举软件从主节点100a的端口A2出发,通过端口A2→交换机B、端口A2→交换机B→设备C、端口A2→交换机B→设备D、端口A2→交换机B→交换机A、端口A2→交换机B→交换机A→设备C的路径分别枚举发现到交换机B、设备C、设备D、交换机A和设备C。显然,基于前述步骤可知,枚举软件已经通过端口A1完成对这些设备的枚举发现 了,因此,此次枚举发现过程中不再对这些设备进行进一步的枚举发现动作,即不会再一次分配设备编号,仅记录主节点100a到交换机A、交换机B、设备C、设备D之间新的路由路径。step12, the enumeration software starts from port A2 of the master node 100a, through port A2→switch B, port A2→switch B→device C, port A2→switch B→device D, port A2→switch B→switch A, port A2 →Switch B→Switch A→Device C enumerates and discovers the paths to Switch B, Device C, Device D, Switch A, and Device C respectively. Obviously, based on the above steps, the enumeration software has already completed the enumeration and discovery of these devices through port A1. Therefore, no further enumeration and discovery of these devices will be performed during this enumeration discovery process, that is, no further enumeration and discovery actions will be performed on these devices. The equipment number is assigned once, and only the new routing path between the master node 100a and the switch A, switch B, device C, and device D is recorded.
step13,至此,主节点100a通过相应的枚举软件完成了对图5所示总线系统中的所有设备和拓扑的发现枚举。step13, so far, the master node 100a has completed the discovery and enumeration of all devices and topologies in the bus system shown in FIG. 5 through corresponding enumeration software.
如上所述,在主节点枚举发现到某一个设备(可以包括交换机和从节点)时,可以对其进行管理主机签名,从而获取对该设备的管理权限。其中,该管理权限包括但不限于对设备之间的资源竞争进行仲裁,对设备的异常进行处理,对设备的基本特性(比如最大支持的报文长度)进行管理,以及对设备的功能进行管理,例如允不允许其使用某个功能(比如物理链路相关的功能),等等,本申请实施例对此不作具体限定。As mentioned above, when the master node enumerates and discovers a certain device (which may include a switch and a slave node), it can sign the management host to obtain the management authority of the device. Among them, the management authority includes but is not limited to arbitrating resource competition among devices, handling device exceptions, managing basic characteristics of devices (such as the maximum supported packet length), and managing device functions , for example, whether it is allowed to use a certain function (such as a function related to a physical link), and so on, which is not specifically limited in this embodiment of the present application.
可选地,请参阅图6,图6是本申请实施例提供的一种管理主机签名流程示意图。如图6所示,该管理主机签名流程可以包括如下步骤。Optionally, please refer to FIG. 6 . FIG. 6 is a schematic diagram of a management host signature flow provided by an embodiment of the present application. As shown in FIG. 6, the management host signing process may include the following steps.
S11,主节点向设备发送配置报文,尝试获取对该设备的管理权限。其中,该主节点例如为上述主节点100a,该设备例如为上述从节点200a,或者图5中的交换机A、设备C等等。其中,该配置报文(例如为第一配置报文)旨在用于写配置空间的管理主节点信息寄存器中的签名比特设置为1,并且,该配置报文还可以携带主节点的主节点编号以及主节点密码。其中,签名比特为1用于指示设备当前已完成签名,也即已拥有管理主节点,其他主节点不能再获取对该设备的管理权限,如此,可以确保设备在较长一段时间内拥有唯一的管理主节点,保证整个拓扑的清晰可管理性。需要说明的是,一个主节点可以获取对多个设备的管理权限,另外,主节点在获取对不同设备的管理权限时,发送的配置报文所携带的主节点密码可以不同。S11, the master node sends a configuration message to the device, trying to obtain the management authority of the device. Wherein, the master node is, for example, the above-mentioned master node 100a, and the device is, for example, the above-mentioned slave node 200a, or the switch A, device C, etc. in FIG. 5 . Wherein, the configuration message (for example, the first configuration message) is intended to be used to set the signature bit in the management master node information register of the configuration space to 1, and the configuration message can also carry the master node of the master node number and masternode password. Among them, the signature bit is 1 to indicate that the device has completed the signature at present, that is, it already has a management master node, and other master nodes can no longer obtain the management authority of the device. In this way, it can ensure that the device has a unique Manage master nodes to ensure clear manageability of the entire topology. It should be noted that one master node can obtain management rights to multiple devices. In addition, when the master node obtains management rights to different devices, the master node passwords carried in the configuration messages sent can be different.
S12,设备接收主节点发送的配置报文,确定该配置报文是用于写其管理主节点信息寄存器中的签名比特为1。S12. The device receives the configuration message sent by the master node, and determines that the configuration message is used to write the signature bit in its management master node information register to be 1.
S13,设备查询管理主节点信息存器中的签名比特是否为0,若是,则执行S14,否则执行S15。其中,签名比特为0用于指示该设备当前未完成签名,也即未拥有管理主节点,系统内的主节点均有机会获取对该设备的管理权限。S13, the device inquires whether the signature bit in the management master node information register is 0, if yes, execute S14, otherwise execute S15. Among them, the signature bit is 0 to indicate that the device has not completed the signature currently, that is, it does not have a management master node, and the master nodes in the system have the opportunity to obtain the management authority of the device.
S14,设备将管理主节点信息寄存器中的签名比特设置为1,并将配置报文所携带的主节点密码和主节点编号保存至管理主节点信息寄存器中,至此,主节点完成对该设备的管理主节点签名,获取了对该设备的管理权限。可选地,设备可以保证此次签名所保存的主节点密码不被其他非校验过的主节点读取和修改。可选地,后续该设备的管理主节点对该设备的所有操作都需要先验证该操作携带的主节点密码和主节点编号是否和设备保存的一致,若不一致,则设备可以认为该操作并非由该设备的管理主节点发出,可以不作响应,从而保证设备管理的安全性。S14. The device sets the signature bit in the management master node information register to 1, and saves the master node password and master node number carried in the configuration message to the management master node information register. So far, the master node has completed the authentication of the device. Manage the signature of the master node and obtain the management authority of the device. Optionally, the device can guarantee that the masternode password saved in this signature will not be read and modified by other non-verified masternodes. Optionally, all subsequent operations of the device by the management master node of the device need to first verify whether the master node password and master node number carried by the operation are consistent with those saved by the device. The management master node of the device may not respond, thus ensuring the safety of device management.
S15,设备对发送配置报文的主节点进行校验,判断其是否为该设备的管理主节点,若是,则执行S16,否则执行S17。其中,若当前访问对应的请求者(或者说源头)的ID信息(即该配置报文携带的主节点编号)与该设备的管理主节点信息寄存器中保存的主节点编号一致,并且该配置报文携带的主节点密码和管理主节点信息寄存器中保存的主节点密码一致。则可以确定此次发送配置报文的主节点为该设备的管理主节点。相应的,若符合主节点编号不一致、主节点密码不一致、没有携带主节点密码中的任意一项或者多项,则可判定此次发送配置报文的主节点并非该设备的管理主节点。S15, the device verifies the master node sending the configuration message, and judges whether it is the management master node of the device, if yes, executes S16, otherwise executes S17. Among them, if the ID information of the requester (or source) corresponding to the current access (that is, the master node number carried by the configuration message) is consistent with the master node number saved in the management master node information register of the device, and the configuration report The master node password carried in the file is consistent with the master node password saved in the management master node information register. Then it can be determined that the master node sending the configuration message this time is the management master node of the device. Correspondingly, if the number of the master node is inconsistent, the password of the master node is inconsistent, or any one or more of the passwords of the master node are not carried, it can be determined that the master node sending the configuration message this time is not the management master node of the device.
S16,设备不修改管理主节点信息寄存器中的签名比特,即保持该签名比特为1,并返回配置成功的响应。S16. The device does not modify the signature bit in the information register of the management master node, that is, keeps the signature bit as 1, and returns a configuration success response.
S17,设备不修改管理主节点信息寄存器中的签名比特,即保持该签名比特为1,并返回配置失败的响应。可选地,设备也可以直接丢弃该配置报文,不作响应。S17, the device does not modify the signature bit in the management master node information register, that is, keeps the signature bit as 1, and returns a configuration failure response. Optionally, the device may also directly discard the configuration message without responding.
综上,本申请实施例定义了谁先签名成功,谁就成为管理主节点的机制,在系统可能包括多个主节点(例如图2所示的主节点100a、主节点100b和主节点100c)的情况下,使得设备的管理更加清晰。同时,设备可以保证管理主节点签名时配置的主节点密码不被其他非管理主节点读取和修改,且完成管理主节点签名后的设备将对后续每一次管理主节点的管理操作都进行管理主节点校验,极大程度上保证了设备管理的安全性。To sum up, the embodiment of this application defines the mechanism that whoever signs successfully first will become the management master node, and the system may include multiple master nodes (such as master node 100a, master node 100b, and master node 100c shown in Figure 2) In the case of the device, the management of the device is clearer. At the same time, the device can ensure that the master node password configured when the management master node signs is not read and modified by other non-management master nodes, and the device that completes the management master node signature will manage every subsequent management operation of the management master node The master node verification ensures the security of device management to a great extent.
可选地,在主节点获取到对设备的管理权限后,还可以主动取消对该设备的管理权限。例如,该主节点可能接收到其他主节点想要获取对该设备的管理权限的请求,或者在该主节点检测到其自身存在异常故障的情况下,为了保证该设备后续可以拥有新的正常工作的管理主节点,以保证设备的管理效率等,则可以主动取消对该设备的管理权限。请参阅图7,图7是本申请实施例提供的一种主机取消管理权限的流程示意图。如图7所示,该消管理权限的流程可以包括如下步骤。Optionally, after the master node obtains the management right to the device, it can actively cancel the management right to the device. For example, the master node may receive requests from other master nodes to obtain management rights to the device, or when the master node detects that it has an abnormal fault, in order to ensure that the device can have a new normal work in the future To ensure the management efficiency of the device, etc., the management master node can actively cancel the management authority of the device. Please refer to FIG. 7 . FIG. 7 is a schematic flowchart of a host canceling management rights provided by an embodiment of the present application. As shown in FIG. 7 , the process of canceling the management authority may include the following steps.
S21,主节点向设备发送配置报文,尝试取消对该设备的管理权限。其中,该主节点例如为上述主节点100a,该设备例如为上述从节点200a,或者图5中的交换机A、设备C等等。其中,该配置报文(例如为第二配置报文)旨在用于写配置空间的管理主节点信息寄存器中的签名比特设置为0,并且,该配置报文还可以携带主节点的主节点编号以及主节点密码。S21, the master node sends a configuration message to the device, trying to cancel the management authority of the device. Wherein, the master node is, for example, the above-mentioned master node 100a, and the device is, for example, the above-mentioned slave node 200a, or the switch A, device C, etc. in FIG. 5 . Wherein, the configuration message (for example, the second configuration message) is intended to be used to set the signature bit in the management master node information register of the configuration space to 0, and the configuration message can also carry the master node of the master node number and masternode password.
S22,设备接收主节点发送的配置报文,确定该配置报文是用于写其管理主节点信息寄存器中的签名比特为0。S22. The device receives the configuration message sent by the master node, and determines that the configuration message is used to write the signature bit in its management master node information register to 0.
S23,设备对发送配置报文的主节点进行校验,判断其是否为该设备的管理主节点,若是,则执行S24,否则执行S25。S23, the device verifies the master node sending the configuration message, and judges whether it is the management master node of the device, if yes, executes S24, otherwise executes S25.
S24,设备将管理主节点信息寄存器中的签名比特设置为0,并返回配置成功的响应。应理解,若该主节点为该设备的管理主节点,则显然说明该设备已拥有管理主节点,则该设备的管理主节点信息存器中的签名比特一般为1。S24. The device sets the signature bit in the information register of the management master node to 0, and returns a response that the configuration is successful. It should be understood that if the master node is the management master node of the device, it obviously indicates that the device already has a management master node, and the signature bit in the management master node information register of the device is generally 1.
S25,设备不修改管理主节点信息寄存器中的签名比特,并返回配置失败的响应。可选地,设备也可以直接丢弃该配置报文,不作响应。应理解,若该主节点并非该设备的管理主节点,此时,该设备的管理主节点信息寄存器中的签名比特可以为1,也可以为0。S25. The device does not modify the signature bits in the information register of the management master node, and returns a configuration failure response. Optionally, the device may also directly discard the configuration message without responding. It should be understood that if the master node is not the master management node of the device, at this time, the signature bit in the master management node information register of the device may be 1 or 0.
需要说明的是,本申请实施例对上述图6和图7所示步骤的执行顺序不作具体限定。It should be noted that, the embodiment of the present application does not specifically limit the execution sequence of the steps shown in FIG. 6 and FIG. 7 .
进一步的,在主节点获取到对设备的管理权限后,为保证管理主节点的实时在位,以及整个系(或者说总线拓扑)的健壮性,本申请实施例还定义了管理主节点的在位确认机制。Furthermore, after the master node obtains the management authority for the device, in order to ensure the real-time presence of the master management node and the robustness of the entire system (or bus topology), the embodiment of this application also defines the presence of the master management node. bit confirmation mechanism.
可选地,请参阅图8a,图8a是本申请实施例提供的一种管理主机在位确认流程示意图。该管理主机在位确认流程可以应用于上述图2-图5所示的系统结构中,图8a中涉及的主节点例如可以为上述主节点100a(即第一主节点),涉及的设备例如可以为上述从节点200a(即第一从节点)、或者图5中的交换机A、交换机B、设备C或者设备D等。Optionally, please refer to FIG. 8a. FIG. 8a is a schematic diagram of a management host presence confirmation process provided by an embodiment of the present application. The presence confirmation process of the management host can be applied to the above-mentioned system structure shown in Fig. 2-Fig. 5, the master node involved in Fig. It is the above-mentioned slave node 200a (that is, the first slave node), or the switch A, the switch B, the device C, or the device D in FIG. 5 .
如图8a所示,设备内部可以配置有计时器1,当某一主节点获取到对该设备的管理权限时(例如当设备的签名比特被设置为1时),计时器1便可以开始计时。如图8a所示,当计时器1数值等于X时(例如为200ms、3s、5s或者8s等),设备可以向其管理主节点发送查 询消息,尝试获取其管理主节点的在位情况,即设备可以按照预设的时间间隔(或者说预设的频率)主动向其管理主节点发送查询消息,相应的,其管理主节点接收该查询消息。可选地,在设备发送查询消息的同时,该设备内部的计时器2可以开始计时,若在计时器2数值等于Y时(例如为300ms、1s、5s或者7s等),设备仍旧没有接收到其管理主节点发送的在位消息,则可以通过该设备内的计数器计一次超时。如此,如图8a所示,在超时次数达到K次时(K为大于或者等于1的整数,例如为3次、5次或者7次等),即设备已向其管理主节点发送了K次查询消息,且每次均未收到其管理主节点发送的在位消息时,该设备可以认为其管理主节点异常,未在位,并且设备可以将其管理主节点信息寄存器中的签名比特重新设置为0,进一步地,还可以清除其管理主节点寄存器中保存的主节点编号和主节点密码,从而取消其原管理主节点的管理权限,此时,该设备处于未签名成功、未拥有管理主节点的状态。相应的,如图8a所示,若在此期间设备接收到了其管理主节点发送的在位消息,则可以确定该管理主节点正常在位,同时,设备可以清除计时器2和计数器当前的数值,也即将计时器2和计数器进行清零(或者说复位)。As shown in Figure 8a, a timer 1 can be configured inside the device. When a master node obtains the management authority of the device (for example, when the signature bit of the device is set to 1), the timer 1 can start counting . As shown in Figure 8a, when the value of timer 1 is equal to X (such as 200ms, 3s, 5s or 8s, etc.), the device can send a query message to its management master node to try to obtain the presence status of its management master node, namely The device may actively send a query message to its management master node according to a preset time interval (or a preset frequency), and correspondingly, its management master node receives the query message. Optionally, when the device sends the query message, the timer 2 inside the device can start counting. If the value of the timer 2 is equal to Y (for example, 300ms, 1s, 5s or 7s, etc.), the device still does not receive the query message. It manages the in-position message sent by the master node, and a timeout can be counted through the counter in the device. In this way, as shown in Figure 8a, when the number of timeouts reaches K times (K is an integer greater than or equal to 1, such as 3 times, 5 times or 7 times, etc.), that is, the device has sent K times to its management master node. Inquiry message, and does not receive the presence message sent by its management master node every time, the device can consider its management master node to be abnormal and not in place, and the device can reset the signature bit in its management master node information register Set it to 0. Further, it can also clear the master node number and master node password saved in its management master node register, thereby canceling the management authority of its original management master node. The state of the master node. Correspondingly, as shown in Figure 8a, if the device receives the presence message sent by its management master node during this period, it can be determined that the management master node is normally in place, and at the same time, the device can clear the timer 2 and the current value of the counter , that is, the timer 2 and the counter are cleared (or reset).
可选地,请参阅图8b,图8b是本申请实施例提供的另一种管理主机在位确认流程示意图。该管理主机在位确认流程可以应用于上述图2-图5所示的系统结构中,图8b中涉及的主节点例如可以为上述主节点100a(即第一主节点),涉及的设备例如可以为上述从节点200a(即第一从节点)、或者图5中的交换机A、交换机B、设备C或者设备D等。Optionally, please refer to FIG. 8b. FIG. 8b is a schematic diagram of another management host presence confirmation process provided by an embodiment of the present application. The presence confirmation process of the management host can be applied to the above-mentioned system structure shown in Fig. 2-Fig. 5, the master node involved in Fig. It is the above-mentioned slave node 200a (that is, the first slave node), or the switch A, the switch B, the device C, or the device D in FIG. 5 .
如图8b所示,主节点内部可以配置有计时器3,当该主节点获取到对该设备的管理权限时,计时器3便可以开始计时。如图8b所示,当计时器3数值等于W时(例如为200ms、3s或者10s等),主节点可以向其管理的设备发送在位消息,即管理主节点可以按照预设的时间间隔(例如第一时间间隔)主动向其管理的设备发送在位消息。相应的,设备可以接收该主节点发送的在位消息,并判断该消息是否来自于其管理主节点,若是,则设备可以确定其管理主节点正常在位,若否,则可以丢弃该消息,不需要做任何响应。As shown in FIG. 8b, a timer 3 may be configured inside the master node, and when the master node obtains the management authority of the device, the timer 3 may start timing. As shown in Figure 8b, when the value of the timer 3 is equal to W (such as 200ms, 3s or 10s, etc.), the master node can send an in-position message to the device it manages, that is, the management master node can follow the preset time interval ( For example, the first time interval) actively sends presence messages to the devices it manages. Correspondingly, the device can receive the in-position message sent by the master node, and judge whether the message comes from its master management node. If so, the device can determine that its master management node is in place normally. If not, it can discard the message. No response is required.
可选地,该设备中也可以配置有相应的计时器,该计时器也可以在主节点获取到对该设备的管理权限时开始计时,当该设备内的计时器等于预设时间(例如为W),且设备仍没有接收到其管理主节点主动发送的在位消息时,该设备可以认为其管理主节点异常,未在位,并且设备可以将其管理主节点信息寄存器中的签名比特重新设置为0,进一步地,还可以清除其管理主节点寄存器中保存的主节点编号和主节点密码,从而取消其原管理主节点的管理权限。相应的,若在此期间设备接收到了其管理主节点发送的在位消息,则可以确定该管理主节点正常在位,同时,设备可以对本地维护的计时器进行清零。Optionally, a corresponding timer can also be configured in the device, and the timer can also start counting when the master node obtains the management authority of the device. When the timer in the device is equal to a preset time (for example, W), and the device still has not received the in-position message actively sent by its management master node, the device can consider that its management master node is abnormal and not in position, and the device can reset the signature bit in its management master node information register If it is set to 0, furthermore, the master node number and master node password stored in its management master node register can be cleared, thereby canceling the management authority of its original management master node. Correspondingly, if the device receives the presence message sent by its master management node during this period, it can be determined that the master management node is normally in position, and at the same time, the device can clear the timer maintained locally.
可选地,该设备中也可以配置有相应的计时器以及计数器,同理,该计时器可以在主节点获取到对该设备的管理权限时开始计时,当该设备内的计时器等于预设时间(例如为W),且设备仍没有接收到其管理主节点主动发送的在位消息时,则可以通过该设备内的计数器计一次超时。当超时次数达到预设值时(例如为5次、7次或者10次等),即该设备的管理主节点长时间未主动发送在位消息时,该设备可以认为其管理主节点异常,未在位,并且设备可以将其管理主节点信息寄存器中的签名比特重新设置为0,进一步地,还可以清除其管理主节点寄存器中保存的主节点编号和主节点密码,从而取消其原管理主节点的管理权限。相应的,若在此期间设备成功接收到了其管理主节点主动发送的在位消息,则可以确定该管理主节点正常在位,同时,设备还可以对本地维护的计时器以及计数器进行清零。Optionally, the device can also be configured with corresponding timers and counters. Similarly, the timer can start counting when the master node obtains the management authority of the device. When the timer in the device is equal to the preset time (for example, W), and the device still has not received the in-position message actively sent by its management master node, it can count a timeout through the counter in the device. When the number of timeouts reaches the preset value (for example, 5 times, 7 times, or 10 times, etc.), that is, when the management master node of the device has not actively sent an in-position message for a long time, the device can consider that its management master node is abnormal and has not In position, and the device can reset the signature bit in its management master node information register to 0, further, it can also clear the master node number and master node password saved in its management master node register, thereby canceling its original management master node Administrative permissions for the node. Correspondingly, if the device successfully receives the presence message actively sent by its management master node during this period, it can be determined that the management master node is normally in place, and at the same time, the device can also reset the locally maintained timer and counter.
可选地,上述涉及的X、Y、K和W的取值均可以由管理主节点进行配置,根据实际需 求选择较为合适的数值。Optionally, the values of X, Y, K, and W mentioned above can all be configured by the management master node, and more appropriate values can be selected according to actual needs.
进一步地,如上所述,在设备长时间未接收到其管理主节点发送的在位消息,从而判定其管理主节点异常并取消其管理权限的情况下,为了维护整个总线拓扑的健壮性,确保设备拥有正常在位的管理主节点,设备还可以向系统中的至少一个主节点(例如图2所示的总线系统10中的主节点100a、主节点100b和主节点100c等)发送广播消息。其中,该广播消息可以用于指示该设备当前未拥有管理主节点。可选地,该至少一个主节点中也可以包括其原管理主节点。相应的,该至少一个主节点接收该广播消息,并且可以基于该广播消息对该设备进行管理主节点签名,尝试获取对该设备的管理权限,以成为该设备的新管理主节点。应理解,根据上述论述,该至少一个主节点中最先完成签名的,即可以获取对该设备的管理权限,具体的签名流程可以参考上述图6对应实施例的描述,此处不再进行赘述。Further, as mentioned above, in order to maintain the robustness of the entire bus topology, in order to maintain the robustness of the entire bus topology, ensure The device has a normally on-site management master node, and the device can also send a broadcast message to at least one master node in the system (such as the master node 100a, the master node 100b, and the master node 100c in the bus system 10 shown in FIG. 2 ). Wherein, the broadcast message may be used to indicate that the device currently does not have a master management node. Optionally, the at least one master node may also include its original management master node. Correspondingly, the at least one master node receives the broadcast message, and can sign the management master node of the device based on the broadcast message, and try to obtain the management authority of the device, so as to become a new management master node of the device. It should be understood that, according to the above discussion, the at least one master node that completes the signature first can obtain the management authority of the device. The specific signature process can refer to the description of the corresponding embodiment in FIG. 6 above, and will not be repeated here. .
需要说明的是,一般情况下,上述图8a和图8b中所示的管理主节点与其设备在位确认的交互的频率可以非常低,因此对总线带宽影响几乎可以忽略,即本申请实施例可以在不会影响整个总线的数据传输以及计算效率等的前提下,确保管理主节点实时在位,从而保证总线拓扑的健壮性。It should be noted that, in general, the frequency of interaction between the management master node and its device presence confirmation shown in Figure 8a and Figure 8b above can be very low, so the impact on the bus bandwidth is almost negligible. Under the premise of not affecting the data transmission and computing efficiency of the entire bus, ensure that the management master node is in place in real time, so as to ensure the robustness of the bus topology.
可选地,请参阅图9,图9是本申请实施例提供的又一种多路径总线子系统的结构示意图。如图9所示,该多路径总线子系统10a中还可以包括主节点100b(即第二主节点),以及与主节点100b邻接的N4个第四交换机(例如图9中的第四交换机41、第四交换机42等)。其中,任意一个第四交换机可以与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;其中,N4为小于或者等于N的正整数。例如,第四交换机41可以与第三交换机31邻接;又例如,第四交换机41可以通过第二交换机22与第三交换机31连接,等等,本申请实施例对此不作具体限定。Optionally, please refer to FIG. 9 . FIG. 9 is a schematic structural diagram of another multipath bus subsystem provided by an embodiment of the present application. As shown in FIG. 9, the multipath bus subsystem 10a may also include a master node 100b (ie, a second master node), and N4 fourth switches adjacent to the master node 100b (for example, the fourth switch 41 in FIG. 9 , the fourth switch 42, etc.). Wherein, any fourth switch may be adjacent to any third switch, or connected through one or more second switches; wherein, N4 is a positive integer less than or equal to N. For example, the fourth switch 41 may be adjacent to the third switch 31; for another example, the fourth switch 41 may be connected to the third switch 31 through the second switch 22, and so on, which is not specifically limited in this embodiment of the present application.
相应的,主节点100b也可以向该从节点200a多次发送枚举报文,并基于每次枚举报文从主节点100b到从节点200a经由的交换机,确定主节点100b与从节点200a之间的多条路由路径。其中,每条路由路径至少可以依次经由N4个第四交换机中的一个或多个、N2个第二交换机中的S个、以及N3个第三交换机中的一个或多个。如此,还进一步地实现多个主节点对一个从节点的多路径访问,即一个从节点可以被多个主节点枚举发现,后续还可以被该多个主节点基于各自的多条路由路径访问并使用其中的各项资源,等等。相应的,主节点100b与从节点200a之间的路由路径可以仅经由第四交换机和第三交换机,在一些可能的实施例中,也可以仅经由N4个第四交换机中的一个或多个,或者仅经由N3个第三交换机中的一个或多个,等等,本申请实施例对此不作具体限定。Correspondingly, the master node 100b may also send multiple report messages to the slave node 200a, and determine the relationship between the master node 100b and the slave node 200a based on each switch passed by the master node 100b to the slave node 200a. multiple routing paths between them. Wherein, each routing path may pass through at least one or more of the N4 fourth switches, S of the N2 second switches, and one or more of the N3 third switches in sequence. In this way, multi-path access from multiple master nodes to a slave node is further realized, that is, a slave node can be enumerated and discovered by multiple master nodes, and subsequently can be accessed by the multiple master nodes based on their respective multiple routing paths And use the various resources in it, and so on. Correspondingly, the routing path between the master node 100b and the slave node 200a may only pass through the fourth switch and the third switch, and in some possible embodiments, may also only pass through one or more of the N4 fourth switches, Or only through one or more of the N3 third switches, etc., which is not specifically limited in this embodiment of the present application.
可选地,关于主节点100b经由多条路由路径枚举发现该从节点200a的过程,具体可以参考上述图5中相应的描述,此处不再进行赘述。Optionally, for the process of enumerating and discovering the slave node 200a by the master node 100b through multiple routing paths, reference may be made to the corresponding description in FIG. 5 above, and details are not repeated here.
可选地,主节点100a和从节点200a可以属于第一节点域,主节点100b可以属于第二节点域。其中,该第二节点域内还包括一个或多个第二从节点(例如图2中的从节点200b等)。可选地,该第一节点域内还可以包括与该主节点100a连接的多个交换机(比如该N1个第一交换机等),该第二节点域内还可以包括与该主节点100b连接的多个交换机(比如该N4个第四交换机等),本申请实施例对此不作具体限定。需要说明的是,不同节点域可以属于不同子网(sub-network),或者不同节点域的系统管理软件可以不同(例如包括上述枚举软件不同,操作系统(operating system,OS)不同等)。Optionally, the master node 100a and the slave node 200a may belong to a first node domain, and the master node 100b may belong to a second node domain. Wherein, the second node domain further includes one or more second slave nodes (such as the slave node 200b in FIG. 2 , etc.). Optionally, the first node domain may also include multiple switches connected to the master node 100a (such as the N1 first switches, etc.), and the second node domain may also include multiple switches connected to the master node 100b. Switches (for example, the N4 fourth switches, etc.), are not specifically limited in this embodiment of the present application. It should be noted that different node domains may belong to different sub-networks (sub-network), or the system management software of different node domains may be different (for example, the enumeration software mentioned above is different, the operating system (operating system, OS) is different, etc.).
可选地,主节点100b可以在向其所属的第二节点域内的设备(例如可以包括从节点和交换机)分别分配设备编号后,即主节点100b在完成其所在的第二节点域内的枚举发现后,再向其他节点域内的设备(例如从节点200a)发送枚举报文,从而完成多主节点跨节点域的枚举发现。Optionally, after the master node 100b assigns device numbers to the devices in the second node domain to which it belongs (for example, it may include slave nodes and switches), that is, the master node 100b completes the enumeration in the second node domain where it is located After discovery, an enumeration report message is sent to devices in other node domains (for example, the slave node 200a), thereby completing the enumeration discovery of multi-master nodes across node domains.
如上所述,在总线系统可以存在多个节点域以及相应的多个主节点的情况下,主节点的枚举发现流程是从其当前所属的节点域(node domain)内开始,在完成对本节点域内的设备的枚举发现之前,本节点域可以不与其他节点域的任何设备进行数据流连接。如此,能够支持跨节点域(即跨网络或者不同系统管理软件)下多主节点的设备枚举发现,并且,保证不会出现不同的枚举软件对同一个设备的多次设备编号的分配,或者同一个设备编号被分配到不同设备的冲突场景。As mentioned above, in the case where there are multiple node domains and corresponding multiple master nodes in the bus system, the enumeration and discovery process of the master node starts from the node domain to which it currently belongs. Before the enumeration of devices in the domain is discovered, the current node domain may not perform data flow connection with any device in other node domains. In this way, device enumeration and discovery of multiple master nodes under cross-node domains (that is, cross-network or different system management software) can be supported, and it is guaranteed that different enumeration software will not assign multiple device numbers to the same device. Or a conflict scenario where the same device number is assigned to different devices.
进一步地,基于上述不同主节点可以属于不同节点域的概念,前述设备(例如从节点200a)在取消了原管理主节点(例如主节点100a)的管理权限,从而向系统内的至少一个主节点发送广播消息时,可以先向其所属节点域内的主节点(例如主节点100a)发送一级广播消息,若其所属节点域内的主节点未响应,则可以向其他节点域内的主节点(例如主节点100b和主节点100c)发送二级广播消息(即跨节点域的广播消息),以此类推。Further, based on the above-mentioned concept that different master nodes may belong to different node domains, the aforementioned device (such as the slave node 200a) cancels the management authority of the original management master node (such as the master node 100a), so as to at least one master node in the system When sending a broadcast message, you can first send a first-level broadcast message to the master node in the node domain to which it belongs (for example, the master node 100a). Node 100b and master node 100c) send secondary broadcast messages (ie, broadcast messages across node domains), and so on.
可选地,请参阅图10a-图10c,图10a-图10c是本申请实施例提供的一组多路径多主机的总线系统的结构示意图。Optionally, please refer to FIG. 10a-FIG. 10c. FIG. 10a-FIG. 10c are schematic structural diagrams of a group of multi-path multi-master bus systems provided by an embodiment of the present application.
如图10a所示,该总线系统可以包括多个主机(例如主节点100a、主节点100b)、多个交换机(例如图10a中的交换机1、交换机2、交换机3等)和多个从节点(例如图10a中的加速器、SSD、智能网卡、GPU和TPU等等)。其中,主节点100a可以包括端口A1和端口A2,主节点100b可以包括端口B1和端口B2。显然,如图10a所示,相较于传统的树形结构,本申请实施例中的实现的是展平的图形结构,多个交换机可以呈矩阵排列,并通过总线进行纵向和横向的连接。显然,该结构可以支持对任意从节点的多路径、多主节点访问,该访问涉及的内容比如可以包括初始的枚举发现,以及后续的资源使用,功能管理等。As shown in Figure 10a, the bus system may include multiple hosts (such as master node 100a, master node 100b), multiple switches (such as switch 1, switch 2, switch 3, etc. in Figure 10a) and multiple slave nodes ( For example, accelerators, SSDs, smart network cards, GPUs and TPUs in Figure 10a, etc.). Wherein, the master node 100a may include a port A1 and a port A2, and the master node 100b may include a port B1 and a port B2. Obviously, as shown in Figure 10a, compared with the traditional tree structure, the embodiment of the present application implements a flattened graph structure, and multiple switches can be arranged in a matrix and connected vertically and horizontally through the bus. Obviously, this structure can support multi-path and multi-master access to any slave node. The content involved in the access can include, for example, initial enumeration discovery, subsequent resource usage, and function management.
可选地,主节点100a和主节点100a下连接的交换机1、交换机2、交换机3、交换机4,以及该交换机1、交换机3、交换机4分别连接的XPU、加速器、SSD、智能网卡可以属于第一节点域;主节点100b和主节点100b下连接的交换机5、交换机6、交换机7、交换机8,以及该交换机6、交换机7、交换机8分别连接的TPU、SSD、GPU、加速器可以属于第二节点域。Optionally, the master node 100a and the switch 1, switch 2, switch 3, and switch 4 connected to the master node 100a, and the XPU, accelerator, SSD, and smart network card connected to the switch 1, switch 3, and switch 4 respectively may belong to the first A node domain; master node 100b and the switch 5, switch 6, switch 7, and switch 8 connected under the master node 100b, and the TPU, SSD, GPU, and accelerator connected to the switch 6, switch 7, and switch 8 respectively can belong to the second node domain.
可选地,本申请实施例为了多个主节点(即多主机)的枚举过程彼此互不干扰,保证主节点在完成本节点域内的枚举发现之前,不与其他节点域的任何设备进行数据流连接,即不对其他节点域内的设备进行枚举发现,同时保证本节点域内的设备不会被其他节点域内的主节点发现,如图10a所示,本申请实施例还提供了一种具备特殊端口的交换机,该交换机用于与其他节点域内同样具备特殊端口的交换机邻接。例如,图10a中的交换机2、交换机4、交换机5和交换机7,该特殊端口例如为交换机2、交换机4、交换机5和交换机7中的用黑色标识的端口。如图10a所示,默认设置下(或者说在主节点未完成本节点域内的枚举发现时),当系统枚举软件扫描到这类型的端口,即不再基于此端口继续往下发现枚举。例如,当主节点100a从端口A2出发枚举到交换机2中的用黑色标识的特殊端口时,不再对该端口下连接的设备(即交换机5)进行枚举发现,而是通过其他端口继续对第一节点域内的设备进行枚举发现;又例如,当主节点100b从端口B1出发枚举到交换机5中的用黑色标识的特殊 端口时,不再对该端口下连接的设备(即交换机2)进行枚举发现,而是通过其他端口继续对第二节点域内的设备进行枚举发现。相应的,当主节点100a和主节点100b完成对各自节点域内的所有设备的枚举发现(例如各自节点域内的所有设备均已被分配了相应的设备编号)后,交换机2和交换机5的特殊端口可以“打开”,主节点100a和主节点100b便可以通过该特殊端口进行跨节点域的枚举发现。Optionally, in this embodiment of the present application, the enumeration process of multiple master nodes (that is, multi-host) does not interfere with each other, and ensures that the master node does not communicate with any device in other node domains before completing the enumeration discovery in the domain of the node. Data stream connection, that is, not to enumerate and discover devices in other node domains, and at the same time ensure that devices in this node domain will not be discovered by master nodes in other node domains, as shown in Figure 10a, the embodiment of this application also provides a device with A switch with a special port is used for adjacency with switches that also have special ports in other node domains. For example, for switch 2, switch 4, switch 5, and switch 7 in FIG. 10a, the special ports are, for example, ports marked in black in switch 2, switch 4, switch 5, and switch 7. As shown in Figure 10a, under the default setting (or when the master node has not completed the enumeration discovery in the domain of the node), when the system enumeration software scans this type of port, it will no longer continue to discover enumeration based on this port . For example, when the master node 100a enumerates from port A2 to the special port marked in black in switch 2, it no longer enumerates and discovers the device (ie switch 5) connected under the port, but continues to search through other ports. Devices in the first node domain perform enumeration discovery; for another example, when the master node 100b enumerates from the port B1 to the special port marked in black in the switch 5, the device connected under the port (that is, the switch 2) is no longer connected. enumeration discovery, but continue to perform enumeration discovery on devices in the second node domain through other ports. Correspondingly, after the master node 100a and the master node 100b complete the enumeration and discovery of all devices in their respective node domains (for example, all devices in their respective node domains have been assigned corresponding device numbers), the special ports of switch 2 and switch 5 If it can be "opened", the master node 100a and the master node 100b can perform cross-node domain enumeration and discovery through this special port.
例如,请参阅图11a-图11b,图11a-图11b是本申请实施例提供的一组多路径多主机下的枚举过程示意图。如图11a所示,主节点100a和主节点100b分别拥有不同的操作系统,即OS 1和OS 2。其中,主节点100a中包括CPU-A0和CPU-A1,以及相应的端口A1、端口A2、端口A3和端口A4;主节点100b中包括CPU-B0和CPU-B1,以及相应的端口B1、端口B2、端口B3和端口B4。For example, please refer to FIG. 11a-FIG. 11b. FIG. 11a-FIG. 11b are schematic diagrams of an enumeration process under a group of multi-path multi-hosts provided by the embodiment of the present application. As shown in FIG. 11a, the master node 100a and the master node 100b have different operating systems, namely OS 1 and OS 2, respectively. Among them, the master node 100a includes CPU-A0 and CPU-A1, and the corresponding port A1, port A2, port A3 and port A4; the master node 100b includes CPU-B0 and CPU-B1, and the corresponding port B1, port B2, port B3 and port B4.
如图11a所示,此时主节点100a和主节点100b都进行了一部分的枚举发现,存在一部分设备(例如第一节点域内的GPU和SSD,以及第二节点域内的加速器)还没有被分配设备编号。其中,主节点100a所在的第一节点域内包括设备编号为cid4的交换机E,其端口A是一个跨节点域的特殊端口,主节点100a的枚举软件(例如OS1)枚举发现到该交换机E的端口A后,因为对其他物理建链好且非跨节点域的端口还没枚举完,因此不再对此端口A背后的拓扑进行发现枚举。同样地,主节点100b所在的第二节点域内包括设备编号为cid2的交换机F,其端口B是一个跨节点域的特殊端口,主节点100b的枚举软件(例如OS2)枚举发现到该交换机F的端口B后,因为对其他物理建链好且非跨节点域的端口还没枚举完,因此也不再对此端口B背后的拓扑进行发现枚举。As shown in Figure 11a, at this time both the master node 100a and the master node 100b have performed part of the enumeration and found that there are some devices (such as GPUs and SSDs in the first node domain, and accelerators in the second node domain) that have not been allocated. device ID. Among them, the first node domain where the master node 100a is located includes a switch E with a device number of cid4, and its port A is a special port across node domains. The enumeration software (such as OS1) of the master node 100a finds the switch E through enumeration After the port A of the port A, the topology behind this port A is no longer discovered and enumerated because the enumeration of other ports that have been physically linked and that are not cross-node domains has not been enumerated. Similarly, the second node domain where the master node 100b is located includes a switch F with a device number of cid2, and its port B is a special port across the node domain. The enumeration software (such as OS2) of the master node 100b finds the switch through enumeration After port B of F, because the enumeration of other ports with good physical links and non-cross-node domains has not been enumerated, the topology behind this port B is no longer discovered and enumerated.
如图11b所示,此时主节点100a和主节点100b都分别完成了对各自节点域内的总线拓扑和所有设备的枚举发现,所有设备都已被分配设备编号。因此,主节点100a和主节点100b的系统软件可以分别将交换机E的特殊端口A和交换机F的特殊端口B的数据流开关打开,从而对本节点域外的其他节点域内的设备进行枚举发现。需要说明的是,此时每个节点域内的设备都已经完成了管理主节点的签名,并且可以知道都是本节点域内的主节点获得了本节点域内的设备的管理主节点权限,例如,主节点100a是第一节点域中TPU、GPU和SSD的管理主节点;主节点100b是第二节点域中GPU和加速器的管理主节点。因此,主节点100a对于第二节点域内的设备只有数据面上的使用权,没有管理面上的管理主节点权限,相应的,主节点100b对于第一节点域内的设备只有数据面上的使用权,没有管理面上的管理主节点权限。As shown in FIG. 11 b , at this time, the master node 100 a and the master node 100 b have respectively completed enumeration and discovery of the bus topology and all devices in their respective node domains, and all devices have been assigned device numbers. Therefore, the system software of the master node 100a and the master node 100b can respectively turn on the data flow switch of the special port A of the switch E and the special port B of the switch F, so as to enumerate and discover devices in other node domains outside the domain of this node. It should be noted that at this time, the devices in each node domain have completed the signature of the management master node, and it can be known that the master node in the node domain has obtained the management master node authority of the devices in the node domain, for example, the master Node 100a is a master node for managing TPUs, GPUs and SSDs in the first node domain; master node 100b is a master node for managing GPUs and accelerators in the second node domain. Therefore, the master node 100a only has the right to use the data plane for the devices in the domain of the second node, and has no right to manage master nodes on the management plane. Correspondingly, the master node 100b only has the right to use the data plane for the devices in the domain of the first node , does not have the permission to manage the master node on the management plane.
可选地,当每个节点域内的系统软件都已经完成对本节点域内的总线拓扑的发现和所有设备的枚举之后,主节点100a对于第二节点域内的设备的枚举发现,还可以直接通过第二节点域内的系统软件(例如OS2)基于约定的描述结构和接口上报给主节点100a上的系统软件(例如OS1),如此,主节点100a上的系统软件可以直接获取主节点100b所在节点域内的所有设备的拓扑结构信息。相应的,主节点100b对主节点100a所在第一节点域下的设备的发现枚举同理,此处不再进行赘述。Optionally, after the system software in each node domain has completed the bus topology discovery and enumeration of all devices in the node domain, the master node 100a can also directly discover the enumeration of the devices in the second node domain through The system software (such as OS2) in the domain of the second node reports to the system software (such as OS1) on the master node 100a based on the agreed description structure and interface, so that the system software on the master node 100a can directly obtain the information in the node domain where the master node 100b is located. topology information of all devices. Correspondingly, the master node 100b discovers and enumerates the devices under the first node domain where the master node 100a is located in the same way, and details are not repeated here.
如上所述,通过本申请实施例提供的该特殊端口,能够让每个主节点独立进行的枚举在此端口处停止对此端口背后的设备/拓扑进行发现枚举,不再继续从这个端口进行广度/深度优先搜索发现枚举。只有当每个主节点在其系统软件完成各自节点域内的枚举发现之后,才打开各自域内的这个特殊端口的数据通路开关,进行跨主节点跨节点域的枚举。As mentioned above, through the special port provided by the embodiment of the present application, each master node can independently perform enumeration at this port to stop discovering and enumerating the device/topology behind this port, and no longer continue to enumerate from this port. Do a breadth/depth first search to find the enumeration. Only after each master node completes the enumeration discovery in its respective node domain in its system software, does it open the data path switch of this special port in its respective domain, and enumerate across master nodes and across node domains.
如图10b所示,该总线系统同样可以包括主节点100a、主节点100b、多个交换机和多个 从节点等,有关图10b的介绍可以参考上述图10a对应实施例的描述,此处不再进行赘述。As shown in Figure 10b, the bus system may also include a master node 100a, a master node 100b, a plurality of switches, and a plurality of slave nodes. to repeat.
可选地,如图10b所示,主节点100a下的交换机2和主节点100b下的交换机5均包括一个能够上网的端口,或者说是具有网卡功能的端口(比如图10b中用灰色标识的端口),如此主节点100a和主节点100b可以通过交换机2和交换机5进行网络连接,即主节点100b可以为主节点100a的远程设备。如此,实现了一个设备除了可以被近端通过有线方式连接的主节点访问并使用,还可以被远程通过无线方式连接的其他主节点访问并使用,从而使得主机可以调用远程的计算资源和存储数据,更大程度上满足了主机在进行复杂计算处理时对庞大的计算资源等的需求,等等。这无疑大大增强了整个总线系统的扩展性,进一步突破了现有PCIe总线连接的限制。Optionally, as shown in Figure 10b, the switch 2 under the master node 100a and the switch 5 under the master node 100b both include a port capable of accessing the Internet, or a port with a network card function (such as the gray mark in Figure 10b ports), so that the master node 100a and the master node 100b can be connected to the network through the switch 2 and the switch 5, that is, the master node 100b can be a remote device of the master node 100a. In this way, a device can not only be accessed and used by the master node connected by wire at the near end, but also can be accessed and used by other master nodes remotely connected wirelessly, so that the host can call remote computing resources and store data , to a greater extent meet the host's demand for huge computing resources when performing complex computing processing, and so on. This undoubtedly greatly enhances the scalability of the entire bus system and further breaks through the limitations of existing PCIe bus connections.
如图10c所示,该总线系统可以包括主节点100a、主节点100b、主节点100c、多个交换机和多个从节点等,其中,该总线系统内既可以包括具备特殊端口的交换机,也可以包括用于进行网络连接的交换机。可选地,有关图10c的介绍可以参考上述图10a和图10b对应实施例的描述,此处不再进行赘述。如图10c所示,该总线系统可以为本申请实施例提供的一种典型的基于图形结构的总线系统,实现极大程度使得总线互联更具扩展性、容量更大,设计空间更大。As shown in Figure 10c, the bus system may include a master node 100a, a master node 100b, a master node 100c, a plurality of switches and a plurality of slave nodes, etc., wherein the bus system may include switches with special ports or Includes switches for network connections. Optionally, for the introduction of FIG. 10c, reference may be made to the descriptions of the embodiments corresponding to FIG. 10a and FIG. 10b above, and details are not repeated here. As shown in FIG. 10c , the bus system can be a typical graph-based bus system provided by the embodiment of the present application, which can achieve greater scalability, larger capacity, and larger design space for bus interconnection to a great extent.
可选地,图10a-图10c中所示的特殊端口以及具备网卡功能的端口都可以通过switch设备(即交换机)的配置空间的相关功能寄存器组直接指示,以便告知系统枚举软件。Optionally, the special ports shown in FIGS. 10a-10c and ports with network card functions can be directly indicated by the relevant function register group in the configuration space of the switch device (that is, the switch), so as to inform the system enumeration software.
需要说明是,图10a-图10c仅为示例性说明,不对本申请实施例的多路径多主节点的总线系统构成具体限定。It should be noted that FIG. 10a-FIG. 10c are only exemplary illustrations, and do not specifically limit the bus system of the multi-path multi-master node in the embodiment of the present application.
综上,本申请实施例基于图形结构的理念,提供了一种总线系统,其打破了现有PCIe总线互联拓扑自上而下的树形结构的限制,实现了展平的图形结构的互联拓扑。如此,不需要额外特殊的总线互联设备/器件(例如上述图1中的CXL switch和type 3 device等),以及复杂的总线拓扑发现枚举流程,便使得互联总线协议能够原生地、低成本地支持多主机(multi-host)和多路径(multipathing)。基于此,本申请实施例还进一步提供了一系列管理主机签名流程、管理主机在位确认流程、跨节点域下多主机的枚举流程等,从而在实现多主机和多路径的情况下,还进一步保障了整个多主机和多路径总线系统(或者说整个是图形结构的总线拓扑)的清晰可管理性、安全性、可靠性等。In summary, the embodiment of the present application provides a bus system based on the concept of a graph structure, which breaks the limitation of the top-down tree structure of the existing PCIe bus interconnection topology, and realizes the interconnection topology of a flattened graph structure . In this way, there is no need for additional special bus interconnection devices/devices (such as the CXL switch and type 3 device in Figure 1 above), and the complicated bus topology discovery and enumeration process, so that the interconnection bus protocol can be natively and at low cost. Supports multi-host and multipathing. Based on this, the embodiment of the present application further provides a series of management host signature process, management host presence confirmation process, multi-host enumeration process under cross-node domain, etc., so that in the case of realizing multi-host and multi-path, The clear manageability, safety, reliability, etc. of the entire multi-host and multi-path bus system (or the entire bus topology with a graph structure) are further guaranteed.
此外,再次强调,本申请实施例旨在基于图形结构的概念,打破现有技术中具有严格自上而下层次关系的树形结构,从而实现上述multi-host和multipathing,而对其中各个节点彼此之间的具体连接情况不作具体限定,在本申请实施例的图形结构中,并非所有节点都需要邻接,甚至在节点设备(例如GPU)拥有多个输入端口的情况下,该节点设备可以不通过交换机,而直接与多个主机分别邻接,从而更加便捷地实现multi-host和multipathing,等等。In addition, it is emphasized again that the embodiment of the present application aims to break the tree structure with strict top-down hierarchical relationship in the prior art based on the concept of graph structure, so as to realize the above-mentioned multi-host and multipathing, and the mutual relationship between each node The specific connection between them is not specifically limited. In the graph structure of the embodiment of the present application, not all nodes need to be adjacent. switch, and directly adjacent to multiple hosts, so that it is more convenient to realize multi-host and multipathing, and so on.
需要说明的是,本申请实施例提供的技术方案除了用于片上总线互联(network on chip),还可以应用于超大规模的网络互联,例如互联网,数据中心大规模服务器/计算节点之间的互联的设备管理,等等,本申请实施例对此不作具体限定。It should be noted that, in addition to being used for network on chip, the technical solution provided by the embodiment of the present application can also be applied to ultra-large-scale network interconnection, such as the Internet, and the interconnection between large-scale servers/computing nodes in data centers device management, etc., which are not specifically limited in this embodiment of the present application.
请参阅图12,图12是本申请实施例提供的一种通信方法的流程示意图。该通信方法可以应用于总线系统(例如图2、图5或者图10a-图10c举例所示的总线系统),该总线系统可以包括多个主节点、多个交换机和多个从节点;该多个主节点、多个交换机和多个从节点可以通过总线构成图形结构;其中,该图形结构中的任意一个多路径总线子系统(例如图3、 图4a-图4d等举例所示的多路径总线子系统10a)可以包括第一主节点、第一从节点和N个交换机;N个交换机包括与第一主节点邻接的N1个第一交换机,N2个第二交换机,以及与第一从节点邻接的N3个第三交换机;其中,任意一个第一交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N1、N2、N3均为小于或者等于N的正整数。该通信方法可以包括以下步骤S401。Please refer to FIG. 12 . FIG. 12 is a schematic flowchart of a communication method provided by an embodiment of the present application. The communication method can be applied to a bus system (such as the bus system shown in Fig. 2, Fig. 5 or Fig. 10a-Fig. 10c for example), and the bus system can include multiple master nodes, multiple switches and multiple slave nodes; the multiple A plurality of master nodes, a plurality of switches and a plurality of slave nodes can form a graph structure through the bus; wherein, any multipath bus subsystem in the graph structure (such as the multipath shown in Fig. 3, Fig. 4a-Fig. 4d etc. for example The bus subsystem 10a) may include a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and the first slave node N3 adjacent third switches; any one of the first switches is adjacent to any one of the third switches, or connected through one or more second switches; N1, N2, and N3 are all positive integers less than or equal to N. The communication method may include the following step S401.
步骤S401,通过第一主机,向第一节点设备多次发送枚举报文,Step S401, through the first host, send enumeration report messages to the first node device multiple times,
步骤S402,基于至少一个交换机,确定第一主机与第一节点设备之间的多条路由路径;至少一个交换机为每次枚举报文从所述第一主机到所述第一节点设备经由的交换机;其中,每条路由路径至少依次经由所述N1个第一交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个;S为小于或者等于N3的自然数。Step S402, based on at least one switch, determine a plurality of routing paths between the first host and the first node device; at least one switch is used for each enumeration message from the first host to the first node device switches; wherein each routing path passes through at least one or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches in sequence S is a natural number less than or equal to N3.
在一些可能的实现方式中,所述多路径总线子系统还包括第二主节点;所述N个交换机还包括与所述第二主节点邻接的N4个第四交换机;其中,任意一个第四交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N4为小于或者等于N的正整数;所述方法还包括:通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第二主节点到所述第一从节点经由的交换机;其中,所述多条路由路径中的每条路由路径至少依次经由所述N4个第四交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个。In some possible implementations, the multipath bus subsystem further includes a second master node; the N switches further include N4 fourth switches adjacent to the second master node; wherein, any fourth The switch is adjacent to any third switch, or connected through one or more second switches; N4 is a positive integer less than or equal to N; the method also includes: through the second master node, to the first slave The node sends enumeration report messages multiple times, and based on at least one switch, determines multiple routing paths between the second master node and the first slave node; A switch through which the second master node goes to the first slave node; wherein, each routing path in the plurality of routing paths passes through at least one or more of the N4 fourth switches in sequence, the S of the N2 second switches and one or more of the N3 third switches.
在一些可能的实现方式中,所述多路径总线子系统包括第一节点域和第二节点域;所述第一节点域内包括所述第一主节点和多个所述第一从节点,所述第二节点域内包括所述第二主节点和多个第二从节点;所述通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径,包括:在所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号后,通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。In some possible implementation manners, the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and multiple first slave nodes, so The second node domain includes the second master node and multiple second slave nodes; the second master node sends an enumeration message to the first slave node multiple times, and based on at least one switch , determining multiple routing paths between the second master node and the first slave node, including: all first slave nodes in the domain of the first node are assigned device numbers by the first master node, In addition, after all the second slave nodes in the second node domain are assigned device numbers by the second master node, the enumeration report message is sent to the first slave node multiple times through the second master node, And based on at least one switch, determine multiple routing paths between the second master node and the first slave node.
在一些可能的实现方式中,所述N个交换机中还包括属于第一节点域的第一跨节点域交换机,以及属于第二节点域的第二跨节点域交换机;所述第一跨节点域交换机的第一端口与所述第二跨节点域交换机的第二端口连接;若所述第一节点域内的部分或者全部第一从节点还未被所述第一主节点分配设备编号,和/或,所述第二节点域内的部分或者全部第二从节点还未被所述第二主节点分配设备编号,则所述第一端口和第二端口之间的数据链路处于关闭状态;若所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号,则所述第一端口与所述第二端口之间的数据链路处于打开状态,以使得所述第二主节点通过所述第二端口和所述第一端口向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。In some possible implementation manners, the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; the first cross-node domain switch The first port of the switch is connected to the second port of the second cross-node domain switch; if some or all of the first slave nodes in the first node domain have not been assigned device numbers by the first master node, and/ Or, some or all of the second slave nodes in the second node domain have not been assigned device numbers by the second master node, then the data link between the first port and the second port is in a closed state; if All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node, Then the data link between the first port and the second port is in an open state, so that the second master node sends more data to the first slave node through the second port and the first port An enumeration report is sent for the second time, and multiple routing paths between the second master node and the first slave node are determined based on at least one switch.
在一些可能的实现方式中,所述第二主节点为与所述第一主节点通过交换机进行网络连接的远程主节点;所述方法还包括:通过所述第二主节点,通过网络连接访问所述第一节点域内的所述第一从节点,以调用所述第一从节点中的计算资或者读取所述第一从节点中的存储数据。In some possible implementations, the second master node is a remote master node that is network-connected to the first master node through a switch; the method further includes: using the second master node to access The first slave node in the first node domain is used to invoke computing resources in the first slave node or read stored data in the first slave node.
可选地,该通信方法具体可参考上述图2-图11b对应实施例的描述,此处不再进行赘述。Optionally, for the communication method, reference may be made to the descriptions of the above-mentioned embodiments corresponding to FIG. 2-FIG. 11b , and details are not repeated here.
可选地,本申请实施例中所描述的通信方法中的各方法流程具体可以基于件、硬件、或其结合的方式实现。其中,以硬件实现的方式可以包括逻辑电路、算法电路或模拟电路等。以软件实现的方式可以包括程序指令,可以被视为是一种软件产品,被存储于存储器中,并可以被处理器运行以实现相关功能。Optionally, each method procedure in the communication method described in the embodiments of the present application may specifically be implemented in a software-based, hardware-based, or a combination thereof. Wherein, the way of implementing by hardware may include logic circuit, arithmetic circuit or analog circuit and so on. A software implementation may include program instructions, which may be regarded as a software product, which is stored in a memory and can be executed by a processor to implement related functions.
本申请实施例还提供一种计算机可读存储介质,其中,该计算机可读存储介质可存储有程序,该程序被处理器执行时,使得所述处理器可以执行上述方法实施例中记载的任意一种的部分或全部步骤。An embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium can store a program, and when the program is executed by a processor, the processor can execute any of the methods described in the above-mentioned method embodiments. Some or all of the steps of one.
本申请实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被多核处理器执行时,使得所述处理器可以执行上述方法实施例中记载的任意一种的部分或全部步骤。The embodiment of the present application also provides a computer program, the computer program includes instructions, when the computer program is executed by a multi-core processor, the processor can perform some or all of the steps described in any one of the above method embodiments .
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments. It should be noted that for the foregoing method embodiments, for the sake of simple description, they are expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Depending on the application, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions and modules involved are not necessarily required by this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(read-only memory,ROM)、双倍速率同步动态随机存储器(double data rate,DDR)、闪存(flash)或者随机存取存储器(random access memory,RAM)等各种可以存储程序代码的介质。If the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute all or part of the steps of the above-mentioned methods in various embodiments of the present application. Wherein, the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disc, read-only memory (read-only memory, ROM), double data rate synchronous dynamic random access memory (double data rate, DDR), flash memory ( flash) or random access memory (random access memory, RAM) and other media that can store program code.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application.

Claims (27)

  1. 一种总线系统,其特征在于,所述系统是由多个主节点、多个交换机和多个从节点通过总线构成的图形结构;所述图形结构中的任意一个多路径总线子系统,包括第一主节点、第一从节点和N个交换机;所述N个交换机包括与所述第一主节点邻接的N1个第一交换机,N2个第二交换机,以及与所述第一从节点邻接的N3个第三交换机;其中,任意一个第一交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N1、N2、N3均为小于或者等于N的正整数;A kind of bus system, it is characterized in that, described system is the graph structure that constitutes by bus by a plurality of master nodes, a plurality of switches and a plurality of slave nodes; Any one multi-path bus subsystem in the described graph structure includes the first A master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and N2 second switches adjacent to the first slave node N3 third switches; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all positive integers less than or equal to N;
    所述第一主节点,用于向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第一主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第一主节点到所述第一从节点经由的交换机;其中,所述多条路由路径中的每条路由路径至少依次经由所述N1个第一交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个;S为小于或者等于N3的自然数。The first master node is configured to send an enumeration message to the first slave node multiple times, and based on at least one switch, determine multiple routes between the first master node and the first slave node path; the at least one switch is the switch through which each enumeration report message passes from the first master node to the first slave node; wherein, each routing path in the multiple routing paths passes at least sequentially through the One or more of the N1 first switches, S of the N2 second switches, and one or more of the N3 third switches; S is a natural number less than or equal to N3.
  2. 根据权利要求1所述的系统,其特征在于,The system according to claim 1, characterized in that,
    所述第一主节点,还用于基于发送的所述枚举报文,查询所述第一从节点的路由状态寄存器中的可视比特,若所述可视比特为0,则向所述第一从节点分配相应的设备编号;其中,所述可视比特为0用于指示所述第一从节点当前未被枚举发现;The first master node is also used to query the visible bit in the routing status register of the first slave node based on the enumerated report message sent, and if the visible bit is 0, send to the The first slave node assigns a corresponding device number; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated and discovered;
    所述第一从节点,用于将所述第一主节点分配的设备编号保存至所述路由状态寄存器中,并将所述可视比特设置为1;其中,所述可视比特为1用于指示所述第一从节点当前已被枚举发现。The first slave node is configured to save the device number assigned by the first master node to the routing status register, and set the visible bit to 1; wherein, the visible bit is 1 for to indicate that the first slave node is currently discovered by enumeration.
  3. 根据权利要求1-2任意一项所述的系统,其特征在于,The system according to any one of claims 1-2, characterized in that,
    所述第一主节点,还用于向所述第一从节点发送第一配置报文,以获取对所述第一从节点的管理权限;所述第一配置报文携带所述第一主节点的主节点密码和主节点编号;The first master node is further configured to send a first configuration message to the first slave node to obtain management authority for the first slave node; the first configuration message carries the first master The master node password and master node number of the node;
    所述第一从节点,还用于接收所述第一配置报文,并基于所述第一配置报文,将所述第一从节点的管理主节点信息寄存器中的签名比特设置为1;其中,所述签名比特为1用于指示所述第一从节点当前已拥有管理主节点,所述系统中的其他主节点不能获取对所述第一从节点的管理权限;The first slave node is further configured to receive the first configuration message, and set the signature bit in the management master node information register of the first slave node to 1 based on the first configuration message; Wherein, the signature bit is 1 to indicate that the first slave node currently has a management master node, and other master nodes in the system cannot obtain management authority to the first slave node;
    所述第一从节点,还用于将所述第一主节点的主节点密码和主节点编号保存至所述管理主节点信息寄存器中。The first slave node is further configured to save the master node password and master node number of the first master node in the management master node information register.
  4. 根据权利要求3所述的系统,其特征在于,The system according to claim 3, characterized in that,
    所述第一主节点,还用于向所述第一从节点发送第二配置报文,以取消对所述第一从节点的管理权限;所述第二配置报文携带所述第一主节点的主节点密码和主节点编号;The first master node is further configured to send a second configuration message to the first slave node to cancel the management authority of the first slave node; the second configuration message carries the first master The master node password and master node number of the node;
    所述第一从节点,还用于接收所述第二配置报文,若所述第二配置报文携带的所述第一主节点的主节点密码和主节点编号与所述管理主节点信息寄存器中保存的主节点密码和主节点编号一致,则将所述管理主节点信息寄存器中的所述签名比特设置为0;其中,所述签名比特为0用于指示所述第一从节点当前未拥有管理主节点。The first slave node is further configured to receive the second configuration message, if the master node password and the master node number of the first master node carried in the second configuration message are consistent with the management master node information The master node password saved in the register is consistent with the master node number, then the signature bit in the management master node information register is set to 0; wherein, the signature bit is 0 for indicating that the first slave node is currently Does not own an administrative master node.
  5. 根据权利要求3所述的系统,其特征在于,The system according to claim 3, characterized in that,
    所述第一主节点,还用于在获取到对所述第一从节点的管理权限后,响应于所述第一从节点发送的查询消息,向所述第一从节点发送在位消息;或者,按照第一时间间隔向所述第一从节点发送所述在位消息。The first master node is further configured to send an in-position message to the first slave node in response to a query message sent by the first slave node after obtaining the management authority to the first slave node; Or, send the presence message to the first slave node according to a first time interval.
  6. 根据权利要求5所述的系统,其特征在于,The system according to claim 5, characterized in that,
    所述第一从节点,还用于在满足预设条件的情况下,将所述管理主节点信息寄存器中的所述签名比特设置为0,以取消所述第一主节点对所述第一从节点的管理权限;其中,The first slave node is further configured to set the signature bit in the management master node information register to 0 when a preset condition is met, so as to cancel the first master node’s registration of the first Administrative privileges for slave nodes; where,
    所述预设条件包括:在所述第一主节点获取到对所述第一从节点的管理权限后,所述第一从节点在预设时间内未接收到所述第一主节点发送的所述在位消息,或者,所述第一从节点在向所述第一主节点发送了K次查询消息后,均未接收到所述第一主节点发送的所述在位消息;K为大于或者等于1的整数。The preset condition includes: after the first master node obtains the management authority for the first slave node, the first slave node does not receive the message sent by the first master node within a preset time. The in-position message, or the first slave node has not received the in-position message sent by the first master node after sending K times of query messages to the first master node; K is An integer greater than or equal to 1.
  7. 根据权利要求6所述的系统,其特征在于,The system according to claim 6, characterized in that,
    所述第一从节点,还用于向所述系统中的至少一个主节点发送广播消息;所述广播消息用于指示所述第一从节点当前未拥有管理主节点;The first slave node is further configured to send a broadcast message to at least one master node in the system; the broadcast message is used to indicate that the first slave node currently does not own a management master node;
    所述至少一个主节点,用于接收所述广播消息,并基于所述广播消息向所述第一从节点发送所述第一配置报文,以获取对所述第一从节点的管理权限。The at least one master node is configured to receive the broadcast message, and send the first configuration message to the first slave node based on the broadcast message, so as to obtain management authority for the first slave node.
  8. 根据权利要求1-7任意一项所述的系统,其特征在于,所述多路径总线子系统还包括第二主节点;所述N个交换机还包括与所述第二主节点邻接的N4个第四交换机;其中,任意一个第四交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N4为小于或者等于N的正整数;The system according to any one of claims 1-7, wherein the multipath bus subsystem further comprises a second master node; the N switches further comprise N4 adjacent to the second master node A fourth switch; wherein, any fourth switch is adjacent to any third switch, or is connected through one or more second switches; N4 is a positive integer less than or equal to N;
    所述第二主节点,用于向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第二主节点到所述第一从节点经由的交换机;其中,所述多条路由路径中的每条路由路径至少依次经由所述N4个第四交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个。The second master node is configured to send an enumeration message to the first slave node multiple times, and based on at least one switch, determine multiple routes between the second master node and the first slave node path; the at least one switch is the switch through which each report message is passed from the second master node to the first slave node; wherein, each routing path in the plurality of routing paths passes at least sequentially through the One or more of the N4 fourth switches, S of the N2 second switches, and one or more of the N3 third switches.
  9. 根据权利要求8所述的系统,其特征在于,所述多路径总线子系统包括第一节点域和第二节点域;所述第一节点域内包括所述第一主节点和多个所述第一从节点,所述第二节点域内包括所述第二主节点和多个第二从节点;所述第二主节点,具体用于:The system according to claim 8, wherein the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and a plurality of the second node domains. A slave node, the second node domain includes the second master node and multiple second slave nodes; the second master node is specifically used for:
    在所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号后,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node Afterwards, sending an enumeration message to the first slave node multiple times, and determining multiple routing paths between the second master node and the first slave node based on at least one switch.
  10. 根据权利要求9所述的系统,其特征在于,所述N个交换机中还包括属于第一节点域的第一跨节点域交换机,以及属于第二节点域的第二跨节点域交换机;所述第一跨节点域交换机的第一端口与所述第二跨节点域交换机的第二端口连接;The system according to claim 9, wherein the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; The first port of the first cross-node domain switch is connected to the second port of the second cross-node domain switch;
    若所述第一节点域内还存在未被所述第一主节点分配设备编号的第一从节点,和/或,所述第二节点域内还存在未被所述第二主节点分配设备编号的第二从节点,则所述第一端口和 第二端口之间的数据链路处于关闭状态;If there is a first slave node in the first node domain that has not been assigned a device number by the first master node, and/or, there is a slave node in the second node domain that has not been assigned a device number by the second master node the second slave node, the data link between the first port and the second port is in a closed state;
    若所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号,则所述第一端口与所述第二端口之间的数据链路处于打开状态,以使得所述第二主节点通过所述第二端口和所述第一端口向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。If all first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node , then the data link between the first port and the second port is in an open state, so that the second master node communicates with the first slave node through the second port and the first port The enumeration message is sent multiple times, and multiple routing paths between the second master node and the first slave node are determined based on at least one switch.
  11. 根据权利要求9-10任意一项所述的系统,其特征在于,所述第二主节点为与所述第一主节点通过交换机进行网络连接的远程主节点;所述第二主节点,具体用于:通过网络连接访问所述第一节点域内的所述第一从节点,以调用所述第一从节点中的计算资源或者读取所述第一从节点中的存储数据。The system according to any one of claims 9-10, wherein the second master node is a remote master node connected to the first master node via a switch; the second master node, specifically The method is used for: accessing the first slave node in the first node domain through a network connection, so as to invoke computing resources in the first slave node or read stored data in the first slave node.
  12. 根据权利要求1至11任一项所述的系统,其特征在于,所述第一主节点为第一终端中的中央处理器CPU;The system according to any one of claims 1 to 11, wherein the first master node is a central processing unit (CPU) in the first terminal;
    所述第一主节点,还用于通过网络连接调用至少一个第二从节点的计算资源或读取所述至少一个第二从节点中的存储数据;所述第二从节点为第二终端中的图像处理器GPU、固态硬盘、加速器、网卡或张量处理单元TPU。The first master node is also used to call the computing resources of at least one second slave node or read the stored data in the at least one second slave node through a network connection; the second slave node is the Image processor GPU, solid state drive, accelerator, network card or tensor processing unit TPU.
  13. 根据权利要求1-12任意一项所述的系统,其特征在于,所述从节点为GPU、固态硬盘、加速器、网卡、TPU、嵌入式神经网络处理器NPU、数字信号处理器DSP、图像信号处理器ISP或交换机中的任意一种。The system according to any one of claims 1-12, wherein the slave node is a GPU, a solid state drive, an accelerator, a network card, a TPU, an embedded neural network processor NPU, a digital signal processor DSP, an image signal Either processor ISP or switch.
  14. 根据权利要求1-13任意一项所述的系统,其特征在于,所述主节点包括一个或多个中央处理器CPU。The system according to any one of claims 1-13, wherein the master node includes one or more central processing units (CPUs).
  15. 一种通信方法,其特征在于,应用于总线系统,所述总线系统是由多个主节点、多个交换机和多个从节点通过总线构成的图形结构;所述图形结构中的任意一个多路径总线子系统,包括第一主节点、第一从节点和N个交换机;所述N个交换机包括与所述第一主节点邻接的N1个第一交换机,N2个第二交换机,以及与所述第一从节点邻接的N3个第三交换机;其中,任意一个第一交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N1、N2、N3均为小于或者等于N的正整数;所述方法包括:A communication method is characterized in that it is applied to a bus system, and the bus system is a graph structure formed by a plurality of master nodes, a plurality of switches and a plurality of slave nodes through the bus; any multipath in the graph structure The bus subsystem includes a first master node, a first slave node, and N switches; the N switches include N1 first switches adjacent to the first master node, N2 second switches, and the N2 second switches connected to the first master node N3 third switches adjacent to the first slave node; wherein, any one of the first switches is adjacent to any one of the third switches, or is connected through one or more second switches; N1, N2, and N3 are all less than or equal to N positive integer; the method includes:
    通过所述第一主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第一主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第一主节点到所述第一从节点经由的交换机;其中,每条路由路径至少依次经由所述N1个第一交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个;S为小于或者等于N3的自然数。Send enumeration messages to the first slave node multiple times through the first master node, and determine multiple routing paths between the first master node and the first slave node based on at least one switch ; The at least one switch is the switch through which each report message passes from the first master node to the first slave node; wherein, each routing path passes through at least one of the N1 first switches in turn or more, S of the N2 second switches, and one or more of the N3 third switches; S is a natural number less than or equal to N3.
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:The method according to claim 15, further comprising:
    通过所述第一主节点,基于发送的所述枚举报文,查询所述第一从节点的路由状态寄存器中的可视比特,若所述可视比特为0,则向所述第一从节点分配相应的设备编号;其中,所述可视比特为0用于指示所述第一从节点当前未被枚举发现;Through the first master node, based on the number of reports sent, query the visible bit in the routing status register of the first slave node, if the visible bit is 0, then send to the first The corresponding device number is allocated from the node; wherein, the visible bit is 0 to indicate that the first slave node is not currently enumerated and found;
    通过所述第一从节点,将所述第一主节点分配的设备编号保存至所述路由状态寄存器中,并将所述可视比特设置为1;其中,所述可视比特为1用于指示所述第一从节点当前已被枚举发现。Through the first slave node, the device number assigned by the first master node is saved in the routing status register, and the visible bit is set to 1; wherein, the visible bit is 1 for Indicates that the first slave node is currently discovered by enumeration.
  17. 根据权利要求15-16任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 15-16, wherein the method further comprises:
    通过所述第一主节点,向所述第一从节点发送第一配置报文,以获取对所述第一从节点的管理权限;所述第一配置报文携带所述第一主节点的主节点密码和主节点编号;Send a first configuration message to the first slave node through the first master node, so as to obtain management authority for the first slave node; the first configuration message carries the first master node’s Masternode password and masternode number;
    通过所述第一从节点,接收所述第一配置报文,并基于所述第一配置报文,将所述第一从节点的管理主节点信息寄存器中的签名比特设置为1;其中,所述签名比特为1用于指示所述第一从节点当前已拥有管理主节点,所述系统中的其他主节点不能获取对所述第一从节点的管理权限;The first configuration message is received by the first slave node, and based on the first configuration message, the signature bit in the management master node information register of the first slave node is set to 1; wherein, The signature bit being 1 is used to indicate that the first slave node currently has a management master node, and other master nodes in the system cannot obtain management authority to the first slave node;
    通过所述第一从节点,将所述第一主节点的主节点密码和主节点编号保存至所述管理主节点信息寄存器中。Save the master node password and master node number of the first master node into the management master node information register through the first slave node.
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:The method according to claim 17, further comprising:
    通过所述第一主节点,向所述第一从节点发送第二配置报文,以取消对所述第一从节点的管理权限;所述第二配置报文携带所述第一主节点的主节点密码和主节点编号;Send a second configuration message to the first slave node through the first master node to cancel the management authority of the first slave node; the second configuration message carries the first master node’s Masternode password and masternode number;
    通过所述第一从节点,接收所述第二配置报文,若所述第二配置报文携带的所述第一主节点的主节点密码和主节点编号与所述管理主节点信息寄存器中保存的主节点密码和主节点编号一致,则将所述管理主节点信息寄存器中的所述签名比特设置为0;其中,所述签名比特为0用于指示所述第一从节点当前未拥有管理主节点。The first slave node receives the second configuration message, if the master node password and the master node number of the first master node carried in the second configuration message are the same as those in the management master node information register The saved master node password is consistent with the master node number, then the signature bit in the management master node information register is set to 0; wherein, the signature bit is 0 to indicate that the first slave node currently does not have Manage master nodes.
  19. 根据权利要求17所述的方法,其特征在于,所述方法还包括:The method according to claim 17, further comprising:
    通过所述第一主节点,在获取到对所述第一从节点的管理权限后,响应于所述第一从节点发送的查询消息,向所述第一从节点发送在位消息;或者,按照第一时间间隔向所述第一从节点发送所述在位消息。Through the first master node, after obtaining the management authority to the first slave node, sending an in-position message to the first slave node in response to the query message sent by the first slave node; or, Sending the presence message to the first slave node according to a first time interval.
  20. 根据权利要求19所述的方法,其特征在于,所述方法还包括:The method according to claim 19, further comprising:
    通过所述第一从节点,在满足预设条件的情况下,将所述管理主节点信息寄存器中的所述签名比特设置为0,以取消所述第一主节点对所述第一从节点的管理权限;其中,Through the first slave node, if the preset condition is met, the signature bit in the management master node information register is set to 0, so as to cancel the first master node from the first slave node administrative privileges for ; where,
    所述预设条件包括:在所述第一主节点获取到对所述第一从节点的管理权限后,所述第一从节点在预设时间内未接收到所述第一主节点发送的所述在位消息,或者,所述第一从节点在向所述第一主节点发送了K次查询消息后,均未接收到所述第一主节点发送的所述在位消息;K为大于或者等于1的整数。The preset condition includes: after the first master node obtains the management authority for the first slave node, the first slave node does not receive the message sent by the first master node within a preset time. The in-position message, or the first slave node has not received the in-position message sent by the first master node after sending K times of query messages to the first master node; K is An integer greater than or equal to 1.
  21. 根据权利要求20所述的方法,其特征在于,所述方法还包括:The method according to claim 20, further comprising:
    通过所述第一从节点,向所述系统中的至少一个主节点发送广播消息;所述广播消息用于指示所述第一从节点当前未拥有管理主节点;Sending a broadcast message to at least one master node in the system through the first slave node; the broadcast message is used to indicate that the first slave node currently does not own a management master node;
    通过所述至少一个主节点,接收所述广播消息,并基于所述广播消息向所述第一从节点发送所述第一配置报文,以获取对所述第一从节点的管理权限。The broadcast message is received by the at least one master node, and the first configuration message is sent to the first slave node based on the broadcast message, so as to obtain management authority for the first slave node.
  22. 根据权利要求15-21任意一项所述的方法,其特征在于,所述多路径总线子系统还包括第二主节点;所述N个交换机还包括与所述第二主节点邻接的N4个第四交换机;其中,任意一个第四交换机与任意一个第三交换机邻接,或者通过一个或多个第二交换机连接;N4为小于或者等于N的正整数;所述方法还包括:The method according to any one of claims 15-21, wherein the multipath bus subsystem further comprises a second master node; the N switches further comprise N4 adjacent to the second master node A fourth switch; wherein, any fourth switch is adjacent to any third switch, or is connected through one or more second switches; N4 is a positive integer less than or equal to N; the method also includes:
    通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径;所述至少一个交换机为每次枚举报文从所述第二主节点到所述第一从节点经由的交换机;其中,所述多条路由路径中的每条路由路径至少依次经由所述N4个第四交换机中的一个或多个、所述N2个第二交换机中的S个、以及所述N3个第三交换机中的一个或多个。Send enumeration messages to the first slave node multiple times through the second master node, and determine multiple routing paths between the second master node and the first slave node based on at least one switch ; The at least one switch is the switch through which each report message passes from the second master node to the first slave node; wherein, each routing path in the plurality of routing paths passes at least sequentially through the One or more of the N4 fourth switches, S of the N2 second switches, and one or more of the N3 third switches.
  23. 根据权利要求22所述的方法,其特征在于,所述多路径总线子系统包括第一节点域和第二节点域;所述第一节点域内包括所述第一主节点和多个所述第一从节点,所述第二节点域内包括所述第二主节点和多个第二从节点;所述通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径,包括:The method according to claim 22, wherein the multipath bus subsystem includes a first node domain and a second node domain; the first node domain includes the first master node and a plurality of the second node domains A slave node, the second node domain includes the second master node and multiple second slave nodes; the second master node sends an enumeration message to the first slave node multiple times, And based on at least one switch, determining multiple routing paths between the second master node and the first slave node, including:
    在所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号后,通过所述第二主节点,向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。All first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node Afterwards, through the second master node, send enumeration report messages to the first slave node multiple times, and based on at least one switch, determine a plurality of information between the second master node and the first slave node routing path.
  24. 根据权利要求23所述的方法,其特征在于,所述N个交换机中还包括属于第一节点域的第一跨节点域交换机,以及属于第二节点域的第二跨节点域交换机;所述第一跨节点域交换机的第一端口与所述第二跨节点域交换机的第二端口连接;The method according to claim 23, wherein the N switches further include a first cross-node domain switch belonging to the first node domain, and a second cross-node domain switch belonging to the second node domain; The first port of the first cross-node domain switch is connected to the second port of the second cross-node domain switch;
    若所述第一节点域内的部分或者全部第一从节点还未被所述第一主节点分配设备编号,和/或,所述第二节点域内的部分或者全部第二从节点还未被所述第二主节点分配设备编号,则所述第一端口和第二端口之间的数据链路处于关闭状态;If some or all of the first slave nodes in the domain of the first node have not been assigned device numbers by the first master node, and/or, some or all of the second slave nodes in the domain of the second node have not been assigned device numbers If the second master node assigns a device number, the data link between the first port and the second port is in a closed state;
    若所述第一节点域内的全部第一从节点均被所述第一主节点分配设备编号,并且,所述第二节点域内的全部第二从节点均被所述第二主节点分配设备编号,则所述第一端口与所述第二端口之间的数据链路处于打开状态,以使得所述第二主节点通过所述第二端口和所述第一端口向所述第一从节点多次发送枚举报文,并基于至少一个交换机,确定所述第二主节点与所述第一从节点之间的多条路由路径。If all first slave nodes in the first node domain are assigned device numbers by the first master node, and all second slave nodes in the second node domain are assigned device numbers by the second master node , then the data link between the first port and the second port is in an open state, so that the second master node communicates with the first slave node through the second port and the first port The enumeration message is sent multiple times, and multiple routing paths between the second master node and the first slave node are determined based on at least one switch.
  25. 根据权利要求23-24任意一项所述的方法,其特征在于,所述第二主节点为与所述第一主节点通过交换机进行网络连接的远程主节点;所述方法还包括:通过所述第二主节点,通过网络连接访问所述第一节点域内的所述第一从节点,以调用所述第一从节点中的计算资或者读取所述第一从节点中的存储数据。The method according to any one of claims 23-24, wherein the second master node is a remote master node connected to the first master node through a switch; the method further includes: through the The second master node accesses the first slave node in the domain of the first node through a network connection, so as to invoke computing resources in the first slave node or read stored data in the first slave node.
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被计算机或处理器执行时实现上述权利要求15-25所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a computer or a processor, the method described in claims 15-25 above is realized.
  27. 一种计算机程序,其特征在于,所述计算机程序包括指令,当所述计算机程序被计算机或处理器执行时,使得所述计算机或所述处理器执行如权利要求15-25所述的方法。A computer program, characterized in that the computer program includes instructions, and when the computer program is executed by a computer or a processor, the computer or the processor executes the method according to claims 15-25.
PCT/CN2022/105758 2021-09-14 2022-07-14 Bus system, communication method, and related device WO2023040447A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111083972.8A CN115811446A (en) 2021-09-14 2021-09-14 Bus system, communication method and related equipment
CN202111083972.8 2021-09-14

Publications (1)

Publication Number Publication Date
WO2023040447A1 true WO2023040447A1 (en) 2023-03-23

Family

ID=85482023

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105758 WO2023040447A1 (en) 2021-09-14 2022-07-14 Bus system, communication method, and related device

Country Status (2)

Country Link
CN (1) CN115811446A (en)
WO (1) WO2023040447A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497432A (en) * 2011-12-13 2012-06-13 华为技术有限公司 Multi-path accessing method for input/output (I/O) equipment, I/O multi-path manager and system
US20130163474A1 (en) * 2011-12-27 2013-06-27 Prashant R. Chandra Multi-protocol i/o interconnect architecture
CN105721357A (en) * 2016-01-13 2016-06-29 华为技术有限公司 Exchange device, and peripheral component interconnection express (PCIe) system and initialization method thereof
CN106100953A (en) * 2016-05-20 2016-11-09 北京百度网讯科技有限公司 The generation method of PCIe collaborative share network, Apparatus and system
US20200081858A1 (en) * 2018-09-10 2020-03-12 GigaIO Networks, Inc. Methods and apparatus for high-speed data bus connection and fabric management
CN113038299A (en) * 2021-03-02 2021-06-25 深圳市信锐网科技术有限公司 Switch, configuration method, control method and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497432A (en) * 2011-12-13 2012-06-13 华为技术有限公司 Multi-path accessing method for input/output (I/O) equipment, I/O multi-path manager and system
US20130163474A1 (en) * 2011-12-27 2013-06-27 Prashant R. Chandra Multi-protocol i/o interconnect architecture
CN105721357A (en) * 2016-01-13 2016-06-29 华为技术有限公司 Exchange device, and peripheral component interconnection express (PCIe) system and initialization method thereof
CN106100953A (en) * 2016-05-20 2016-11-09 北京百度网讯科技有限公司 The generation method of PCIe collaborative share network, Apparatus and system
US20200081858A1 (en) * 2018-09-10 2020-03-12 GigaIO Networks, Inc. Methods and apparatus for high-speed data bus connection and fabric management
CN113038299A (en) * 2021-03-02 2021-06-25 深圳市信锐网科技术有限公司 Switch, configuration method, control method and storage medium

Also Published As

Publication number Publication date
CN115811446A (en) 2023-03-17

Similar Documents

Publication Publication Date Title
US11394645B2 (en) System and method for supporting inter subnet partitions in a high performance computing environment
CN103905426B (en) For making the host-to-host information receiving and transmitting safety and isolated method and apparatus of PCIe structurally
TWI357561B (en) Method, system and computer program product for vi
US7685335B2 (en) Virtualized fibre channel adapter for a multi-processor data processing system
JP5763873B2 (en) Method, computer program, and data processing system for initializing shared memory for communication between multiple root complexes of a data processing system
JP5315209B2 (en) Using peripheral interconnect I / O virtualization devices to create redundant configurations
JP5362980B2 (en) Method, program, and system for communicating between a first host system and a second host system in a data processing system (for communication between host systems using socket connections and shared memory) System and method)
WO2020233120A1 (en) Scheduling method and apparatus, and related device
TWI515572B (en) System and method for register access in distributed virtual bridge environment
JP6845431B2 (en) Information processing device and control method of information processing device
US9396101B2 (en) Shared physical memory protocol
US20190286488A1 (en) Resource management method, host, and endpoint
US10810036B1 (en) Traffic management on an interconnect
US7136907B1 (en) Method and system for informing an operating system in a system area network when a new device is connected
JP2008152783A (en) Method, program, and system for communicating between first host system and second host system in data processing system (system and method for communication between host systems using transaction protocol and shared memory)
JP2008152787A (en) Method, program and system for hot plugging component into communication fabric running in data processing system (system and method for hot plug/remove of new component in running pcie fabric)
US20100095080A1 (en) Data Communications Through A Host Fibre Channel Adapter
US20060004837A1 (en) Advanced switching peer-to-peer protocol
JP2008539484A (en) Universal serial bus function delegation
WO2014201623A1 (en) Method, apparatus and system for data transmission, and physical network card
US10397096B2 (en) Path resolution in InfiniBand and ROCE networks
US11824793B2 (en) Unlocking computing resources for decomposable data centers
US11601377B1 (en) Unlocking computing resources for decomposable data centers
WO2023040447A1 (en) Bus system, communication method, and related device
US11003618B1 (en) Out-of-band interconnect control and isolation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868824

Country of ref document: EP

Kind code of ref document: A1