WO2013086861A1 - Method for accessing multi-path input/output (i/o) equipment, i/o multi-path manager and system - Google Patents

Method for accessing multi-path input/output (i/o) equipment, i/o multi-path manager and system Download PDF

Info

Publication number
WO2013086861A1
WO2013086861A1 PCT/CN2012/079307 CN2012079307W WO2013086861A1 WO 2013086861 A1 WO2013086861 A1 WO 2013086861A1 CN 2012079307 W CN2012079307 W CN 2012079307W WO 2013086861 A1 WO2013086861 A1 WO 2013086861A1
Authority
WO
WIPO (PCT)
Prior art keywords
hard partition
hard
pci
node
partition
Prior art date
Application number
PCT/CN2012/079307
Other languages
French (fr)
Chinese (zh)
Inventor
雕峻峰
刘云海
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013086861A1 publication Critical patent/WO2013086861A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network

Definitions

  • the present invention relates to the field of communications, and in particular, to a method for multi-path access to an I/O device, an I/O multipath manager, and a system. Background technique
  • computing nodes in network communications such as servers, directly access I/O (Input/Output) devices through PCI-E (Peripheral Component Interconnect-Express).
  • PCI-E Peripheral Component Interconnect-Express
  • Most of them are based on the cluster scenario, that is, each interface (port) of the default PCI-E switch is connected to different systems, and is not well considered based on NUMA (Non-Uniform Memory Access).
  • NUMA Non-Uniform Memory Access
  • System scenario In a NUMA system, all compute nodes are divided into sets of nodes, and each set of nodes is electrically isolated from other sets of nodes. Such a set is called a hard partition (or a large node), and each hard partition It includes one or more computing nodes, each of which has an RC (Root Complex) that can be connected to PCI-E.
  • RC Root Complex
  • a NUMA system is composed of multiple computing nodes aggregated through a NUMA network.
  • the hardware resources of these physical nodes are managed by an OS (Operating System) or a hypervisor.
  • OS Operating System
  • a hypervisor Take the NUMA system as an example.
  • Figure 1 a network diagram showing only one large node (or hard partition), including NUMA office and network 1, hard partition 2 including 3 compute nodes 21 ⁇ 23, PCI- E switch 3, external I/O device 4, each compute node has at least 1 CPU (Central Processing Unit), an NC (Node Controller), located at ⁇ (Input/Output Hub, The root complex (RC) in the input and output hub), the external I/O device 6 includes a Fibre Channel network card 31 and an Ethernet card 32.
  • CPU Central Processing Unit
  • NC Node Controller
  • the computing node 21 is the master node, and assumes the tasks of primary partition initiation and resource management, and the computing nodes 22-24 are slave nodes.
  • the three compute nodes form a 6-way system through a NUMA aggregation network (actually, this PCI-E switch is also connected to other hard-partitioned compute nodes in the NUMA system).
  • the compute node 21 is externally connected to the PCI-E link of the PCI-E switch.
  • 01 is a solid line display, and the PCI-E links 02 and 03 of the computing nodes 22 to 23 connected to the PCI-E switch are dotted lines.
  • FIG. 2 it is a schematic diagram of the resource information table of the system shown in Figure 1.
  • the compute nodes 22 ⁇ 23 cannot directly see I/O devices such as Ethernet cards and Fibre Channel network cards.
  • To access I/O devices such as Ethernet cards To pass through the NUMA aggregation network through the NUMA link, and then access through the link of the compute node 21. This increases the delay.
  • the primary node itself may have a bottleneck of input and output, and the PCI-E links that are connected to other slave nodes have no effect, and the bandwidth utilization is low. Summary of the invention
  • Embodiments of the present invention provide a method for multipath access I/O device, an I/O multipath manager, and a system, which can effectively enable a PCI-E link between all computing nodes and a PCI-E switch, eliminating the Bottlenecks in case of latency and high traffic increase bandwidth utilization.
  • a method for accessing an I/O device by using a multi-path includes: configuring a PCI-E switch according to the received configuration information of the first hard partition, to isolate the a hard partition other than the first hard partition, the computing node of the first hard partition accessing only the I/O device of the first hard partition; establishing the location according to the received configuration information of the first hard partition a mapping relationship between the computing node of the first hard partition and the I/O device of the first hard partition, so that the operating system instructs the computing node that performs the I/O task to access the first hard according to the mapping relationship Partitioned I/O devices.
  • an I/O multipath manager includes: a PCI-E switch configuration module, configured to configure a peripheral component to quickly interconnect a standard PCI-E switch according to the received configuration information of the first hard partition, to isolate And other hard partitions other than the first hard partition, so that the computing node of the first hard partition accesses only the I/O device of the first hard partition;
  • An I/O multipath configuration module configured to establish, according to the received configuration information of the first hard partition, a mapping relationship between the computing node of the first hard partition and the I/O device of the first hard partition And causing the operating system to instruct the computing node performing the I/O task to access the I/O device of the first hard partition according to the mapping relationship.
  • a system for providing multi-path access to an I/O device includes:
  • An I/O multipath manager configured to configure a PCI-E switch according to the received configuration information of the first hard partition, to isolate other hard partitions except the first hard partition, so that the first hard partition
  • the computing node only accesses the I/O device of the first hard partition, and establishes the computing node of the first hard partition and the I/ of the first hard partition according to the received configuration information of the first hard partition.
  • the I/O multipath manager is located in the firmware Or an operating system; an aggregation network, configured to connect computing nodes in the system, so that the system controls the computing node through an operating system; at least two hard partitions, wherein each hard partition includes at least one computing node;
  • a PCI-E switch configured to establish a connection between the computing node and the I/O device, so that the computing node accesses the computing node by using a PCI-E link established between itself and the PCI-E switch I/O device belonging to a hard partition;
  • An I/O device configured to connect between the computing node and an external network; a storage device, configured to store firmware, an operating system, an I/O application.
  • the method for multi-path access I/O device, the I/O multi-path manager and the system provided by the embodiments of the present invention enable the PCI-E link between all the computing nodes and the PCI-E switch to be valid, so that the slave node It can also access I/O devices through its own PCI-E link, eliminating bottlenecks in case of delay and high traffic, and improving bandwidth utilization.
  • FIG. 1 is a schematic diagram of a networking diagram of a prior art NUMA system
  • FIG. 2 is a schematic structural diagram of a system resource information table of a prior art NUMA system
  • FIG. 3 is a schematic flowchart of a method for multipath accessing an I/O device according to an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a method for accessing an I/O device for multi-path access according to an embodiment of the present invention
  • FIG. 5 is a schematic flowchart of a method for accessing an I/O device for multi-path access according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram of the networking of the NUMA system of the multi-path access I/O device according to the embodiment of the present invention
  • FIG. 8 is a schematic diagram of the networking of the NUMA system of the multi-path access I/O device according to the embodiment of the present invention
  • FIG. 9 is a schematic structural diagram of a system resource information table of a NUMA system according to an embodiment of the present invention
  • FIG. 10 is a schematic diagram of a system resource information table of a NUMA system according to an embodiment of the present invention
  • FIG. 11 is a schematic structural diagram of a system resource information table of an SMP system according to an embodiment of the present invention.
  • the method for the multi-path access I/O device provided by the embodiment of the present invention, as shown in FIG. 3, includes:
  • a method for accessing an I/O device by using a multi-path according to an embodiment of the present invention includes:
  • the master node invokes an I/O multipath manager, and receives configuration information for analyzing the first hard partition.
  • the configuration information generally comes from a system management module (not shown) that manages the entire NUMA system by running management software.
  • the I/O multipath manager analyzes the number of RCs in the first hard partitioned computing node in the configuration information, analyzes the number of I/O devices of the first hard partition, and types of devices, and identifies PCI.
  • -E A corresponding port of the switch with the first hard partitioned compute node and a corresponding port of the PCI-E switch with the first hard partitioned I/O device.
  • the master node invokes the I/O multipath manager to configure the PCI-E switch to isolate other hard partitions except the first hard partition.
  • the corresponding port of the PCI-E switch and the first hard partitioned computing node and the corresponding port of the first hard partitioned I/O device in the PCI-E switch are configured as a virtual switch to isolate
  • the other hard partitioned I/O devices and I/O accesses other than the first hard partition enable the first hard partitioned compute node to access only the first hard partitioned I/O device.
  • the master node searches for an I/O device and boots the slave node to start initialization. Specifically, the master node scans the first node of the first hard partition according to the number of the first hard partitioned computing nodes and the number and type of the first hard partitioned I/O devices. A hard-partitioned I/O device bus that searches for valid I/O devices. After searching for I/O devices, each of the RCs in the master node and the searched I/O devices are assigned addresses and memory, and are booted after the scan is complete. Initialize from the node. S205. The master node invokes the I/O multipath manager to establish a mapping relationship between the compute node of the first hard partition and the I/O device of the first hard partition.
  • the master node invokes the I/O multipath manager to send the address of the first hard partition I/O device to the RC of each slave node through a pointer, and the pointer points to the address of the above I/O device, so that A mapping relationship is established between the compute node of the first hard partition and the I/O device.
  • the master node invokes an I/O multipath manager to form a system resource information table.
  • the master node invokes the I/O multipath manager to form a system resource information table and sends a pointer of the system resource information table to the operating system, where the system resource information table includes the first hard partition computing node and the I/O device.
  • S207. Receive an I/O task, and allocate hardware resources for the I/O task according to the system resource information table. Specifically, the operating system receives the I/O task into the I/O task queue, invokes the system resource information table through the pointer of the system resource information table, and then determines the processor that executes the current I/O task and allocates the memory according to the I/O. The type of task determines which I/O device to access.
  • the operating system instructs the processor that performs the I/O task to access the I/O device through the shortest path according to the system resource information table. Specifically, the operating system selects one according to a mapping relationship between the first hard partitioned computing node and the I/O device in the system resource information table, and a PCI-E link available between the computing node and the PCI-E switch. Perform the shortest path of the current I/O task. In general, this path is the PCI-E link between the compute node and the PCI-E switch where the processor itself performs the current I/O task.
  • the primary node needs to exit the hard partition due to a failure or resource reallocation, etc., as shown in the figure
  • the system management module receives the master node to issue an exit request instruction. Normally, when the primary node of the first hard partition needs to exit the first hard partition due to a failure or resource reallocation, the primary node sends an exit request command to the system management software through the system management module.
  • the system management module receives an exit response command from the master node, instructing the master node to exit the first hard partition.
  • the system management module After receiving the exit request instruction, the system management module sends an instruction to upgrade to the new primary node to one of the slave nodes. After receiving the exit request instruction, the management module of the system selects one of the slave nodes according to the policy of the system, and sends an instruction to the new master node.
  • the new master node receives hardware resource information and I/O tasks from the original master node.
  • the hardware resource information includes processor information for performing an I/O task, information of an I/O device that needs to be accessed to perform an I/O task, memory information, and PCI-E link information that is required to perform an I/O task. 5305.
  • the original primary node exits the first hard partition and waits for maintenance or reallocation.
  • the new master node updates the system resource information table.
  • the new primary node enables the I/O multipath manager to configure the PCI-E switch according to the updated system resource information table, and isolates other hard partitions other than the first hard partition, and the preparation method is the same as step 203 in FIG. It is exactly the same and will not be described again.
  • the system in this embodiment may be a NUMA system or an SMP system, and the computing node may be a server.
  • the method for multi-path accessing an I/O device provided by the embodiment of the present invention enables a PCI-E link between all computing nodes and a PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device.
  • the I/O multipath manager 10 provided by the embodiment of the present invention, as shown in FIG. 6, includes: a call function interface 101, where a master node in a compute node of a first hard partition is invoked through an operating system or firmware The I/O multipath manager.
  • the hard partition resource analysis module 102 is configured to receive configuration information of the first hard partition, analyze the number and address of the RC in the first hard partition of the configuration information, and analyze the number of I/O devices in the first hard partition. And a type of the device, identifying a corresponding port of the PCI-E switch with the compute node of the first hard partition, and a corresponding port of the PCI-E switch with the I/O device of the first hard partition.
  • the PCI-E switch configuration module 103 is configured to configure a PCI-E switch according to the received configuration information of the first hard partition to isolate other hard partitions except the first hard partition, so that the first hard partition compute node only Access the I/O device of the first hard partition.
  • the I/O multipath configuration module 104 is configured to establish a mapping relationship between the first hard partitioned computing node and the first hard partitioned I/O device according to the received configuration information of the first hard partition, so that The operating system instructs the computing node performing the I/O task to access the I/O device of the first hard partition according to the mapping relationship.
  • the I/O multipath manager provided by the embodiment of the present invention enables the PCI-E link between all the computing nodes and the PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device, thereby The slave node can also access the I/O device through its own PCI-E link, thereby eliminating the bottleneck in the case of delay and high traffic, and improving the bandwidth utilization.
  • the fourth embodiment of the present invention provides a multi-path access I/O device.
  • the NUMA system is taken as an example.
  • the method includes: an I/O multipath manager 10 as shown in FIG.
  • the firmware 51 is configured to configure the PCI-E switch 3 according to the received configuration information of the first hard partition 2 to isolate other hard partitions except the first hard partition 2, so that the computing nodes 21-22 access only the first The I/O device 4 of the hard partition 2, and establishing a mapping relationship between the computing nodes 21 ⁇ 23 and the I/O device 4 according to the received configuration information of the first hard partition 2, so that the computing nodes 21 ⁇ 23 and the PCI- The PCI-E links 01 ⁇ 03 between the E switches 3 become valid.
  • the PCI-E links 01 ⁇ 03 are all solid lines. Then, the I/O multipath manager 10 compares the mapping relationship between the compute nodes 21-23 and the I/O device 4, the processor information, the memory information, and the available PCI between the compute nodes 21-23 and the PCI-E switch 3.
  • the E links 01 ⁇ 03 are associated to form a system resource information table.
  • the operating system 52 accesses the I/O device according to the shortest path according to the indication computer points 21 ⁇ 23 according to the system resource information table, which is generally the shortest.
  • the path is the link between the compute node itself and the PCI-E switch that performs the I/O task.
  • NUMA aggregation network 1 is used to connect all compute nodes through NC aggregation and control all compute nodes through an operating system.
  • At least one hard partition 2 (only the first hard partition 2 is drawn in FIG. 7, other hard partitions are not shown), including one master node 21 and two slave nodes 22, 23, of course, more slave nodes can be added.
  • each computing node includes: a node controller NC, used for computing nodes and NUMA converged network connection; two CPUs for performing I/O tasks; one RC for I/O device scanning and compute node connection to PCI-E corresponding port.
  • the RC is located in an IOH (Input-Output Hub), and the RC can also be located in a CPU or a MUX (Multiplexer).
  • the above computing node can be a server.
  • the PCI-E switch 3 is configured to establish a link between the computing nodes 21 ⁇ 23 of the first hard partition 2 and the I/O device 4 of the first hard partition, as shown in FIG. 7, the calculation of the first hard partition 2.
  • the nodes 21 ⁇ 23 and the PCI-E switch link are both solid lines, and the compute nodes 21 ⁇ 23 can directly access the I/O device 4 of the first hard partition 2 through the own link 01-03.
  • the PCI-E switch is also connected to other hard partitions, not shown in the figure.
  • the I/O device 4 includes a Fibre Channel (FC) network card 41 and an Ethernet card for connecting the computing nodes to the external network.
  • the storage device 5 is for storing the firmware 51, the operating system 52, and the firmware 51 includes an I/O multipath manager 511.
  • Another multi-path access I/O device system provided by this embodiment takes a NUMA system as an example. As shown in FIG. 8, the storage device 5 is configured to store firmware 51, an operating system 52, and an I/O multipath manager. 10 is located in the operating system 52, and the rest is exactly the same as the system shown in FIG. 4, and will not be described again.
  • the system resource information table of the NUMA system is shown in FIG. 9.
  • the system of the multi-path access I/O device provided by the embodiment of the present invention enables the PCI-E link between all the computing nodes and the PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device. Therefore, the slave node can also access the I/O device through its own PCI-E link, thereby eliminating the bottleneck in the case of delay and high traffic, and improving the bandwidth utilization.
  • the fifth embodiment of the present invention provides a multi-path access I/O device.
  • the SMP (Symmetric Multi-Process) system is used as an example. As shown in FIG.
  • the method includes: /O Multipath Manager 10, located in firmware 51, for receiving
  • the configuration information of the second hard partition 2a is configured to configure the PCI-E switch 3 to isolate other hard partitions except the second hard partition, so that the computing node 2al ⁇ 2a2 only accesses the I/O device 4 of the second hard partition 2a.
  • the roads 01 ⁇ 03 are all valid.
  • the PCI-E links 01 ⁇ 03 are all solid lines.
  • the I/O multipath manager 10 compares the mapping relationship between the compute nodes 2al ⁇ 2a3 and the I/O device 4, the processor information, the memory information, and the available PCI between the compute nodes 2al ⁇ 2a3 and the PCI-E switch 3.
  • the E links 01 ⁇ 03 are associated to form a system resource information table.
  • the operating system 52 accesses the I/O device according to the shortest path according to the indication computer point 2al ⁇ 2a3 according to the system resource information table, which is generally the shortest.
  • the path is the link between the compute node itself and the PCI-E switch that performs the I/O task.
  • the I/O multipath manager 10 can also be located in the operating system 52 (not shown in FIG. 10).
  • the SMP aggregation network 1 is used to directly interconnect the CPUs of all computing nodes, does not require an NC, and controls all computing nodes through an operating system.
  • At least two hard partitions (only the second hard partition 2a is shown in FIG. 10, other hard partitions are not shown), including one master node 2al and two slave nodes 2a2, 2a3, and of course, more slave nodes can be added.
  • Each compute node includes: two CPUs for performing I/O tasks and direct interconnection between the nodes; and an RC for calculating the connection of the node to the corresponding port of the PCI-E.
  • the RC is located in the IOH, and the RC may also be located in the CPU or the MUX.
  • the above computing node can be a server.
  • the rest of the system provided by this embodiment is completely the same as the system shown in FIG. 7, and will not be described again.
  • the system resource table of the system is shown in FIG.
  • the system of the multi-path access I/O device provided by the embodiment of the present invention enables the PCI-E link between all the computing nodes and the PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device. Therefore, the slave node can also access the I/O device through its own PCI-E link, thereby eliminating the bottleneck in the case of delay and high traffic, and improving the bandwidth utilization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method for accessing multi-path input/output (I/O) equipment, an I/O multi-path manager and system relate to the field of information technologies (IT). With the invention, Peripheral component interconnect-express (PCI-E) links between all computing nodes and a PCI-E switch can be effective, and slave nodes also can access the I/O equipment through own PCI-E links, so that the time delay and the bottleneck under the condition of high service traffic are eliminated, and the utilization rate of the bandwidth is increased. The method comprises the following steps of: configuring, according to the received configuration information of the first hard zone, the PCI-E switch to make computing nodes of the first hard zone only access the I/O equipment of the first hard zone; and establishing,according to the received configuration information of the first hard zone, a mapping relationship between the computing nodes of the first hard zone and the I/O equipment of the first hard zone to make an operating system indicate computing nodes executing I/O tasks to access the I/O equipment of the first hard zone according to the mapping relationship. The embodiment of the invention is used for the multi-path accessing of the I/O equipment.

Description

一种多路径访问 I/O设备的方法、 I/O多路径管理器及系统 技术领域  Method for multi-path access I/O device, I/O multi-path manager and system
本发明涉及通信领域, 尤其涉及一种多路径访问 I/O设备的方法、 I/O 多路径管理器及系统。 背景技术  The present invention relates to the field of communications, and in particular, to a method for multi-path access to an I/O device, an I/O multipath manager, and a system. Background technique
目前, 网络通信中的计算节点, 例如服务器, 通过 PCI-E ( Peripheral Component Interconnect-Express ,夕卜设组件快速互连标准)来直接对夕卜访问 I/O ( Input/Output, 输入输出)设备大多基于集群(Cluster )场景, 即默认 PCI-E 交换机的各个接口 (port )都是连接到不同的系统上, 并没有很好地考虑基 于 NUMA ( Non-Uniform Memory Access , 非一致性内存访问) 系统的场景。 在 NUMA系统中,将所有计算节点分成若干个节点集合,且每个节点集合都 与其他的节点集合电气隔离, 这样的一个集合就叫做硬分区 (或者说是一个 大节点) , 每个硬分区内包括一个或多个计算节点, 每个计算节点都有 RC ( Root Complex , 根集合体) , 可以对外连接 PCI-E。  Currently, computing nodes in network communications, such as servers, directly access I/O (Input/Output) devices through PCI-E (Peripheral Component Interconnect-Express). Most of them are based on the cluster scenario, that is, each interface (port) of the default PCI-E switch is connected to different systems, and is not well considered based on NUMA (Non-Uniform Memory Access). System scenario. In a NUMA system, all compute nodes are divided into sets of nodes, and each set of nodes is electrically isolated from other sets of nodes. Such a set is called a hard partition (or a large node), and each hard partition It includes one or more computing nodes, each of which has an RC (Root Complex) that can be connected to PCI-E.
一个 NUMA系统是由多个计算节点通过 NUMA网络聚合而成, 由一个 OS ( Operating System, 操作系统)或管理程序 (Hypervisor )统一管理这些 物理节点的硬件资源。 以 NUMA系统为例, 如图 1所示为只显示了一个大节点(或硬分区)的 组网图,包括 NUMA局和网络 1、包括 3个计算节点 21~23的硬分区 2、PCI-E 交换机 3、 外部 I/O设备 4, 其中每个计算节点有至少 1 颗 CPU ( Central Processing Unit, 中央处理器) 、 一个 NC ( Node Controller, 节点控制器) , 位于 ΙΟΗ ( Input/Output Hub, 输入输出集线器)中的根组件 ( Root Complex, RC ) , 外部 I/O设备 6包括光纤通道网卡 31和以太网卡 32。 假定计算节点 21是主节点, 承担主要的分区启动、 资源管理的任务, 计算节点 22~24为从 节点。 3个计算节点通过 NUMA聚合网络, 总共组成一个 6路系统(实际上 这个 PCI-E交换机还连接着 NUMA系统中其他的硬分区的计算节点 ) 。 如图 1所示的系统里,计算节点 21对外连接 PCI-E交换机的 PCI-E链路 01是实线显示,计算节点 22~23对外连接 PCI-E交换机的 PCI-E链路 02、 03 是虚线显示。 这是因为软件(例如 Firmware, OS、 I/O应用程序等)只能通 过主节点连接到 PCI-E交换机的链路才能看到以太网卡等 10设备, 而软件 是无法通过从节点访问到有效的 I/O设备。 如图 2所示, 为图 1所示系统的 资源信息表示意图, 计算节点 22~23无法直接看到以太网卡和光纤通道网卡 等 I/O设备,要访问以太网卡等 I/O设备,必须要通过 NUMA链路经过 NUMA 聚合网络, 再通过计算节点 21的链路来访问。 这样就增加了延时, 在业务量 较大的情况下主节点本身会出现输入输出的瓶颈, 而且其他从节点向外连接 的 PCI-E链路没有起到作用, 带宽利用率低。 发明内容 A NUMA system is composed of multiple computing nodes aggregated through a NUMA network. The hardware resources of these physical nodes are managed by an OS (Operating System) or a hypervisor. Take the NUMA system as an example. As shown in Figure 1, a network diagram showing only one large node (or hard partition), including NUMA office and network 1, hard partition 2 including 3 compute nodes 21~23, PCI- E switch 3, external I/O device 4, each compute node has at least 1 CPU (Central Processing Unit), an NC (Node Controller), located at ΙΟΗ (Input/Output Hub, The root complex (RC) in the input and output hub), the external I/O device 6 includes a Fibre Channel network card 31 and an Ethernet card 32. It is assumed that the computing node 21 is the master node, and assumes the tasks of primary partition initiation and resource management, and the computing nodes 22-24 are slave nodes. The three compute nodes form a 6-way system through a NUMA aggregation network (actually, this PCI-E switch is also connected to other hard-partitioned compute nodes in the NUMA system). In the system shown in FIG. 1, the compute node 21 is externally connected to the PCI-E link of the PCI-E switch. 01 is a solid line display, and the PCI-E links 02 and 03 of the computing nodes 22 to 23 connected to the PCI-E switch are dotted lines. This is because software (such as firmware, OS, I/O applications, etc.) can only see 10 devices such as Ethernet cards through the link of the master node connected to the PCI-E switch, and the software cannot be accessed through the slave node. I/O device. As shown in Figure 2, it is a schematic diagram of the resource information table of the system shown in Figure 1. The compute nodes 22~23 cannot directly see I/O devices such as Ethernet cards and Fibre Channel network cards. To access I/O devices such as Ethernet cards, To pass through the NUMA aggregation network through the NUMA link, and then access through the link of the compute node 21. This increases the delay. In the case of a large amount of traffic, the primary node itself may have a bottleneck of input and output, and the PCI-E links that are connected to other slave nodes have no effect, and the bandwidth utilization is low. Summary of the invention
本发明的实施例提供一种多路径访问 I/O设备的方法、 I/O多路径管 理器及系统,能够使所有计算节点与 PCI-E交换机之间的 PCI-E链路有效, 消除了时延和高业务量情况下的瓶颈, 提高了带宽的利用率。 本发明的实施例采用如下技术方案: 一方面, 提供一种多路径访问 I/O设备的方法, 包括: 根据接收到的第一硬分区的配置信息配置 PCI-E交换机, 以隔离除所 述第一硬分区之外的其他硬分区, 使所述第一硬分区的计算节点只访问所 述第一硬分区的 I/O设备; 根据接收到的所述第一硬分区的配置信息建立所述第一硬分区的计 算节点和所述第一硬分区的 I/O设备之间的映射关系, 以使操作系统根据 所述映射关系指示执行 I/O任务的计算节点访问所述第一硬分区的 I/O设 备。 一方面, 一种 I/O多路径管理器, 包括: PCI-E交换机配置模块, 用于根据接收到的第一硬分区的配置信息配 置外设组件快速互连标准 PCI-E交换机, 以隔离除所述第一硬分区之外的 其他硬分区, 使所述第一硬分区的计算节点只访问所述第一硬分区的 I/O 设备; I/O多路径配置模块, 用于根据接收到的所述第一硬分区的配置信息 建立所述第一硬分区的计算节点和所述第一硬分区的 I/O设备之间的映射 关系, 以使操作系统根据所述映射关系指示执行 I/O任务的计算节点访问 所述第一硬分区的 I/O设备。 另一方面, 提供一种多路径访问 I/O设备的系统, 包括: Embodiments of the present invention provide a method for multipath access I/O device, an I/O multipath manager, and a system, which can effectively enable a PCI-E link between all computing nodes and a PCI-E switch, eliminating the Bottlenecks in case of latency and high traffic increase bandwidth utilization. The embodiment of the present invention adopts the following technical solutions: In one aspect, a method for accessing an I/O device by using a multi-path includes: configuring a PCI-E switch according to the received configuration information of the first hard partition, to isolate the a hard partition other than the first hard partition, the computing node of the first hard partition accessing only the I/O device of the first hard partition; establishing the location according to the received configuration information of the first hard partition a mapping relationship between the computing node of the first hard partition and the I/O device of the first hard partition, so that the operating system instructs the computing node that performs the I/O task to access the first hard according to the mapping relationship Partitioned I/O devices. In one aspect, an I/O multipath manager includes: a PCI-E switch configuration module, configured to configure a peripheral component to quickly interconnect a standard PCI-E switch according to the received configuration information of the first hard partition, to isolate And other hard partitions other than the first hard partition, so that the computing node of the first hard partition accesses only the I/O device of the first hard partition; An I/O multipath configuration module, configured to establish, according to the received configuration information of the first hard partition, a mapping relationship between the computing node of the first hard partition and the I/O device of the first hard partition And causing the operating system to instruct the computing node performing the I/O task to access the I/O device of the first hard partition according to the mapping relationship. In another aspect, a system for providing multi-path access to an I/O device includes:
I/O多路径管理器, 用于根据接收到的第一硬分区的配置信息配置 PCI-E交换机, 以隔离除所述第一硬分区之外的其他硬分区, 使所述第一 硬分区的计算节点只访问所述第一硬分区的 I/O设备, 根据接收到的所述 第一硬分区的配置信息建立所述第一硬分区的计算节点和所述第一硬分 区的 I/O设备之间的映射关系, 以使操作系统根据所述映射关系指示执行 I/O任务的计算节点访问所述第一硬分区的 I/O设备;所述 I/O多路径管理 器位于固件或者操作系统中; 聚合网络, 用于连接系统内的计算节点, 以便于系统通过一个操作系 统控制计算节点; 至少两个硬分区, 其中每个硬分区中包括至少一个计算节点; An I/O multipath manager, configured to configure a PCI-E switch according to the received configuration information of the first hard partition, to isolate other hard partitions except the first hard partition, so that the first hard partition The computing node only accesses the I/O device of the first hard partition, and establishes the computing node of the first hard partition and the I/ of the first hard partition according to the received configuration information of the first hard partition. a mapping relationship between the devices, so that the operating system instructs the computing node that performs the I/O task to access the I/O device of the first hard partition according to the mapping relationship; the I/O multipath manager is located in the firmware Or an operating system; an aggregation network, configured to connect computing nodes in the system, so that the system controls the computing node through an operating system; at least two hard partitions, wherein each hard partition includes at least one computing node;
PCI-E交换机, 用于所述计算节点与 I/O设备之间建立连接, 以便于 所述计算节点通过自身与所述 PCI-E交换机之间建立的 PCI-E链路访问所 述计算节点所属硬分区的 I/O设备; a PCI-E switch, configured to establish a connection between the computing node and the I/O device, so that the computing node accesses the computing node by using a PCI-E link established between itself and the PCI-E switch I/O device belonging to a hard partition;
I/O设备, 用于所述计算节点与外部网络之间的连接; 存储设备, 用于存储固件、 操作系统、 I/O应用程序。 本发明的实施例提供的多路径访问 I/O设备的方法、 I/O多路径管理 器及系统, 能够使所有计算节点的与 PCI-E交换机之间 PCI-E链路有效, 使从节点也能够通过自身的 PCI-E链路访问 I/O设备, 消除了时延和高业 务量情况下的瓶颈, 提高了带宽的利用率。 附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对 实施例或现有技术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员 来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附 图。 图 1为现有技术 NUMA系统的组网逻辑示意图; 图 2为现有技术 NUMA系统的系统资源信息表结构示意图; 图 3为本发明实施例提供的多路径访问 I/O设备的方法流程示意图一; 图 4为本发明实施例提供的多路径访问 I/O设备的方法流程示意图二; 图 5为本发明实施例提供的多路径访问 I/O设备的方法流程示意图三; 图 6为本发明实施例提供的 I/O多路径管理器的结构示意图; 图 7为本发明实施例提供的多路径访问 I/O设备的 NUMA系统的组 网逻辑示意图; 图 8为本发明实施例提供的另一种多路径访问 I/O设备的 NUMA系 统的组网逻辑示意图; 图 9为本发明实施例提供的 NUMA系统的系统资源信息表结构示意 图; 图 10为本发明实施例提供的 SMP系统的系统资源信息表结构示意 图; 图 11为本发明实施例提供的 SMP系统的系统资源信息表结构示意 图。 具体实施方式 An I/O device, configured to connect between the computing node and an external network; a storage device, configured to store firmware, an operating system, an I/O application. The method for multi-path access I/O device, the I/O multi-path manager and the system provided by the embodiments of the present invention enable the PCI-E link between all the computing nodes and the PCI-E switch to be valid, so that the slave node It can also access I/O devices through its own PCI-E link, eliminating bottlenecks in case of delay and high traffic, and improving bandwidth utilization. BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will The drawings used in the embodiments or the description of the prior art are described in a single manner. It is obvious that the drawings in the following description are only some embodiments of the present invention, and those of ordinary skill in the art do not pay Other drawings can also be obtained from these drawings on the premise of creative labor. 1 is a schematic diagram of a networking diagram of a prior art NUMA system; FIG. 2 is a schematic structural diagram of a system resource information table of a prior art NUMA system; FIG. 3 is a schematic flowchart of a method for multipath accessing an I/O device according to an embodiment of the present invention; FIG. 4 is a schematic flowchart of a method for accessing an I/O device for multi-path access according to an embodiment of the present invention; FIG. 5 is a schematic flowchart of a method for accessing an I/O device for multi-path access according to an embodiment of the present invention; FIG. 7 is a schematic diagram of the networking of the NUMA system of the multi-path access I/O device according to the embodiment of the present invention; FIG. 8 is a schematic diagram of the networking of the NUMA system of the multi-path access I/O device according to the embodiment of the present invention; FIG. 9 is a schematic structural diagram of a system resource information table of a NUMA system according to an embodiment of the present invention; FIG. 10 is a schematic diagram of a system resource information table of a NUMA system according to an embodiment of the present invention; FIG. 11 is a schematic structural diagram of a system resource information table of an SMP system according to an embodiment of the present invention. detailed description
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进 行清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没 有做出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的 范围。 实施例一 本发明实施例提供的多路径访问 I/O设备的方法,如图 3所示, 包括: The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention. Rather than all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without departing from the inventive scope are the scope of the present invention. The method for the multi-path access I/O device provided by the embodiment of the present invention, as shown in FIG. 3, includes:
S101、 根据接收到的第一硬分区的配置信息配置 PCI-E交换机, 以隔 离除第一硬分区之外的其他硬分区, 使第一硬分区的计算节点只访问第一 硬分区的 I/O设备。 S101. Configure a PCI-E switch according to the received configuration information of the first hard partition to isolate other hard partitions except the first hard partition, so that the computing node of the first hard partition only accesses the I/ of the first hard partition. O equipment.
S 102、根据接收到的第一硬分区的配置信息在第一硬分区的计算节点 和第一硬分区的 I/O设备之间建立映射关系, 以使操作系统根据映射关系 指示执行 I/O任务的计算节点访问 I/O设备。 本发明的实施例提供的多路径访问 I/O设备的方法, 通过在计算节点 和 I/O设备之间建立映射关系, 使所有计算节点的与 PCI-E交换机之间 PCI-E链路有效, 从而使从节点也能够通过自身的 PCI-E链路访问 I/O设 备, 进而消除了时延和高业务量情况下的瓶颈, 提高了带宽的利用率。 实施例二 本发明实施例提供的多路径访问 I/O设备的方法,如图 4所示, 包括: S102. Establish a mapping relationship between the computing node of the first hard partition and the I/O device of the first hard partition according to the received configuration information of the first hard partition, so that the operating system instructs to perform I/O according to the mapping relationship. The compute node of the task accesses the I/O device. The method for multi-path accessing an I/O device provided by the embodiment of the present invention enables a PCI-E link between all computing nodes and a PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device. Therefore, the slave node can also access the I/O device through its own PCI-E link, thereby eliminating the bottleneck in the case of delay and high traffic, and improving the bandwidth utilization. Embodiment 2 A method for accessing an I/O device by using a multi-path according to an embodiment of the present invention, as shown in FIG. 4, includes:
S201、 系统复位完成, 第一硬分区的主节点开始运行固件, 利用固件 对主对节点内的处理器、 内存以及芯片组进行初始化。 S202、 主节点调用 I/O多路径管理器, 并接收分析第一硬分区的配置 信息。 具体的, 此配置信息一般来自系统管理模块, 该系统管理模块(图中 未画出 )通过运行管理软件, 从而管理整个 NUMA系统。 I/O多路径管理 器接收到这个配置信息后分析配置信息中的第一硬分区的计算节点的中 RC的数量、分析第一硬分区的 I/O设备的数量以及设备的类型、识别 PCI-E 交换机中与第一硬分区的计算节点的对应端口以及 PCI-E交换机中与第一 硬分区的 I/O设备的对应端口。 S201. The system reset is completed, and the primary node of the first hard partition starts to run firmware, and uses firmware to initialize the processor, the memory, and the chipset in the primary pair node. S202. The master node invokes an I/O multipath manager, and receives configuration information for analyzing the first hard partition. Specifically, the configuration information generally comes from a system management module (not shown) that manages the entire NUMA system by running management software. After receiving the configuration information, the I/O multipath manager analyzes the number of RCs in the first hard partitioned computing node in the configuration information, analyzes the number of I/O devices of the first hard partition, and types of devices, and identifies PCI. -E A corresponding port of the switch with the first hard partitioned compute node and a corresponding port of the PCI-E switch with the first hard partitioned I/O device.
5203、 主节点调用 I/O多路径管理器配置 PCI-E交换机, 以隔离除第 一硬分区之外的其他硬分区。 具体的, 将上述 PCI-E交换机中与第一硬分区的计算节点的对应端口 以及 PCI-E交换机中与第一硬分区的 I/O设备的对应端口, 配制成一个虚 拟交换机,以隔离除第一硬分区之外的其他硬分区的 I/O设备及 I/O访问, 使第一硬分区的计算节点只访问第一硬分区的 I/O设备。 5203. The master node invokes the I/O multipath manager to configure the PCI-E switch to isolate other hard partitions except the first hard partition. Specifically, the corresponding port of the PCI-E switch and the first hard partitioned computing node and the corresponding port of the first hard partitioned I/O device in the PCI-E switch are configured as a virtual switch to isolate The other hard partitioned I/O devices and I/O accesses other than the first hard partition enable the first hard partitioned compute node to access only the first hard partitioned I/O device.
5204、 主节点搜索 I/O设备并引导从节点开始初始化。 具体的, 主节点根据第一硬分区的配置信息中提供的第一硬分区的计 算节点的数量以及第一硬分区的 I/O设备的数量、 类型, 通过主节点内部 的 RC逐一扫描第一硬分区的 I/O设备总线, 搜索有效的 I/O设备, 搜索 到 I/O设备后分别为主节点内的 RC和搜索到的 I/O设备分配地址和内存, 并且在扫描完成后引导从节点开始初始化。 S205、 主节点调用 I/O多路径管理器在第一硬分区的计算节点和第一 硬分区的 I/O设备之间建立映射关系。 具体的, 主节点调用 I/O多路径管理器将第一硬分区的 I/O设备的地 址通过指针发送给各个从节点的 RC, 该指针指向上述的 I/O设备的地址, 这样就在第一硬分区的计算节点和 I/O设备之间建立了映射关系。 S206、 主节点调用 I/O多路径管理器形成系统资源信息表。 具体的, 主节点调用 I/O多路径管理器形成系统资源信息表并将系统 资源信息表的指针发送给操作系统, 该系统资源信息表包括上述第一硬分 区的计算节点和 I/O设备之间的映射关系、 处理器信息和内存信息以及计 算节点与 PCI-E交换机之间可用的 PCI-E链路。 S207、 接收 I/O任务, 并根据系统资源信息表为 I/O任务分配硬件资 源。 具体的, 操作系统接收 I/O任务进入 I/O任务列队, 通过上述系统资 源信息表的指针调用系统资源信息表, 而后确定执行当前 I/O任务的处理 器并分配内存, 根据 I/O任务的类型确定要访问的 I/O设备。 5204. The master node searches for an I/O device and boots the slave node to start initialization. Specifically, the master node scans the first node of the first hard partition according to the number of the first hard partitioned computing nodes and the number and type of the first hard partitioned I/O devices. A hard-partitioned I/O device bus that searches for valid I/O devices. After searching for I/O devices, each of the RCs in the master node and the searched I/O devices are assigned addresses and memory, and are booted after the scan is complete. Initialize from the node. S205. The master node invokes the I/O multipath manager to establish a mapping relationship between the compute node of the first hard partition and the I/O device of the first hard partition. Specifically, the master node invokes the I/O multipath manager to send the address of the first hard partition I/O device to the RC of each slave node through a pointer, and the pointer points to the address of the above I/O device, so that A mapping relationship is established between the compute node of the first hard partition and the I/O device. S206. The master node invokes an I/O multipath manager to form a system resource information table. Specifically, the master node invokes the I/O multipath manager to form a system resource information table and sends a pointer of the system resource information table to the operating system, where the system resource information table includes the first hard partition computing node and the I/O device. The mapping relationship between the processor information and the memory information and the PCI-E link available between the compute node and the PCI-E switch. S207. Receive an I/O task, and allocate hardware resources for the I/O task according to the system resource information table. Specifically, the operating system receives the I/O task into the I/O task queue, invokes the system resource information table through the pointer of the system resource information table, and then determines the processor that executes the current I/O task and allocates the memory according to the I/O. The type of task determines which I/O device to access.
S208、 操作系统根据系统资源信息表指示执行 I/O任务的处理器通过 最短路径来访问 I/O设备。 具体的, 操作系统根据系统资源信息表中的第一硬分区的计算节点和 I/O设备之间的映射关系,以及计算节点与 PCI-E交换机之间可用的 PCI-E 链路, 选择一条执行当前 I/O任务最短路径。 一般情况下, 此路径为执行 当前 I/O任务的处理器自身所在计算节点与 PCI-E交换机之间的 PCI-E链 路。 当主节点因为故障或者资源重新分配等原因需要退出硬分区时, 如图S208. The operating system instructs the processor that performs the I/O task to access the I/O device through the shortest path according to the system resource information table. Specifically, the operating system selects one according to a mapping relationship between the first hard partitioned computing node and the I/O device in the system resource information table, and a PCI-E link available between the computing node and the PCI-E switch. Perform the shortest path of the current I/O task. In general, this path is the PCI-E link between the compute node and the PCI-E switch where the processor itself performs the current I/O task. When the primary node needs to exit the hard partition due to a failure or resource reallocation, etc., as shown in the figure
5所示, 还包括: 5, also includes:
S301、 系统管理模块接收主节点发出退出请求指令。 通常情况下, 当第一硬分区的主节点因为故障或者资源重新分配等原 因需要退出第一硬分区时, 主节点会通过系统管理模块向系统的管理软件 发送退出请求指令。 S301. The system management module receives the master node to issue an exit request instruction. Normally, when the primary node of the first hard partition needs to exit the first hard partition due to a failure or resource reallocation, the primary node sends an exit request command to the system management software through the system management module.
5302、 系统管理模块接收主节点发出退出响应指令, 指示主节点退出 第一硬分区。 5302. The system management module receives an exit response command from the master node, instructing the master node to exit the first hard partition.
5303、 系统管理模块接收退出请求支指令后, 向从节点中的一个发送 升级为新的主节点的指令。 系统的管理模块在接收到退出请求指令后, 根据系统的策略在从节点 中选取一个, 向其发送指令升级为新的主节点。 5303. After receiving the exit request instruction, the system management module sends an instruction to upgrade to the new primary node to one of the slave nodes. After receiving the exit request instruction, the management module of the system selects one of the slave nodes according to the policy of the system, and sends an instruction to the new master node.
5304、 新的主节点从原主节点接收硬件资源信息和 I/O任务。 其中, 硬件资源信息包括执行 I/O任务的处理器信息、 执行 I/O任务需要访问的 I/O设备的信息、 内存信息、 执行 I/O任务需要经过的 PCI-E链路信息。 5305、 原主节点退出第一硬分区, 等待维修或者重新分配。 5304. The new master node receives hardware resource information and I/O tasks from the original master node. The hardware resource information includes processor information for performing an I/O task, information of an I/O device that needs to be accessed to perform an I/O task, memory information, and PCI-E link information that is required to perform an I/O task. 5305. The original primary node exits the first hard partition and waits for maintenance or reallocation.
5306、 新的主节点更新系统资源信息表。 5306. The new master node updates the system resource information table.
5307、 新的主节点启用 I/O多路径管理器根据更新后的系统资源信息 表配置 PCI-E交换机, 隔离出第一硬分区之外的其他硬分区, 配制方法与 图 4中的步骤 203完全相同, 不再赘述。 本实施例中的系统可以是 NUMA系统也可以是 SMP系统, 计算节点 可以是服务器。 本发明的实施例提供的多路径访问 I/O设备的方法, 通过在计算节点 和 I/O设备之间建立映射关系, 使所有计算节点的与 PCI-E交换机之间 PCI-E链路有效, 从而使从节点也能够通过自身的 PCI-E链路访问 I/O设 备, 进而消除了时延和高业务量情况下的瓶颈, 提高了带宽的利用率。 实施例三 本发明实施例提供的 I/O多路径管理器 10, 如图 6所示, 包括: 调用函数接口 101 , 用于第一硬分区的计算节点中的主节点通过操作 系统或者固件调用所述 I/O多路径管理器。 硬分区资源分析模块 102, 用于接收第一硬分区的配置信息, 分析配 置信息中的第一硬分区的计算节点的中 RC的数量和地址、 分析第一硬分 区的 I/O设备的数量以及设备的类型、 识别 PCI-E交换机中与第一硬分区 的计算节点的对应端口以及 PCI-E交换机中与第一硬分区的 I/O设备的对 应端口。 5307. The new primary node enables the I/O multipath manager to configure the PCI-E switch according to the updated system resource information table, and isolates other hard partitions other than the first hard partition, and the preparation method is the same as step 203 in FIG. It is exactly the same and will not be described again. The system in this embodiment may be a NUMA system or an SMP system, and the computing node may be a server. The method for multi-path accessing an I/O device provided by the embodiment of the present invention enables a PCI-E link between all computing nodes and a PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device. Therefore, the slave node can also access the I/O device through its own PCI-E link, thereby eliminating the bottleneck in the case of delay and high traffic, and improving the bandwidth utilization. Embodiment 3 The I/O multipath manager 10 provided by the embodiment of the present invention, as shown in FIG. 6, includes: a call function interface 101, where a master node in a compute node of a first hard partition is invoked through an operating system or firmware The I/O multipath manager. The hard partition resource analysis module 102 is configured to receive configuration information of the first hard partition, analyze the number and address of the RC in the first hard partition of the configuration information, and analyze the number of I/O devices in the first hard partition. And a type of the device, identifying a corresponding port of the PCI-E switch with the compute node of the first hard partition, and a corresponding port of the PCI-E switch with the I/O device of the first hard partition.
PCI-E交换机配置模块 103 , 用于根据接收到的第一硬分区的配置信 息配置 PCI-E交换机, 以隔离除第一硬分区之外的其他硬分区, 使第一硬 分区的计算节点只访问第一硬分区的 I/O设备。 The PCI-E switch configuration module 103 is configured to configure a PCI-E switch according to the received configuration information of the first hard partition to isolate other hard partitions except the first hard partition, so that the first hard partition compute node only Access the I/O device of the first hard partition.
I/O多路径配置模块 104,用于根据接收到的第一硬分区的配置信息建 立第一硬分区的计算节点和第一硬分区的 I/O设备之间的映射关系, 以使 操作系统根据映射关系指示执行 I/O任务的计算节点访问第一硬分区的 I/O设备。 本发明的实施例提供的 I/O多路径管理器, 通过在计算节点和 I/O设 备之间建立映射关系, 使所有计算节点的与 PCI-E交换机之间 PCI-E链路 有效, 从而使从节点也能够通过自身的 PCI-E链路访问 I/O设备, 进而消 除了时延和高业务量情况下的瓶颈, 提高了带宽的利用率。 实施例四 本发明实施提供的多路径访问 I/O设备的系统, 以 NUMA系统为例 进行说明, 如图 7所示, 包括: 如图 6所示的 I/O多路径管理器 10, 位于固件 51中, 用于根据接收 到的第一硬分区 2的配置信息配置 PCI-E交换机 3 , 以隔离除第一硬分区 2之外的其他硬分区, 使计算节点 21~22只访问第一硬分区 2的 I/O设备 4, 并且根据接收到的第一硬分区 2的配置信息在计算节点 21~23和 I/O 设备 4之间建立映射关系, 这样计算节点 21~23与 PCI-E交换机 3之间 PCI-E链路 01~03都变成有效的, 如图中 7所示 PCI-E链路 01~03都为实 线。 而后, I/O多路径管理器 10将计算节点 21~23和 I/O设备 4的映射关 系、 处理器信息、 内存信息以及计算节点 21~23与 PCI-E交换机 3之间可 用的 PCI-E链路 01~03关联起来形成系统资源信息表, 如图 9所示, 操作 系统 52根据此系统资源信息表根据指示计算机点 21~23通过最短路径访 问 I/O设备, 一般情况下此最短路径为执行 I/O任务的计算节点自身与 PCI-E交换机的链路。 The I/O multipath configuration module 104 is configured to establish a mapping relationship between the first hard partitioned computing node and the first hard partitioned I/O device according to the received configuration information of the first hard partition, so that The operating system instructs the computing node performing the I/O task to access the I/O device of the first hard partition according to the mapping relationship. The I/O multipath manager provided by the embodiment of the present invention enables the PCI-E link between all the computing nodes and the PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device, thereby The slave node can also access the I/O device through its own PCI-E link, thereby eliminating the bottleneck in the case of delay and high traffic, and improving the bandwidth utilization. The fourth embodiment of the present invention provides a multi-path access I/O device. The NUMA system is taken as an example. As shown in FIG. 7, the method includes: an I/O multipath manager 10 as shown in FIG. The firmware 51 is configured to configure the PCI-E switch 3 according to the received configuration information of the first hard partition 2 to isolate other hard partitions except the first hard partition 2, so that the computing nodes 21-22 access only the first The I/O device 4 of the hard partition 2, and establishing a mapping relationship between the computing nodes 21~23 and the I/O device 4 according to the received configuration information of the first hard partition 2, so that the computing nodes 21~23 and the PCI- The PCI-E links 01~03 between the E switches 3 become valid. As shown in Figure 7, the PCI-E links 01~03 are all solid lines. Then, the I/O multipath manager 10 compares the mapping relationship between the compute nodes 21-23 and the I/O device 4, the processor information, the memory information, and the available PCI between the compute nodes 21-23 and the PCI-E switch 3. The E links 01~03 are associated to form a system resource information table. As shown in FIG. 9, the operating system 52 accesses the I/O device according to the shortest path according to the indication computer points 21~23 according to the system resource information table, which is generally the shortest. The path is the link between the compute node itself and the PCI-E switch that performs the I/O task.
NUMA聚合网络 1 , 用于将所有计算节点通过 NC聚合连接, 并通过 一个操作系统控制所有的计算节点。 至少一个硬分区 2(图 7中只画出第一硬分区 2,其他硬分区未画出 ) , 包括一个主节点 21和两个从节点 22、 23 , 当然, 还可以增加更多的从节 点。 其中, 每个计算节点中包括: 一个节点控制器 NC, 用于计算节点与 NUMA聚合网络的连接; 两个 CPU, 用于执行 I/O任务; 一个 RC, 用于 I/O设备的扫描以及计算节点与 PCI-E的对应端口的连接。在本实施例中, RC位于 IOH ( Input-Output Hub, 输入输出集线器) 中, RC还可以位于 CPU或者 MUX ( Multiplexer, 多路复用器) 中。 上述计算节点可以是服 务器。 NUMA aggregation network 1 is used to connect all compute nodes through NC aggregation and control all compute nodes through an operating system. At least one hard partition 2 (only the first hard partition 2 is drawn in FIG. 7, other hard partitions are not shown), including one master node 21 and two slave nodes 22, 23, of course, more slave nodes can be added. . Wherein, each computing node includes: a node controller NC, used for computing nodes and NUMA converged network connection; two CPUs for performing I/O tasks; one RC for I/O device scanning and compute node connection to PCI-E corresponding port. In this embodiment, the RC is located in an IOH (Input-Output Hub), and the RC can also be located in a CPU or a MUX (Multiplexer). The above computing node can be a server.
PCI-E交换机 3 , 用于将第一硬分区 2的计算节点 21~23与第一硬分 区的 I/O设备 4之间建立链路,如图 7所示,第一硬分区 2的计算节点 21~23 与 PCI-E交换机链路都为实线, 计算节点 21~23可以分别通过自身链路 01-03直接访问第一硬分区 2的 I/O设备 4。 当然, PCI-E交换机还连接着 其他硬分区, 在图中未画出。 The PCI-E switch 3 is configured to establish a link between the computing nodes 21~23 of the first hard partition 2 and the I/O device 4 of the first hard partition, as shown in FIG. 7, the calculation of the first hard partition 2. The nodes 21~23 and the PCI-E switch link are both solid lines, and the compute nodes 21~23 can directly access the I/O device 4 of the first hard partition 2 through the own link 01-03. Of course, the PCI-E switch is also connected to other hard partitions, not shown in the figure.
I/O设备 4, 包括光纤通道( Fibre Channel , FC ) 网卡 41和以太网卡, 用于各个计算节点与外部网络之间的连接。 存储设备 5 , 用于存储固件 51、 操作系统 52, 固件 51包括 I/O多路 径管理器 511。 本实施提供的另一种多路径访问 I/O设备的系统, 以 NUMA系统为 例, 如图 8所示, 存储设备 5 , 用于存储固件 51、 操作系统 52, I /O多路 径管理器 10位于操作系统 52中, 其余部分与图 4所示系统完全一样, 不 再赘述, 此 NUMA系统的系统资源信息表如图 9所示。 本发明的实施例提供的多路径访问 I/O设备的系统, 通过在计算节点 和 I/O设备之间建立映射关系, 使所有计算节点的与 PCI-E交换机之间 PCI-E链路有效, 从而使从节点也能够通过自身的 PCI-E链路访问 I/O设 备, 进而消除了时延和高业务量情况下的瓶颈, 提高了带宽的利用率。 实施例五 本发明实施提供的多路径访问 I/O设备的系统, 以 SMP ( Symmetric Multiple Process , 对称多处理) 系统为例进行说明, 如图 10所示, 包括: 如图 6所示的 I/O多路径管理器 10, 位于固件 51中, 用于根据接收 到的第二硬分区 2a的配置信息配置 PCI-E交换机 3 ,以隔离除第二硬分区 之外的其他硬分区,使计算节点 2al~2a2只访问第二硬分区 2a的 I/O设备 4,并且根据接收到的第二硬分区 2a的配置信息在计算节点 2al~2a3和 I/O 设备 4之间建立映射关系, 这样计算节点 2al~2a3与 PCI-E交换机 3之间 PCI-E链路 01~03都变成有效的,如图中 10所示 PCI-E链路 01~03都为实 线。 而后, I/O多路径管理器 10将计算节点 2al~2a3和 I/O设备 4的映射 关系、 处理器信息、 内存信息以及计算节点 2al~2a3与 PCI-E交换机 3之 间可用的 PCI-E链路 01~03关联起来形成系统资源信息表, 如图 11所示, 操作系统 52根据此系统资源信息表根据指示计算机点 2al~2a3通过最短 路径访问 I/O设备, 一般情况下此最短路径为执行 I/O任务的计算节点自 身与 PCI-E交换机的链路。 当然, I/O多路径管理器 10也可以位于操作系 统 52中 (图 10中未画出) 。 The I/O device 4 includes a Fibre Channel (FC) network card 41 and an Ethernet card for connecting the computing nodes to the external network. The storage device 5 is for storing the firmware 51, the operating system 52, and the firmware 51 includes an I/O multipath manager 511. Another multi-path access I/O device system provided by this embodiment takes a NUMA system as an example. As shown in FIG. 8, the storage device 5 is configured to store firmware 51, an operating system 52, and an I/O multipath manager. 10 is located in the operating system 52, and the rest is exactly the same as the system shown in FIG. 4, and will not be described again. The system resource information table of the NUMA system is shown in FIG. 9. The system of the multi-path access I/O device provided by the embodiment of the present invention enables the PCI-E link between all the computing nodes and the PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device. Therefore, the slave node can also access the I/O device through its own PCI-E link, thereby eliminating the bottleneck in the case of delay and high traffic, and improving the bandwidth utilization. The fifth embodiment of the present invention provides a multi-path access I/O device. The SMP (Symmetric Multi-Process) system is used as an example. As shown in FIG. 10, the method includes: /O Multipath Manager 10, located in firmware 51, for receiving The configuration information of the second hard partition 2a is configured to configure the PCI-E switch 3 to isolate other hard partitions except the second hard partition, so that the computing node 2al~2a2 only accesses the I/O device 4 of the second hard partition 2a. And establishing a mapping relationship between the computing nodes 2al~2a3 and the I/O device 4 according to the received configuration information of the second hard partition 2a, so that the PCI-E chain between the nodes 2al~2a3 and the PCI-E switch 3 is calculated. The roads 01~03 are all valid. As shown in Figure 10, the PCI-E links 01~03 are all solid lines. Then, the I/O multipath manager 10 compares the mapping relationship between the compute nodes 2al~2a3 and the I/O device 4, the processor information, the memory information, and the available PCI between the compute nodes 2al~2a3 and the PCI-E switch 3. The E links 01~03 are associated to form a system resource information table. As shown in FIG. 11, the operating system 52 accesses the I/O device according to the shortest path according to the indication computer point 2al~2a3 according to the system resource information table, which is generally the shortest. The path is the link between the compute node itself and the PCI-E switch that performs the I/O task. Of course, the I/O multipath manager 10 can also be located in the operating system 52 (not shown in FIG. 10).
SMP聚合网络 1 ,用于将所有计算节点的 CPU直接互联,不需要 NC, 并通过一个操作系统控制所有的计算节点。 至少两个硬分区(图 10中只画出第二硬分区 2a,其他硬分区未画出), 包括一个主节点 2al和两个从节点 2a2、 2a3 , 当然, 还可以增加更多的从 节点。 每个计算节点中包括: 两个 CPU, 用于执行 I/O任务以及各个节点之 间的直接互联; 一个 RC, 用于计算节点与 PCI-E的对应端口的连接。 在 本实施例中, RC位于 IOH中, RC还可以位于 CPU或者 MUX中。 上述 计算节点可以是服务器。 本实施例所提供的系统的其余部分与图 7所示系统完全一样, 不再赘 述, 本系统的系统资源表如图 11所示。 本发明的实施例提供的多路径访问 I/O设备的系统, 通过在计算节点 和 I/O设备之间建立映射关系, 使所有计算节点的与 PCI-E交换机之间 PCI-E链路有效, 从而使从节点也能够通过自身的 PCI-E链路访问 I/O设 备, 进而消除了时延和高业务量情况下的瓶颈, 提高了带宽的利用率。 以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局 限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可 轻易想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明 的保护范围应以所述权利要求的保护范围为准。 The SMP aggregation network 1 is used to directly interconnect the CPUs of all computing nodes, does not require an NC, and controls all computing nodes through an operating system. At least two hard partitions (only the second hard partition 2a is shown in FIG. 10, other hard partitions are not shown), including one master node 2al and two slave nodes 2a2, 2a3, and of course, more slave nodes can be added. . Each compute node includes: two CPUs for performing I/O tasks and direct interconnection between the nodes; and an RC for calculating the connection of the node to the corresponding port of the PCI-E. In this embodiment, the RC is located in the IOH, and the RC may also be located in the CPU or the MUX. The above computing node can be a server. The rest of the system provided by this embodiment is completely the same as the system shown in FIG. 7, and will not be described again. The system resource table of the system is shown in FIG. The system of the multi-path access I/O device provided by the embodiment of the present invention enables the PCI-E link between all the computing nodes and the PCI-E switch to be valid by establishing a mapping relationship between the computing node and the I/O device. Therefore, the slave node can also access the I/O device through its own PCI-E link, thereby eliminating the bottleneck in the case of delay and high traffic, and improving the bandwidth utilization. The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

权 利 要 求 书 Claim
1、 一种多路径访问输入输出 I/O设备的方法, 其特征在于, 包括: 根据接收到的第一硬分区的配置信息配置外设组件快速互连标准 A method for multi-path access to an input/output I/O device, comprising: configuring a peripheral component quick interconnect standard according to the received configuration information of the first hard partition
PCI-E交换机, 以隔离除所述第一硬分区之外的其他硬分区, 使所述第一 硬分区的计算节点只访问所述第一硬分区的 I/O设备; 根据接收到的所述第一硬分区的配置信息建立所述第一硬分区的计 算节点和所述第一硬分区的 I/O设备之间的映射关系, 以使操作系统根据 所述映射关系指示执行 I/O任务的计算节点访问所述第一硬分区的 I/O设 备。 a PCI-E switch, to isolate other hard partitions except the first hard partition, so that the computing node of the first hard partition only accesses the I/O device of the first hard partition; according to the received The configuration information of the first hard partition establishes a mapping relationship between the computing node of the first hard partition and the I/O device of the first hard partition, so that the operating system instructs to perform I/O according to the mapping relationship. The compute node of the task accesses the I/O device of the first hard partition.
2、 根据权利要求 1所述的方法, 其特征在于, 在根据接收到的第一 硬分区的配置信息配置外设组件快速互连标准 PCI-E交换机, 以隔离除所 述第一硬分区之外硬分区, 使所述第一硬分区的计算节点只访问所述第一 硬分区的 I/O设备之前还包括: 所述第一硬分区的计算节点中的主节点对处理器、 内存、 芯片组进行 初始化; 所述主节点接收所述第一硬分区的配置信息; 所述主节点引导从节点进行初始化。 2. The method according to claim 1, wherein the peripheral component is quickly interconnected with the standard PCI-E switch according to the received configuration information of the first hard partition to isolate the first hard partition. The external hard partition, before the computing node of the first hard partition accesses only the I/O device of the first hard partition, further includes: a master node in the first hard partition computing node, a processor, a memory, The chipset performs initialization; the master node receives configuration information of the first hard partition; and the master node boots the slave node for initialization.
3、 根据权利要求 1或 2所述方法, 其特征在于, 所述第一硬分区的 配置信息包括: 所述第一硬分的各个计算节点的数量、 标识; 所述第一硬分区的 I/O设备的数量、 类型; 所述 PCI-E交换机中与所述第一硬分区的各个计算节点的对应端口; 所述 PCI-E交换机中与所述第一硬分区的 I/O设备的对应端口。 The method according to claim 1 or 2, wherein the configuration information of the first hard partition includes: a quantity and an identifier of each computing node of the first hard partition; The number and type of the /O devices; the corresponding ports of the PCI-E switches and the respective computing nodes of the first hard partition; and the I/O devices of the PCI-E switch and the first hard partition Corresponding port.
4、 根据权利要求 1或 2所述方法, 其特征在于, 根据接收到的第一 硬分区的配置信息配置外设组件快速互连标准 PCI-E交换机, 以隔离除所 述第一硬分区之外硬分区, 使所述第一硬分区的计算节点只访问所述第一 硬分区的 I/O设备包括: 将所述 PCI-E交换机中与所述第一硬分区的各个计算节点的对应端口 和所述 PCI-E交换机中与所述第一硬分区的 I/O设备的对应端口, 配置成 一个虚拟交换机, 用于隔离除所述第一硬分区之外的其他的硬分区的 I/O 设备和 I/O访问, 使所述第一硬分区的计算节点只访问所述第一硬分区的 I/O设备。 The method according to claim 1 or 2, wherein the peripheral component is quickly interconnected with the standard PCI-E switch according to the received configuration information of the first hard partition to isolate the first hard partition An external hard partition, the computing node of the first hard partition accessing only the I/O device of the first hard partition includes: corresponding to each computing node of the first hard partition in the PCI-E switch a port and a corresponding port of the PCI-E switch and the I/O device of the first hard partition, configured as a virtual switch, for isolating other hard partitions except the first hard partition /O device and I/O access, such that the computing node of the first hard partition accesses only the I/O device of the first hard partition.
5、 根据权利要求 1或 2所述方法, 其特征在于, 所述根据接收到的 所述第一硬分区的配置信息建立所述第一硬分区的计算节点和所述第一 硬分区的 I/O设备之间的映射关系, 以使操作系统根据所述映射关系指示 执行 I/O任务的计算节点访问所述第一硬分区的 I/O设备包括: 所述主节点根据所述第一硬分区的配置信息中提供的所述第一硬分 区的计算节点的数量、标识以及所述第一硬分区的 I/O设备的数量、类型, 通过所述主节点内部的根组件逐一扫描所述第一硬分区的 I/O设备总线, 搜索有效的 I/O设备, 搜索到 I/O设备后分别为所述主节点内的跟组件和 搜索到的 I/O设备分配地址和内存; 将所述 I/O设备已分配好的地址发送给从节点的根组件, 从而在所述 第一硬分区的计算节点和所述第一硬分区的 I/O设备之间建立映射关系; 形成系统资源信息表, 包括所述第一硬分区的计算节点和所述第一硬 分区的 I/O设备之间的映射关系、 处理器信息和内存信息以及所述第一硬 分区的计算节点与所述 PCI-E交换机之间可用的 PCI-E链路的信息; 所述操作系统接收所述 I/O任务后, 根据所述系统资源信息表为所述 I/O任务分配硬件资源, 所述硬件资源包括执行所述 I/O任务的处理器、 执行所述 I/O任务需要访问的 I/O设备和内存信息; 所述操作系统根据所述系统资源信息表中所述映射关系指示执行所 述 I/O任务的处理器在所述可用的 PCI-E链路中选取一条最短的链路来访 问所述需要访问的 I/O设备。 The method according to claim 1 or 2, wherein the establishing the first hard partition computing node and the first hard partition I according to the received configuration information of the first hard partition a mapping relationship between the /O devices, so that the operating system instructs the computing node that performs the I/O task to access the I/O device of the first hard partition according to the mapping relationship: the primary node is according to the first The number, the identifier, and the number and type of the I/O devices of the first hard partition, which are provided in the configuration information of the hard partition, are scanned one by one by the root component of the primary node. The first hard partitioned I/O device bus searches for a valid I/O device, and after searching for the I/O device, respectively allocates an address and a memory for the follower component and the searched I/O device in the master node; Transmitting the allocated address of the I/O device to the root component of the slave node, thereby establishing a mapping relationship between the compute node of the first hard partition and the I/O device of the first hard partition; System resource information table, including the first a mapping relationship between the partitioned compute node and the first hard partitioned I/O device, processor information and memory information, and available PCI between the compute node of the first hard partition and the PCI-E switch Information of the -E link; after receiving the I/O task, the operating system allocates hardware resources for the I/O task according to the system resource information table, where the hardware resources include performing the I/O task The processor, the I/O device and the memory information that are required to be accessed by the I/O task; the operating system indicates the execution according to the mapping relationship in the system resource information table. The processor of the I/O task selects one of the available PCI-E links to access the I/O device that needs to be accessed.
6、 一种 I/O多路径管理器, 其特征在于, 包括: 6. An I/O multipath manager, comprising:
PCI-E交换机配置模块, 用于根据接收到的第一硬分区的配置信息配 置外设组件快速互连标准 PCI-E交换机, 以隔离除所述第一硬分区之外的 其他硬分区, 使所述第一硬分区的计算节点只访问所述第一硬分区的 I/O 设备; a PCI-E switch configuration module, configured to configure a peripheral component to quickly interconnect a standard PCI-E switch according to the received configuration information of the first hard partition, to isolate other hard partitions except the first hard partition, so that The computing node of the first hard partition accesses only the I/O device of the first hard partition;
I/O多路径配置模块, 用于根据接收到的所述第一硬分区的配置信息 建立所述第一硬分区的计算节点和所述第一硬分区的 I/O设备之间的映射 关系, 以使操作系统根据所述映射关系指示执行 I/O任务的计算节点访问 所述第一硬分区的 I/O设备。 An I/O multipath configuration module, configured to establish, according to the received configuration information of the first hard partition, a mapping relationship between the computing node of the first hard partition and the I/O device of the first hard partition And causing the operating system to instruct the computing node performing the I/O task to access the I/O device of the first hard partition according to the mapping relationship.
7、 根据权利要求 6所述的 I/O多路径管理器, 其特征在于, 还包括: 硬分区资源分析模块, 用于接收所述第一硬分区的配置信息, 分析所 述配置信息中的第一硬分区的计算节点的中根组件的数量和标识、 分析所 述第一硬分区的 I/O设备的数量以及设备的类型、 识别所述 PCI-E交换机 中与所述第一硬分区的计算节点的对应端口以及所述 PCI-E交换机中与所 述第一硬分区的 I/O设备的对应端口; 调用函数接口, 用于所述第一硬分区的计算节点中的主节点启用所述 I/O多路径管理器。 The I/O multipath management device according to claim 6, further comprising: a hard partition resource analysis module, configured to receive configuration information of the first hard partition, and analyze the configuration information. The number and identification of the middle root components of the first hard partitioned compute node, the number of I/O devices analyzing the first hard partition, and the type of the device, identifying the first hard partition in the PCI-E switch Corresponding port of the computing node and a corresponding port of the PCI-E switch and the I/O device of the first hard partition; calling a function interface, a primary node enabling node in the computing node of the first hard partition I/O multipath manager.
8、 一种多路径访问 I/O设备的系统, 其特征在于, 包括: 如权利要求 6至 7任意一项权利要求所述的 I/O多路径管理器, 用于 根据接收到的第一硬分区的配置信息配置 PCI-E交换机, 以隔离除所述第 一硬分区之外的其他硬分区, 使所述第一硬分区的计算节点只访问所述第 一硬分区的 I/O设备, 根据接收到的所述第一硬分区的配置信息建立所述 第一硬分区的计算节点和所述第一硬分区的 I/O设备之间的映射关系, 以 使操作系统根据所述映射关系指示执行 I/O任务的计算节点访问所述第一 硬分区的 I/O设备; 所述 I/O多路径管理器位于固件或者操作系统中; 聚合网络, 用于连接系统内的计算节点, 以便于系统通过一个操作系 统控制计算节点; 至少两个硬分区, 其中每个硬分区中包括至少一个计算节点; A system for multi-path accessing an I/O device, comprising: the I/O multipath manager according to any one of claims 6 to 7 for receiving the first The configuration information of the hard partition configures the PCI-E switch to isolate other hard partitions except the first hard partition, so that the computing node of the first hard partition only accesses the I/O device of the first hard partition Establishing a mapping relationship between the computing node of the first hard partition and the I/O device of the first hard partition according to the received configuration information of the first hard partition, so that the operating system according to the mapping The relationship indicates that the computing node performing the I/O task accesses the first Hard partitioned I/O device; the I/O multipath manager is located in firmware or an operating system; an aggregate network is used to connect computing nodes in the system, so that the system controls the computing node through an operating system; at least two a hard partition, wherein each hard partition includes at least one compute node;
PCI-E交换机, 用于所述计算节点与 I/O设备之间建立连接, 以便于 所述计算节点通过自身与所述 PCI-E交换机之间建立的 PCI-E链路访问所 述计算节点所属硬分区的 I/O设备; a PCI-E switch, configured to establish a connection between the computing node and the I/O device, so that the computing node accesses the computing node by using a PCI-E link established between itself and the PCI-E switch I/O device belonging to a hard partition;
I/O设备, 用于所述计算节点与外部网络之间的连接; 存储设备, 用于存储固件、 操作系统、 I/O应用程序。 An I/O device, configured to connect between the computing node and an external network; a storage device, configured to store firmware, an operating system, an I/O application.
9、 根据权利要求 8所述的系统, 其特征在于, 所述聚合网络包括: 非一致性内存访问系统聚合网络、 对称多处理系统聚合网络。 The system according to claim 8, wherein the aggregation network comprises: a non-uniform memory access system aggregation network, and a symmetric multi-processing system aggregation network.
10、 根据权利要求 8所述的系统, 其特征在于, 所述计算节点包括: 两个中央处理器, 用于所述计算节点执行 I/O任务; 至少一个根组件, 用于所述计算节点与 PCI-E的连接。 10. The system according to claim 8, wherein the computing node comprises: two central processors for the computing node to perform an I/O task; at least one root component, for the computing node Connection to PCI-E.
11、 根据权利要求 10所述的系统, 其特征在于, 当所述所述聚合网 络为非一致性内存访问系统聚合网络时, 所述计算节点还包括: 节点控制器, 用于所述计算节点与非一致性内存访问系统聚合网络的 连接和对计算节点的控制。 The system according to claim 10, wherein, when the aggregation network is a non-uniform memory access system aggregation network, the computing node further includes: a node controller, configured to be used by the computing node Connection to the non-uniform memory access system aggregation network and control of the compute nodes.
PCT/CN2012/079307 2011-12-13 2012-07-28 Method for accessing multi-path input/output (i/o) equipment, i/o multi-path manager and system WO2013086861A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110415345.X 2011-12-13
CN201110415345.XA CN102497432B (en) 2011-12-13 2011-12-13 Multi-path accessing method for input/output (I/O) equipment, I/O multi-path manager and system

Publications (1)

Publication Number Publication Date
WO2013086861A1 true WO2013086861A1 (en) 2013-06-20

Family

ID=46189217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/079307 WO2013086861A1 (en) 2011-12-13 2012-07-28 Method for accessing multi-path input/output (i/o) equipment, i/o multi-path manager and system

Country Status (2)

Country Link
CN (1) CN102497432B (en)
WO (1) WO2013086861A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110430601A (en) * 2019-08-09 2019-11-08 西安科技大学 A kind of PCI Express link rate management system and management method

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497432B (en) * 2011-12-13 2014-06-25 华为技术有限公司 Multi-path accessing method for input/output (I/O) equipment, I/O multi-path manager and system
CN103312720B (en) * 2013-07-01 2016-05-25 华为技术有限公司 A kind of data transmission method, equipment and system
CN108632100B (en) * 2015-08-24 2020-11-17 上海天旦网络科技发展有限公司 Method and system for discovering and presenting network application access information
CN105959176B (en) * 2016-04-25 2019-05-28 浪潮(北京)电子信息产业有限公司 Consistency protocol test method and system based on Gem5 simulator
US11249808B2 (en) * 2017-08-22 2022-02-15 Intel Corporation Connecting accelerator resources using a switch
US10585833B1 (en) * 2019-01-28 2020-03-10 Quanta Computer Inc. Flexible PCIe topology
CN115811446A (en) * 2021-09-14 2023-03-17 华为技术有限公司 Bus system, communication method and related equipment
CN114168499A (en) * 2021-11-10 2022-03-11 上海安路信息科技股份有限公司 Access control method, device and system for PCIe (peripheral component interface express) equipment
CN117492967B (en) * 2023-12-28 2024-03-19 苏州元脑智能科技有限公司 Method, device, equipment and medium for managing storage system resources

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007021513A2 (en) * 2005-08-09 2007-02-22 Intel Corporation Exclusive access for secure audio progam
CN101425046A (en) * 2008-10-28 2009-05-06 北京航空航天大学 Method for implementing distributed I/O resource virtualizing technique
CN101901207A (en) * 2010-07-23 2010-12-01 中国科学院计算技术研究所 Operating system of heterogeneous shared storage multiprocessor system and working method thereof
CN102497432A (en) * 2011-12-13 2012-06-13 华为技术有限公司 Multi-path accessing method for input/output (I/O) equipment, I/O multi-path manager and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7395367B2 (en) * 2005-10-27 2008-07-01 International Business Machines Corporation Method using a master node to control I/O fabric configuration in a multi-host environment
US7631050B2 (en) * 2005-10-27 2009-12-08 International Business Machines Corporation Method for confirming identity of a master node selected to control I/O fabric configuration in a multi-host environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007021513A2 (en) * 2005-08-09 2007-02-22 Intel Corporation Exclusive access for secure audio progam
CN101425046A (en) * 2008-10-28 2009-05-06 北京航空航天大学 Method for implementing distributed I/O resource virtualizing technique
CN101901207A (en) * 2010-07-23 2010-12-01 中国科学院计算技术研究所 Operating system of heterogeneous shared storage multiprocessor system and working method thereof
CN102497432A (en) * 2011-12-13 2012-06-13 华为技术有限公司 Multi-path accessing method for input/output (I/O) equipment, I/O multi-path manager and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110430601A (en) * 2019-08-09 2019-11-08 西安科技大学 A kind of PCI Express link rate management system and management method
CN110430601B (en) * 2019-08-09 2023-05-09 西安科技大学 PCI Express link rate management system and management method

Also Published As

Publication number Publication date
CN102497432B (en) 2014-06-25
CN102497432A (en) 2012-06-13

Similar Documents

Publication Publication Date Title
WO2013086861A1 (en) Method for accessing multi-path input/output (i/o) equipment, i/o multi-path manager and system
EP3252608B1 (en) Node system, server device, scaling control method, and program
EP2835953B1 (en) System for live migration of virtual machine
US7870301B2 (en) System and method for modification of virtual adapter resources in a logically partitioned data processing system
US9558041B2 (en) Transparent non-uniform memory access (NUMA) awareness
US7480742B2 (en) Method for virtual adapter destruction on a physical adapter that supports virtual adapters
EP3206339B1 (en) Network card configuration method and resource management center
US9154451B2 (en) Systems and methods for sharing devices in a virtualization environment
US11392417B2 (en) Ultraconverged systems having multiple availability zones
US20100287262A1 (en) Method and system for guaranteed end-to-end data flows in a local networking domain
WO2018086013A1 (en) Packet processing method in cloud computing system, host, and system
US20110032944A1 (en) Method and System for Switching in a Virtualized Platform
US20060195848A1 (en) System and method of virtual resource modification on a physical adapter that supports virtual resources
EP3332325B1 (en) Methods, apparatus, and systems for providing access to serial ports of virtual machines in self-deployed virtual applications
EP2724244A2 (en) Native cloud computing via network segmentation
US11005968B2 (en) Fabric support for quality of service
WO2016107023A1 (en) Cloud server system
US20200019525A1 (en) Memory access optimization for an i/o adapter in a processor complex
JP2009075718A (en) Method of managing virtual i/o path, information processing system, and program
WO2014201623A1 (en) Method, apparatus and system for data transmission, and physical network card
US20160203030A1 (en) Load calculation method, load calculation program, and load calculation apparatus
CN114860387B (en) I/O virtualization method of HBA controller for virtualization storage application
CN108351802B (en) Computer data processing system and method for communication traffic based optimization of virtual machine communication
CN115827148A (en) Resource management method and device, electronic equipment and storage medium
CN116010307A (en) Server resource allocation system, method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12856777

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12856777

Country of ref document: EP

Kind code of ref document: A1