WO2012149714A1 - 一种节点控制器链路的切换方法、处理器系统和节点 - Google Patents

一种节点控制器链路的切换方法、处理器系统和节点 Download PDF

Info

Publication number
WO2012149714A1
WO2012149714A1 PCT/CN2011/078893 CN2011078893W WO2012149714A1 WO 2012149714 A1 WO2012149714 A1 WO 2012149714A1 CN 2011078893 W CN2011078893 W CN 2011078893W WO 2012149714 A1 WO2012149714 A1 WO 2012149714A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
link
hba
chip
routing table
Prior art date
Application number
PCT/CN2011/078893
Other languages
English (en)
French (fr)
Inventor
谭海波
王振国
俞柏峰
黄平
赵俊峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP11864888.0A priority Critical patent/EP2605451B1/en
Priority to PCT/CN2011/078893 priority patent/WO2012149714A1/zh
Priority to CN201180001863.5A priority patent/CN102449621B/zh
Publication of WO2012149714A1 publication Critical patent/WO2012149714A1/zh
Priority to US13/712,588 priority patent/US9015521B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2017Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where memory access, memory control or I/O control functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0882Utilisation of link capacity

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method, a processor system, and a node for switching an NC link.
  • processor systems with more processing capabilities, such as SMP (Symmetric Multi-Processor).
  • SMP Symmetric Multi-Processor
  • System architecture Cluster system
  • MPP Massive Parallel Processing
  • NUMA Non Uniform Memory Access
  • OS operating system
  • the memory of all CPUs and the whole system can be shared to optimize and improve the performance of the processor system.
  • both the SMP system and the NUMA system adopt a dual NC (Node Controller) chip redundant link scheme.
  • NC Node Controller
  • Embodiments of the present invention provide a method, a processor system, and a node for switching an NC link, so as to reduce the cost of a redundant link while maintaining reliability of the processor system.
  • Embodiments of the present invention provide a method for switching a node controller link.
  • the processor system includes two or more nodes that can communicate with each other, and each node includes a node controller NC chip and a master.
  • the machine bus adapter HBA device and the at least one CPU the NC chip is connected to each CPU in the node, the HBA device is connected to each CPU in the node; the NC link carried by the NC chip and the HBA device
  • the HBA link of the bearer is corresponding to the HBA link; the method includes: when detecting that a certain NC chip is faulty, switching the service on the NC link carried by the NC chip to the corresponding HB A link.
  • An embodiment of the present invention further provides a processor system, where the processor system includes two or more nodes that can communicate with each other;
  • Each node includes a node controller NC chip, a host bus adapter HBA device, and at least one CPU, the NC chip is connected to each CPU in the node, and the HBA device is connected to each CPU in the node;
  • the NC link carried by the NC chip corresponds to the HB A link carried by the HBA device;
  • the node is configured to: when detecting that the own NC chip is faulty, switch the service on the NC link carried by the NC chip to the corresponding HBA link.
  • An embodiment of the present invention further provides a node, where the node includes:
  • NC chip a node controller NC chip, a host bus adapter HBA device, a controller and at least one CPU, the NC chip is connected to each CPU in the node, and the HBA device is connected to each CPU in the node; the NC The NC link carried by the chip corresponds to the HBA link carried by the HBA device;
  • the controller is configured to: when detecting a failure of the own NC chip, switch the service on the NC link carried by the NC chip to the corresponding HB A link.
  • the switching method, the processor system and the node of the NC link in the embodiment of the present invention use the HBA device to arrange the redundant link. Because the hardware cost of the HBA device is low, the design is simple, and the development cycle is short, the HBA device arrangement is adopted. Redundant HBA links can effectively reduce the cost of arranging redundant links while ensuring processor system reliability.
  • HBA devices connected to CPU or Northbridge support hot swap when connecting and replacing HBA devices Very convenient, improve the RAS characteristics of the processor system; because the HBA device does not occupy the system bus resources, it is convenient for the processor system to expand; when the service on the NC link reaches a certain load, by switching some services on the NC link to On the HBA link, the service load distribution is balanced, and the utilization of the HBA link is improved.
  • FIG. 1 is a schematic diagram of a first connection manner between an HBA device and a CPU of the present invention
  • FIG. 2 is a schematic diagram of a second connection manner between the HBA device and the CPU of the present invention.
  • FIG. 3 is a schematic flow chart of an embodiment of a method for switching an NC link according to the present invention.
  • FIG. 4 is a schematic diagram of an NC switching network and an HB A switching network of the present invention.
  • Figure 5 is a schematic diagram of an NC link and an HBA link of the present invention.
  • FIG. 6 is a schematic diagram of a connection manner of an NC chip, an HBA device, and a switching device according to the present invention
  • FIG. 8 is a schematic diagram of another connection manner of the NC chip, the HBA device, and the switching device of the present invention
  • FIG. 8 is an NC link of the present invention.
  • FIG. 9 is a schematic structural diagram of a processor system of the present invention.
  • Figure 10 is a schematic view of a frame of a first embodiment of a node of the present invention.
  • Figure 11 is a schematic illustration of a frame of a second embodiment of the node of the present invention.
  • the embodiment provides a method for switching an NC link, where the processor system includes two or more nodes that can communicate with each other, and each node includes an NC chip and a host bus adapter HBA (Host Bus Adapter).
  • HBA Hos Bus Adapter
  • the HBA link carried by the device corresponds.
  • the HBA device is plugged into the north bridge chip, and the north bridge chip is connected to each of the CPUs through the front end bus. See Figure 1, for example, taking two CPUs in one node.
  • HBA devices use the PCIE (Peripheral Component Interconnect Express) interface to extend the PCIE slot on the Northbridge to connect to the HBA.
  • PCIE Peripheral Component Interconnect Express
  • Some CPU chips have a PCIE controller integrated inside, so you can directly mount the HBA device on the PCIE slot that the CPU leads out.
  • the method can include the following steps (see Figure 3):
  • the detecting operation may be performed by a switching device (such as a switch, a router, etc.) in the processor system, or a node where the failed NC chip is located; and the node that initiates the switching action may be a node where the failed NC chip is located.
  • a switching device such as a switch, a router, etc.
  • the redundant HBA link with the HBA device can effectively reduce the cost of arranging redundant links while ensuring the reliability of the processor system.
  • HBA devices There are usually many PCIE slots reserved on the motherboard, and many HBA devices support hot swap, which is very convenient when connecting and replacing HBA devices, improving the RAS (Reliability, Availability, Serviceability, High Reliability, High availability, high serviceability) features.
  • RAS Reliability, Availability, Serviceability, High Reliability, High availability, high serviceability
  • the HBA device does not occupy system bus resources and does not limit the expansion of the processor system.
  • the NC link carried by the NC chip corresponds to the HB A link carried by the HB A device.
  • the first routing table and the second routing table are preset in each node, and the first routing table is used for each The routing table of the NC chip in the node, each NC chip corresponds to the address of the node where it is located, the second routing table is the routing table of the HBA device in each node, the address of the node where each HBA device corresponds, the first routing table and the The two routing tables are mapped by node addresses.
  • the node resources seen through the first routing table and the second routing table are consistent, and the node resources may include CPU, memory, 10 resources (for example, PCIE devices), and the like.
  • the NC link and the HBA link need to be established respectively through the switching device.
  • the first routing table includes the port of the switching device corresponding to the node in the NC switching network.
  • the second routing table includes a port of the switching device corresponding to the node in the HBA switching network.
  • the NC switching network is a switching network between NC chips
  • the HBA switching network is an HBA device.
  • the switching network between the NC (see Figure 4), the NC switching network and the HBA switching network can each use two independent switching devices, or can share the same switching device (see Figure 5), where the NC link is indicated by a solid line. HBA links are indicated by dashed lines.
  • the switching device also needs to store the first routing table and the second routing table, and the operating system controls the switching device to synchronize with the first routing table and the second routing table on each node.
  • the NC link and the HB A link are always unblocked, and both sides of the bearer link (such as between nodes and nodes or between nodes and switching devices) will continuously send handshake signals on the corresponding links.
  • the node may allocate a thread for the HBA link to perform handshake signal training; when the NC chip fails, the node allocates more threads for the HBA link. In order to receive the services switched by the NC link, the smooth switching of the service is realized, and the reliability of the processor system is ensured.
  • the switching of the service on the NC link to the corresponding HBA link in S101 may include: the node where the faulty NC chip is located uses the first routing table to search for the address of the local node; and uses the second routing table to search for the address corresponding to the address. HBA device;
  • the method may further include: S102, when the operating system detects that the bandwidth occupancy of the NC link on a node exceeds a threshold, notifying the node to switch the service conforming to the preset list from the NC link. Go to the corresponding HBA link.
  • the threshold can be specified in advance by the user, and can be adjusted according to actual needs, thereby determining whether the traffic exceeds a certain load.
  • the preset list lists the types of services that are suitable for switching from the NC link to the HBA link, and may be services with low real-time requirements, such as services such as PCIE devices, external storage devices, or 10 storage devices;
  • the preset list can be cached in a certain memory of the node. Therefore, when the traffic on the NC link reaches a certain load, the service load distribution is balanced, and the utilization of the HBA link is improved.
  • the NC link is a direct link between the NC chips, see Figure 6.
  • the NC link is a link formed between the NC chip and the switching device, see Figure 7.
  • the HBA link is always the link formed between the HBA device and the switching device.
  • FIG. 8 four nodes in the processor system are set up: Node 1 and Node 2, and the NC chip and HBA device in Node 1 and Node 2 form an NC link and an HBA link through the same switching device.
  • the CPU 1 in the node 1 wants to access the memory of the CPU 4 in the node 2
  • the general path of the CPU 1 accessing the memory of the CPU 4 is: CPU1-NC chip 1 - switching device - NC chip 2 - CPU 4 - memory 4.
  • the NC chip 1 fails, and the NC link between the NC chip 1 and the switching device fails, the node 1 switches the service on the NC chip 1 carrying the NC link to the HBA link corresponding to the HBA device 1 .
  • the path of the CPU's access to the memory of the CPU 4 is: CPU1-HBA device 1 - switching device - NC chip 2 - CPU 4 - memory 4.
  • the above method for switching the NC link can be applied to the SMP system architecture, the NUMA system architecture, the cluster, the cloud computing, etc. If the processor system of the present embodiment is regarded as a whole node, it can also be used in the MMP system architecture.
  • the switching method of the NC link in this embodiment uses the HBA device to arrange redundant links. Because the hardware cost of the HBA device is low, the design is simple, and the development cycle is short, the redundant HBA link can be arranged by using the HBA device. Under the premise of ensuring the reliability of the processor system, the cost of arranging redundant links is effectively reduced; since the HBA device connected to the CPU or the Northbridge chip supports hot swapping, it is very convenient when connecting and replacing the HBA device, and the processor is improved.
  • This embodiment provides a processor system.
  • the processor system includes two nodes that can communicate with each other.
  • Each node includes an NC chip, an HBA device, and at least one CPU.
  • the NC chip is connected to each CPU in the node, and the HBA device is connected to each CPU in the node; the NC chain carried by the NC chip The road corresponds to the HBA link carried by the HBA device.
  • the node is configured to: when detecting that the own NC chip is faulty, switch the service on the NC link carried by the NC chip to the corresponding HBA link.
  • the node can also be used to:
  • the preset first routing table and the second routing table are stored.
  • the first routing table is a routing table of the NC chip in each node, and each NC chip corresponds to an address of a node where the node is located, and the second routing table is an HBA device in each node.
  • the node can also be used to:
  • the service that matches the preset list on the NC link is switched to the corresponding HBA link.
  • the processor system of the embodiment uses the HBA device to arrange redundant links. Because the hardware cost of the HBA device is low, the design is simple, and the development cycle is short, the HBA device using the HBA device can ensure redundant processing. Under the premise of reliability of the system, the cost of arranging redundant links is effectively reduced; since the HBA device connected to the CPU or the Northbridge chip supports hot swapping, it is very convenient when connecting and replacing the HBA device, and the RAS of the processor system is improved. Feature: Because the HBA device does not occupy the system bus resources, it is convenient for the processor system to expand. When the service on the NC link reaches a certain load, the traffic load is allocated by switching some services on the NC link to the HBA link. Equilibrium, while improving the utilization of HBA links. Embodiment 3
  • the node includes:
  • An NC chip 10 a host bus adapter HBA device 20, a controller 30, and at least one CPU 40 (exemplified by including three CPUs in FIG. 10), the NC chip 10 is connected to each CPU 40 in the node, the HBA The device 20 is connected to each CPU 40 in the node where it is located; the NC link carried by the NC chip 10 corresponds to the HB A link carried by the HB A device 20.
  • the controller 30 is configured to: when the fault of the own NC chip is detected, switch the service on the NC link carried by the NC chip 10 to the corresponding HBA link.
  • the node further includes a storage device 50, configured to: store a preset first routing table and a second routing table, where the first routing table is a routing table of the NC chip in each node, and each NC chip corresponds to the routing table.
  • the address of the node where the second routing table is the routing table of the HBA device in each node, and each HBA device corresponds to the address of the node where the HBA device is located.
  • the first routing table and the second routing table are associated by the node address.
  • the controller 30 is also used to:
  • the preset list may be located in the memory of a certain CPU in the node; or the controller may be separately configured with a memory, and the preset list is located in the memory of the controller; the preset list may also be stored in the storage device 50.
  • the node of this embodiment uses the HBA device to arrange redundant links. Because the hardware cost of the HBA device is low, the design is simple, and the development cycle is short, the redundant HBA link can be arranged in the HBA device to ensure the processor system. Under the premise of reliability, the cost of arranging redundant links is effectively reduced. Since the HBA devices connected to the CPU or the Northbridge chip support hot swap, it is very convenient to connect and replace the HBA device, and improve the RAS characteristics of the processor system. The HBA device does not occupy the system bus resources, which facilitates the expansion of the processor system. When the service on the NC link reaches a certain load, the service load distribution is balanced by switching some services on the NC link to the HBA link. At the same time, the utilization of HBA links is improved.
  • the third embodiment and the first embodiment have more similar contents, so the comparison is introduced.
  • the related matter refer to the first embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

本发明实施例公开了一种NC链路的切换方法、处理器系统和节点,其中,所述处理器系统中包括两个以上可相互通信的节点,每个节点包括一个节点控制器NC芯片、一个主机总线适配器HBA装置和至少一个CPU,所述NC芯片与所在节点内每一个CPU连接,所述HBA装置与所在节点内每一个CPU连接;所述NC芯片承载的NC链路与所述HBA装置承载的HBA链路相对应;所述方法包括:当检测到某一NC芯片出现故障时,将所述NC芯片承载的NC链路上的业务切换到对应的HBA链路上。通过使用HBA装置布置冗余链路,在保证处理器系统可靠性的前提下,有效降低布置冗余链路的成本。

Description

一种节点控制器链路的切换方法、 处理器系统和节点 技术领域 本发明实施例涉及通信技术领域, 特别是涉及一种 NC链路的切换方法、 处理器系统和节点。
背景技术 随着技术的进步,人们对处理器的性能提出越来越高的要求,应此需求人 们开发出了处理能力更强的处理器系统, 例如 SMP ( Symmetric Multi - Processor, 对称多处理器)系统、 Cluster (机群)系统、 MPP ( Massive Parallel Processing, 大规模并行处理 )系统以及 NUMA ( Non Uniform Memory Access, 非一致内存访问) 系统等处理器系统架构。 这些架构主要通过共享内存及 10 总线的方式, 达到系统性能的提升。 例如 NUMA系统架构中的各节点之间通 过互联模块进行连接和信息交互, 在一个 OS (操作系统) 下面, 可以共享所 有 CPU及整个系统的内存, 达到处理器系统性能优化和提升。
目前无论是 SMP系统还是 NUMA系统, 都采用双 NC ( Node Controller, 节点控制器)芯片冗余链路方案, 当其中一条 NC链路出现故障时, 该 NC链 路上所有的业务都将切换到另外一个冗余的 NC链路上面,以确保业务不会中 断、 处理器系统性能不会受到影响, 从而提高整个处理器系统的可用度。
发明人在实现本发明的过程中, 发现现有技术至少存在如下问题: 由于 NC芯片的成本较高, 且开发周期长, 导致布置冗余的 NC链路的成 本过高。进一步的,冗余的 NC链路利用率非常低,还占用处理器系统总线(例 如 QPI (快速通道互联)接口、 HT (超传输) 总线等) 资源, 当处理器系统 总线资源比较紧张时, 不利于处理器系统的扩展。
发明内容
本发明实施例提供一种 NC链路的切换方法、 处理器系统和节点, 以在保 持处理器系统可靠性的前提下, 降低冗余链路的成本。
本发明实施例提供了一种节点控制器链路的切换方法,处理器系统中包括 两个以上可相互通信的节点, 每个节点包括一个节点控制器 NC芯片、 一个主 机总线适配器 HBA装置和至少一个 CPU, 所述 NC芯片与所在节点内每一个 CPU连接, 所述 HBA装置与所在节点内每一个 CPU连接; 所述 NC芯片承 载的 NC链路与所述 HBA装置承载的 HBA链路相对应; 所述方法包括: 当检测到某一 NC芯片出现故障时, 将所述 NC芯片承载的 NC链路上的 业务切换到对应的 HB A链路上。
本发明实施例还提供了一种处理器系统,所述处理器系统包括两个以上可 相互通信的节点;
每个节点包括一个节点控制器 NC芯片、 一个主机总线适配器 HBA装置 和至少一个 CPU, 所述 NC芯片与所在节点内每一个 CPU连接, 所述 HBA 装置与所在节点内每一个 CPU连接;所述 NC芯片承载的 NC链路与所述 HBA 装置承载的 HB A链路相对应;
所述节点用于: 当检测到自身 NC芯片出现故障时, 将所述 NC芯片承载 的 NC链路上的业务切换到对应的 HBA链路上。
本发明实施例还提供了一种节点, 所述节点包括:
一个节点控制器 NC芯片、 一个主机总线适配器 HBA装置、 控制器和至 少一个 CPU, 所述 NC芯片与所在节点内每一个 CPU连接, 所述 HBA装置 与所在节点内每一个 CPU连接; 所述 NC芯片承载的 NC链路与所述 HBA装 置承载的 HBA链路相对应;
所述控制器用于: 当检测到自身 NC芯片出现故障时, 将所述 NC芯片承 载的 NC链路上的业务切换到对应的 HB A链路上。
本发明实施例的 NC链路的切换方法、 处理器系统和节点, 使用 HBA装 置布置冗余链路, 由于 HBA装置的硬件成本较低、设计筒单、开发周期较短, 因此采用 HBA装置布置冗余的 HBA链路能够在保证处理器系统可靠性的前 提下,有效降低布置冗余链路的成本; 由于与 CPU或北桥芯片连接的 HBA装 置支持热插拔 ,在连接和更换 HBA装置时非常方便,提高了处理器系统的 RAS 特性; 由于 HBA装置不占用系统总线资源, 便于处理器系统扩展; 当 NC链 路上的业务达到一定的负荷时, 通过将 NC链路上一部分业务切换到 HBA链 路上, 实现了业务负载分配均衡, 同时提高了 HBA链路的利用率。 附图说明
图 1是本发明 HBA装置与 CPU第一种连接方式的示意图;
图 2是本发明 HBA装置与 CPU第二种连接方式的示意图;
图 3是本发明 NC链路的切换方法的实施例的流程示意图;
图 4是本发明 NC交换网和 HB A交换网的示意图;
图 5是本发明 NC链路和 HBA链路的示意图;
图 6是本发明 NC芯片、 HBA装置与交换设备的一种连接方式示意图; 图 Ί是本发明 NC芯片、 HBA装置与交换设备的另一种连接方式示意图; 图 8是本发明 NC链路的切换方法的一个具体例子的示意图;
图 9是本发明处理器系统的结构示意图;
图 10是本发明节点的第一实施例的框架示意图;
图 11是本发明节点的第二实施例的框架示意图。
具体实施方式
为使本发明的上述目的、特征和优点能够更加明显易懂, 下面结合附图和 具体实施方式对本发明实施例作进一步详细的说明。
实施例一
本实施例提供了一种 NC链路的切换方法,该切换方法涉及的处理器系统 中包括两个以上可相互通信的节点, 每个节点包括一个 NC芯片、 一个主机总 线适配器 HBA ( Host Bus Adapter, 主机总线适配器 )装置和至少一个 CPU, 所述 NC芯片与所在节点内每一个 CPU连接,所述 HBA装置与所在节点内每 一个 CPU连接; 所述 NC芯片承载的 NC链路与所述 HBA装置承载的 HBA 链路相对应。
所述 HBA装置与所在节点内每一个 CPU连接的方式可以有两种:
( 1 ), HBA装置插接在北桥芯片上, 北桥芯片与所述每一个 CPU通过前 端总线相连。 参见图 1 , 以一个节点中包含两个 CPU为例。
通常, HBA装置都采用 PCIE ( Peripheral Component Interconnect Express , 快速外设互联标准 )接口, 可以在北桥芯片上扩展 PCIE插槽来连接 HBA装 置。 ( 2 )参见图 2, HBA装置直接挂接在所述每一个 CPU上。
一些 CPU芯片内部集成有 PCIE控制器, 因此可以直接将 HBA装置挂接 在 CPU引出的 PCIE插槽上面。
所述方法可以包括如下步骤(参见图 3 ):
S101 , 当检测到某一 NC芯片出现故障时, 将所述 NC芯片承载的 NC链 路上的业务切换到对应的 HB A链路上。
执行所述检测动作的可以是处理器系统中的交换设备 (例如交换机、路由 器等), 也可以是出现故障的 NC芯片所在的节点; 发起切换动作的可以是出 现故障的 NC芯片所在的节点。
由于 HBA装置的硬件成本较低、 设计筒单, 因此采用 HBA装置布置冗 余的 HBA链路能够在保证处理器系统可靠性的前提下, 有效降低布置冗余链 路的成本。
主板上通常会预留有很多 PCIE插槽, 而且很多 HBA装置支持热插拔, 这样在连接和更换 HBA 装置时非常方便, 提高了处理器系统的 RAS ( Reliability, Availability, Serviceability, 高可靠性、 高可用性、 高服务性) 特性。 另外, HBA装置不占用系统总线资源, 不限制处理器系统的扩展。
所述 NC芯片承载的 NC链路与所述 HB A装置承载的 HB A链路相对应, 具体可以通过在每个节点内预置第一路由表和第二路由表,第一路由表为各个 节点中 NC芯片的路由表, 每个 NC芯片对应其所在节点的地址, 第二路由表 为各个节点中 HBA装置的路由表, 每个 HBA装置对应其所在节点的地址, 第一路由表和第二路由表通过节点地址对应起来。
通过第一路由表和第二路由表看到的节点资源是一致的,节点资源可以包 括 CPU、 内存、 10资源 (例如 PCIE设备 )等。
若 NC芯片没有交换功能,则需要分别通过交换设备建立 NC链路和 HBA 链路, 则第一路由表中除了节点地址以夕卜, 还包括 NC交换网中与该节点对应 的交换设备的端口, 以及, 第二路由表中除了节点地址以外, 还包括 HBA交 换网中与该节点对应的交换设备的端口。
无论节点中的 NC芯片是否具有交换功能, 都存在 NC交换网和 HB A交 换网。 所述 NC交换网为 NC芯片之间的交换网, HBA交换网为 HBA装置之 间的交换网 (参见图 4 ), NC交换网和 HBA交换网可以各自使用两个独立的 交换设备, 也可以共用同一个交换设备(参见图 5 ), 其中, NC链路用实线表 示, HBA链路用虚线表示。 交换设备中也需要存储第一路由表和第二路由表, 可以由操作系统控制交换设备与各个节点上的第一路由表和第二路由表同步。
正常情况下, NC链路和 HB A链路是一直畅通的, 承载链路的双方 (如 节点和节点之间或节点与交换设备之间)会不停的在相应的链路上发送握手信 号, 以检测链路是否畅通可用。 若节点中 NC芯片正常, 与其对应的 HBA链 路未被使用, 则节点可以为 HBA链路分配一个线程进行握手信号的训练; 当 NC芯片出现故障, 节点会为该 HBA链路多分配一些线程以便于接收 NC链 路切换过来的业务, 实现业务的平滑切换, 保证处理器系统的可靠性。
S101中将 NC链路上的业务切换到对应的 HBA链路上可以包括: 出现故障的 NC芯片所在的节点利用第一路由表查找本节点的地址; 利用第二路由表查找与所述地址对应的 HBA装置;
将所述出现故障的 NC 芯片承载的 NC链路上的业务切换到所述对应的 HBA装置承载的 HBA链路上。
进一步的, 所述方法还可以包括: S102, 当操作系统检测到某一节点上 NC链路的带宽占用率超过阈值时, 通知所述节点将符合预置列表的业务从所 述 NC链路切换到对应的 HBA链路上。
所述阈值可以由用户提前指定, 并可根据实际需要进行调整, 由此来确定 业务量是否超出一定的负荷。 所述预置列表中罗列了适合从 NC链路切换到 HBA链路的业务类型, 可以是一些实时性要求较低的业务, 例如可以为 PCIE 设备、 外挂存储设备或 10存储设备等的业务; 预置列表可以緩存在节点的某 个内存中。 由此, 可以在 NC链路上的业务量达到一定负荷时, 实现业务负载 分配均衡, 并提高了 HBA链路的利用率。
对于具有交换功能的 NC芯片来说, NC链路为 NC芯片之间直接形成的 链路, 参见图 6。 对于不具有交换功能的 NC芯片来说, NC链路为 NC芯片 与交换设备之间形成的链路, 参见图 7。 而 HBA链路则始终为 HBA装置与交 换设备之间形成的链路。
下面, 以一个具体的例子介绍上述切换方法。 参见图 8, 4叚设处理器系统中的两个节点: 节点 1和节点 2, 节点 1和节 点 2中的 NC芯片和 HBA装置通过同一交换设备形成 NC链路和 HBA链路。 若节点 1中的 CPU1要访问节点 2中 CPU4的内存, 正常情况下, CPU1访问 CPU4的内存的一般路径为: CPU1-NC芯片 1-交换设备 -NC芯片 2-CPU4-内存 4。 当 NC芯片 1 出现故障、 导致 NC芯片 1与交换设备之间的 NC链路出现 故障时, 节点 1将 NC芯片 1承载 NC链路上的业务切换到 HBA装置 1对应 承载的 HBA链路上, 贝' J CPU1访问 CPU4的内存的路径为: CPU1-HBA装置 1-交换设备 -NC芯片 2-CPU4-内存 4。
上述 NC链路的切换方法可以应用于 SMP系统架构、 NUMA系统架构、 机群和云计算等, 若将本实施例的处理器系统整个看作一个节点, 则还可以运 用在 MMP系统架构中。
本实施例的 NC链路的切换方法,使用 HBA装置布置冗余链路,由于 HBA 装置的硬件成本较低、 设计筒单、 开发周期较短, 因此采用 HBA装置布置冗 余的 HBA链路能够在保证处理器系统可靠性的前提下, 有效降低布置冗余链 路的成本; 由于与 CPU或北桥芯片相连的 HBA装置支持热插拔,在连接和更 换 HBA装置时非常方便, 提高了处理器系统的 RAS特性; 由于 HBA装置不 占用系统总线资源, 便于处理器系统扩展; 当 NC链路上的业务达到一定的负 荷时, 通过将 NC链路上一部分业务切换到 HBA链路上, 实现了业务负载分 配均衡, 同时提高了 HBA链路的利用率。 实施例二
本实施例提供了一种处理器系统, 参见图 9, 所述处理器系统包括两个以 上可相互通信的节点。
每个节点包括一个 NC芯片、 一个 HBA装置和至少一个 CPU, 所述 NC 芯片与所在节点内每一个 CPU连接,所述 HBA装置与所在节点内每一个 CPU 连接;所述 NC芯片承载的 NC链路与所述 HBA装置承载的 HBA链路相对应。
所述节点用于: 当检测到自身 NC芯片出现故障时, 将所述 NC芯片承载 的 NC链路上的业务切换到对应的 HBA链路上。
所述节点还可以用于: 存储预置的第一路由表和第二路由表,第一路由表为各个节点中 NC芯片 的路由表, 每个 NC 芯片对应其所在节点的地址, 第二路由表为各个节点中 HBA装置的路由表,每个 HBA装置对应其所在节点的地址, 第一路由表和第 二路由表通过节点地址对应起来。
所述节点还可以用于:
当收到操作系统对本节点 NC链路的切换通知后, 将所述 NC链路上符合 预置列表的业务切换到对应的 HBA链路上。
本实施例的处理器系统, 使用 HBA装置布置冗余链路, 由于 HBA装置 的硬件成本较低、 设计筒单、 开发周期较短, 因此采用 HBA装置布置冗余的 HBA链路能够在保证处理器系统可靠性的前提下, 有效降低布置冗余链路的 成本; 由于与 CPU或北桥芯片相连的 HBA装置支持热插拔, 在连接和更换 HBA装置时非常方便, 提高了处理器系统的 RAS特性; 由于 HBA装置不占 用系统总线资源, 便于处理器系统扩展; 当 NC链路上的业务达到一定的负荷 时, 通过将 NC链路上一部分业务切换到 HBA链路上, 实现了业务负载分配 均衡, 同时提高了 HBA链路的利用率。 实施例三
本实施例提供了一种节点, 参见图 10, 所述节点包括:
一个 NC芯片 10、 一个主机总线适配器 HBA装置 20、 控制器 30和至少 一个 CPU40 (图 10中以包含 3个 CPU进行示例 ),所述 NC芯片 10与所在节 点内每一个 CPU40连接,所述 HBA装置 20与所在节点内每一个 CPU40连接; 所述 NC芯片 10承载的 NC链路与所述 HB A装置 20承载的 HB A链路相对应。
所述控制器 30用于: 当检测到自身 NC芯片出现故障时, 将所述 NC芯 片 10承载的 NC链路上的业务切换到对应的 HBA链路上。
参见图 11 , 所述节点还包括存储器件 50, 用于: 存储预置的第一路由表 和第二路由表, 第一路由表为各个节点中 NC芯片的路由表, 每个 NC芯片对 应其所在节点的地址,第二路由表为各个节点中 HBA装置的路由表,每个 HBA 装置对应其所在节点的地址, 第一路由表和第二路由表通过节点地址对应起 来。 所述控制器 30还用于:
当收到操作系统对本节点 NC链路的切换通知后, 将所述 NC链路上符合 预置列表的业务切换到对应的 HBA链路上。 预置列表可以位于节点内某一 CPU 的内存; 也可以是控制器单独配置有内存, 预置列表位于所述控制器的 内存中; 预置列表还可以存储在所述存储器件 50中。
本实施例的节点, 使用 HBA装置布置冗余链路, 由于 HBA装置的硬件 成本较低、 设计筒单、 开发周期较短, 因此采用 HBA装置布置冗余的 HBA 链路能够在保证处理器系统可靠性的前提下, 有效降低布置冗余链路的成本; 由于与 CPU或北桥芯片相连的 HBA装置都支持热插拔, 在连接和更换 HBA 装置时非常方便,提高了处理器系统的 RAS特性; 由于 HBA装置不占用系统 总线资源, 便于处理器系统扩展; 当 NC链路上的业务达到一定的负荷时, 通 过将 NC链路上一部分业务切换到 HBA链路上, 实现了业务负载分配均衡, 同时提高了 HBA链路的利用率。
由于实施例二、 实施例三与实施例一的相似内容较多, 因此介绍的比较筒 略, 相关之处请参见实施例一。
本领域普通技术人员可以理解,实现上述实施例方法中的全部或部分步骤 是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于一计算机可 读存储介质中, 如: ROM / RAM、 磁碟、 光盘等。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将 一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些 实体或操作之间存在任何这种实际的关系或者顺序。 而且, 术语 "包括"、 "包 含"或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素 的过程、 方法、 物品或者设备不仅包括那些要素, 而且还包括没有明确列出的 其他要素, 或者是还包括为这种过程、 方法、 物品或者设备所固有的要素。 在 没有更多限制的情况下, 由语句 "包括一个……,, 限定的要素, 并不排除在包 括所述要素的过程、 方法、 物品或者设备中还存在另外的相同要素。
以上所述仅为本发明的较佳实施例而已, 并非用于限定本发明的保护范 围。 凡在本发明的精神和原则之内所作的任何修改、 等同替换、 改进等, 均包 含在本发明的保护范围内。

Claims

权 利 要 求
1、 一种节点控制器链路的切换方法, 其特征在于, 处理器系统中包括两 个以上可相互通信的节点, 每个节点包括一个节点控制器 NC芯片、一个主机 总线适配器 HBA装置和至少一个 CPU, 所述 NC芯片与所在节点内每一个 CPU连接, 所述 HB A装置与所在节点内每一个 CPU连接; 所述 NC芯片承 载的 NC链路与所述 HBA装置承载的 HBA链路相对应; 所述方法包括: 当检测到某一 NC芯片出现故障时, 将所述 NC芯片承载的 NC链路上的 业务切换到对应的 HBA链路上。
2、 如权利要求 1所述的方法, 其特征在于, 所述 NC芯片承载的 NC链 路与所述 HBA装置承载的 HBA链路相对应包括: 每个节点内预置有第一路 由表和第二路由表, 第一路由表为各个节点中 NC芯片的路由表, 每个 NC芯 片对应其所在节点的地址, 第二路由表为各个节点中 HBA装置的路由表, 每 个 HBA装置对应其所在节点的地址, 第一路由表和第二路由表通过节点地址 对应起来。
3、 如权利要求 2所述的方法, 其特征在于, 所述将 NC链路上的业务切 换到对应的 HBA链路上具体包括:
利用第一路由表查找出现故障的 NC芯片所在节点的地址;
利用第二路由表查找与所述地址对应的 HBA装置;
将所述出现故障的 NC 芯片承载的 NC链路上的业务切换到所述对应的 HBA装置承载的 HB A链路上。
4、 如权利要求 1所述的方法, 其特征在于, 所述方法还包括: 当操作系 统检测到某一节点上 NC链路的带宽占用率超过阈值时,通知所述节点将符合 预置列表的业务从所述 NC链路切换到对应的 HBA链路上。
5、 如权利要求 1-4任一项所述的方法, 其特征在于, 所述 HBA装置与所 在节点内每一个 CPU连接的方式为: HBA装置插接在北桥芯片上, 北桥芯片 与所述每一个 CPU通过前端总线相连。
6、 如权利要求 1-4任一项所述的方法, 其特征在于, 所述 HBA装置与所 在节点内每一个 CPU连接的方式为: HBA装置直接挂接在所述每一个 CPU 上。
7、 一种处理器系统, 其特征在于, 所述处理器系统包括两个以上可相互 通信的节点;
每个节点包括一个节点控制器 NC芯片、 一个主机总线适配器 HBA装置 和至少一个 CPU , 所述 NC芯片与所在节点内每一个 CPU连接, 所述 HB A 装置与所在节点内每一个 CPU连接;所述 NC芯片承载的 NC链路与所述 HBA 装置承载的 HB A链路相对应;
并且所述节点在检测到自身 NC芯片出现故障时, 将所述 NC芯片承载的 NC链路上的业务切换到对应的 HB A链路上。
8、 如权利要求 7所述的处理器系统, 其特征在于, 所述节点还用于: 存储预置的第一路由表和第二路由表,第一路由表为各个节点中 NC芯片 的路由表, 每个 NC 芯片对应其所在节点的地址, 第二路由表为各个节点中 HBA装置的路由表,每个 HBA装置对应其所在节点的地址, 第一路由表和第 二路由表通过节点地址对应起来。
9、 如权利要求 7所述的处理器系统, 其特征在于, 所述节点还用于: 当收到操作系统对本节点 NC链路的切换通知后, 将所述 NC链路上符合 预置列表的业务切换到对应的 HBA链路上。
10、 一种节点, 其特征在于, 所述节点包括:
一个节点控制器 NC芯片、 一个主机总线适配器 HBA装置、 控制器和至 少一个 CPU, 所述 NC芯片与所在节点内每一个 CPU连接, 所述 HBA装置 与所在节点内每一个 CPU连接; 所述 NC芯片承载的 NC链路与所述 HBA装 置承载的 HBA链路相对应;
所述控制器用于: 当检测到自身 NC芯片出现故障时, 将所述 NC芯片承 载的 NC链路上的业务切换到对应的 HBA链路上。
11、 如权利要求 10所述的节点, 其特征在于, 所述节点还包括存储器件, 用于: 存储预置的第一路由表和第二路由表, 第一路由表为各个节点中 NC芯 片的路由表, 每个 NC芯片对应其所在节点的地址, 第二路由表为各个节点中 HBA装置的路由表,每个 HBA装置对应其所在节点的地址, 第一路由表和第 二路由表通过节点地址对应起来。
12、 如权利要求 11所述的节点, 其特征在于, 所述控制器还用于: 当收到操作系统对本节点 NC链路的切换通知后, 将所述 NC链路上符合 预置列表的业务切换到对应的 HBA链路上。
PCT/CN2011/078893 2011-08-25 2011-08-25 一种节点控制器链路的切换方法、处理器系统和节点 WO2012149714A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP11864888.0A EP2605451B1 (en) 2011-08-25 2011-08-25 Node controller link switching method, processor system and node
PCT/CN2011/078893 WO2012149714A1 (zh) 2011-08-25 2011-08-25 一种节点控制器链路的切换方法、处理器系统和节点
CN201180001863.5A CN102449621B (zh) 2011-08-25 2011-08-25 一种节点控制器链路的切换方法、处理器系统和节点
US13/712,588 US9015521B2 (en) 2011-08-25 2012-12-12 Method for switching a node controller link, processor system, and node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/078893 WO2012149714A1 (zh) 2011-08-25 2011-08-25 一种节点控制器链路的切换方法、处理器系统和节点

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/712,588 Continuation US9015521B2 (en) 2011-08-25 2012-12-12 Method for switching a node controller link, processor system, and node

Publications (1)

Publication Number Publication Date
WO2012149714A1 true WO2012149714A1 (zh) 2012-11-08

Family

ID=46010198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/078893 WO2012149714A1 (zh) 2011-08-25 2011-08-25 一种节点控制器链路的切换方法、处理器系统和节点

Country Status (4)

Country Link
US (1) US9015521B2 (zh)
EP (1) EP2605451B1 (zh)
CN (1) CN102449621B (zh)
WO (1) WO2012149714A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880583B (zh) * 2012-08-01 2015-03-11 浪潮(北京)电子信息产业有限公司 一种多路服务器动态链路配置装置和方法
US9710341B2 (en) * 2014-12-16 2017-07-18 Dell Products L.P. Fault tolerant link width maximization in a data bus
CN106708551B (zh) * 2015-11-17 2020-01-17 华为技术有限公司 一种热添加中央处理器cpu的配置方法及系统
KR102092660B1 (ko) * 2015-12-29 2020-03-24 후아웨이 테크놀러지 컴퍼니 리미티드 Cpu 및 다중 cpu 시스템 관리 방법
CN105700975B (zh) 2016-01-08 2019-05-24 华为技术有限公司 一种中央处理器cpu热移除、热添加方法及装置
CN107291653B (zh) * 2016-03-31 2020-06-16 华为技术有限公司 一种多处理器系统及配置多处理器系统的方法
CN106776459B (zh) * 2016-12-14 2020-06-26 华为技术有限公司 信号处理方法、节点控制器芯片与多处理器系统
EP3605350A4 (en) * 2017-05-04 2020-04-29 Huawei Technologies Co., Ltd. INTERCONNECTION SYSTEM, AND INTERCONNECTION CONTROL METHOD AND APPARATUS
CN108632142B (zh) * 2018-03-28 2021-02-12 华为技术有限公司 节点控制器的路由管理方法和装置
CN109189699B (zh) * 2018-09-21 2022-03-22 郑州云海信息技术有限公司 多路服务器通信方法、系统、中间控制器及可读存储介质
CN112711503B (zh) * 2020-12-28 2024-03-26 北京同有飞骥科技股份有限公司 一种基于飞腾2000+cpu的存储测试方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1805411A (zh) * 2006-01-23 2006-07-19 杭州华为三康技术有限公司 一种处理标签绑定的方法
US20080215818A1 (en) * 2006-06-19 2008-09-04 Kornegay Marcus L Structure for silent invalid state transition handling in an smp environment
CN101741831A (zh) * 2008-11-10 2010-06-16 国际商业机器公司 动态物理和虚拟多路径输入/输出的方法、系统和装置
CN102141975A (zh) * 2011-04-01 2011-08-03 华为技术有限公司 计算机系统

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030187987A1 (en) * 2002-03-29 2003-10-02 Messick Randall E. Storage area network with multiple pathways for command paths
US9264384B1 (en) * 2004-07-22 2016-02-16 Oracle International Corporation Resource virtualization mechanism including virtual host bus adapters
US7430629B2 (en) * 2005-05-12 2008-09-30 International Business Machines Corporation Internet SCSI communication via UNDI services
US20060274787A1 (en) * 2005-06-07 2006-12-07 Fong Pong Adaptive cache design for MPT/MTT tables and TCP context
US7821973B2 (en) * 2006-10-24 2010-10-26 Hewlett-Packard Development Company, L.P. Sharing of host bus adapter context
US7778157B1 (en) * 2007-03-30 2010-08-17 Symantec Operating Corporation Port identifier management for path failover in cluster environments
CN100553189C (zh) 2007-06-15 2009-10-21 南京恩瑞特实业有限公司 基于缓冲管理的多链路冗余的实现方法
US8107360B2 (en) * 2009-03-23 2012-01-31 International Business Machines Corporation Dynamic addition of redundant network in distributed system communications
JP5550089B2 (ja) * 2009-03-30 2014-07-16 エヌイーシーコンピュータテクノ株式会社 マルチプロセッサシステム、ノードコントローラ、障害回復方式

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1805411A (zh) * 2006-01-23 2006-07-19 杭州华为三康技术有限公司 一种处理标签绑定的方法
US20080215818A1 (en) * 2006-06-19 2008-09-04 Kornegay Marcus L Structure for silent invalid state transition handling in an smp environment
CN101741831A (zh) * 2008-11-10 2010-06-16 国际商业机器公司 动态物理和虚拟多路径输入/输出的方法、系统和装置
CN102141975A (zh) * 2011-04-01 2011-08-03 华为技术有限公司 计算机系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2605451A4 *

Also Published As

Publication number Publication date
EP2605451A1 (en) 2013-06-19
US9015521B2 (en) 2015-04-21
EP2605451A4 (en) 2013-08-14
US20130103975A1 (en) 2013-04-25
CN102449621A (zh) 2012-05-09
CN102449621B (zh) 2013-11-06
EP2605451B1 (en) 2015-07-01

Similar Documents

Publication Publication Date Title
WO2012149714A1 (zh) 一种节点控制器链路的切换方法、处理器系统和节点
US11507528B2 (en) Pooled memory address translation
US11113196B2 (en) Shared buffered memory routing
US9037898B2 (en) Communication channel failover in a high performance computing (HPC) network
US9208110B2 (en) Raw memory transaction support
US20160283303A1 (en) Reliability, availability, and serviceability in multi-node systems with disaggregated memory
US7783822B2 (en) Systems and methods for improving performance of a routable fabric
US10915370B2 (en) Inter-host communication without data copy in disaggregated systems
JP2017537404A (ja) メモリアクセス方法、スイッチ、およびマルチプロセッサシステム
KR20120037785A (ko) 부하 균형을 유지하는 시스템 온 칩 및 그것의 부하 균형 유지 방법
US11714755B2 (en) System and method for scalable hardware-coherent memory nodes
US10776309B2 (en) Method and apparatus to build a monolithic mesh interconnect with structurally heterogenous tiles
GB2366029A (en) Arbitration method to allow multiple translation lookaside buffers to access a common hardware page walker
CN114930312A (zh) 一种通信方法及相关装置
US12099458B2 (en) Pooled memory address translation
EP3842913A1 (en) Dsp processor and system, and access method for external storage space
CN114928535A (zh) 一种路由发布方法及装置
CN118535281A (zh) 在jailhouse实现MSI/MSIX中断的方法、装置及电子设备
CN114745325A (zh) 一种基于PCIe总线的MAC层数据交换方法及系统

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180001863.5

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11864888

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011864888

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE